docs: Document planned architectural refactor and tool consolidation.

feat: Introduce a pluggable LLM provider system with token extraction, pricing, and updated setup configuration.
docs: Add a detailed plan for multi-provider support, including pricing and token tracking strategies.
2026-03-05 22:10:54 +01:00 · 2026-03-05 22:02:22 +01:00 · 2026-03-05 12:41:55 +01:00
16 changed files with 2139 additions and 28 deletions
@@ -7,7 +7,8 @@ internal sealed class AnchorConfig
 {
    public string ApiKey { get; set; } = "";
    public string Model { get; set; } = "qwen/qwen3.5-397b-a17b";
-
+    public string Provider { get; set; } = "openrouter";
+    public string Endpoint { get; set; } = "https://openrouter.ai/api/v1";
    // ── Persistence ──────────────────────────────────────────────────────

    private static string ConfigPath =>
@@ -5,6 +5,7 @@ namespace AnchorCli.OpenRouter;
 /// </summary>
 internal sealed class TokenTracker
 {
+    public string Provider { get; set; } = "Unknown";
    public long SessionInputTokens { get; private set; }
    public long SessionOutputTokens { get; private set; }
    public int RequestCount { get; private set; }
@@ -23,7 +24,6 @@ internal sealed class TokenTracker

    /// <summary>Fixed USD per API request.</summary>
    public decimal RequestPrice { get; set; }
-
    /// <summary>
    /// Record usage from one response (may span multiple LLM rounds).
    /// </summary>
@@ -0,0 +1,293 @@
+# Provider Support Plan
+
+## Current Problems
+
+1. **OpenRouter Hardcoded**: Endpoint, headers, and pricing API calls are hardcoded to OpenRouter
+2. **Config Ineffective**: SetupTui allows "custom endpoint" but Program.cs ignores it
+3. **Token Count**: Token usage tracking only works with OpenRouter response headers
+4. **Pricing Only for One Provider**: Models list shows pricing, but only when using OpenRouter
+
+---
+
+## Goals
+
+1. Make the system **endpoint-agnostic**
+2. Support pricing/token tracking for **multiple providers**
+3. Keep **OpenRouter as the default** (familiar)
+4. Allow users to configure any OpenAI-compatible endpoint
+5. Show pricing/token info **only when available** for each provider
+
+---
+
+## Provider Categories
+
+### Tier 1: Native Support (Built-in)
+- OpenRouter (default)
+- Ollama (local, no auth)
+- Groq (high-speed inference)
+- Anthropic (native or via API)
+- OpenAI (official api)
+
+### Tier 2: Config-Based Support
+- Cerebras
+- DeepSeek
+- Any OpenAI-compatible endpoint that supports custom headers
+
+### Tier 3: Manual Configuration Required
+- Self-hosted endpoints
+- Corporate proxies
+- Custom middleware layers
+
+---
+
+```csharp
+// Example: Provider interface
+class PricingProvider
+{
+    // Get pricing info from provider's API
+    async Task<List<ModelPricing>> GetModelsAsync(string apiKey);
+    
+    // Get tokens from response
+    async Task<TokenUsage> GetTokensFromResponseAsync(HttpResponseMessage response);
+    
+    // Add provider-specific headers if needed
+    void AddHeaders(HttpRequestMessage request, string apiKey);
+}
+```
+
+**Supported Implementations:**
+- `OpenRouterProvider` (uses `/api/v1/models` + `x-total-tokens`)
+- `GroqProvider` (uses Groq's pricing API + response headers)
+- `OllamaProvider` (free tier, no pricing lookup, basic token counting)
+- `OpenAIProvider` (uses OpenAI's model list + token counting)
+- `GenericProvider` (fallback for any OpenAI-compatible endpoint)
+
+**Configuration:**
+Store provider selection in `anchor.config.json`:
+```json
+{
+  "apiKey": "your-key",
+  "model": "qwen3.5-27b",
+  "endpoint": "https://openrouter.ai/api/v1",
+  "provider": "openrouter"
+}
+```
+
+Auto-detect provider from endpoint URL if not specified.
+---
+
+## Pricing System
+
+### Current State
+- Uses OpenRouter's `/api/v1/models` endpoint
+- Displays pricing in a table during startup
+- Only works when using OpenRouter
+
+### Improved Behavior
+
+**When endpoint matches known provider:**
+1. Fetch pricing from that provider's API
+2. Display pricing in the startup table
+3. Show per-prompt costs in chat output
+
+**When endpoint is generic/unsupported:**
+1. Skip API call (no pricing lookup)
+2. Display `---` or `$` placeholders
+3. Optional: Show "Pricing not available" note
+
+**User Feedback:**
+- Show clear messaging: "Pricing data loaded from OpenRouter"
+- Show: "Pricing not available for this endpoint" (for unsupported)
+- Don't break chat functionality if pricing fails
+
+### Pricing Data Format
+
+Store in `ModelPricing` class:
+```csharp
+class ModelPricing
+{
+    string ModelId;
+    decimal InputPricePerMTokens;
+    decimal OutputPricePerMTokens;
+    double? CacheCreationPricePerMTokens; // if supported
+}
+```
+
+---
+
+## Token Tracking System
+
+### Current State
+- Uses `x-total-tokens` from OpenRouter headers
+- Only works with OpenRouter responses
+
+### Multi-Provider Strategy
+
+**OpenRouter:**
+- Use `x-total-tokens` header
+- Use `x-response-timing` for latency tracking
+
+**Groq:**
+- Use `x-groq-tokens` header
+- Use `x-groq-response-time` for latency
+
+**OpenAI:**
+- Use `x-ai-response-tokens` header (if available)
+- Fall back to response body if needed
+
+**Ollama:**
+- No official token counting
+- Use output length as proxy estimate
+- Optional: Show message token estimates
+
+**Generic/Fallback:**
+- Parse `total_tokens` from response JSON
+- Fall back to character count estimates
+- Show placeholder when unavailable
+
+### Integration Points
+
+**During Chat Session:**
+1. After each response, extract tokens from response headers
+2. Store in `ChatSession.TokensUsed` object
+3. Display in status bar: `Tokens: 128/2048 • Cost: $0.002`
+
+**At Session End:**
+1. Show summary: `Total tokens: 1,024 | Total cost: $0.015`
+2. Write to session log or history file
+
+---
+
+## Implementation Roadmap
+
+### Phase 1: Conditional Pricing (Current Issues First)
+- [ ] Check if endpoint is OpenRouter before fetching pricing
+- [ ] Skip pricing API call for non-OpenRouter endpoints
+- [ ] Show placeholder message if pricing not available
+- [ ] **Time estimate:** 2 hours
+
+### Phase 2: Provider Configuration
+- [ ] Add `provider` field to `AnchorConfig` model
+- [ ] Update `SetupTui` to ask "Which provider?" (openrouter, ollama, groq, etc.)
+- [ ] Auto-detect provider from endpoint URL (smart default)
+- [ ] Write provider to config file on setup
+- [ ] **Time estimate:** 3 hours
+
+### Phase 3: Provider Abstraction
+- [ ] Create `IPricingProvider` interface
+- [ ] Move existing `PricingProvider` to `OpenRouterProvider`
+- [ ] Create `GenericPricingProvider` for fallback
+- [ ] Add provider factory: `ProviderFactory.Create(providerName)`
+- [ ] **Time estimate:** 5 hours
+
+### Phase 4: Token Tracking Enhancement
+- [ ] Create `ITokenTracker` interface
+- [ ] Implement token extraction for multiple providers
+- [ ] Display token usage in status bar
+- [ ] Add per-prompt cost calculation
+- [ ] **Time estimate:** 6 hours
+
+### Phase 5: Second Provider Implementation
+- [ ] Implement `GroqProvider` (similar to OpenRouter)
+- [ ] Test with Groq API
+- [ ] Update documentation
+- [ ] **Time estimate:** 4 hours
+
+### Phase 6: Future-Proofing (Optional)
+- [ ] Add plugin system for custom providers
+- [ ] Allow users to define custom pricing rules
+- [ ] Support OpenRouter-compatible custom endpoints
+- [ ] **Time estimate:** 8+ hours
+
+---
+
+## User Configuration Guide
+
+### Automatic Setup
+Run `/setup` in the chat or `anchor setup` in CLI:
+```
+Which provider are you using?
+1) OpenRouter (qwen models)
+2) Groq (qwen/gemma models)
+3) Ollama (local models)
+4) OpenAI (gpt models)
+5) Custom endpoint
+```
+
+### Manual Configuration
+Edit `anchor.config.json` directly:
+```json
+{
+  "apiKey": "your-api-key",
+  "model": "qwen3.5-27b",
+  "endpoint": "https://api.groq.com/openai/v1",
+  "provider": "groq"  // optional, auto-detected if missing
+}
+```
+
+### Environment Variables
+For custom setup:
+```
+ANCHOR_ENDPOINT=https://api.groq.com/openai/v1
+ANCHOR_PROVIDER=groq
+ANCHOR_API_KEY=...  
+ANCHOR_MODEL=qwen3.5-27b
+```
+
+---
+
+## Known Limitations
+
+### Tier 1 Providers (Full Support)
+**✓ OpenRouter**
+- Pricing: ✓ (native API)
+- Tokens: ✓ (response headers)
+- Cost tracking: ✓
+
+**✓ Groq** (after Phase 4)
+- Pricing: ✓ (will add)
+- Tokens: ✓ (response headers)
+- Cost tracking: ✓
+
+### Tier 2 Providers (Partial Support)
+**○ Ollama**
+- Pricing: ○ (free, no lookup needed)
+- Tokens: ○ (estimated from output)
+- Cost tracking: ○ (placeholder)
+
+**○ OpenAI**
+- Pricing: ○ (manual pricing display)
+- Tokens: ○ (header extraction)
+- Cost tracking: ○ (config-based)
+
+### Tier 3 Providers (Basic Support)
+**□ Custom Endpoints**
+- Pricing: □ (manual only)
+- Tokens: □ (fallback parsing)
+- Cost tracking: □ (user-defined)
+
+---
+
+## Future Enhancements
+
+1. **Pricing Database**: Maintain own pricing database (like OpenRouter's)
+2. **Cost Estimator**: Predict costs before sending message
+3. **Usage Alerts**: Warn user when approaching budget limits
+4. **Multi-Model Support**: Compare costs between different providers
+5. **Plugin System**: Allow community to add new providers
+
+---
+
+## Success Criteria
+
+- ✅ Users can choose from 3+ providers in setup
+- ✅ Pricing displays only for supported endpoints
+- ✅ Token tracking works for all Tier 1 providers
+- ✅ No breaking changes to existing OpenRouter users
+- ✅ Clear documentation on what each provider supports
+- ✅ Graceful degradation for unsupported features
+
+---
+
+*Last Updated: 2025-12-23*
+
@@ -1,4 +1,5 @@
 using System.ClientModel;
+using AnchorCli.Providers;
 using Microsoft.Extensions.AI;
 using OpenAI;
 using AnchorCli;
@@ -15,10 +16,11 @@ if (args.Length > 0 && args[0].Equals("setup", StringComparison.OrdinalIgnoreCas
 }

 // ── Config ──────────────────────────────────────────────────────────────
-const string endpoint = "https://openrouter.ai/api/v1";
 var cfg = AnchorConfig.Load();
 string apiKey = cfg.ApiKey;
 string model = cfg.Model;
+string provider = cfg.Provider ?? "openrouter";
+string endpoint = cfg.Endpoint ?? "https://openrouter.ai/api/v1";

 if (string.IsNullOrWhiteSpace(apiKey))
 {
@@ -26,11 +28,14 @@ if (string.IsNullOrWhiteSpace(apiKey))
    return;
 }

-// ── Fetch model pricing from OpenRouter ─────────────────────────────────
-var pricingProvider = new PricingProvider();
-var tokenTracker = new TokenTracker();
+// ── Create token extractor for this provider ───────────────────────────
+var tokenExtractor = ProviderFactory.CreateTokenExtractorForEndpoint(endpoint);
+var tokenTracker = new TokenTracker { Provider = tokenExtractor.ProviderName };

+// ── Fetch model pricing (only for supported providers) ─────────────────
 ModelInfo? modelInfo = null;
+if (ProviderFactory.IsOpenRouter(endpoint))
+{
    await AnsiConsole.Status()
        .Spinner(Spinner.Known.BouncingBar)
        .SpinnerStyle(Style.Parse("cornflowerblue"))
@@ -38,6 +43,7 @@ await AnsiConsole.Status()
        {
            try
            {
+                var pricingProvider = new OpenRouterProvider();
                modelInfo = await pricingProvider.GetModelInfoAsync(model);
                if (modelInfo?.Pricing != null)
                {
@@ -48,6 +54,7 @@ await AnsiConsole.Status()
            }
            catch { /* pricing is best-effort */ }
        });
+}

 // ── Pretty header ───────────────────────────────────────────────────────
 AnsiConsole.Write(
@@ -68,9 +75,12 @@ var infoTable = new Table()
    .AddColumn(new TableColumn("[dim]Value[/]"));

 infoTable.AddRow("[grey]Model[/]", $"[cyan]{Markup.Escape(modelInfo?.Name ?? model)}[/]");
-infoTable.AddRow("[grey]Endpoint[/]", $"[blue]OpenRouter[/]");
+infoTable.AddRow("[grey]Provider[/]", $"[blue]{tokenExtractor.ProviderName}[/]");
+infoTable.AddRow("[grey]Endpoint[/]", $"[dim]{endpoint}[/]");
 infoTable.AddRow("[grey]CWD[/]", $"[green]{Markup.Escape(Environment.CurrentDirectory)}[/]");

+if (modelInfo?.Pricing != null)
+
 if (modelInfo?.Pricing != null)
 {
    var inM = tokenTracker.InputPrice * 1_000_000m;
@@ -0,0 +1,89 @@
+using System.Net.Http.Headers;
+using System.Text.Json;
+
+namespace AnchorCli.Providers;
+
+/// <summary>
+/// Generic token extractor for any OpenAI-compatible endpoint.
+/// Tries common header names and JSON body parsing.
+/// </summary>
+internal sealed class GenericTokenExtractor : ITokenExtractor
+{
+    public string ProviderName => "Generic";
+
+    public (int inputTokens, int outputTokens)? ExtractTokens(HttpResponseHeaders headers, string? responseBody)
+    {
+        // Try various common header names
+        var headerNames = new[] {
+            "x-total-tokens",
+            "x-ai-response-tokens",
+            "x-tokens",
+            "x-prompt-tokens",
+            "x-completion-tokens"
+        };
+
+        foreach (var headerName in headerNames)
+        {
+            if (headers.TryGetValues(headerName, out var values))
+            {
+                if (int.TryParse(values.FirstOrDefault(), out var tokens))
+                {
+                    // Assume all tokens are output if we can't determine split
+                    return (0, tokens);
+                }
+            }
+        }
+
+        // Fallback: try parsing from response body JSON
+        if (!string.IsNullOrEmpty(responseBody))
+        {
+            try
+            {
+                using var doc = JsonDocument.Parse(responseBody);
+                var root = doc.RootElement;
+
+                // Try standard OpenAI format: usage.prompt_tokens, usage.completion_tokens
+                if (root.TryGetProperty("usage", out var usage))
+                {
+                    var prompt = usage.TryGetProperty("prompt_tokens", out var p) ? p.GetInt32() : 0;
+                    var completion = usage.TryGetProperty("completion_tokens", out var c) ? c.GetInt32() : 0;
+
+                    if (prompt > 0 || completion > 0)
+                    {
+                        return (prompt, completion);
+                    }
+                }
+            }
+            catch
+            {
+                // Ignore parsing errors
+            }
+        }
+
+        return null;
+    }
+
+    public int? ExtractLatency(HttpResponseHeaders headers)
+    {
+        // Try various common latency headers
+        var headerNames = new[] {
+            "x-response-time",
+            "x-response-timing",
+            "x-latency-ms",
+            "x-duration-ms"
+        };
+
+        foreach (var headerName in headerNames)
+        {
+            if (headers.TryGetValues(headerName, out var values))
+            {
+                if (int.TryParse(values.FirstOrDefault(), out var latency))
+                {
+                    return latency;
+                }
+            }
+        }
+
+        return null;
+    }
+}
@@ -0,0 +1,61 @@
+using System.Net.Http.Headers;
+
+namespace AnchorCli.Providers;
+
+/// <summary>
+/// Token extractor for Groq responses.
+/// </summary>
+internal sealed class GroqTokenExtractor : ITokenExtractor
+{
+    public string ProviderName => "Groq";
+
+    public (int inputTokens, int outputTokens)? ExtractTokens(HttpResponseHeaders headers, string? responseBody)
+    {
+        // Groq provides x-groq-tokens header (format: "n;<prompt_tokens>,n;<completion_tokens>")
+        if (headers.TryGetValues("x-groq-tokens", out var values))
+        {
+            var tokenStr = values.FirstOrDefault();
+            if (!string.IsNullOrEmpty(tokenStr))
+            {
+                // Parse format: "n;123,n;45" where first is prompt, second is completion
+                var parts = tokenStr.Split(',');
+                if (parts.Length >= 2)
+                {
+                    var inputPart = parts[0].Trim();
+                    var outputPart = parts[1].Trim();
+
+                    // Extract numbers after "n;"
+                    if (inputPart.StartsWith("n;") && outputPart.StartsWith("n;"))
+                    {
+                        if (int.TryParse(inputPart[2..], out var input) &&
+                            int.TryParse(outputPart[2..], out var output))
+                        {
+                            return (input, output);
+                        }
+                    }
+                }
+            }
+        }
+
+        // Fallback: try parsing from response body
+        if (!string.IsNullOrEmpty(responseBody))
+        {
+            // TODO: Parse usage from JSON body if headers aren't available
+        }
+
+        return null;
+    }
+
+    public int? ExtractLatency(HttpResponseHeaders headers)
+    {
+        if (headers.TryGetValues("x-groq-response-time", out var values))
+        {
+            if (int.TryParse(values.FirstOrDefault(), out var latency))
+            {
+                return latency;
+            }
+        }
+
+        return null;
+    }
+}
@@ -0,0 +1,18 @@
+using AnchorCli.OpenRouter;
+namespace AnchorCli.Providers;
+
+/// <summary>
+/// Interface for fetching model pricing information.
+/// </summary>
+internal interface IPricingProvider
+{
+    /// <summary>
+    /// Fetches pricing info for a specific model.
+    /// </summary>
+    Task<ModelInfo?> GetModelInfoAsync(string modelId, CancellationToken ct = default);
+
+    /// <summary>
+    /// Fetches all available models with pricing.
+    /// </summary>
+    Task<Dictionary<string, ModelInfo>> GetAllModelsAsync(CancellationToken ct = default);
+}
@@ -0,0 +1,25 @@
+using System.Net.Http.Headers;
+
+namespace AnchorCli.Providers;
+
+/// <summary>
+/// Interface for extracting token usage from provider responses.
+/// </summary>
+internal interface ITokenExtractor
+{
+    /// <summary>
+    /// Extracts token usage from response headers and/or body.
+    /// Returns (inputTokens, outputTokens) or null if unavailable.
+    /// </summary>
+    (int inputTokens, int outputTokens)? ExtractTokens(HttpResponseHeaders headers, string? responseBody);
+
+    /// <summary>
+    /// Gets the latency from response headers (in ms).
+    /// </summary>
+    int? ExtractLatency(HttpResponseHeaders headers);
+
+    /// <summary>
+    /// Gets the provider name for display purposes.
+    /// </summary>
+    string ProviderName { get; }
+}
@@ -0,0 +1,39 @@
+using System.Net.Http.Headers;
+
+namespace AnchorCli.Providers;
+
+/// <summary>
+/// Token extractor for Ollama responses.
+/// Ollama doesn't provide official token counts, so we estimate.
+/// </summary>
+internal sealed class OllamaTokenExtractor : ITokenExtractor
+{
+    public string ProviderName => "Ollama";
+
+    public (int inputTokens, int outputTokens)? ExtractTokens(HttpResponseHeaders headers, string? responseBody)
+    {
+        // Ollama doesn't provide token headers
+        return null;
+    }
+
+    public int? ExtractLatency(HttpResponseHeaders headers)
+    {
+        // Ollama doesn't provide latency headers
+        return null;
+    }
+
+    /// <summary>
+    /// Estimates token count from text length (rough approximation).
+    /// Assumes ~4 characters per token on average.
+    /// </summary>
+    public static int EstimateTokens(string text)
+    {
+        if (string.IsNullOrEmpty(text))
+        {
+            return 0;
+        }
+
+        // Rough estimate: 4 characters per token
+        return text.Length / 4;
+    }
+}
@@ -0,0 +1,40 @@
+using System.Net.Http.Json;
+using System.Text.Json;
+using AnchorCli.OpenRouter;
+
+namespace AnchorCli.Providers;
+
+/// <summary>
+/// Pricing provider for OpenRouter API.
+/// </summary>
+internal sealed class OpenRouterProvider : IPricingProvider
+{
+    private const string ModelsUrl = "https://openrouter.ai/api/v1/models";
+    private static readonly HttpClient Http = new();
+    private Dictionary<string, ModelInfo>? _models;
+
+    static OpenRouterProvider()
+    {
+        OpenRouterHeaders.ApplyTo(Http);
+    }
+
+    public async Task<Dictionary<string, ModelInfo>> GetAllModelsAsync(CancellationToken ct = default)
+    {
+        if (_models != null) return _models;
+
+        var response = await Http.GetAsync(ModelsUrl, ct);
+        response.EnsureSuccessStatusCode();
+
+        var json = await response.Content.ReadAsStringAsync(ct);
+        var result = JsonSerializer.Deserialize(json, AppJsonContext.Default.ModelsResponse);
+
+        _models = result?.Data?.ToDictionary(m => m.Id) ?? [];
+        return _models;
+    }
+
+    public async Task<ModelInfo?> GetModelInfoAsync(string modelId, CancellationToken ct = default)
+    {
+        var models = await GetAllModelsAsync(ct);
+        return models.GetValueOrDefault(modelId);
+    }
+}
@@ -0,0 +1,42 @@
+using System.Net.Http.Headers;
+
+namespace AnchorCli.Providers;
+
+/// <summary>
+/// Token extractor for OpenRouter responses.
+/// </summary>
+internal sealed class OpenRouterTokenExtractor : ITokenExtractor
+{
+    public string ProviderName => "OpenRouter";
+
+    public (int inputTokens, int outputTokens)? ExtractTokens(HttpResponseHeaders headers, string? responseBody)
+    {
+        // OpenRouter provides x-total-tokens header
+        if (headers.TryGetValues("x-total-tokens", out var values))
+        {
+            // Note: OpenRouter only provides total tokens, not split
+            // We'll estimate split based on typical ratios if needed
+            if (long.TryParse(values.FirstOrDefault(), out var total))
+            {
+                // For now, return total as output (placeholder until we have better splitting)
+                // In practice, you'd need to track input separately from the request
+                return (0, (int)total);
+            }
+        }
+
+        return null;
+    }
+
+    public int? ExtractLatency(HttpResponseHeaders headers)
+    {
+        if (headers.TryGetValues("x-response-timing", out var values))
+        {
+            if (int.TryParse(values.FirstOrDefault(), out var latency))
+            {
+                return latency;
+            }
+        }
+
+        return null;
+    }
+}
@@ -0,0 +1,70 @@
+namespace AnchorCli.Providers;
+
+/// <summary>
+/// Factory for creating provider instances based on endpoint or provider name.
+/// </summary>
+internal static class ProviderFactory
+{
+    /// <summary>
+    /// Creates a token extractor based on the provider name.
+    /// </summary>
+    public static ITokenExtractor CreateTokenExtractor(string providerName)
+    {
+        return providerName.ToLowerInvariant() switch
+        {
+            "openrouter" => new OpenRouterTokenExtractor(),
+            "groq" => new GroqTokenExtractor(),
+            "ollama" => new OllamaTokenExtractor(),
+            _ => new GenericTokenExtractor()
+        };
+    }
+
+    /// <summary>
+    /// Creates a token extractor by auto-detecting from the endpoint URL.
+    /// </summary>
+    public static ITokenExtractor CreateTokenExtractorForEndpoint(string endpoint)
+    {
+        if (string.IsNullOrEmpty(endpoint))
+        {
+            return new GenericTokenExtractor();
+        }
+
+        var url = endpoint.ToLowerInvariant();
+
+        if (url.Contains("openrouter"))
+        {
+            return new OpenRouterTokenExtractor();
+        }
+
+        if (url.Contains("groq"))
+        {
+            return new GroqTokenExtractor();
+        }
+
+        if (url.Contains("ollama") || url.Contains("localhost") || url.Contains("127.0.0.1"))
+        {
+            return new OllamaTokenExtractor();
+        }
+
+        return new GenericTokenExtractor();
+    }
+
+    /// <summary>
+    /// Creates a pricing provider based on the provider name.
+    /// Only OpenRouter has a pricing API currently.
+    /// </summary>
+    public static IPricingProvider? CreatePricingProvider(string providerName)
+    {
+        return providerName.ToLowerInvariant() switch
+        {
+            "openrouter" => new OpenRouterProvider(),
+            _ => null  // Other providers don't have pricing APIs yet
+        };
+    }
+
+    /// <summary>
+    /// Determines if an endpoint is OpenRouter.
+    /// </summary>
+    public static bool IsOpenRouter(string endpoint) =>
+        !string.IsNullOrEmpty(endpoint) && endpoint.Contains("openrouter", StringComparison.OrdinalIgnoreCase);
+}
@@ -27,10 +27,85 @@ internal static class SetupTui

        AnsiConsole.WriteLine();

+        // ── Provider ────────────────────────────────────────────────────
+        var providers = new List<(string Value, string Description)>
+        {
+            ("openrouter", "default, pricing support"),
+            ("groq", "high-speed inference"),
+            ("ollama", "local, no auth required"),
+            ("openai", "official OpenAI API"),
+            ("custom", "generic OpenAI-compatible endpoint")
+        };
+
+        string currentProvider = config.Provider ?? "openrouter";
+        AnsiConsole.MarkupLine($"  Current provider: [cyan]{Markup.Escape(currentProvider)}[/]");
+
+        var selectedProviderChoice = AnsiConsole.Prompt(
+            new SelectionPrompt<(string Value, string Description)>()
+                .Title("  Select a provider:")
+                .UseConverter(p => p.Value + (string.IsNullOrEmpty(p.Description) ? "" : $" [dim]({p.Description})[/]"))
+                .AddChoices(providers));
+
+        config.Provider = selectedProviderChoice.Value;
+
+
+
+
+        if (config.Provider == "custom")
+        {
+            string customEndpoint = AnsiConsole.Prompt(
+                new TextPrompt<string>("  Enter endpoint URL:")
+                    .DefaultValue(config.Endpoint)
+                    .AllowEmpty());
+
+            if (!string.IsNullOrWhiteSpace(customEndpoint))
+            {
+                config.Endpoint = customEndpoint.Trim();
+            }
+        }
+        else
+        {
+            config.Endpoint = config.Provider.ToLowerInvariant() switch
+            {
+                "openrouter" => "https://openrouter.ai/api/v1",
+                "groq" => "https://api.groq.com/openai/v1",
+                "ollama" => "http://localhost:11434/v1",
+                "openai" => "https://api.openai.com/v1",
+                _ => config.Endpoint
+            };
+        }
+
+        AnsiConsole.WriteLine();
+
        // ── Model ─────────────────────────────────────────────────────
        AnsiConsole.MarkupLine($"  Current model: [cyan]{Markup.Escape(config.Model)}[/]");

-        var models = new List<(string Value, string Description)>
+        var models = config.Provider.ToLowerInvariant() switch
+        {
+            "groq" => new List<(string Value, string Description)>
+            {
+                ("llama-3.3-70b-versatile", "fast, powerful"),
+                ("llama-3.1-8b-instant", "very fast"),
+                ("mixtral-8x7b-32768", "sparse MoE"),
+                ("gemma2-9b-it", "Google's Gemma"),
+                ("Custom...", "")
+            },
+            "ollama" => new List<(string Value, string Description)>
+            {
+                ("llama3.2", "Meta's Llama 3.2"),
+                ("qwen2.5", "Alibaba Qwen"),
+                ("mistral", "Mistral AI"),
+                ("codellama", "code-focused"),
+                ("Custom...", "")
+            },
+            "openai" => new List<(string Value, string Description)>
+            {
+                ("gpt-4o", "most capable"),
+                ("gpt-4o-mini", "fast, affordable"),
+                ("o1-preview", "reasoning model"),
+                ("Custom...", "")
+            },
+            _ => new List<(string Value, string Description)>
            {
                ("qwen/qwen3.5-397b-a17b", "smart, expensive"),
                ("qwen/qwen3.5-122b-a10b", "faster"),
@@ -38,6 +113,7 @@ internal static class SetupTui
                ("qwen/qwen3.5-flash-02-23", "cloud, fast"),
                ("qwen/qwen3.5-plus-02-15", "cloud, smart"),
                ("Custom...", "")
+            }
        };

        string selectedModel = AnsiConsole.Prompt(
@@ -0,0 +1,134 @@
+# Advanced AI Agent CLI System Design
+
+This document outlines the architecture for a completely new, built-from-scratch AI Agent Command Line Interface system, inspired by the lessons learned from the `Anchor CLI` refactoring.
+
+## 1. Core Principles
+*   **Event-Driven UI & Decoupled State:** The UI and display layers communicate exclusively through an asynchronous Event Bus.
+*   **Explicit Control Flow:** Core agent execution utilizes a Mediator pattern (Request/Response) for predictable, traceable control flow rather than pure event spaghetti.
+*   **Dependency Injection:** A robust IoC container manages lifecycles and dependencies.
+*   **Pluggable Architecture:** Everything—from the LLM provider to the UI renderer and memory storage—is an injectable plugin.
+*   **Stateless Components:** Services maintain minimal internal state. State is managed centrally in a session or context store with immutable snapshots.
+*   **Test-First Design:** Complete absence of static delegates and global mutable state ensures every component is unit-testable in isolation.
+*   **Pervasive Cancellation:** Every asynchronous operation accepts a `CancellationToken` for graceful termination.
+
+## 2. High-Level Architecture & Project Structure (AOT-Ready)
+
+The system is structurally divided into three distinct C# projects to enforce decoupling, testability, and future-proof design, while maintaining strict compatibility with **.NET Native AOT** compilation for single-file, zero-dependency distribution on Linux/Windows.
+
+### 2.1 Project: `Anchor.AgentFramework` (Class Library)
+The core logic and abstractions. It has **no knowledge** of the console, the file system, or specific LLM SDKs.
+*   **Contains:** Interfaces (`IEventBus`, `IMediator`, `IAgentAvatar`), Memory Management (`ISessionManager`), Execution Loop (`ChatCoordinator`), and the `ToolRunner`.
+*   **Responsibilities:** Orchestrating the agent's thought process, managing state, and firing events.
+
+### 2.2 Project: `Anchor.Providers` (Class Library)
+The vendor-specific implementations for Language Models.
+*   **Contains:** `OpenAIAvatar`, `AnthropicAvatar`.
+*   **Responsibilities:** Translating the framework's semantic requests into vendor-specific API calls (e.g., mapping `ToolResult` to OpenAI's tool response format) via SDKs like `Azure.AI.OpenAI`.
+
+### 2.3 Project: `Anchor.Cli` (Console Application)
+The "Hosting Shell" and the physical "Senses/Hands" of the application.
+*   **Contains:** `Program.cs` (Composition Root), `RichConsoleRenderer`, `ConsoleInputDispatcher`, and concrete Tool implementations (e.g., `FileSystemTool`, `CmdTool`).
+*   **Responsibilities:** Wiring up Dependency Injection, reading from stdin, rendering UI/spinners to stdout, and executing side-effects on the host OS.
+
+### 2.4 Logical Layers
+
+Across these projects, the system operates in five primary layers:
+
+1.  **Hosting & Lifecycle (The Host)**
+2.  **Event & Messaging Backbone (The Bus)**
+3.  **State & Memory Management (The Brain)**
+4.  **I/O & User Interface (The Senses & Voice)**
+5.  **Execution & Tooling (The Hands)**
+
+### 2.5 Dependency Injection Graph
+
+```text
+Anchor.Cli (Composition Root - Program.cs)
+│
+├── IEventBus → AsyncEventBus
+│
+├── IMemoryStore → VectorMemoryStore / SQLiteMemoryStore
+├── ISessionManager → ContextAwareSessionManager
+│   └── ICompactionStrategy → SemanticCompactionStrategy
+│
+├── IUserInputDispatcher → ConsoleInputDispatcher
+├── ICommandRegistry → DynamicCommandRegistry
+│
+├── IAgentAvatar (LLM Interface) → AnthropicAvatar / OpenAIAvatar
+├── IResponseStreamer → TokenAwareResponseStreamer
+│
+├── IUiRenderer → RichConsoleRenderer
+│   ├── ISpinnerManager → AsyncSpinnerManager
+│   └── IStreamingRenderer → ConsoleStreamingRenderer
+│
+└── IToolRegistry → DynamicToolRegistry
+    └── (Injected Tools: FileSystemTool, CmdTool, WebSearchTool)
+```
+
+## 3. Component Details
+
+### 3.1 The Messaging Backbone: `IEventBus` and `IMediator` (AOT Safe)
+The system utilizes a dual-messaging approach to prevent "event spaghetti":
+*   **Publish-Subscribe (Events):** Used for things that *happened* and might have multiple or zero listeners (e.g., UI updates, diagnostics).
+    *   `EventBus.PublishAsync(EventBase @event)`
+*   **Request-Response (Commands):** Used for linear, required actions with a return value.
+    *   `Mediator.Send(IRequest<TResponse> request)`
+
+> [!WARNING]
+> Standard `MediatR` relies heavily on runtime reflection for handler discovery, making it **incompatible with Native AOT**. We must use an AOT-safe source-generated alternative, such as the [Mediator](https://github.com/martinothamar/Mediator) library, or implement a simple, source-generated Event/Command bus internally.
+
+**Key Events (Pub/Sub):**
+*   `UserInputReceived`: Triggered when the user hits Enter.
+*   `LLMStreamDeltaReceived`: Emitted for token-by-token streaming to the UI.
+*   `ToolExecutionStarted` / `ToolExecutionCompleted`: Emitted for UI spinners and logging.
+*   `ContextLimitWarning`: High token usage indicator.
+
+**Key Commands (Request/Response):**
+*   `ExecuteToolCommand`: Sent from the Avatar to the Tool Runner, returns a `ToolResult`.
+
+### 3.2 The Brain: `ISessionManager` & Memory
+Instead of just a simple list of messages, the new system uses a multi-tiered memory architecture with thread-safe access.
+
+*   **Short-Term Memory (Context Window):** The active conversation. Must yield **Immutable Context Snapshots** to prevent collection modification exceptions when tools/LLM run concurrently with background tasks.
+*   **Long-Term Memory (Vector DB):** Indexed facts, summaries, and user preferences.
+*   **ICompactionStrategy:** 
+    Instead of implicitly using an LLM on the critical path, the system uses tiered, deterministic strategies:
+    1.  **Sliding Window:** Automatically drop the oldest user/assistant message pairs.
+    2.  **Tool Output Truncation:** Remove large file reads from old turns.
+    3.  **LLM Summarization (Optional):** As a last resort, explicitly lock state and summarize old context into a "Context Digest".
+
+### 3.3 The Senses & Voice: Event-Driven CLI UI
+The UI is strictly separated from business logic, which is an ideal architecture for a dedicated CLI tool. The `RichConsoleRenderer` only listens to the `IEventBus`.
+
+*   **Input Loop:** `IUserInputDispatcher` sits in a loop reading stdin. When input is received, it fires `UserInputReceived`. It captures `Ctrl+C` to trigger a global `CancellationToken`.
+*   **Output Loop:** `IUiRenderer` subscribes to `LLMStreamDeltaReceived` and renders tokens. It subscribes to `ToolExecutionStarted` and spins up a dedicated UI spinner, preventing async console output from overwriting the active prompt.
+*   **Headless CLI Mode:** For CI/CD environments or scripting, the system can run non-interactively by simply swapping the `RichConsoleRenderer` with a `BasicLoggingRenderer`—the core agent logic remains untouched.
+
+### 3.4 The Hands: Plugins and Tooling
+Tools are no longer hardcoded.
+
+*   **IToolRegistry:** Discovers tools at startup via Reflection or Assembly Scanning.
+*   **Tool Execution:** When the LLM API returns a `tool_calls` stop reason, the `IAgentAvatar` iteratively or concurrently sends an `ExecuteToolCommand` via the Mediator. It directly awaits the results, appends them to the context snapshot, and resumes the LLM generation. This provides explicit, traceable control flow.
+*   **Cancellation:** Every async method across the entire system accepts a `CancellationToken` to allow graceful termination of infinite loops or runaway processes.
+
+## 4. Execution Flow (Anatomy of a User Turn)
+
+1.  **Input:** User types "Find the bug in main.py".
+2.  **Dispatch:** `ConsoleInputDispatcher` reads it and publishes `UserInputReceived`.
+3.  **Routing:** Built-in command handler (if applicable) checks if it's a structural command (`/clear`, `/exit`). Otherwise `SessionManager` adds it to the active context.
+4.  **Inference:** A `ChatCoordinator` service reacts to the updated context and asks the `IAgentAvatar` for a response.
+5.  **Streaming:** The Avatar calls the Anthropic/OpenAI API. As tokens arrive, it publishes `LLMStreamDeltaReceived`.
+6.  **Rendering:** `RichConsoleRenderer` receives the deltas and prints them to the terminal.
+7.  **Tool Request:** The LLM API returns a tool call. The Avatar dispatches an `ExecuteToolCommand` via the Mediator. The EventBus also publishes a `ToolExecutionStarted` event for the UI spinner.
+8.  **Execution & Feedback:** `ToolRunner` handles the command, runs it safely with the `CancellationToken`, and returns the result back to the Avatar. The Avatar feeds this back to the LLM API automatically.
+9.  **Completion:** The turn ends. The `SessionManager` checks token bounds and runs compaction if necessary.
+
+## 5. Conclusion (Native AOT Focus)
+While `ARCHITECTURE_REFACTOR.md` focuses on migrating a legacy "God Class", this new design assumes a green-field, **AOT-first** approach. 
+To achieve true Native AOT, we must strictly avoid runtime reflection. This means:
+1.  Using `CreateSlimBuilder()` instead of `CreateDefaultBuilder()` in `Microsoft.Extensions.Hosting`.
+2.  Using Source Generators for Dependency Injection setup.
+3.  Using Source Generators for JSON Serialization (`System.Text.Json.Serialization.JsonSerializableAttribute`).
+4.  Replacing reflection-heavy libraries like `MediatR` and `Scrutor` with AOT-friendly source-generated alternatives.
+
+By adhering to these constraints, the resulting single-binary Linux executable will have near-instant startup time and a dramatically reduced memory footprint compared to a standard JIT-compiled .NET application.
@@ -0,0 +1,112 @@
+# Tool Consolidation Ideas
+
+This document outlines opportunities to merge similar tools to simplify the API.
+
+## 1. File Write Operations
+
+**Current tools:** `CreateFile`, `InsertAfter`, `AppendToFile`
+
+**Proposed merge:** `WriteToFile`
+
+```csharp
+public static string WriteToFile(
+    string path,
+    string[] content,
+    string? mode = "create",
+    string? anchor = null)
+```
+
+**Behavior:**
+- `mode="create"` - Creates new file (error if exists)
+- `mode="append"` - Appends to EOF (creates if missing)
+- `mode="insert"` - Inserts after anchor (requires existing file)
+
+**Benefits:**
+- Reduces 3 tools to 1
+- Cleaner API for LLM
+- Unified error handling
+
+## 2. File Move Operations
+
+**Current tools:** `RenameFile`, `CopyFile`
+
+**Proposed merge:** `MoveFile`
+
+```csharp
+public static string MoveFile(
+    string sourcePath,
+    string destinationPath,
+    bool copy = false)
+```
+
+**Behavior:**
+- `copy=false` - Moves file (current RenameFile behavior)
+- `copy=true` - Copies file (current CopyFile behavior)
+
+**Benefits:**
+- 90% identical logic
+- Only difference is File.Move vs File.Copy
+- Both create parent directories
+- Similar error handling patterns
+
+## 3. Grep Operations
+
+**Current tools:** `GrepFile`, `GrepRecursive`
+
+**Proposed merge:** `Grep`
+
+```csharp
+public static string Grep(
+    string path,
+    string pattern,
+    bool recursive = false,
+    string? filePattern = null)
+```
+
+**Behavior:**
+- `recursive=false` - Searches single file (current GrepFile)
+- `recursive=true` - Searches directory recursively (current GrepRecursive)
+- `filePattern` - Optional glob to filter files when recursive
+
+**Benefits:**
+- Very similar logic
+- Reduces 2 tools to 1
+- Cleaner API for LLM
+
+## 4. Delete Operations
+
+**Current tools:** `DeleteFile`, `DeleteDir`
+
+**Proposed merge:** `Delete`
+
+```csharp
+public static string Delete(
+    string path,
+    bool recursive = true)
+```
+
+**Behavior:**
+- Auto-detects if path is file or directory
+- `recursive=true` - Delete directory and all contents
+- `recursive=false` - Only matters for directories (error if not empty)
+
+**Benefits:**
+- Auto-detects file vs directory
+- Similar error handling patterns
+- Reduces 2 tools to 1
+
+## Summary
+
+These consolidations would reduce the tool count from 17 to 13 tools, making the API simpler and easier for the LLM to use effectively.
+
+**High priority merges:**
+1. ✅ File Write Operations (3 → 1)
+2. ✅ File Move Operations (2 → 1)
+3. ✅ Grep Operations (2 → 1)
+4. ✅ Delete Operations (2 → 1)
+
+**Kept separate:**
+- `ReadFile` - distinct read-only operation
+- `ListDir`, `FindFiles`, `GetFileInfo` - different purposes
+- `CreateDir` - simple enough to keep standalone
+- `ReplaceLines`, `InsertAfter`, `DeleteRange` - too complex to merge without confusing LLM
Author	SHA1	Message	Date
tomi	112f1f3202	docs: Document planned architectural refactor and tool consolidation.	2026-03-05 22:10:54 +01:00
tomi	c7e7976d9d	feat: Introduce a pluggable LLM provider system with token extraction, pricing, and updated setup configuration.	2026-03-05 22:02:22 +01:00
tomi	4476cc7f15	docs: Add a detailed plan for multi-provider support, including pricing and token tracking strategies.	2026-03-05 12:41:55 +01:00