docs: Add a detailed plan for multi-provider support, including pricing and token tracking strategies.

2026-03-05 12:41:55 +01:00
parent f2c3e5032d
commit 4476cc7f15
1 changed files with 293 additions and 0 deletions
@@ -0,0 +1,293 @@
 # Provider Support Plan
 ## Current Problems
 1. **OpenRouter Hardcoded**: Endpoint, headers, and pricing API calls are hardcoded to OpenRouter
 2. **Config Ineffective**: SetupTui allows "custom endpoint" but Program.cs ignores it
 3. **Token Count**: Token usage tracking only works with OpenRouter response headers
 4. **Pricing Only for One Provider**: Models list shows pricing, but only when using OpenRouter
 ---
 ## Goals
 1. Make the system **endpoint-agnostic**
 2. Support pricing/token tracking for **multiple providers**
 3. Keep **OpenRouter as the default** (familiar)
 4. Allow users to configure any OpenAI-compatible endpoint
 5. Show pricing/token info **only when available** for each provider
 ---
 ## Provider Categories
 ### Tier 1: Native Support (Built-in)
 - OpenRouter (default)
 - Ollama (local, no auth)
 - Groq (high-speed inference)
 - Anthropic (native or via API)
 - OpenAI (official api)
 ### Tier 2: Config-Based Support
 - Cerebras
 - DeepSeek
 - Any OpenAI-compatible endpoint that supports custom headers
 ### Tier 3: Manual Configuration Required
 - Self-hosted endpoints
 - Corporate proxies
 - Custom middleware layers
 ---
 ```csharp
 // Example: Provider interface
 class PricingProvider
 {
    // Get pricing info from provider's API
    async Task<List<ModelPricing>> GetModelsAsync(string apiKey);
    // Get tokens from response
    async Task<TokenUsage> GetTokensFromResponseAsync(HttpResponseMessage response);
    // Add provider-specific headers if needed
    void AddHeaders(HttpRequestMessage request, string apiKey);
 }
 ```
 **Supported Implementations:**
 - `OpenRouterProvider` (uses `/api/v1/models` + `x-total-tokens`)
 - `GroqProvider` (uses Groq's pricing API + response headers)
 - `OllamaProvider` (free tier, no pricing lookup, basic token counting)
 - `OpenAIProvider` (uses OpenAI's model list + token counting)
 - `GenericProvider` (fallback for any OpenAI-compatible endpoint)
 **Configuration:**
 Store provider selection in `anchor.config.json`:
 ```json
 {
  "apiKey": "your-key",
  "model": "qwen3.5-27b",
  "endpoint": "https://openrouter.ai/api/v1",
  "provider": "openrouter"
 }
 ```
 Auto-detect provider from endpoint URL if not specified.
 ---
 ## Pricing System
 ### Current State
 - Uses OpenRouter's `/api/v1/models` endpoint
 - Displays pricing in a table during startup
 - Only works when using OpenRouter
 ### Improved Behavior
 **When endpoint matches known provider:**
 1. Fetch pricing from that provider's API
 2. Display pricing in the startup table
 3. Show per-prompt costs in chat output
 **When endpoint is generic/unsupported:**
 1. Skip API call (no pricing lookup)
 2. Display `---` or `$` placeholders
 3. Optional: Show "Pricing not available" note
 **User Feedback:**
 - Show clear messaging: "Pricing data loaded from OpenRouter"
 - Show: "Pricing not available for this endpoint" (for unsupported)
 - Don't break chat functionality if pricing fails
 ### Pricing Data Format
 Store in `ModelPricing` class:
 ```csharp
 class ModelPricing
 {
    string ModelId;
    decimal InputPricePerMTokens;
    decimal OutputPricePerMTokens;
    double? CacheCreationPricePerMTokens; // if supported
 }
 ```
 ---
 ## Token Tracking System
 ### Current State
 - Uses `x-total-tokens` from OpenRouter headers
 - Only works with OpenRouter responses
 ### Multi-Provider Strategy
 **OpenRouter:**
 - Use `x-total-tokens` header
 - Use `x-response-timing` for latency tracking
 **Groq:**
 - Use `x-groq-tokens` header
 - Use `x-groq-response-time` for latency
 **OpenAI:**
 - Use `x-ai-response-tokens` header (if available)
 - Fall back to response body if needed
 **Ollama:**
 - No official token counting
 - Use output length as proxy estimate
 - Optional: Show message token estimates
 **Generic/Fallback:**
 - Parse `total_tokens` from response JSON
 - Fall back to character count estimates
 - Show placeholder when unavailable
 ### Integration Points
 **During Chat Session:**
 1. After each response, extract tokens from response headers
 2. Store in `ChatSession.TokensUsed` object
 3. Display in status bar: `Tokens: 128/2048 • Cost: $0.002`
 **At Session End:**
 1. Show summary: `Total tokens: 1,024 | Total cost: $0.015`
 2. Write to session log or history file
 ---
 ## Implementation Roadmap
 ### Phase 1: Conditional Pricing (Current Issues First)
 - [ ] Check if endpoint is OpenRouter before fetching pricing
 - [ ] Skip pricing API call for non-OpenRouter endpoints
 - [ ] Show placeholder message if pricing not available
 - [ ] **Time estimate:** 2 hours
 ### Phase 2: Provider Configuration
 - [ ] Add `provider` field to `AnchorConfig` model
 - [ ] Update `SetupTui` to ask "Which provider?" (openrouter, ollama, groq, etc.)
 - [ ] Auto-detect provider from endpoint URL (smart default)
 - [ ] Write provider to config file on setup
 - [ ] **Time estimate:** 3 hours
 ### Phase 3: Provider Abstraction
 - [ ] Create `IPricingProvider` interface
 - [ ] Move existing `PricingProvider` to `OpenRouterProvider`
 - [ ] Create `GenericPricingProvider` for fallback
 - [ ] Add provider factory: `ProviderFactory.Create(providerName)`
 - [ ] **Time estimate:** 5 hours
 ### Phase 4: Token Tracking Enhancement
 - [ ] Create `ITokenTracker` interface
 - [ ] Implement token extraction for multiple providers
 - [ ] Display token usage in status bar
 - [ ] Add per-prompt cost calculation
 - [ ] **Time estimate:** 6 hours
 ### Phase 5: Second Provider Implementation
 - [ ] Implement `GroqProvider` (similar to OpenRouter)
 - [ ] Test with Groq API
 - [ ] Update documentation
 - [ ] **Time estimate:** 4 hours
 ### Phase 6: Future-Proofing (Optional)
 - [ ] Add plugin system for custom providers
 - [ ] Allow users to define custom pricing rules
 - [ ] Support OpenRouter-compatible custom endpoints
 - [ ] **Time estimate:** 8+ hours
 ---
 ## User Configuration Guide
 ### Automatic Setup
 Run `/setup` in the chat or `anchor setup` in CLI:
 ```
 Which provider are you using?
 1) OpenRouter (qwen models)
 2) Groq (qwen/gemma models)
 3) Ollama (local models)
 4) OpenAI (gpt models)
 5) Custom endpoint
 ```
 ### Manual Configuration
 Edit `anchor.config.json` directly:
 ```json
 {
  "apiKey": "your-api-key",
  "model": "qwen3.5-27b",
  "endpoint": "https://api.groq.com/openai/v1",
  "provider": "groq"  // optional, auto-detected if missing
 }
 ```
 ### Environment Variables
 For custom setup:
 ```
 ANCHOR_ENDPOINT=https://api.groq.com/openai/v1
 ANCHOR_PROVIDER=groq
 ANCHOR_API_KEY=...  
 ANCHOR_MODEL=qwen3.5-27b
 ```
 ---
 ## Known Limitations
 ### Tier 1 Providers (Full Support)
 **✓ OpenRouter**
 - Pricing: ✓ (native API)
 - Tokens: ✓ (response headers)
 - Cost tracking: ✓
 **✓ Groq** (after Phase 4)
 - Pricing: ✓ (will add)
 - Tokens: ✓ (response headers)
 - Cost tracking: ✓
 ### Tier 2 Providers (Partial Support)
 **○ Ollama**
 - Pricing: ○ (free, no lookup needed)
 - Tokens: ○ (estimated from output)
 - Cost tracking: ○ (placeholder)
 **○ OpenAI**
 - Pricing: ○ (manual pricing display)
 - Tokens: ○ (header extraction)
 - Cost tracking: ○ (config-based)
 ### Tier 3 Providers (Basic Support)
 **□ Custom Endpoints**
 - Pricing: □ (manual only)
 - Tokens: □ (fallback parsing)
 - Cost tracking: □ (user-defined)
 ---
 ## Future Enhancements
 1. **Pricing Database**: Maintain own pricing database (like OpenRouter's)
 2. **Cost Estimator**: Predict costs before sending message
 3. **Usage Alerts**: Warn user when approaching budget limits
 4. **Multi-Model Support**: Compare costs between different providers
 5. **Plugin System**: Allow community to add new providers
 ---
 ## Success Criteria
 - ✅ Users can choose from 3+ providers in setup
 - ✅ Pricing displays only for supported endpoints
 - ✅ Token tracking works for all Tier 1 providers
 - ✅ No breaking changes to existing OpenRouter users
 - ✅ Clear documentation on what each provider supports
 - ✅ Graceful degradation for unsupported features
 ---
 *Last Updated: 2025-12-23*