diff --git a/PROVIDERS.md b/PROVIDERS.md new file mode 100644 index 0000000..d0e3cfe --- /dev/null +++ b/PROVIDERS.md @@ -0,0 +1,293 @@ +# Provider Support Plan + +## Current Problems + +1. **OpenRouter Hardcoded**: Endpoint, headers, and pricing API calls are hardcoded to OpenRouter +2. **Config Ineffective**: SetupTui allows "custom endpoint" but Program.cs ignores it +3. **Token Count**: Token usage tracking only works with OpenRouter response headers +4. **Pricing Only for One Provider**: Models list shows pricing, but only when using OpenRouter + +--- + +## Goals + +1. Make the system **endpoint-agnostic** +2. Support pricing/token tracking for **multiple providers** +3. Keep **OpenRouter as the default** (familiar) +4. Allow users to configure any OpenAI-compatible endpoint +5. Show pricing/token info **only when available** for each provider + +--- + +## Provider Categories + +### Tier 1: Native Support (Built-in) +- OpenRouter (default) +- Ollama (local, no auth) +- Groq (high-speed inference) +- Anthropic (native or via API) +- OpenAI (official api) + +### Tier 2: Config-Based Support +- Cerebras +- DeepSeek +- Any OpenAI-compatible endpoint that supports custom headers + +### Tier 3: Manual Configuration Required +- Self-hosted endpoints +- Corporate proxies +- Custom middleware layers + +--- + +```csharp +// Example: Provider interface +class PricingProvider +{ + // Get pricing info from provider's API + async Task> GetModelsAsync(string apiKey); + + // Get tokens from response + async Task GetTokensFromResponseAsync(HttpResponseMessage response); + + // Add provider-specific headers if needed + void AddHeaders(HttpRequestMessage request, string apiKey); +} +``` + +**Supported Implementations:** +- `OpenRouterProvider` (uses `/api/v1/models` + `x-total-tokens`) +- `GroqProvider` (uses Groq's pricing API + response headers) +- `OllamaProvider` (free tier, no pricing lookup, basic token counting) +- `OpenAIProvider` (uses OpenAI's model list + token counting) +- `GenericProvider` (fallback for any OpenAI-compatible endpoint) + +**Configuration:** +Store provider selection in `anchor.config.json`: +```json +{ + "apiKey": "your-key", + "model": "qwen3.5-27b", + "endpoint": "https://openrouter.ai/api/v1", + "provider": "openrouter" +} +``` + +Auto-detect provider from endpoint URL if not specified. +--- + +## Pricing System + +### Current State +- Uses OpenRouter's `/api/v1/models` endpoint +- Displays pricing in a table during startup +- Only works when using OpenRouter + +### Improved Behavior + +**When endpoint matches known provider:** +1. Fetch pricing from that provider's API +2. Display pricing in the startup table +3. Show per-prompt costs in chat output + +**When endpoint is generic/unsupported:** +1. Skip API call (no pricing lookup) +2. Display `---` or `$` placeholders +3. Optional: Show "Pricing not available" note + +**User Feedback:** +- Show clear messaging: "Pricing data loaded from OpenRouter" +- Show: "Pricing not available for this endpoint" (for unsupported) +- Don't break chat functionality if pricing fails + +### Pricing Data Format + +Store in `ModelPricing` class: +```csharp +class ModelPricing +{ + string ModelId; + decimal InputPricePerMTokens; + decimal OutputPricePerMTokens; + double? CacheCreationPricePerMTokens; // if supported +} +``` + +--- + +## Token Tracking System + +### Current State +- Uses `x-total-tokens` from OpenRouter headers +- Only works with OpenRouter responses + +### Multi-Provider Strategy + +**OpenRouter:** +- Use `x-total-tokens` header +- Use `x-response-timing` for latency tracking + +**Groq:** +- Use `x-groq-tokens` header +- Use `x-groq-response-time` for latency + +**OpenAI:** +- Use `x-ai-response-tokens` header (if available) +- Fall back to response body if needed + +**Ollama:** +- No official token counting +- Use output length as proxy estimate +- Optional: Show message token estimates + +**Generic/Fallback:** +- Parse `total_tokens` from response JSON +- Fall back to character count estimates +- Show placeholder when unavailable + +### Integration Points + +**During Chat Session:** +1. After each response, extract tokens from response headers +2. Store in `ChatSession.TokensUsed` object +3. Display in status bar: `Tokens: 128/2048 • Cost: $0.002` + +**At Session End:** +1. Show summary: `Total tokens: 1,024 | Total cost: $0.015` +2. Write to session log or history file + +--- + +## Implementation Roadmap + +### Phase 1: Conditional Pricing (Current Issues First) +- [ ] Check if endpoint is OpenRouter before fetching pricing +- [ ] Skip pricing API call for non-OpenRouter endpoints +- [ ] Show placeholder message if pricing not available +- [ ] **Time estimate:** 2 hours + +### Phase 2: Provider Configuration +- [ ] Add `provider` field to `AnchorConfig` model +- [ ] Update `SetupTui` to ask "Which provider?" (openrouter, ollama, groq, etc.) +- [ ] Auto-detect provider from endpoint URL (smart default) +- [ ] Write provider to config file on setup +- [ ] **Time estimate:** 3 hours + +### Phase 3: Provider Abstraction +- [ ] Create `IPricingProvider` interface +- [ ] Move existing `PricingProvider` to `OpenRouterProvider` +- [ ] Create `GenericPricingProvider` for fallback +- [ ] Add provider factory: `ProviderFactory.Create(providerName)` +- [ ] **Time estimate:** 5 hours + +### Phase 4: Token Tracking Enhancement +- [ ] Create `ITokenTracker` interface +- [ ] Implement token extraction for multiple providers +- [ ] Display token usage in status bar +- [ ] Add per-prompt cost calculation +- [ ] **Time estimate:** 6 hours + +### Phase 5: Second Provider Implementation +- [ ] Implement `GroqProvider` (similar to OpenRouter) +- [ ] Test with Groq API +- [ ] Update documentation +- [ ] **Time estimate:** 4 hours + +### Phase 6: Future-Proofing (Optional) +- [ ] Add plugin system for custom providers +- [ ] Allow users to define custom pricing rules +- [ ] Support OpenRouter-compatible custom endpoints +- [ ] **Time estimate:** 8+ hours + +--- + +## User Configuration Guide + +### Automatic Setup +Run `/setup` in the chat or `anchor setup` in CLI: +``` +Which provider are you using? +1) OpenRouter (qwen models) +2) Groq (qwen/gemma models) +3) Ollama (local models) +4) OpenAI (gpt models) +5) Custom endpoint +``` + +### Manual Configuration +Edit `anchor.config.json` directly: +```json +{ + "apiKey": "your-api-key", + "model": "qwen3.5-27b", + "endpoint": "https://api.groq.com/openai/v1", + "provider": "groq" // optional, auto-detected if missing +} +``` + +### Environment Variables +For custom setup: +``` +ANCHOR_ENDPOINT=https://api.groq.com/openai/v1 +ANCHOR_PROVIDER=groq +ANCHOR_API_KEY=... +ANCHOR_MODEL=qwen3.5-27b +``` + +--- + +## Known Limitations + +### Tier 1 Providers (Full Support) +**✓ OpenRouter** +- Pricing: ✓ (native API) +- Tokens: ✓ (response headers) +- Cost tracking: ✓ + +**✓ Groq** (after Phase 4) +- Pricing: ✓ (will add) +- Tokens: ✓ (response headers) +- Cost tracking: ✓ + +### Tier 2 Providers (Partial Support) +**○ Ollama** +- Pricing: ○ (free, no lookup needed) +- Tokens: ○ (estimated from output) +- Cost tracking: ○ (placeholder) + +**○ OpenAI** +- Pricing: ○ (manual pricing display) +- Tokens: ○ (header extraction) +- Cost tracking: ○ (config-based) + +### Tier 3 Providers (Basic Support) +**□ Custom Endpoints** +- Pricing: □ (manual only) +- Tokens: □ (fallback parsing) +- Cost tracking: □ (user-defined) + +--- + +## Future Enhancements + +1. **Pricing Database**: Maintain own pricing database (like OpenRouter's) +2. **Cost Estimator**: Predict costs before sending message +3. **Usage Alerts**: Warn user when approaching budget limits +4. **Multi-Model Support**: Compare costs between different providers +5. **Plugin System**: Allow community to add new providers + +--- + +## Success Criteria + +- ✅ Users can choose from 3+ providers in setup +- ✅ Pricing displays only for supported endpoints +- ✅ Token tracking works for all Tier 1 providers +- ✅ No breaking changes to existing OpenRouter users +- ✅ Clear documentation on what each provider supports +- ✅ Graceful degradation for unsupported features + +--- + +*Last Updated: 2025-12-23* +