docs: add comprehensive documentation with README and detailed guides
- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
This commit is contained in:
682
docs/architecture.md
Normal file
682
docs/architecture.md
Normal file
@@ -0,0 +1,682 @@
|
||||
# Architecture
|
||||
|
||||
Deep dive into OpenQuery's system design, architectural patterns, and data flow.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [System Overview](#system-overview)
|
||||
2. [Architectural Patterns](#architectural-patterns)
|
||||
3. [Component Architecture](#component-architecture)
|
||||
4. [Data Flow](#data-flow)
|
||||
5. [Concurrency Model](#concurrency-model)
|
||||
6. [Error Handling & Resilience](#error-handling--resilience)
|
||||
7. [Performance Considerations](#performance-considerations)
|
||||
8. [Design Decisions](#design-decisions)
|
||||
|
||||
## System Overview
|
||||
|
||||
OpenQuery is a **pipeline-based AI application** that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.
|
||||
|
||||
### Core Design Principles
|
||||
|
||||
1. **Separation of Concerns** - Each component has a single, well-defined responsibility
|
||||
2. **Parallel First** - Wherever possible, operations are parallelized for speed
|
||||
3. **Resilient by Default** - Built-in retries, rate limiting, and graceful degradation
|
||||
4. **Configurable** - Most parameters can be adjusted without code changes
|
||||
5. **Observable** - Progress reporting and verbose mode for debugging
|
||||
|
||||
### High-Level Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ User Layer │
|
||||
│ CLI (System.CommandLine) → OpenQueryApp │
|
||||
└─────────────────────────────┬───────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Orchestration Layer │
|
||||
│ OpenQueryApp → SearchTool (4-phase pipeline) │
|
||||
└─────────────────────────────┬───────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────┼─────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
|
||||
│ Search Layer │ │ Processing Layer │ │ AI Layer │
|
||||
│ │ │ │ │ │
|
||||
│ SearxngClient│ │ ArticleService │ │OpenRouterClient│
|
||||
│ │ │ ChunkingService │ │ │
|
||||
│ │ │ EmbeddingService │ │ │
|
||||
└──────────────┘ └──────────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
## Architectural Patterns
|
||||
|
||||
### 1. Pipeline Pattern
|
||||
|
||||
The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:
|
||||
|
||||
```csharp
|
||||
Phase 1: ExecuteParallelSearchesAsync
|
||||
↓ (List<SearxngResult>)
|
||||
Phase 2: ExecuteParallelArticleFetchingAsync
|
||||
↓ (List<Chunk>)
|
||||
Phase 3: ExecuteParallelEmbeddingsAsync
|
||||
↓ ((queryEmbedding, chunkEmbeddings))
|
||||
Phase 4: RankAndSelectTopChunks
|
||||
↓ (List<Chunk> topChunks)
|
||||
→ Formatted context string returned
|
||||
```
|
||||
|
||||
Each phase:
|
||||
- Accepts input from previous phase
|
||||
- Processes in parallel where applicable
|
||||
- Returns output to next phase
|
||||
- Reports progress via callbacks
|
||||
|
||||
### 2. Service Layer Pattern
|
||||
|
||||
Services (`Services/` directory) are stateless classes that encapsulate specific operations:
|
||||
|
||||
- **Clients**: `OpenRouterClient`, `SearxngClient` (HTTP communication)
|
||||
- **Processors**: `EmbeddingService`, `ChunkingService` (data transformation)
|
||||
- **Extractors**: `ArticleService` (content extraction)
|
||||
- **Infrastructure**: `RateLimiter`, `StatusReporter` (cross-cutting concerns)
|
||||
|
||||
All dependencies are explicit (constructor injection), making services easily testable.
|
||||
|
||||
### 3. Dependency Injection (Manual)
|
||||
|
||||
While not using a DI container, OpenQuery follows DI principles:
|
||||
|
||||
```csharp
|
||||
// Program.cs: instantiate dependencies with explicit parameters
|
||||
var client = new OpenRouterClient(apiKey);
|
||||
var searxngClient = new SearxngClient(searxngUrl);
|
||||
var embeddingService = new EmbeddingService(client);
|
||||
var searchTool = new SearchTool(searxngClient, embeddingService);
|
||||
var openQuery = new OpenQueryApp(client, searchTool, model);
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Clear dependency graph
|
||||
- Easy to substitute mocks for testing
|
||||
- No magic; visible construction
|
||||
|
||||
### 4. Observer Pattern (Progress Reporting)
|
||||
|
||||
`StatusReporter` and progress callbacks implement observer pattern:
|
||||
|
||||
```csharp
|
||||
// SearchTool receives a progress callback
|
||||
public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)
|
||||
|
||||
// Components invoke callback at key milestones
|
||||
onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");
|
||||
|
||||
// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
|
||||
_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);
|
||||
```
|
||||
|
||||
### 5. Resilience Patterns (Polly)
|
||||
|
||||
`EmbeddingService` uses Polly's retry policy:
|
||||
|
||||
```csharp
|
||||
_retryPipeline = new ResiliencePipelineBuilder()
|
||||
.AddRetry(new RetryStrategyOptions
|
||||
{
|
||||
MaxRetryAttempts = 3,
|
||||
Delay = TimeSpan.FromSeconds(1),
|
||||
BackoffType = DelayBackoffType.Exponential,
|
||||
ShouldHandle = new PredicateBuilder()
|
||||
.Handle<HttpRequestException>()
|
||||
})
|
||||
.Build();
|
||||
```
|
||||
|
||||
This automatically retries failed embedding requests with exponential backoff.
|
||||
|
||||
### 6. Producer-Consumer Pattern (Channel-based)
|
||||
|
||||
`StatusReporter` uses `System.Threading.Channels.Channel<string>` for asynchronous progress updates:
|
||||
|
||||
- Producer: `UpdateStatus()` writes messages to channel
|
||||
- Consumer: Background task `ProcessStatusUpdatesAsync()` reads and displays
|
||||
- Benefit: No blocking between progress generation and display
|
||||
|
||||
### 7. Disposable Pattern
|
||||
|
||||
Components that hold unmanaged resources implement `IDisposable` or `IAsyncDisposable`:
|
||||
|
||||
- `StatusReporter` - stops background spinner task
|
||||
- `RateLimiter` - disposes semaphore
|
||||
|
||||
Used via `using` statements for deterministic cleanup.
|
||||
|
||||
## Component Architecture
|
||||
|
||||
### OpenQueryApp (OpenQuery.cs)
|
||||
|
||||
**Role**: Main orchestrator; coordinates the entire workflow
|
||||
|
||||
**Responsibilities**:
|
||||
- Parse CLI options into `OpenQueryOptions`
|
||||
- Load configuration and resolve API keys/models
|
||||
- Optionally generate expanded search queries via LLM
|
||||
- Invoke `SearchTool` with progress callbacks
|
||||
- Stream final answer from LLM
|
||||
|
||||
**Key Methods**:
|
||||
- `RunAsync(OpenQueryOptions)` - Main entry point
|
||||
|
||||
**Interactions**:
|
||||
- Instantiates `OpenRouterClient` (for both query gen and final answer)
|
||||
- Instantiates `SearxngClient` (passed to `SearchTool`)
|
||||
- Instantiates `EmbeddingService` (passed to `SearchTool`)
|
||||
- Instantiates `SearchTool` (orchestration)
|
||||
- `StatusReporter` for UI updates
|
||||
|
||||
### SearchTool (Tools/SearchTool.cs)
|
||||
|
||||
**Role**: Core search-retrieve-rank pipeline orchestrator
|
||||
|
||||
**Responsibilities**:
|
||||
- Execute 4-phase pipeline (search → fetch → embed → rank)
|
||||
- Manage concurrency limits (via semaphores)
|
||||
- Coordinate parallel operations
|
||||
- Generate context string for final answer
|
||||
|
||||
**Interactions**:
|
||||
- Uses `SearxngClient` for Phase 1
|
||||
- Uses `ArticleService` + `ChunkingService` for Phase 2
|
||||
- Uses `EmbeddingService` for Phase 3
|
||||
- Has no external UI dependency (pure logic)
|
||||
|
||||
**Parallelization Strategy**:
|
||||
- **Phase 1**: `Task.WhenAll` on search tasks (unbounded but limited by SearxNG instance)
|
||||
- **Phase 2**: Semaphore (max 10 concurrent fetches)
|
||||
- **Phase 3**: `Parallel.ForEachAsync` (max 4 concurrent embedding batches)
|
||||
|
||||
### EmbeddingService (Services/EmbeddingService.cs)
|
||||
|
||||
**Role**: Generate vector embeddings with batching, rate limiting, and retries
|
||||
|
||||
**Responsibilities**:
|
||||
- Batch embedding requests (default: 300 per batch)
|
||||
- Parallelize batches (default: 4 concurrent)
|
||||
- Apply rate limiting (via `RateLimiter`)
|
||||
- Retry failed requests (Polly)
|
||||
- Calculate cosine similarity
|
||||
|
||||
**Key Methods**:
|
||||
- `GetEmbeddingsAsync(List<string> texts, ...)` - batch with progress
|
||||
- `GetEmbeddingAsync(string text)` - single embedding
|
||||
- `CosineSimilarity(float[], float[])` - static vector math
|
||||
|
||||
**Design Notes**:
|
||||
- Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
|
||||
- Batches of 300 reduce API overhead
|
||||
- Polly retry handles transient failures (429, 500, network blips)
|
||||
|
||||
### OpenRouterClient (Services/OpenRouterClient.cs)
|
||||
|
||||
**Role**: HTTP client for OpenRouter API (completions + embeddings)
|
||||
|
||||
**Responsibilities**:
|
||||
- Serialize requests to JSON (source-generated)
|
||||
- Send HTTP with authorization header
|
||||
- Stream responses for chat completions (IAsyncEnumerable)
|
||||
- Return full responses for non-streaming
|
||||
- Throw on non-2xx status codes
|
||||
|
||||
**Endpoints**:
|
||||
- POST `/chat/completions` (stream and non-stream)
|
||||
- POST `/embeddings`
|
||||
|
||||
**Configuration**:
|
||||
- Base URL: `https://openrouter.ai/api/v1`
|
||||
- Headers: `Authorization: Bearer {apiKey}`, `Accept: application/json`
|
||||
|
||||
**Design**:
|
||||
- Low-level client; no retry logic (retry is in `EmbeddingService`)
|
||||
- Thin wrapper around `HttpClient`
|
||||
- Could be replaced with `HttpClientFactory` in larger apps
|
||||
|
||||
### SearxngClient (Services/SearxngClient.cs)
|
||||
|
||||
**Role**: HTTP client for SearxNG metasearch
|
||||
|
||||
**Responsibilities**:
|
||||
- Construct search URL with query param
|
||||
- GET request and deserialize JSON
|
||||
- Limit results (`.Take(limit)`)
|
||||
- Return empty list on failure (no exceptions)
|
||||
|
||||
**Endpoint**: `GET /search?q={query}&format=json`
|
||||
|
||||
**Design**:
|
||||
- Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
|
||||
- `DistinctBy(r => r.Url)` deduplication happens upstream
|
||||
|
||||
### ArticleService (Services/ArticleService.cs)
|
||||
|
||||
**Role**: Extract clean article content from URLs
|
||||
|
||||
**Responsibilities**:
|
||||
- Call `SmartReader.ParseArticleAsync(url)`
|
||||
- Return `Article` object with `Title`, `TextContent`, `IsReadable`
|
||||
|
||||
**Design**:
|
||||
- Single responsibility: extraction only (no fetching, no chunking)
|
||||
- SmartReader handles all complexity (HTML parsing, boilerplate removal)
|
||||
- Exceptions propagate to `SearchTool` (handled there)
|
||||
|
||||
### ChunkingService (Services/ChunkingService.cs)
|
||||
|
||||
**Role**: Split long text into 500-char chunks at natural boundaries
|
||||
|
||||
**Algorithm**:
|
||||
1. Start at index 0
|
||||
2. Take up to 500 chars
|
||||
3. If not at end, backtrack to last space/newline/period
|
||||
4. Add chunk, advance start
|
||||
5. Repeat until done
|
||||
|
||||
**Design**:
|
||||
- Static class (stateless utility)
|
||||
- No dependencies
|
||||
- Pure function (input text → output chunks)
|
||||
|
||||
### RateLimiter (Services/RateLimiter.cs)
|
||||
|
||||
**Role**: Limit concurrent operations via semaphore
|
||||
|
||||
**Responsibilities**:
|
||||
- Wrap actions with semaphore acquisition
|
||||
- Support both sync and async actions
|
||||
- Implement `IAsyncDisposable` for cleanup
|
||||
|
||||
**Usage Pattern**:
|
||||
```csharp
|
||||
await _rateLimiter.ExecuteAsync(async () =>
|
||||
{
|
||||
// operation limited by semaphore
|
||||
return await SomeApiCall();
|
||||
});
|
||||
```
|
||||
|
||||
**Design**:
|
||||
- Generic wrapper (can return TResult)
|
||||
- `SemaphoreSlim` initialized at construction
|
||||
- Used in `EmbeddingService` for parallel embedding batches
|
||||
|
||||
### StatusReporter (Services/StatusReporter.cs)
|
||||
|
||||
**Role**: Show real-time progress with spinner or verbose mode
|
||||
|
||||
**Responsibilities**:
|
||||
- Maintain spinner animation (background task)
|
||||
- Receive status updates via channel
|
||||
- Display updates with appropriate formatting
|
||||
- Stop spinner on completion
|
||||
|
||||
**Features**:
|
||||
- **Spinner mode** (non-verbose): `⠋ Fetching...` with animated Braille chars
|
||||
- **Verbose mode**: `[Fetching article 1/10: example.com]` on separate lines
|
||||
- **Thread-safe**: Channel is safe for concurrent writes
|
||||
- **Non-blocking**: Background spinner doesn't block updates
|
||||
|
||||
**Design**:
|
||||
- `Channel<string>` for asynchronous producer-consumer
|
||||
- Background task (`_statusProcessor`) reads from channel
|
||||
- Spinner runs on its own task with 100ms delay per frame
|
||||
- `IDisposable` ensures proper cleanup
|
||||
|
||||
### ConfigManager (ConfigManager.cs)
|
||||
|
||||
**Role**: Load and save configuration from/to file
|
||||
|
||||
**Responsibilities**:
|
||||
- Get config path (XDG: `~/.config/openquery/config`)
|
||||
- Parse key-value pairs (no INI library, manual parsing)
|
||||
- Provide `AppConfig` object with defaults
|
||||
- Save settings back to file
|
||||
|
||||
**Design**:
|
||||
- Static class (no instances)
|
||||
- Creates config directory if missing
|
||||
- Line-by-line parsing (simple, no dependencies)
|
||||
- Could be improved with proper INI parser or JSON
|
||||
|
||||
## Data Flow
|
||||
|
||||
### End-to-End Data Flow
|
||||
|
||||
```
|
||||
User: "What is quantum entanglement?"
|
||||
|
||||
1. OpenQueryOptions created
|
||||
{ Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }
|
||||
|
||||
2. Query Generation (if Queries > 1)
|
||||
→ ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
|
||||
→ Deserialize to List<string> (generatedQueries)
|
||||
|
||||
3. Search Phase
|
||||
generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
|
||||
→ DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)
|
||||
|
||||
4. Fetch Phase
|
||||
searchResults → Parallel.ArticleService.FetchArticleAsync → Article
|
||||
→ ChunkingService.ChunkText (split into ~500-char pieces)
|
||||
→ ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)
|
||||
|
||||
5. Embedding Phase
|
||||
originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
|
||||
chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)
|
||||
|
||||
6. Ranking Phase
|
||||
For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
|
||||
OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)
|
||||
|
||||
7. Answer Phase
|
||||
context = string.Join("\n\n", topChunks.Select(...))
|
||||
→ ChatCompletionRequest to OpenRouter with context + question
|
||||
→ StreamAsync → Console.Write(delta) (real-time display)
|
||||
|
||||
Result: User sees answer with [Source N] citations
|
||||
```
|
||||
|
||||
### Data Structures
|
||||
|
||||
**Chunk** - The core data structure flowing through the pipeline:
|
||||
```csharp
|
||||
public record Chunk(
|
||||
string Content, // Text content (~500 chars)
|
||||
string SourceUrl, // Where it came from
|
||||
string? Title = null // Article title
|
||||
)
|
||||
{
|
||||
public float[]? Embedding { get; set; } // Added in Phase 3
|
||||
public float Score { get; set; } // Added in Phase 4
|
||||
}
|
||||
```
|
||||
|
||||
**Data Flow State**:
|
||||
- Phase 1-2: `Chunk` without embedding
|
||||
- Phase 3: `Chunk.Embedding` populated
|
||||
- Phase 4: `Chunk.Score` populated
|
||||
- Phase 5: Serialized into context string
|
||||
|
||||
### Memory Footprint
|
||||
|
||||
**Per 15-article run (approximate)**:
|
||||
- Raw HTML (fetched): ~5MB (transient, discarded after extract)
|
||||
- Articles: ~500KB (15 articles × ~30KB extracted text)
|
||||
- Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
|
||||
- Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
|
||||
- Total peak: ~1-2MB (excluding OpenRouter's memory usage)
|
||||
|
||||
**Note**: AOT compilation reduces runtime memory compared to JIT.
|
||||
|
||||
## Concurrency Model
|
||||
|
||||
OpenQuery uses multiple parallelization strategies:
|
||||
|
||||
### Unbounded Parallelism (Task.WhenAll)
|
||||
|
||||
**Where**: Search queries (Phase 1)
|
||||
|
||||
```csharp
|
||||
var searchTasks = generatedQueries.Select(async query => { ... });
|
||||
await Task.WhenAll(searchTasks);
|
||||
```
|
||||
|
||||
**Rationale**: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.
|
||||
|
||||
**Risk**: Could overwhelm SearxNG if `--queries` is set very high (100+). Default 3 is safe.
|
||||
|
||||
### Semaphore-Controlled Parallelism
|
||||
|
||||
**Where**: Article fetching (Phase 2)
|
||||
|
||||
```csharp
|
||||
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
|
||||
await Task.WhenAll(fetchTasks); // Each task waits on semaphore
|
||||
```
|
||||
|
||||
**Rationale**: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.
|
||||
|
||||
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentArticleFetches` (compile-time constant currently).
|
||||
|
||||
### Parallel.ForEachAsync with MaxDegreeOfParallelism
|
||||
|
||||
**Where**: Embedding batch processing (Phase 3)
|
||||
|
||||
```csharp
|
||||
await Parallel.ForEachAsync(
|
||||
batchIndices,
|
||||
new ParallelOptions { MaxDegreeOfParallelism = 4 },
|
||||
async (batchIndex, ct) => { ... }
|
||||
);
|
||||
```
|
||||
|
||||
**Rationale**: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.
|
||||
|
||||
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentEmbeddingRequests` (compile-time).
|
||||
|
||||
### Progress Reporting (Channel)
|
||||
|
||||
**Where**: All phases pass `onProgress` callback
|
||||
|
||||
**Implementation**:
|
||||
- `StatusReporter.UpdateStatus()` → writes to channel
|
||||
- Background task reads channel and displays
|
||||
- Non-blocking; callbacks are fire-and-forget (TryWrite)
|
||||
|
||||
**Thread Safety**: Channel is thread-safe; multiple phases may write concurrently.
|
||||
|
||||
## Error Handling & Resilience
|
||||
|
||||
### HTTP Errors
|
||||
|
||||
**OpenRouterClient**:
|
||||
- Calls `response.EnsureSuccessStatusCode()` → throws `HttpRequestException` on 4xx/5xx
|
||||
- No retry (handled at higher level in `EmbeddingService`)
|
||||
|
||||
**SearxngClient**:
|
||||
- Returns empty `List<SearxngResult>` on non-success
|
||||
- No exception thrown (searches are non-critical; if some queries fail, others proceed)
|
||||
|
||||
### Retry Policy (Polly)
|
||||
|
||||
**Location**: `EmbeddingService` constructor
|
||||
|
||||
**Scope**: Only embedding requests (`_client.EmbedAsync`)
|
||||
|
||||
**Policy**:
|
||||
- Max 3 attempts
|
||||
- Exponential backoff: 1s, 2s, 4s
|
||||
- Only retries `HttpRequestException` (network errors, 429, 5xx)
|
||||
|
||||
**Why not on chat completions?**
|
||||
- Query generation and final answer are critical; failures should surface immediately
|
||||
- Could be added in future if transient failures are common
|
||||
|
||||
### Graceful Degradation
|
||||
|
||||
**Query Generation Failure**:
|
||||
```csharp
|
||||
try { ... generate queries ... }
|
||||
catch (Exception ex)
|
||||
{
|
||||
// Fall back to original question as sole query
|
||||
if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
|
||||
}
|
||||
```
|
||||
|
||||
**Embedding Batch Failure**:
|
||||
```csharp
|
||||
catch
|
||||
{
|
||||
// Return empty embeddings for this batch (chunk will be filtered out)
|
||||
var emptyBatch = new float[batch.Count][];
|
||||
// fill with empty arrays
|
||||
results.Add((batchIndex, emptyBatch));
|
||||
}
|
||||
```
|
||||
|
||||
**Article Fetch Failure**:
|
||||
```csharp
|
||||
try { await ArticleService.FetchArticleAsync(url); }
|
||||
catch (Exception ex)
|
||||
{
|
||||
if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
|
||||
// Chunk not added; continue with others
|
||||
}
|
||||
```
|
||||
|
||||
### User-Facing Errors
|
||||
|
||||
Top-level exception handler in `Program.cs`:
|
||||
```csharp
|
||||
try { await openQuery.RunAsync(options); }
|
||||
catch (HttpRequestException ex)
|
||||
{
|
||||
Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
|
||||
Environment.Exit(1);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
|
||||
Environment.Exit(1);
|
||||
}
|
||||
```
|
||||
|
||||
### Cancellation Support
|
||||
|
||||
`OpenRouterClient.StreamAsync` and `EmbeddingService` methods accept `CancellationToken`.
|
||||
|
||||
Used in:
|
||||
- Streaming answer (Ctrl+C stops immediately)
|
||||
- Parallel embeddings (can be cancelled)
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Latency Breakdown (Typical)
|
||||
|
||||
| Stage | Time | Description |
|
||||
|-------|------|-------------|
|
||||
| Query generation | 2-5s | LLM generates 3-5 queries |
|
||||
| Searches | 3-8s | 3-5 parallel SearxNG queries |
|
||||
| Article fetching | 5-15s | 10-20 parallel fetches (network + parse) |
|
||||
| Embeddings | 2-4s | 50-100 chunks in 4-parallel batches |
|
||||
| Final answer | 5-20s | Depends on answer length (streaming) |
|
||||
| **Total** | **15-50s** | Varies widely based on network & content |
|
||||
|
||||
### Bottlenecks
|
||||
|
||||
1. **Network I/O** (article fetching, API calls) - can't be CPU-bound
|
||||
2. **OpenRouter API latency** - varies by model and load
|
||||
3. **SmartReader parsing** - CPU-bound for large HTML
|
||||
4. **Embedding API rate** - OpenRouter may rate limit if too many concurrent
|
||||
|
||||
### Optimization Strategies
|
||||
|
||||
- **Parallelism**: Already maximized within API constraints
|
||||
- **Caching**: Not implemented; future enhancement could cache embeddings per URL
|
||||
- **Batching**: 300-chunk batches reduce API overhead
|
||||
- **AOT**: Native compilation reduces startup overhead vs JIT
|
||||
|
||||
### Scalability Limits
|
||||
|
||||
- **Memory**: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
|
||||
- **API Limits**: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
|
||||
- **SearxNG Limits**: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Not Use a DI Container?
|
||||
|
||||
OpenQuery manually wires dependencies in `Program.cs`. For a small CLI app, this is:
|
||||
- Simpler (no container configuration)
|
||||
- More explicit (easy to trace dependency graph)
|
||||
- No runtime overhead
|
||||
- Easier to understand for contributors
|
||||
|
||||
Container-based DI would be overengineering.
|
||||
|
||||
### Why AOT?
|
||||
|
||||
.NET 10 AOT provides:
|
||||
- **Fast startup** (<100ms vs ~500ms JIT)
|
||||
- **Smaller footprint** (trimmed, no JIT)
|
||||
- **No runtime dependencies** (self-contained)
|
||||
- Better for CLI tools distributed to users
|
||||
|
||||
Trade-offs:
|
||||
- Longer build time
|
||||
- Some reflection-based APIs not supported (not needed here)
|
||||
- Less flexible (can't load dynamic assemblies, but not needed)
|
||||
|
||||
### Why SmartReader for Article Extraction?
|
||||
|
||||
SmartReader uses Readability-based algorithm similar to Firefox Reader View:
|
||||
- Removes ads, navigation, comments, boilerplate
|
||||
- Extracts main article content
|
||||
- Handles malformed HTML gracefully
|
||||
- Zero dependencies (pure .NET)
|
||||
|
||||
Alternatives considered:
|
||||
- `HtmlAgilityPack` (too low-level, need to implement extraction logic)
|
||||
- `AngleSharp` (similar, still need extraction)
|
||||
- External services (like diffbot) - require API keys, costs money
|
||||
|
||||
SmartReader is the sweet spot: free, good quality, easy integration.
|
||||
|
||||
### Why Embeddings + Cosine Similarity vs Full-Text Search?
|
||||
|
||||
Full-text search (like Lucene) would:
|
||||
- Require inverted index, more complex
|
||||
- Be faster for exact keyword matching
|
||||
- Not understand semantic similarity
|
||||
|
||||
Embeddings provide:
|
||||
- Semantic similarity (understand meaning, not just keywords)
|
||||
- Simple math (cosine similarity of float arrays)
|
||||
- No index to maintain (just compute on-the-fly)
|
||||
|
||||
Trade-off: Embedding API cost and latency vs LRU cache potential.
|
||||
|
||||
### Why Not RAG (Retrieval Augmented Generation) Framework?
|
||||
|
||||
OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:
|
||||
- Add dependency bloat
|
||||
- Reduce control
|
||||
- Increase abstraction complexity
|
||||
|
||||
Custom implementation is ~1000 LOC and perfectly matches needs.
|
||||
|
||||
### Why System.CommandLine?
|
||||
|
||||
Provides:
|
||||
- Native-like CLI help (`openquery --help`)
|
||||
- Strongly-typed options
|
||||
- Command hierarchy (main + subcommands)
|
||||
- Good error messages
|
||||
|
||||
Alternative: `CommandLineParser` (older) or manual parsing. System.CommandLine is modern and actively developed.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Components](components/overview.md) - Deep dive into each module
|
||||
- [API Reference](api/cli.md) - Complete command documentation
|
||||
- [Troubleshooting](troubleshooting.md) - Debug issues
|
||||
|
||||
---
|
||||
|
||||
**Want to extend OpenQuery?** Check the [Components](components/overview.md) guide to understand each piece.
|
||||
Reference in New Issue
Block a user