docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
2026-03-19 10:01:58 +01:00
parent b28d8998f7
commit 65ca2401ae
16 changed files with 7073 additions and 0 deletions
--- a/docs/components/overview.md
+++ b/docs/components/overview.md
@@ -0,0 +1,603 @@
+# Components Overview
+
+Detailed documentation for each major component in the OpenQuery system.
+
+## 📋 Table of Contents
+
+1. [Component Hierarchy](#component-hierarchy)
+2. [Core Components](#core-components)
+3. [Services](#services)
+4. [Data Models](#data-models)
+5. [Component Interactions](#component-interactions)
+
+## Component Hierarchy
+
+```
+OpenQuery/
+├── Program.cs                    [Entry Point, CLI]
+├── OpenQuery.cs                  [OpenQueryApp - Orchestrator]
+├── Tools/
+│   └── SearchTool.cs            [Pipeline Orchestration]
+├── Services/
+│   ├── OpenRouterClient.cs      [LLM & Embedding API]
+│   ├── SearxngClient.cs         [Search API]
+│   ├── EmbeddingService.cs      [Embedding Generation + Math]
+│   ├── ChunkingService.cs       [Text Splitting]
+│   ├── ArticleService.cs        [Content Extraction]
+│   ├── RateLimiter.cs           [Concurrency Control]
+│   └── StatusReporter.cs        [Progress Display]
+├── Models/
+│   ├── OpenQueryOptions.cs      [CLI Options Record]
+│   ├── Chunk.cs                 [Content + Metadata]
+│   ├── ParallelOptions.cs       [Concurrency Settings]
+│   ├── OpenRouter.cs            [API DTOs]
+│   ├── Searxng.cs               [Search Result DTOs]
+│   └── JsonContexts.cs          [JSON Context]
+└── ConfigManager.cs             [Configuration Persistence]
+```
+
+## Core Components
+
+### 1. Program.cs
+
+**Type**: Console Application Entry Point  
+**Responsibilities**: CLI parsing, dependency wiring, error handling
+
+**Key Elements**:
+- `RootCommand` from System.CommandLine
+- Options: `--chunks`, `--results`, `--queries`, `--short`, `--long`, `--verbose`
+- Subcommand: `configure` (with interactive mode)
+- Configuration loading via `ConfigManager.Load()`
+- Environment variable resolution
+- Service instantiation and coordination
+- Top-level try-catch for error reporting
+
+**Code Flow**:
+1. Load config file
+2. Define CLI options and commands
+3. Set handler for root command
+4. Handler: resolve API key/model → instantiate services → call `OpenQueryApp.RunAsync()`
+5. Set handler for configure command (writes config file)
+6. Invoke command parser: `await rootCommand.InvokeAsync(args)`
+
+**Exit Codes**:
+- 0 = success
+- 1 = error
+
+### 2. OpenQueryApp (OpenQuery.cs)
+
+**Type**: Main Application Class  
+**Responsibilities**: Workflow orchestration, query generation, answer streaming
+
+**Constructor Parameters**:
+- `OpenRouterClient client` - for query gen and final answer
+- `SearchTool searchTool` - for search-retrieve-rank pipeline
+- `string model` - LLM model identifier
+
+**Main Method**: `RunAsync(OpenQueryOptions options)`
+
+**Workflow Steps**:
+1. Create `StatusReporter` (for progress UI)
+2. **Optional Query Generation** (if `options.Queries > 1`):
+   - Create system message instructing JSON array output
+   - Create user message with `options.Question`
+   - Call `client.CompleteAsync()` with query gen model
+   - Parse JSON response; fall back to original question on failure
+   - Result: `List<string> queries` (1 or many)
+3. **Execute Search Pipeline**:
+   - Call `_searchTool.ExecuteAsync()` with queries, options
+   - Receive `string context` (formatted context with source citations)
+   - Progress reported via callback to `StatusReporter`
+4. **Generate Final Answer**:
+   - Build system prompt (append "short" or "long" modifier)
+   - Create user message with `Context:\n{context}\n\nQuestion: {options.Question}`
+   - Stream answer via `client.StreamAsync()`
+   - Write each `chunk.TextDelta` to Console as it arrives
+   - Stop spinner on first chunk, continue streaming
+5. Dispose reporter
+
+**Error Handling**:
+- Exceptions propagate to `Program.cs` top-level handler
+- `HttpRequestException` vs generic `Exception`
+
+**Note**: Query generation uses the same model as final answer; could be separated for cost/performance.
+
+### 3. SearchTool (Tools/SearchTool.cs)
+
+**Type**: Pipeline Orchestrator  
+**Responsibilities**: Execute 4-phase search-retrieve-rank-return workflow
+
+**Constructor Parameters**:
+- `SearxngClient searxngClient`
+- `EmbeddingService embeddingService`
+
+**Main Method**: `ExecuteAsync(originalQuery, generatedQueries, maxResults, topChunksLimit, onProgress, verbose)`
+
+**Returns**: `Task<string>` - formatted context string with source citations
+
+**Pipeline Phases**:
+
+#### Phase 1: ExecuteParallelSearchesAsync
+- Parallelize `searxngClient.SearchAsync(query, maxResults)` for each query
+- Collect all results in `ConcurrentBag<SearxngResult>`
+- Deduplicate by `DistinctBy(r => r.Url)`
+
+**Output**: `List<SearxngResult>` (aggregated, unique)
+
+#### Phase 2: ExecuteParallelArticleFetchingAsync
+- Semaphore: `MaxConcurrentArticleFetches` (default 10)
+- For each `SearxngResult`: fetch URL via `ArticleService.FetchArticleAsync()`
+- Extract article text, title
+- Chunk via `ChunkingService.ChunkText(article.TextContent)`
+- Add each chunk as new `Chunk(content, url, title)`
+
+**Output**: `List<Chunk>` (potentially 50-100 chunks)
+
+#### Phase 3: ExecuteParallelEmbeddingsAsync
+- Start two parallel tasks:
+  1. Query embedding: `embeddingService.GetEmbeddingAsync(originalQuery)`
+  2. Chunk embeddings: `embeddingService.GetEmbeddingsWithRateLimitAsync(chunkTexts, onProgress)`
+- `Parallel.ForEachAsync` with `MaxConcurrentEmbeddingRequests` (default 4)
+- Batch size: 300 chunks per embedding API call
+- Filter chunks with empty embeddings (failed batches)
+
+**Output**: `(float[] queryEmbedding, float[][] chunkEmbeddings)`
+
+#### Phase 4: RankAndSelectTopChunks
+- Calculate cosine similarity for each chunk vs query
+- Assign `chunk.Score`
+- Order by descending score
+- Take `topChunksLimit` (from `--chunks` option)
+- Return `List<Chunk>` (top N)
+
+**Formatting**:
+```csharp
+string context = string.Join("\n\n", topChunks.Select((c, i) =>
+    $"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
+```
+
+**Progress Callbacks**: Invoked at each major step for UI feedback
+
+## Services
+
+### OpenRouterClient
+
+**Purpose**: HTTP client for OpenRouter API (chat completions + embeddings)
+
+**Base URL**: `https://openrouter.ai/api/v1`
+
+**Authentication**: `Authorization: Bearer {apiKey}`
+
+**Methods**:
+
+#### `StreamAsync(ChatCompletionRequest request, CancellationToken)`
+- Sets `request.Stream = true`
+- POST to `/chat/completions`
+- Reads SSE stream line-by-line
+- Parses `data: {json}` chunks
+- Yields `StreamChunk` (text delta or tool call)
+- Supports cancellation
+
+#### `CompleteAsync(ChatCompletionRequest request)`
+- Sets `request.Stream = false`
+- POST to `/chat/completions`
+- Deserializes full response
+- Returns `ChatCompletionResponse`
+
+#### `EmbedAsync(string model, List<string> inputs)`
+- POST to `/embeddings`
+- Returns `float[][]` (ordered by input index)
+
+**Error Handling**: `EnsureSuccessStatusCode()` throws `HttpRequestException` on failure
+
+**Design**: Thin wrapper; no retry logic (delegated to EmbeddingService)
+
+### SearxngClient
+
+**Purpose**: HTTP client for SearxNG metasearch
+
+**Base URL**: Configurable (default `http://localhost:8002`)
+
+**Methods**:
+
+#### `SearchAsync(string query, int limit = 10)`
+- GET `{baseUrl}/search?q={query}&format=json`
+- Deserializes to `SearxngRoot`
+- Returns `Results.Take(limit).ToList()`
+- On failure: returns empty `List<SearxngResult>` (no exception)
+
+**Design**: Very simple; failures are tolerated (OpenQuery continues with other queries)
+
+### EmbeddingService
+
+**Purpose**: Batch embedding generation with rate limiting, parallelization, and retries
+
+**Configuration** (from `ParallelProcessingOptions`):
+- `MaxConcurrentEmbeddingRequests` = 4
+- `EmbeddingBatchSize` = 300
+
+**Default Embedding Model**: `openai/text-embedding-3-small`
+
+**Methods**:
+
+#### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
+- Splits `texts` into batches of `EmbeddingBatchSize`
+- Parallelizes batches with `Parallel.ForEachAsync` + `MaxConcurrentEmbeddingRequests`
+- Each batch: rate-limited + retry-wrapped `client.EmbedAsync(model, batch)`
+- Collects results in order (by batch index)
+- Returns `float[][]` (same order as input texts)
+- Failed batches return empty `float[]` for each text
+
+#### `GetEmbeddingAsync(string text, CancellationToken)`
+- Wraps single-text call in rate limiter + retry
+- Returns `float[]`
+
+#### `CosineSimilarity(float[] v1, float[] v2)`
+- Static method using `TensorPrimitives.CosineSimilarity`
+- Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
+
+**Retry Policy** (Polly):
+- Max 3 attempts
+- 1s base delay, exponential backoff
+- Only `HttpRequestException`
+
+**Rate Limiting**: `RateLimiter` semaphore with `MaxConcurrentEmbeddingRequests`
+
+**Design Notes**:
+- Two similar methods (`GetEmbeddingsAsync` and `GetEmbeddingsWithRateLimitAsync`) - could be consolidated
+- Uses Polly for resilience (good pattern)
+- Concurrency control prevents overwhelming OpenRouter
+
+### ChunkingService
+
+**Purpose**: Split long text into manageable pieces
+
+**Static Class** (no dependencies, pure function)
+
+**Algorithm** (in `ChunkText(string text)`):
+- Constant `MAX_CHUNK_SIZE = 500`
+- While remaining text:
+  - Take up to 500 chars
+  - If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
+  - Trim and add non-empty chunk
+  - Advance start position
+
+**Rationale**: 500 chars is a sweet spot for embeddings - long enough for context, short enough for semantic coherence.
+
+**Edge Cases**: Handles text shorter than 500 chars, empty text, text with no natural breaks.
+
+### ArticleService
+
+**Purpose**: Extract clean article content from URLs
+
+**Method**: `FetchArticleAsync(string url)`
+
+**Implementation**: Delegates to `SmartReader.ParseArticleAsync(url)`
+
+**Returns**: `Article` object (from SmartReader)
+- `Title` (string)
+- `TextContent` (string) - cleaned article body
+- `IsReadable` (bool) - quality indicator
+- Other metadata (author, date, etc.)
+
+**Error Handling**: Exceptions propagate (handled by `SearchTool`)
+
+**Design**: Thin wrapper around third-party library. Could be extended to add caching, custom extraction rules, etc.
+
+### RateLimiter
+
+**Purpose**: Limit concurrent operations via semaphore
+
+**Interface**:
+```csharp
+public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken);
+public async Task ExecuteAsync(Func<Task> action, CancellationToken);
+```
+
+**Implementation**: `SemaphoreSlim` with `WaitAsync` and `Release`
+
+**Disposal**: `IAsyncDisposable` (awaits semaphore disposal)
+
+**Usage**: Wrap API calls that need concurrency control
+```csharp
+var result = await _rateLimiter.ExecuteAsync(async () =>
+    await _client.EmbedAsync(model, batch), cancellationToken);
+```
+
+**Design**: Simple, reusable. Could be replaced with `Polly.RateLimiting` policy but this is lightweight.
+
+### StatusReporter
+
+**Purpose**: Real-time progress UI with spinner and verbose modes
+
+**Architecture**:
+- Producer: UpdateStatus(text) → writes to `Channel<string>`
+- Consumer: Background task `ProcessStatusUpdatesAsync()` reads from channel
+- Spinner: Separate task animates Braille characters every 100ms
+
+**Modes**:
+
+**Verbose Mode** (`_verbose = true`):
+- All progress messages written as `Console.WriteLine()`
+- No spinner
+- Full audit trail
+
+**Compact Mode** (default):
+- Status line with spinner (overwrites same line)
+- Only latest status visible
+- Example: `⠋ Fetching articles 3/10...`
+
+**Key Methods**:
+- `UpdateStatus(message)` - fire-and-forget, non-blocking
+- `WriteLine(text)` - stops spinner temporarily, writes full line
+- `StartSpinner()` / `StopSpinner()` - manual control
+- `ClearStatus()` - ANSI escape `\r\x1b[K` to clear line
+- `Dispose()` - completes channel, waits for background tasks
+
+**Spinner Chars**: `['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']` (Braille patterns, smooth animation)
+
+**ANSI Codes**: `\r` (carriage return), `\x1b[K` (erase to end of line)
+
+**Thread Safety**: Channel is thread-safe; multiple components can write concurrently without locks
+
+**Design**: Well-encapsulated; could be reused in other CLI projects.
+
+### ConfigManager
+
+**Purpose**: Load/save configuration from XDG-compliant location
+
+**Config Path**:
+- `Environment.SpecialFolder.UserProfile` → `~/.config/openquery/config`
+
+**Schema** (`AppConfig`):
+```csharp
+public class AppConfig
+{
+    public string ApiKey { get; set; } = "";
+    public string Model { get; set; } = "qwen/qwen3.5-flash-02-23";
+    public int DefaultQueries { get; set; } = 3;
+    public int DefaultChunks { get; set; } = 3;
+    public int DefaultResults { get; set; } = 5;
+}
+```
+
+**Format**: Simple `key=value` (no INI parser, manual line split)
+
+**Methods**:
+- `Load()` → reads file if exists, returns `AppConfig` (with defaults)
+- `Save(AppConfig)` → writes all 5 keys, overwrites existing
+
+**Design**:
+- Static class (no instances)
+- Creates directory if missing
+- No validation (writes whatever values given)
+- Could be improved with JSON format (but keep simple)
+
+## Data Models
+
+### OpenQueryOptions
+
+**Location**: `Models/OpenQueryOptions.cs`
+
+**Type**: `record`
+
+**Purpose**: Immutable options object passed through workflow
+
+**Properties**:
+- `int Chunks` - top N chunks for context
+- `int Results` - search results per query
+- `int Queries` - number of expanded queries to generate
+- `bool Short` - concise answer flag
+- `bool Long` - detailed answer flag
+- `bool Verbose` - verbose logging flag
+- `string Question` - original user question
+
+**Created**: In `Program.cs` from CLI options + config defaults
+
+**Used By**: `OpenQueryApp.RunAsync()`
+
+### Chunk
+
+**Location**: `Models/Chunk.cs`
+
+**Type**: `record`
+
+**Purpose**: Content chunk with metadata and embedding
+
+**Properties**:
+- `string Content` - extracted text (~500 chars)
+- `string SourceUrl` - article URL
+- `string? Title` - article title (nullable)
+- `float[]? Embedding` - vector embedding (populated by EmbeddingService)
+- `float Score` - relevance score (populated during ranking)
+
+**Lifecycle**:
+1. Instantiated in `SearchTool.ExecuteParallelArticleFetchingAsync` with content, url, title
+2. `Embedding` set in `ExecuteParallelEmbeddingsAsync` after batch processing
+3. `Score` set in `RankAndSelectTopChunks` after cosine similarity
+4. Serialized into context string for final answer
+
+**Equality**: Records provide value equality (based on all properties)
+
+### ParallelProcessingOptions
+
+**Location**: `Models/ParallelOptions.cs`
+
+**Type**: `class` (mutable)
+
+**Purpose**: Concurrency settings for parallel operations
+
+**Properties** (with defaults):
+- `MaxConcurrentArticleFetches` = 10
+- `MaxConcurrentEmbeddingRequests` = 4
+- `EmbeddingBatchSize` = 300
+
+**Used By**: `EmbeddingService` (for embeddings), `SearchTool` (for article fetching)
+
+**Currently**: Hardcoded in `SearchTool` constructor; could be made configurable
+
+### OpenRouter Models (Models/OpenRouter.cs)
+
+**Purpose**: DTOs for OpenRouter API (JSON serializable)
+
+**Chat Completion**:
+- `ChatCompletionRequest` (model, messages, tools, stream)
+- `ChatCompletionResponse` (choices[], usage[])
+- `Message` (role, content, tool_calls, tool_call_id)
+- `ToolDefinition`, `ToolFunction`, `ToolCall`, `FunctionCall`
+- `Choice`, `Usage`
+
+**Embedding**:
+- `EmbeddingRequest` (model, input[])
+- `EmbeddingResponse` (data[], usage)
+- `EmbeddingData` (embedding[], index)
+
+**Streaming**:
+- `StreamChunk` (TextDelta, Tool)
+- `ChatCompletionChunk`, `ChunkChoice`, `ChunkDelta`
+
+**JSON Properties**: Uses `[JsonPropertyName]` to match API
+
+**Serialization**: System.Text.Json with source generation (AppJsonContext)
+
+### Searxng Models (Models/Searxng.cs)
+
+**Purpose**: DTOs for SearxNG search results
+
+**Records**:
+- `SearxngRoot` with `List<SearxngResult> Results`
+- `SearxngResult` with `Title`, `Url`, `Content` (snippet)
+
+**Usage**: Deserialized from SearxNG's JSON response
+
+### JsonContexts
+
+**Location**: `Models/JsonContexts.cs`
+
+**Purpose**: Source-generated JSON serializer context for AOT compatibility
+
+**Pattern**:
+```csharp
+[JsonSerializable(typeof(ChatCompletionRequest))]
+[JsonSerializable(typeof(ChatCompletionResponse))]
+... etc ...
+internal partial class AppJsonContext : JsonSerializerContext
+{
+}
+```
+
+**Generated**: Partial class compiled by source generator
+
+**Used By**: All `JsonSerializer.Serialize/Deserialize` calls with `AppJsonContext.Default.{Type}`
+
+**Benefits**:
+- AOT-compatible (no reflection)
+- Faster serialization (compiled delegates)
+- Smaller binary (trimming-safe)
+
+## Component Interactions
+
+### Dependencies Graph
+
+```
+Program.cs
+├── ConfigManager (load/save)
+├── OpenRouterClient ──┐
+├── SearxngClient ─────┤
+├── EmbeddingService ──┤
+└── SearchTool ────────┤
+                        │
+OpenQueryApp ◄──────────┘
+    │
+    ├── OpenRouterClient (query gen + answer streaming)
+    ├── SearchTool (pipeline)
+    │   ├── SearxngClient (searches)
+    │   ├── ArticleService (fetch)
+    │   ├── ChunkingService (split)
+    │   ├── EmbeddingService (embeddings)
+    │   ├── RateLimiter (concurrency)
+    │   └── StatusReporter (progress via callback)
+    └── StatusReporter (UI)
+```
+
+### Data Flow Between Components
+
+```
+OpenQueryOptions
+    ↓
+OpenQueryApp
+    ├─ Query Generation
+    │   └─ OpenRouterClient.CompleteAsync()
+    │       → List<string> generatedQueries
+    │
+    ├─ Search Pipeline
+    │   └─ SearchTool.ExecuteAsync(originalQuery, generatedQueries, ...)
+    │       ↓
+    │   Phase 1: SearxngClient.SearchAsync(query) × N
+    │       → ConcurrentBag<SearxngResult>
+    │       → List<SearxngResult> (unique)
+    │       ↓
+    │   Phase 2: ArticleService.FetchArticleAsync(url) × M
+    │       → ChunkingService.ChunkText(article.TextContent)
+    │       → ConcurrentBag<Chunk> (content, url, title)
+    │       ↓
+    │   Phase 3: EmbeddingService.GetEmbeddingsAsync(chunkContents)
+    │       → (queryEmbedding, chunkEmbeddings)
+    │       ↓
+    │   Phase 4: CosineSimilarity + Rank
+    │       → List<Chunk> topChunks (with Score, Embedding set)
+    │       ↓
+    │   Format: context string with [Source N: Title](Url)
+    │       → return context string
+    │
+    └─ Final Answer
+        └─ OpenRouterClient.StreamAsync(prompt with context)
+            → stream deltas to Console
+```
+
+### Interface Contracts
+
+**SearchTool → Progress**:
+```csharp
+// Invoked as: onProgress?.Invoke("[Fetching article 1/10: example.com]")
+Action<string>? onProgress
+```
+
+**StatusReporter ← Progress**:
+```csharp
+// Handler in OpenQueryApp:
+(progress) => {
+    if (options.Verbose) reporter.WriteLine(progress);
+    else reporter.UpdateStatus(parsedShorterMessage);
+}
+```
+
+**SearchTool → ArticleService**:
+```csharp
+Article article = await ArticleService.FetchArticleAsync(url);
+```
+
+**SearchTool → EmbeddingService**:
+```csharp
+(float[] queryEmbedding, float[][] chunkEmbeddings) = await ExecuteParallelEmbeddingsAsync(...);
+// Also: embeddingService.GetEmbeddingAsync(text), GetEmbeddingsWithRateLimitAsync(...)
+```
+
+**SearchTool → ChunkingService**:
+```csharp
+List<string> chunks = ChunkingService.ChunkText(article.TextContent);
+```
+
+**SearchTool → RateLimiter**:
+```csharp
+await _rateLimiter.ExecuteAsync(async () => await _client.EmbedAsync(...), ct);
+```
+
+---
+
+## Next Steps
+
+- [OpenQueryApp](openquery-app.md) - Main orchestrator details
+- [SearchTool](search-tool.md) - Pipeline implementation
+- [Services](services.md) - All service classes documented
+- [Models](models.md) - Complete data model reference