docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
2026-03-19 10:01:58 +01:00
parent b28d8998f7
commit 65ca2401ae
16 changed files with 7073 additions and 0 deletions
@@ -0,0 +1,528 @@
+# Models Reference
+
+Complete reference for all data models, DTOs, and records in OpenQuery.
+
+## 📋 Table of Contents
+
+1. [Core Data Models](#core-data-models)
+2. [OpenRouter API Models](#openrouter-api-models)
+3. [SearxNG API Models](#searxng-api-models)
+4. [JSON Serialization](#json-serialization)
+5. [Model Relationships](#model-relationships)
+
+## Core Data Models
+
+### OpenQueryOptions
+
+**Location**: `Models/OpenQueryOptions.cs`  
+**Type**: `record`  
+**Purpose**: Immutable options object for a single query execution
+
+```csharp
+public record OpenQueryOptions(
+    int Chunks,      // Number of top chunks to include in context
+    int Results,     // Search results per generated query
+    int Queries,     // Number of search queries to generate (if >1)
+    bool Short,      // Request concise answer
+    bool Long,       // Request detailed answer
+    bool Verbose,    // Enable verbose logging
+    string Question  // Original user question (required)
+);
+```
+
+**Lifecycle**:
+- Created in `Program.cs` by combining CLI options, config defaults, and environment variables
+- Passed to `OpenQueryApp.RunAsync(options)`
+
+**Validation**: None (assumes valid values from CLI parser/config)
+
+**Example**:
+```csharp
+var options = new OpenQueryOptions(
+    Chunks: 3,
+    Results: 5,
+    Queries: 3,
+    Short: false,
+    Long: false,
+    Verbose: true,
+    Question: "What is quantum entanglement?"
+);
+```
+
+---
+
+### Chunk
+
+**Location**: `Models/Chunk.cs`  
+**Type**: `record`  
+**Purpose**: Content chunk with metadata, embedding, and relevance score
+
+```csharp
+public record Chunk(
+    string Content,      // Text content (typically ~500 chars)
+    string SourceUrl,    // Original article URL
+    string? Title = null // Article title (optional, may be null)
+)
+{
+    public float[]? Embedding { get; set; }  // Vector embedding (1536-dim for text-embedding-3-small)
+    public float Score { get; set; }         // Relevance score (0-1, higher = more relevant)
+}
+```
+
+**Lifecycle**:
+1. **Created** in `SearchTool.ExecuteParallelArticleFetchingAsync`:
+   ```csharp
+   chunks.Add(new Chunk(chunkText, result.Url, article.Title));
+   ```
+   At this point: `Embedding = null`, `Score = 0`
+
+2. **Embedded** in `SearchTool.ExecuteParallelEmbeddingsAsync`:
+   ```csharp
+   validChunks[i].Embedding = validEmbeddings[i];
+   ```
+
+3. **Scored** in `SearchTool.RankAndSelectTopChunks`:
+   ```csharp
+   chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
+   ```
+
+4. **Formatted** into context string:
+   ```csharp
+   $"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"
+   ```
+
+**Properties**:
+- `Content`: Never null/empty (filters empty chunks in `ChunkingService`)
+- `SourceUrl`: Always provided (from `SearxngResult.Url`)
+- `Title`: May be null if article extraction failed to get title
+- `Embedding`: Null until phase 3; may remain null if embedding failed
+- `Score`: 0 until phase 4; irrelevant for non-embedded chunks
+
+**Equality**: Records use value equality (all properties compared). Two chunks with same content/url/title are equal; embeddings and scores ignored for equality (as they're mutable).
+
+---
+
+### ParallelProcessingOptions
+
+**Location**: `Models/ParallelOptions.cs`  
+**Type**: `class`  
+**Purpose**: Configuration for parallel/concurrent operations
+
+```csharp
+public class ParallelProcessingOptions
+{
+    public int MaxConcurrentArticleFetches { get; set; } = 10;
+    public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
+    public int EmbeddingBatchSize { get; set; } = 300;
+}
+```
+
+**Usage**:
+- Instantiated in `SearchTool` constructor (hardcoded new)
+- Passed to `EmbeddingService` constructor
+- Read by `SearchTool` for article fetching semaphore
+
+**Default Values**:
+| Property | Default | Effect |
+|----------|---------|--------|
+| `MaxConcurrentArticleFetches` | 10 | Up to 10 articles fetched simultaneously |
+| `MaxConcurrentEmbeddingRequests` | 4 | Up to 4 embedding batches in parallel |
+| `EmbeddingBatchSize` | 300 | Each embedding API call handles up to 300 texts |
+
+**Current Limitation**: These are **compile-time defaults** (hardcoded in `SearchTool.cs`). To make them configurable:
+1. Add to `AppConfig`
+2. Read in `ConfigManager`
+3. Pass through `SearchTool` constructor
+
+---
+
+## OpenRouter API Models
+
+**Location**: `Models/OpenRouter.cs`  
+**Purpose**: DTOs for OpenRouter's REST API (JSON serialization)
+
+### Chat Completion
+
+#### `ChatCompletionRequest`
+```csharp
+public record ChatCompletionRequest(
+    [property: JsonPropertyName("model")] string Model,
+    [property: JsonPropertyName("messages")] List<Message> Messages,
+    [property: JsonPropertyName("tools")] List<ToolDefinition>? Tools = null,
+    [property: JsonPropertyName("stream")] bool Stream = false
+);
+```
+
+**Example**:
+```json
+{
+  "model": "qwen/qwen3.5-flash-02-23",
+  "messages": [
+    { "role": "system", "content": "You are a helpful assistant." },
+    { "role": "user", "content": "What is 2+2?" }
+  ],
+  "stream": true
+}
+```
+
+#### `Message`
+```csharp
+public record Message(
+    [property: JsonPropertyName("role")] string Role,
+    [property: JsonPropertyName("content")] string? Content = null,
+    [property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null,
+    [property: JsonPropertyName("tool_call_id")] string? ToolCallId = null
+)
+{
+    // Factory method for tool responses
+    public static Message FromTool(string content, string toolCallId) =>
+        new Message("tool", content, null, toolCallId);
+}
+```
+
+**Roles**: `"system"`, `"user"`, `"assistant"`, `"tool"`
+
+**Usage**:
+- `Content` for text messages
+- `ToolCalls` when assistant requests tool use
+- `ToolCallId` when responding to tool call
+
+#### `ChatCompletionResponse`
+```csharp
+public record ChatCompletionResponse(
+    [property: JsonPropertyName("choices")] List<Choice> Choices,
+    [property: JsonPropertyName("usage")] Usage? Usage = null
+);
+
+public record Choice(
+    [property: JsonPropertyName("message")] Message Message,
+    [property: JsonPropertyName("finish_reason")] string? FinishReason = null
+);
+```
+
+**Response Example**:
+```json
+{
+  "choices": [
+    {
+      "message": {
+        "role": "assistant",
+        "content": "Answer text..."
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 100,
+    "completion_tokens": 50,
+    "total_tokens": 150
+  }
+}
+```
+
+#### `Usage`
+```csharp
+public record Usage(
+    [property: JsonPropertyName("prompt_tokens")] int PromptTokens,
+    [property: JsonPropertyName("completion_tokens")] int CompletionTokens,
+    [property: JsonPropertyName("total_tokens")] int TotalTokens
+);
+```
+
+### Tool Calling (Not Currently Used)
+
+#### `ToolDefinition` / `ToolFunction`
+```csharp
+public record ToolDefinition(
+    [property: JsonPropertyName("type")] string Type,  // e.g., "function"
+    [property: JsonPropertyName("function")] ToolFunction Function
+);
+
+public record ToolFunction(
+    [property: JsonPropertyName("name")] string Name,
+    [property: JsonPropertyName("description")] string Description,
+    [property: JsonPropertyName("parameters")] JsonElement Parameters  // JSON Schema
+);
+```
+
+#### `ToolCall` / `FunctionCall`
+```csharp
+public record ToolCall(
+    [property: JsonPropertyName("id")] string Id,
+    [property: JsonPropertyName("type")] string Type,
+    [property: JsonPropertyName("function")] FunctionCall Function
+);
+
+public record FunctionCall(
+    [property: JsonPropertyName("name")] string Name,
+    [property: JsonPropertyName("arguments")] string Arguments  // JSON string
+);
+```
+
+**Note**: OpenQuery doesn't use tools currently, but models are defined for future tool-calling capability.
+
+### Streaming
+
+#### `StreamChunk`
+```csharp
+public record StreamChunk(
+    string? TextDelta = null,
+    ClientToolCall? Tool = null
+);
+```
+
+Yielded by `OpenRouterClient.StreamAsync()` for each SSE event.
+
+#### `ChatCompletionChunk` (Server Response)
+```csharp
+public record ChatCompletionChunk(
+    [property: JsonPropertyName("choices")] List<ChunkChoice> Choices
+);
+
+public record ChunkChoice(
+    [property: JsonPropertyName("delta")] ChunkDelta Delta
+);
+
+public record ChunkDelta(
+    [property: JsonPropertyName("content")] string? Content = null,
+    [property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null
+);
+```
+
+**Streaming Response Example** (SSE):
+```
+data: {"choices":[{"delta":{"content":"Hello"}}]}
+data: {"choices":[{"delta":{"content":" world"}}]}
+data: [DONE]
+```
+
+`OpenRouterClient.StreamAsync` parses and yields `StreamChunk` with non-null `TextDelta` for content.
+
+### Embeddings
+
+#### `EmbeddingRequest`
+```csharp
+public record EmbeddingRequest(
+    [property: JsonPropertyName("model")] string Model,
+    [property: JsonPropertyName("input")] List<string> Input
+);
+```
+
+**Example**:
+```json
+{
+  "model": "openai/text-embedding-3-small",
+  "input": ["text 1", "text 2", ...]
+}
+```
+
+#### `EmbeddingResponse`
+```csharp
+public record EmbeddingResponse(
+    [property: JsonPropertyName("data")] List<EmbeddingData> Data,
+    [property: JsonPropertyName("usage")] Usage Usage
+);
+
+public record EmbeddingData(
+    [property: JsonPropertyName("embedding")] float[] Embedding,
+    [property: JsonPropertyName("index")] int Index
+);
+```
+
+**Response Example**:
+```json
+{
+  "data": [
+    { "embedding": [0.1, 0.2, ...], "index": 0 },
+    { "embedding": [0.3, 0.4, ...], "index": 1 }
+  ],
+  "usage": {
+    "prompt_tokens": 100,
+    "total_tokens": 100
+  }
+}
+```
+
+**Note**: `_client.EmbedAsync` orders by `index` to match input order.
+
+---
+
+## SearxNG API Models
+
+**Location**: `Models/Searxng.cs`  
+**Purpose**: DTOs for SearxNG's JSON response format
+
+### `SearxngRoot`
+```csharp
+public record SearxngRoot(
+    [property: JsonPropertyName("results")] List<SearxngResult> Results
+);
+```
+
+Top-level response object.
+
+### `SearxngResult`
+```csharp
+public record SearxngResult(
+    [property: JsonPropertyName("title")] string Title,
+    [property: JsonPropertyName("url")] string Url,
+    [property: JsonPropertyName("content")] string Content  // Snippet/description
+);
+```
+
+**Fields**:
+- `Title`: Result title (from page `<title>` or OpenGraph)
+- `Url`: Absolute URL to article
+- `Content`: Short snippet (~200 chars) from search engine
+
+**Usage**:
+- `Url` passed to `ArticleService.FetchArticleAsync`
+- `Title` used as fallback if article extraction fails
+- `Content` currently unused (could be for quick answer without fetching)
+
+**Example Response**:
+```json
+{
+  "results": [
+    {
+      "title": "Quantum Entanglement - Wikipedia",
+      "url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
+      "content": "Quantum entanglement is a physical phenomenon..."
+    }
+  ]
+}
+```
+
+---
+
+## JSON Serialization
+
+### JsonContext (Source Generation)
+
+**Location**: `Models/JsonContexts.cs`  
+**Purpose**: Provide source-generated JSON serializer context for AOT compatibility
+
+#### Declaration
+```csharp
+[JsonSerializable(typeof(ChatCompletionRequest))]
+[JsonSerializable(typeof(ChatCompletionResponse))]
+[JsonSerializable(typeof(ChatCompletionChunk))]
+[JsonSerializable(typeof(EmbeddingRequest))]
+[JsonSerializable(typeof(EmbeddingResponse))]
+[JsonSerializable(typeof(SearxngRoot))]
+[JsonJsonSerializer(typeof(List<string>))]
+internal partial class AppJsonContext : JsonSerializerContext
+{
+}
+```
+
+**Usage**:
+```csharp
+var json = JsonSerializer.Serialize(request, AppJsonContext.Default.ChatCompletionRequest);
+var response = JsonSerializer.Deserialize(json, AppJsonContext.Default.ChatCompletionResponse);
+```
+
+**Benefits**:
+- **AOT-compatible**: No reflection, works with PublishAot=true
+- **Performance**: Pre-compiled serializers are faster
+- **Trimming safe**: Unused serializers trimmed automatically
+
+**Generated**: Partial class compiled by source generator (no manual implementation)
+
+**Important**: Must include ALL types that will be serialized/deserialized in `[JsonSerializable]` attributes, otherwise runtime exception in AOT.
+
+---
+
+## Model Relationships
+
+### Object Graph (Typical Execution)
+
+```
+OpenQueryOptions
+    ↓
+OpenQueryApp.RunAsync()
+    │
+    ├─ queryGenerationMessages (List<Message>)
+    │   ├─ system: "You are an expert researcher..."
+    │   └─ user: "Generate N queries for: {question}"
+    │       ↓
+    │   ChatCompletionRequest → OpenRouter → ChatCompletionResponse
+    │       ↓
+    │   List<string> generatedQueries
+    │
+    ├─ SearchTool.ExecuteAsync()
+    │       ↓
+    │   ┌─────────────────────────────────────┐
+    │   │ Phase 1: Parallel Searches         │
+    │   │ SearxngClient.SearchAsync(query) × N
+    │   │ → List<SearxngResult>              │
+    │   │   (Title, Url, Content)            │
+    │   └─────────────────────────────────────┘
+    │       ↓
+    │   ┌─────────────────────────────────────┐
+    │   │ Phase 2: Article Fetch & Chunking  │
+    │   │ ArticleService.FetchAsync(Url) × M
+    │   │ → Article (TextContent, Title)
+    │   │ → ChunkingService.ChunkText → List<string> chunks
+    │   │ → Chunk(content, url, title) × K  │
+    │   └─────────────────────────────────────┘
+    │       ↓
+    │   ┌─────────────────────────────────────┐
+    │   │ Phase 3: Embeddings                │
+    │   │ EmbeddingService.GetEmbeddingsAsync(chunkContents)
+    │   │ → float[][] chunkEmbeddings        │
+    │   │ → Set chunk.Embedding for each     │
+    │   │ Also: GetEmbeddingAsync(question) → float[] queryEmbedding
+    │   └─────────────────────────────────────┘
+    │       ↓
+    │   ┌─────────────────────────────────────┐
+    │   │ Phase 4: Ranking                   │
+    │   │ For each chunk: Score = CosineSimilarity(queryEmbedding, chunk.Embedding)
+    │   │ → Set chunk.Score                 │
+    │   │ → OrderByDescending(Score)        │
+    │   │ → Take(topChunksLimit) → topChunks (List<Chunk>)
+    │   └─────────────────────────────────────┘
+    │       ↓
+    │   Context string: formatted topChunks
+    │       ↓
+    └─ OpenQueryApp → final ChatCompletionRequest
+        System: "Answer based on context..."
+        User: "Context:\n{context}\n\nQuestion: {question}"
+            ↓
+        StreamAsync() → StreamChunk.TextDelta → Console
+```
+
+### Record Immutability
+
+Most DTOs are `record` types:
+- **Immutable**: Properties are init-only (`{ get; init; }`)
+- **Value semantics**: Equality based on content
+- **Thread-safe**: Can be shared across threads
+
+**Exception**:
+- `Chunk`: Has mutable properties `Embedding` and `Score` (set during pipeline)
+- `ParallelProcessingOptions`: Class with mutable setters
+- `AppConfig`: Class with mutable setters
+
+---
+
+## Next Steps
+
+- **[API Reference](../../api/cli.md)** - How these models are used in CLI commands
+- **[OpenRouterClient](../../services/OpenRouterClient.md)** - Uses OpenRouter models
+- **[SearxngClient](../../services/SearxngClient.md)** - Uses Searxng models
+- **[SearchTool](../../components/search-tool.md)** - Orchestrates all models
+
+---
+
+**Quick Reference Table**
+
+| Model | Category | Purpose | Mutable? |
+|-------|----------|---------|----------|
+| `OpenQueryOptions` | Core | CLI options | No (record) |
+| `Chunk` | Core | Content + metadata + ranking | Partially (Embedding, Score) |
+| `ParallelProcessingOptions` | Config | Concurrency settings | Yes (class) |
+| `ChatCompletionRequest/Response` | OpenRouter | LLM API | No |
+| `EmbeddingRequest/Response` | OpenRouter | Embeddings API | No |
+| `SearxngRoot/Result` | SearxNG | Search results | No |
+| `AppJsonContext` | Internal | JSON serialization | No (generated partial) |
@@ -0,0 +1,395 @@
+# OpenQueryApp Component
+
+Deep dive into the `OpenQueryApp` class - the main application orchestrator.
+
+## Overview
+
+`OpenQueryApp` is the heart of OpenQuery. It coordinates all components, manages the workflow from question to answer, and handles progress reporting.
+
+## Location
+`OpenQuery.cs` in project root
+
+## Class Definition
+
+```csharp
+public class OpenQueryApp
+{
+    private readonly OpenRouterClient _client;
+    private readonly SearchTool _searchTool;
+    private readonly string _model;
+
+    public OpenQueryApp(
+        OpenRouterClient client,
+        SearchTool searchTool,
+        string model);
+    
+    public async Task RunAsync(OpenQueryOptions options);
+}
+```
+
+**Dependencies**:
+- `OpenRouterClient` - for query generation and final answer streaming
+- `SearchTool` - for search-retrieve-rank pipeline
+- `string _model` - model identifier to use for LLM calls
+
+**Lifecycle**: Instantiated once per query execution in `Program.cs`, then `RunAsync()` called once.
+
+## RunAsync Workflow
+
+```csharp
+public async Task RunAsync(OpenQueryOptions options)
+{
+    // 1. Setup
+    using var reporter = new StatusReporter(options.Verbose);
+    reporter.StartSpinner();
+    
+    // 2. Query Generation (if needed)
+    List<string> queries = await GenerateQueriesIfNeededAsync(options, reporter);
+    
+    // 3. Search Pipeline
+    string searchResult = await ExecuteSearchPipelineAsync(options, queries, reporter);
+    
+    // 4. Final Answer Streaming
+    await StreamFinalAnswerAsync(options, searchResult, reporter);
+}
+```
+
+### Step 1: Status Reporter Setup
+
+```csharp
+using var reporter = new StatusReporter(options.Verbose);
+reporter.StartSpinner();
+```
+
+- Creates `StatusReporter` (implements `IDisposable`)
+- Starts spinner animation (unless verbose)
+- `using` ensures disposal on exit
+
+### Step 2: Query Generation
+
+**When**: `options.Queries > 1` (user wants multiple search queries)
+
+**Purpose**: Use LLM to generate diverse, optimized search queries from the original question
+
+**System Prompt** (hardcoded in `OpenQuery.cs`):
+```
+You are an expert researcher. The user will ask a question. Your task is to 
+generate optimal search queries to gather comprehensive information.
+
+Instructions:
+1. Break down complex questions.
+2. Use synonyms and alternative phrasing.
+3. Target different aspects (entities, mechanisms, pros/cons, history).
+
+CRITICAL: Output must be a valid JSON array of strings ONLY. No markdown, 
+explanations, or other text.
+```
+
+**Request**:
+```csharp
+var queryGenMessages = new List<Message>
+{
+    new Message("system", systemPrompt),
+    new Message("user", $"Generate {options.Queries} distinct search queries for:\n{options.Question}")
+};
+var request = new ChatCompletionRequest(_model, queryGenMessages);
+var response = await _client.CompleteAsync(request);
+```
+
+**Response Parsing**:
+```csharp
+var content = response.Choices.FirstOrDefault()?.Message.Content;
+if (!string.IsNullOrEmpty(content))
+{
+    // Remove markdown code fences if present
+    content = Regex.Replace(content, @"```json\s*|\s*```", "").Trim();
+    
+    // Deserialize to List<string>
+    var generatedQueries = JsonSerializer.Deserialize(content, AppJsonContext.Default.ListString);
+    if (generatedQueries != null && generatedQueries.Count > 0)
+    {
+        queries = generatedQueries;
+    }
+}
+```
+
+**Fallback**: If any step fails (exception, null, empty, invalid JSON), use `new List<string> { options.Question }` (single query = original)
+
+**Note**: Query generation reuses the same model as final answer. This could be optimized:
+- Use cheaper/faster model for query gen
+- Separate model configuration
+- Cache query generation results
+
+### Step 3: Search Pipeline Execution
+
+```csharp
+var searchResult = await _searchTool.ExecuteAsync(
+    options.Question,
+    queries,
+    options.Results,
+    options.Chunks,
+    (progress) => {
+        if (options.Verbose)
+            reporter.WriteLine(progress);
+        else
+            reporter.UpdateStatus(parsedMessage);
+    },
+    options.Verbose);
+```
+
+**Parameters**:
+- `originalQuery`: User's original question (used for final embedding)
+- `generatedQueries`: From step 2 (or fallback)
+- `maxResults`: `options.Results` (search results per query)
+- `topChunksLimit`: `options.Chunks` (top N chunks to return)
+- `onProgress`: Callback to update UI
+- `verbose`: Passed through to `SearchTool`
+
+**Returns**: `string context` - formatted context with source citations
+
+**Progress Handling**:
+- In verbose mode: all progress printed as lines (via `reporter.WriteLine()`)
+- In compact mode: parse progress messages to show concise status (e.g., "Fetching articles 3/10...")
+
+### Step 4: Final Answer Streaming
+
+**Status Update**:
+```csharp
+if (!options.Verbose)
+    reporter.UpdateStatus("Asking AI...");
+else
+{
+    reporter.ClearStatus();
+    Console.WriteLine();
+}
+```
+
+**Build System Prompt**:
+```csharp
+var systemPrompt = "You are a helpful AI assistant. Answer the user's question in depth, based on the provided context. Be precise and accurate. You can mention sources or citations.";
+if (options.Short) systemPrompt += " Give a very short concise answer.";
+if (options.Long) systemPrompt += " Give a long elaborate detailed answer.";
+```
+
+**Prompt Structure**:
+```
+System: {systemPrompt}
+User: Context:
+{searchResult}
+
+Question: {options.Question}
+```
+
+Where `searchResult` is:
+```
+[Source 1: Title](URL)
+Content chunk 1
+
+[Source 2: Title](URL)
+Content chunk 2
+
+...
+```
+
+**Streaming**:
+```csharp
+var requestStream = new ChatCompletionRequest(_model, messages);
+var assistantResponse = new StringBuilder();
+var isFirstChunk = true;
+
+using var streamCts = new CancellationTokenSource();
+await foreach (var chunk in _client.StreamAsync(requestStream, streamCts.Token))
+{
+    if (chunk.TextDelta == null) continue;
+    
+    if (isFirstChunk)
+    {
+        reporter.StopSpinner();
+        if (!options.Verbose) reporter.ClearStatus();
+        else Console.Write("Assistant: ");
+        isFirstChunk = false;
+    }
+    
+    Console.Write(chunk.TextDelta);
+    assistantResponse.Append(chunk.TextDelta);
+}
+```
+
+**Key Points**:
+- `StreamAsync` yields `StreamChunk` objects (text deltas)
+- First chunk stops spinner and clears status line
+- Each delta written to Console immediately (real-time feel)
+- Entire response accumulated in `assistantResponse` (though not used elsewhere)
+- `CancellationTokenSource` passed but not canceled (Ctrl+C would cancel from outside)
+
+**Finally Block**:
+```csharp
+finally
+{
+    reporter.StopSpinner();
+}
+```
+Ensures spinner stops even if streaming fails.
+
+**End**:
+```csharp
+Console.WriteLine(); // Newline after complete answer
+```
+
+## Error Handling
+
+`RunAsync` itself does not catch exceptions. All exceptions propagate to `Program.cs`:
+
+```csharp
+try
+{
+    var openQuery = new OpenQueryApp(client, searchTool, model);
+    await openQuery.RunAsync(options);
+}
+catch (HttpRequestException ex)
+{
+    Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
+    Environment.Exit(1);
+}
+catch (Exception ex)
+{
+    Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
+    Environment.Exit(1);
+}
+```
+
+**Common Exceptions**:
+- `HttpRequestException` - network failures, API errors
+- `JsonException` - malformed JSON from API
+- `TaskCanceledException` - timeout or user interrupt
+- `Exception` - anything else
+
+**No Retries at This Level**: Fail fast; user sees error immediately. Lower-level retries exist (embedding service).
+
+## Performance Characteristics
+
+**Query Generation**:
+- One non-streaming LLM call
+- Takes 2-5 seconds depending on model
+- Typically <1000 tokens
+
+**Search Pipeline** (`SearchTool.ExecuteAsync`):
+- See `SearchTool.md` for detailed timing breakdown
+- Total 10-30 seconds typically
+
+**Final Answer Streaming**:
+- Streaming LLM call
+- Time depends on answer length (typically 5-20 seconds)
+- User sees words appear progressively
+
+**Total End-to-End**: 15-50 seconds for typical query
+
+## Design Decisions
+
+### Why Not Stream Query Generation?
+
+Query generation currently uses `CompleteAsync` (non-streaming). Could be streamed but:
+- Queries are short (JSON array)
+- Streaming offers no UX benefit (user doesn't see intermediate queries)
+- Simpler to wait for all queries before proceeding
+
+### Why Build Prompt Manually Instead of Templates?
+
+Simple string concatenation is fine for few prompts. Pros:
+- No template dependencies
+- Easy to read and modify
+- No runtime compilation overhead
+
+Cons:
+- No validation
+- Could benefit from prompt engineering framework
+
+### Why Accumulate `assistantResponse` StringBuilder?
+
+Currently built but not used. Could be:
+- Saved to file (future feature: `--output file.md`)
+- Analyzed for token counting
+- Removed if not needed
+
+### Could Query Generation Be Cached?
+
+Yes! For repeated questions (common in scripts), cache query results:
+- `Dictionary<string, List<string>>` cache in memory
+- Or persistent cache (Redis, file)
+- Not implemented (low priority)
+
+### Single Responsibility Violation?
+
+`OpenQueryApp` does:
+- Query generation
+- Pipeline orchestration
+- Answer streaming
+
+That's 3 responsibilities, but they're tightly coupled to the "query → answer" workflow. Separating them would add complexity without clear benefit. Acceptable as "application coordinator".
+
+## Extension Points
+
+### Adding New Model for Query Generation
+
+Currently uses same `_model` for queries and answer. To use different models:
+
+1. Add `queryGenerationModel` parameter to constructor
+2. Use it for query gen: `new ChatCompletionRequest(queryGenerationModel, queryGenMessages)`
+3. Keep `_model` for final answer
+
+Or make it configurable via environment variable: `OPENROUTER_QUERY_MODEL`
+
+### Post-Processing Answer
+
+Opportunity to add:
+- Source citation formatting (footnotes, clickable links)
+- Answer summarization
+- Export to Markdown/JSON
+- Text-to-speech
+
+Add after streaming loop, before final newline.
+
+### Progress UI Enhancement
+
+Current `StatusReporter` is basic. Could add:
+- Progress bar with percentage
+- ETA calculation
+- Colors (ANSI) for different message types
+- Logging to file
+- Web dashboard
+
+Would require extending `StatusReporter` or replacing it.
+
+## Testing Considerations
+
+**Challenges**:
+- `RunAsync` is cohesive (hard to unit test in isolation)
+- Depends on many services (need mocks)
+- Asynchronous and streaming
+
+**Recommended Approach**:
+1. Extract interfaces:
+   - `ISearchTool` (wrapper around `SearchTool`)
+   - `IOpenRouterClient` (wrapper around `OpenRouterClient`)
+2. Mock interfaces in tests
+3. Test query generation parsing separately
+4. Test progress callback counting
+5. Test final answer prompt construction
+
+**Integration Tests**:
+- End-to-end with real/mocked APIs
+- Automated tests with test SearxNG/OpenRouter instances
+
+## Related Components
+
+- **[SearchTool](search-tool.md)** - pipeline executed by `OpenQueryApp`
+- **[Program.cs](../Program.md)** - creates `OpenQueryApp`
+- **[StatusReporter](../services/StatusReporter.md)** - progress UI used by `OpenQueryApp`
+
+---
+
+## Next Steps
+
+- [SearchTool](search-tool.md) - See the pipeline in detail
+- [Services](../services/overview.md) - Understand each service
+- [CLI Reference](../../api/cli.md) - How users invoke this
@@ -0,0 +1,603 @@
+# Components Overview
+
+Detailed documentation for each major component in the OpenQuery system.
+
+## 📋 Table of Contents
+
+1. [Component Hierarchy](#component-hierarchy)
+2. [Core Components](#core-components)
+3. [Services](#services)
+4. [Data Models](#data-models)
+5. [Component Interactions](#component-interactions)
+
+## Component Hierarchy
+
+```
+OpenQuery/
+├── Program.cs                    [Entry Point, CLI]
+├── OpenQuery.cs                  [OpenQueryApp - Orchestrator]
+├── Tools/
+│   └── SearchTool.cs            [Pipeline Orchestration]
+├── Services/
+│   ├── OpenRouterClient.cs      [LLM & Embedding API]
+│   ├── SearxngClient.cs         [Search API]
+│   ├── EmbeddingService.cs      [Embedding Generation + Math]
+│   ├── ChunkingService.cs       [Text Splitting]
+│   ├── ArticleService.cs        [Content Extraction]
+│   ├── RateLimiter.cs           [Concurrency Control]
+│   └── StatusReporter.cs        [Progress Display]
+├── Models/
+│   ├── OpenQueryOptions.cs      [CLI Options Record]
+│   ├── Chunk.cs                 [Content + Metadata]
+│   ├── ParallelOptions.cs       [Concurrency Settings]
+│   ├── OpenRouter.cs            [API DTOs]
+│   ├── Searxng.cs               [Search Result DTOs]
+│   └── JsonContexts.cs          [JSON Context]
+└── ConfigManager.cs             [Configuration Persistence]
+```
+
+## Core Components
+
+### 1. Program.cs
+
+**Type**: Console Application Entry Point  
+**Responsibilities**: CLI parsing, dependency wiring, error handling
+
+**Key Elements**:
+- `RootCommand` from System.CommandLine
+- Options: `--chunks`, `--results`, `--queries`, `--short`, `--long`, `--verbose`
+- Subcommand: `configure` (with interactive mode)
+- Configuration loading via `ConfigManager.Load()`
+- Environment variable resolution
+- Service instantiation and coordination
+- Top-level try-catch for error reporting
+
+**Code Flow**:
+1. Load config file
+2. Define CLI options and commands
+3. Set handler for root command
+4. Handler: resolve API key/model → instantiate services → call `OpenQueryApp.RunAsync()`
+5. Set handler for configure command (writes config file)
+6. Invoke command parser: `await rootCommand.InvokeAsync(args)`
+
+**Exit Codes**:
+- 0 = success
+- 1 = error
+
+### 2. OpenQueryApp (OpenQuery.cs)
+
+**Type**: Main Application Class  
+**Responsibilities**: Workflow orchestration, query generation, answer streaming
+
+**Constructor Parameters**:
+- `OpenRouterClient client` - for query gen and final answer
+- `SearchTool searchTool` - for search-retrieve-rank pipeline
+- `string model` - LLM model identifier
+
+**Main Method**: `RunAsync(OpenQueryOptions options)`
+
+**Workflow Steps**:
+1. Create `StatusReporter` (for progress UI)
+2. **Optional Query Generation** (if `options.Queries > 1`):
+   - Create system message instructing JSON array output
+   - Create user message with `options.Question`
+   - Call `client.CompleteAsync()` with query gen model
+   - Parse JSON response; fall back to original question on failure
+   - Result: `List<string> queries` (1 or many)
+3. **Execute Search Pipeline**:
+   - Call `_searchTool.ExecuteAsync()` with queries, options
+   - Receive `string context` (formatted context with source citations)
+   - Progress reported via callback to `StatusReporter`
+4. **Generate Final Answer**:
+   - Build system prompt (append "short" or "long" modifier)
+   - Create user message with `Context:\n{context}\n\nQuestion: {options.Question}`
+   - Stream answer via `client.StreamAsync()`
+   - Write each `chunk.TextDelta` to Console as it arrives
+   - Stop spinner on first chunk, continue streaming
+5. Dispose reporter
+
+**Error Handling**:
+- Exceptions propagate to `Program.cs` top-level handler
+- `HttpRequestException` vs generic `Exception`
+
+**Note**: Query generation uses the same model as final answer; could be separated for cost/performance.
+
+### 3. SearchTool (Tools/SearchTool.cs)
+
+**Type**: Pipeline Orchestrator  
+**Responsibilities**: Execute 4-phase search-retrieve-rank-return workflow
+
+**Constructor Parameters**:
+- `SearxngClient searxngClient`
+- `EmbeddingService embeddingService`
+
+**Main Method**: `ExecuteAsync(originalQuery, generatedQueries, maxResults, topChunksLimit, onProgress, verbose)`
+
+**Returns**: `Task<string>` - formatted context string with source citations
+
+**Pipeline Phases**:
+
+#### Phase 1: ExecuteParallelSearchesAsync
+- Parallelize `searxngClient.SearchAsync(query, maxResults)` for each query
+- Collect all results in `ConcurrentBag<SearxngResult>`
+- Deduplicate by `DistinctBy(r => r.Url)`
+
+**Output**: `List<SearxngResult>` (aggregated, unique)
+
+#### Phase 2: ExecuteParallelArticleFetchingAsync
+- Semaphore: `MaxConcurrentArticleFetches` (default 10)
+- For each `SearxngResult`: fetch URL via `ArticleService.FetchArticleAsync()`
+- Extract article text, title
+- Chunk via `ChunkingService.ChunkText(article.TextContent)`
+- Add each chunk as new `Chunk(content, url, title)`
+
+**Output**: `List<Chunk>` (potentially 50-100 chunks)
+
+#### Phase 3: ExecuteParallelEmbeddingsAsync
+- Start two parallel tasks:
+  1. Query embedding: `embeddingService.GetEmbeddingAsync(originalQuery)`
+  2. Chunk embeddings: `embeddingService.GetEmbeddingsWithRateLimitAsync(chunkTexts, onProgress)`
+- `Parallel.ForEachAsync` with `MaxConcurrentEmbeddingRequests` (default 4)
+- Batch size: 300 chunks per embedding API call
+- Filter chunks with empty embeddings (failed batches)
+
+**Output**: `(float[] queryEmbedding, float[][] chunkEmbeddings)`
+
+#### Phase 4: RankAndSelectTopChunks
+- Calculate cosine similarity for each chunk vs query
+- Assign `chunk.Score`
+- Order by descending score
+- Take `topChunksLimit` (from `--chunks` option)
+- Return `List<Chunk>` (top N)
+
+**Formatting**:
+```csharp
+string context = string.Join("\n\n", topChunks.Select((c, i) =>
+    $"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
+```
+
+**Progress Callbacks**: Invoked at each major step for UI feedback
+
+## Services
+
+### OpenRouterClient
+
+**Purpose**: HTTP client for OpenRouter API (chat completions + embeddings)
+
+**Base URL**: `https://openrouter.ai/api/v1`
+
+**Authentication**: `Authorization: Bearer {apiKey}`
+
+**Methods**:
+
+#### `StreamAsync(ChatCompletionRequest request, CancellationToken)`
+- Sets `request.Stream = true`
+- POST to `/chat/completions`
+- Reads SSE stream line-by-line
+- Parses `data: {json}` chunks
+- Yields `StreamChunk` (text delta or tool call)
+- Supports cancellation
+
+#### `CompleteAsync(ChatCompletionRequest request)`
+- Sets `request.Stream = false`
+- POST to `/chat/completions`
+- Deserializes full response
+- Returns `ChatCompletionResponse`
+
+#### `EmbedAsync(string model, List<string> inputs)`
+- POST to `/embeddings`
+- Returns `float[][]` (ordered by input index)
+
+**Error Handling**: `EnsureSuccessStatusCode()` throws `HttpRequestException` on failure
+
+**Design**: Thin wrapper; no retry logic (delegated to EmbeddingService)
+
+### SearxngClient
+
+**Purpose**: HTTP client for SearxNG metasearch
+
+**Base URL**: Configurable (default `http://localhost:8002`)
+
+**Methods**:
+
+#### `SearchAsync(string query, int limit = 10)`
+- GET `{baseUrl}/search?q={query}&format=json`
+- Deserializes to `SearxngRoot`
+- Returns `Results.Take(limit).ToList()`
+- On failure: returns empty `List<SearxngResult>` (no exception)
+
+**Design**: Very simple; failures are tolerated (OpenQuery continues with other queries)
+
+### EmbeddingService
+
+**Purpose**: Batch embedding generation with rate limiting, parallelization, and retries
+
+**Configuration** (from `ParallelProcessingOptions`):
+- `MaxConcurrentEmbeddingRequests` = 4
+- `EmbeddingBatchSize` = 300
+
+**Default Embedding Model**: `openai/text-embedding-3-small`
+
+**Methods**:
+
+#### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
+- Splits `texts` into batches of `EmbeddingBatchSize`
+- Parallelizes batches with `Parallel.ForEachAsync` + `MaxConcurrentEmbeddingRequests`
+- Each batch: rate-limited + retry-wrapped `client.EmbedAsync(model, batch)`
+- Collects results in order (by batch index)
+- Returns `float[][]` (same order as input texts)
+- Failed batches return empty `float[]` for each text
+
+#### `GetEmbeddingAsync(string text, CancellationToken)`
+- Wraps single-text call in rate limiter + retry
+- Returns `float[]`
+
+#### `CosineSimilarity(float[] v1, float[] v2)`
+- Static method using `TensorPrimitives.CosineSimilarity`
+- Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
+
+**Retry Policy** (Polly):
+- Max 3 attempts
+- 1s base delay, exponential backoff
+- Only `HttpRequestException`
+
+**Rate Limiting**: `RateLimiter` semaphore with `MaxConcurrentEmbeddingRequests`
+
+**Design Notes**:
+- Two similar methods (`GetEmbeddingsAsync` and `GetEmbeddingsWithRateLimitAsync`) - could be consolidated
+- Uses Polly for resilience (good pattern)
+- Concurrency control prevents overwhelming OpenRouter
+
+### ChunkingService
+
+**Purpose**: Split long text into manageable pieces
+
+**Static Class** (no dependencies, pure function)
+
+**Algorithm** (in `ChunkText(string text)`):
+- Constant `MAX_CHUNK_SIZE = 500`
+- While remaining text:
+  - Take up to 500 chars
+  - If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
+  - Trim and add non-empty chunk
+  - Advance start position
+
+**Rationale**: 500 chars is a sweet spot for embeddings - long enough for context, short enough for semantic coherence.
+
+**Edge Cases**: Handles text shorter than 500 chars, empty text, text with no natural breaks.
+
+### ArticleService
+
+**Purpose**: Extract clean article content from URLs
+
+**Method**: `FetchArticleAsync(string url)`
+
+**Implementation**: Delegates to `SmartReader.ParseArticleAsync(url)`
+
+**Returns**: `Article` object (from SmartReader)
+- `Title` (string)
+- `TextContent` (string) - cleaned article body
+- `IsReadable` (bool) - quality indicator
+- Other metadata (author, date, etc.)
+
+**Error Handling**: Exceptions propagate (handled by `SearchTool`)
+
+**Design**: Thin wrapper around third-party library. Could be extended to add caching, custom extraction rules, etc.
+
+### RateLimiter
+
+**Purpose**: Limit concurrent operations via semaphore
+
+**Interface**:
+```csharp
+public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken);
+public async Task ExecuteAsync(Func<Task> action, CancellationToken);
+```
+
+**Implementation**: `SemaphoreSlim` with `WaitAsync` and `Release`
+
+**Disposal**: `IAsyncDisposable` (awaits semaphore disposal)
+
+**Usage**: Wrap API calls that need concurrency control
+```csharp
+var result = await _rateLimiter.ExecuteAsync(async () =>
+    await _client.EmbedAsync(model, batch), cancellationToken);
+```
+
+**Design**: Simple, reusable. Could be replaced with `Polly.RateLimiting` policy but this is lightweight.
+
+### StatusReporter
+
+**Purpose**: Real-time progress UI with spinner and verbose modes
+
+**Architecture**:
+- Producer: UpdateStatus(text) → writes to `Channel<string>`
+- Consumer: Background task `ProcessStatusUpdatesAsync()` reads from channel
+- Spinner: Separate task animates Braille characters every 100ms
+
+**Modes**:
+
+**Verbose Mode** (`_verbose = true`):
+- All progress messages written as `Console.WriteLine()`
+- No spinner
+- Full audit trail
+
+**Compact Mode** (default):
+- Status line with spinner (overwrites same line)
+- Only latest status visible
+- Example: `⠋ Fetching articles 3/10...`
+
+**Key Methods**:
+- `UpdateStatus(message)` - fire-and-forget, non-blocking
+- `WriteLine(text)` - stops spinner temporarily, writes full line
+- `StartSpinner()` / `StopSpinner()` - manual control
+- `ClearStatus()` - ANSI escape `\r\x1b[K` to clear line
+- `Dispose()` - completes channel, waits for background tasks
+
+**Spinner Chars**: `['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']` (Braille patterns, smooth animation)
+
+**ANSI Codes**: `\r` (carriage return), `\x1b[K` (erase to end of line)
+
+**Thread Safety**: Channel is thread-safe; multiple components can write concurrently without locks
+
+**Design**: Well-encapsulated; could be reused in other CLI projects.
+
+### ConfigManager
+
+**Purpose**: Load/save configuration from XDG-compliant location
+
+**Config Path**:
+- `Environment.SpecialFolder.UserProfile` → `~/.config/openquery/config`
+
+**Schema** (`AppConfig`):
+```csharp
+public class AppConfig
+{
+    public string ApiKey { get; set; } = "";
+    public string Model { get; set; } = "qwen/qwen3.5-flash-02-23";
+    public int DefaultQueries { get; set; } = 3;
+    public int DefaultChunks { get; set; } = 3;
+    public int DefaultResults { get; set; } = 5;
+}
+```
+
+**Format**: Simple `key=value` (no INI parser, manual line split)
+
+**Methods**:
+- `Load()` → reads file if exists, returns `AppConfig` (with defaults)
+- `Save(AppConfig)` → writes all 5 keys, overwrites existing
+
+**Design**:
+- Static class (no instances)
+- Creates directory if missing
+- No validation (writes whatever values given)
+- Could be improved with JSON format (but keep simple)
+
+## Data Models
+
+### OpenQueryOptions
+
+**Location**: `Models/OpenQueryOptions.cs`
+
+**Type**: `record`
+
+**Purpose**: Immutable options object passed through workflow
+
+**Properties**:
+- `int Chunks` - top N chunks for context
+- `int Results` - search results per query
+- `int Queries` - number of expanded queries to generate
+- `bool Short` - concise answer flag
+- `bool Long` - detailed answer flag
+- `bool Verbose` - verbose logging flag
+- `string Question` - original user question
+
+**Created**: In `Program.cs` from CLI options + config defaults
+
+**Used By**: `OpenQueryApp.RunAsync()`
+
+### Chunk
+
+**Location**: `Models/Chunk.cs`
+
+**Type**: `record`
+
+**Purpose**: Content chunk with metadata and embedding
+
+**Properties**:
+- `string Content` - extracted text (~500 chars)
+- `string SourceUrl` - article URL
+- `string? Title` - article title (nullable)
+- `float[]? Embedding` - vector embedding (populated by EmbeddingService)
+- `float Score` - relevance score (populated during ranking)
+
+**Lifecycle**:
+1. Instantiated in `SearchTool.ExecuteParallelArticleFetchingAsync` with content, url, title
+2. `Embedding` set in `ExecuteParallelEmbeddingsAsync` after batch processing
+3. `Score` set in `RankAndSelectTopChunks` after cosine similarity
+4. Serialized into context string for final answer
+
+**Equality**: Records provide value equality (based on all properties)
+
+### ParallelProcessingOptions
+
+**Location**: `Models/ParallelOptions.cs`
+
+**Type**: `class` (mutable)
+
+**Purpose**: Concurrency settings for parallel operations
+
+**Properties** (with defaults):
+- `MaxConcurrentArticleFetches` = 10
+- `MaxConcurrentEmbeddingRequests` = 4
+- `EmbeddingBatchSize` = 300
+
+**Used By**: `EmbeddingService` (for embeddings), `SearchTool` (for article fetching)
+
+**Currently**: Hardcoded in `SearchTool` constructor; could be made configurable
+
+### OpenRouter Models (Models/OpenRouter.cs)
+
+**Purpose**: DTOs for OpenRouter API (JSON serializable)
+
+**Chat Completion**:
+- `ChatCompletionRequest` (model, messages, tools, stream)
+- `ChatCompletionResponse` (choices[], usage[])
+- `Message` (role, content, tool_calls, tool_call_id)
+- `ToolDefinition`, `ToolFunction`, `ToolCall`, `FunctionCall`
+- `Choice`, `Usage`
+
+**Embedding**:
+- `EmbeddingRequest` (model, input[])
+- `EmbeddingResponse` (data[], usage)
+- `EmbeddingData` (embedding[], index)
+
+**Streaming**:
+- `StreamChunk` (TextDelta, Tool)
+- `ChatCompletionChunk`, `ChunkChoice`, `ChunkDelta`
+
+**JSON Properties**: Uses `[JsonPropertyName]` to match API
+
+**Serialization**: System.Text.Json with source generation (AppJsonContext)
+
+### Searxng Models (Models/Searxng.cs)
+
+**Purpose**: DTOs for SearxNG search results
+
+**Records**:
+- `SearxngRoot` with `List<SearxngResult> Results`
+- `SearxngResult` with `Title`, `Url`, `Content` (snippet)
+
+**Usage**: Deserialized from SearxNG's JSON response
+
+### JsonContexts
+
+**Location**: `Models/JsonContexts.cs`
+
+**Purpose**: Source-generated JSON serializer context for AOT compatibility
+
+**Pattern**:
+```csharp
+[JsonSerializable(typeof(ChatCompletionRequest))]
+[JsonSerializable(typeof(ChatCompletionResponse))]
+... etc ...
+internal partial class AppJsonContext : JsonSerializerContext
+{
+}
+```
+
+**Generated**: Partial class compiled by source generator
+
+**Used By**: All `JsonSerializer.Serialize/Deserialize` calls with `AppJsonContext.Default.{Type}`
+
+**Benefits**:
+- AOT-compatible (no reflection)
+- Faster serialization (compiled delegates)
+- Smaller binary (trimming-safe)
+
+## Component Interactions
+
+### Dependencies Graph
+
+```
+Program.cs
+├── ConfigManager (load/save)
+├── OpenRouterClient ──┐
+├── SearxngClient ─────┤
+├── EmbeddingService ──┤
+└── SearchTool ────────┤
+                        │
+OpenQueryApp ◄──────────┘
+    │
+    ├── OpenRouterClient (query gen + answer streaming)
+    ├── SearchTool (pipeline)
+    │   ├── SearxngClient (searches)
+    │   ├── ArticleService (fetch)
+    │   ├── ChunkingService (split)
+    │   ├── EmbeddingService (embeddings)
+    │   ├── RateLimiter (concurrency)
+    │   └── StatusReporter (progress via callback)
+    └── StatusReporter (UI)
+```
+
+### Data Flow Between Components
+
+```
+OpenQueryOptions
+    ↓
+OpenQueryApp
+    ├─ Query Generation
+    │   └─ OpenRouterClient.CompleteAsync()
+    │       → List<string> generatedQueries
+    │
+    ├─ Search Pipeline
+    │   └─ SearchTool.ExecuteAsync(originalQuery, generatedQueries, ...)
+    │       ↓
+    │   Phase 1: SearxngClient.SearchAsync(query) × N
+    │       → ConcurrentBag<SearxngResult>
+    │       → List<SearxngResult> (unique)
+    │       ↓
+    │   Phase 2: ArticleService.FetchArticleAsync(url) × M
+    │       → ChunkingService.ChunkText(article.TextContent)
+    │       → ConcurrentBag<Chunk> (content, url, title)
+    │       ↓
+    │   Phase 3: EmbeddingService.GetEmbeddingsAsync(chunkContents)
+    │       → (queryEmbedding, chunkEmbeddings)
+    │       ↓
+    │   Phase 4: CosineSimilarity + Rank
+    │       → List<Chunk> topChunks (with Score, Embedding set)
+    │       ↓
+    │   Format: context string with [Source N: Title](Url)
+    │       → return context string
+    │
+    └─ Final Answer
+        └─ OpenRouterClient.StreamAsync(prompt with context)
+            → stream deltas to Console
+```
+
+### Interface Contracts
+
+**SearchTool → Progress**:
+```csharp
+// Invoked as: onProgress?.Invoke("[Fetching article 1/10: example.com]")
+Action<string>? onProgress
+```
+
+**StatusReporter ← Progress**:
+```csharp
+// Handler in OpenQueryApp:
+(progress) => {
+    if (options.Verbose) reporter.WriteLine(progress);
+    else reporter.UpdateStatus(parsedShorterMessage);
+}
+```
+
+**SearchTool → ArticleService**:
+```csharp
+Article article = await ArticleService.FetchArticleAsync(url);
+```
+
+**SearchTool → EmbeddingService**:
+```csharp
+(float[] queryEmbedding, float[][] chunkEmbeddings) = await ExecuteParallelEmbeddingsAsync(...);
+// Also: embeddingService.GetEmbeddingAsync(text), GetEmbeddingsWithRateLimitAsync(...)
+```
+
+**SearchTool → ChunkingService**:
+```csharp
+List<string> chunks = ChunkingService.ChunkText(article.TextContent);
+```
+
+**SearchTool → RateLimiter**:
+```csharp
+await _rateLimiter.ExecuteAsync(async () => await _client.EmbedAsync(...), ct);
+```
+
+---
+
+## Next Steps
+
+- [OpenQueryApp](openquery-app.md) - Main orchestrator details
+- [SearchTool](search-tool.md) - Pipeline implementation
+- [Services](services.md) - All service classes documented
+- [Models](models.md) - Complete data model reference
@@ -0,0 +1,555 @@
+# SearchTool Component
+
+Deep dive into `SearchTool` - the core pipeline orchestrator that implements the 4-phase search-retrieve-rank workflow.
+
+## Overview
+
+`SearchTool` is the workhorse of OpenQuery. It Takes search queries, fetches articles, generates embeddings, ranks by relevance, and returns formatted context for the final AI answer.
+
+## Location
+`Tools/SearchTool.cs`
+
+## Class Definition
+
+```csharp
+public class SearchTool
+{
+    private readonly SearxngClient _searxngClient;
+    private readonly EmbeddingService _embeddingService;
+    private readonly ParallelProcessingOptions _options;
+    
+    public static string Name => "search";
+    public static string Description => "Search the web for information on a topic";
+    
+    public SearchTool(
+        SearxngClient searxngClient,
+        EmbeddingService embeddingService);
+    
+    public Task<string> ExecuteAsync(
+        string originalQuery,
+        List<string> generatedQueries,
+        int maxResults,
+        int topChunksLimit,
+        Action<string>? onProgress = null,
+        bool verbose = true);
+}
+```
+
+**Dependencies**:
+- `SearxngClient` - for web searches
+- `EmbeddingService` - for vector generation
+- `ParallelProcessingOptions` - concurrency settings (hardcoded new instance)
+
+**Static Properties**:
+- `Name` - tool identifier (currently "search")
+- `Description` - tool description
+
+## ExecuteAsync Method
+
+**Signature**:
+```csharp
+public async Task<string> ExecuteAsync(
+    string originalQuery,        // User's original question
+    List<string> generatedQueries, // Expanded search queries
+    int maxResults,              // Results per query
+    int topChunksLimit,          // Top N chunks to return
+    Action<string>? onProgress,  // Progress callback
+    bool verbose)                // Verbose mode flag
+```
+
+**Returns**: `Task<string>` - formatted context with source citations
+
+**Contract**:
+- Never returns `null` (returns "No search results found." on zero results)
+- Progress callback may be invoked frequently (many phases)
+- `verbose` passed to sub-components for their own logging
+
+## The 4-Phase Pipeline
+
+```
+ExecuteAsync()
+  │
+  ├─ Phase 1: ExecuteParallelSearchesAsync
+  │   Input: generatedQueries × maxResults
+  │   Output: List<SearxngResult> (deduplicated)
+  │
+  ├─ Phase 2: ExecuteParallelArticleFetchingAsync
+  │   Input: List<SearxngResult>
+  │   Output: List<Chunk> (with content, url, title)
+  │
+  ├─ Phase 3: ExecuteParallelEmbeddingsAsync
+  │   Input: originalQuery + List<Chunk>
+  │   Output: (queryEmbedding, chunkEmbeddings)
+  │   (also sets Chunk.Embedding for valid chunks)
+  │
+  ├─ Phase 4: RankAndSelectTopChunks
+  │   Input: List<Chunk> + queryEmbedding + chunkEmbeddings
+  │   Output: List<Chunk> topChunks (with Score set)
+  │
+  └─ Format Context → return string
+```
+
+### Phase 1: ExecuteParallelSearchesAsync
+
+**Purpose**: Execute all search queries in parallel, collect and deduplicate results.
+
+**Implementation**:
+```csharp
+var allResults = new ConcurrentBag<SearxngResult>();
+
+var searchTasks = generatedQueries.Select(async query =>
+{
+    onProgress?.Invoke($"[Searching web for '{query}'...]");
+    try
+    {
+        var results = await _searsult in results)
+            {
+                allResults.Add(result);
+            }
+    }
+    catch (Exception ex)
+    {
+        if (verbose)
+            Console.WriteLine($"Warning: Search failed for query '{query}': {ex.Message}");
+    }
+});
+
+await Task.WhenAll(searchTasks);
+
+var uniqueResults = allResults.DistinctBy(r => r.Url).ToList();
+return uniqueResults;
+```
+
+**Details**:
+- `ConcurrentBag<SearxngResult>` collects results thread-safely
+- `Task.WhenAll` - unbounded parallelism (parallel to `generatedQueries.Count`)
+- Each task: calls `_searxngClient.SearchAsync(query, maxResults)`
+- Errors caught and logged (verbose only); other queries continue
+- `DistinctBy(r => r.Url)` removes duplicates
+
+**Return**: `List<SearxngResult>` (unique URLs only)
+
+**Progress**: `[Searching web for '{query}'...]`
+
+**Potential Issues**:
+- Could overwhelm local SearxNG if `generatedQueries` is large (100+)
+- SearxNG itself may have its own rate limiting
+
+**Future Enhancement**:
+- Add semaphore to limit search concurrency
+- Add timeout per search task
+- Cache search results (same query across runs)
+
+### Phase 2: ExecuteParallelArticleFetchingAsync
+
+**Purpose**: Fetch each search result URL, extract article content, split into chunks.
+
+**Implementation**:
+```csharp
+var chunks = new ConcurrentBag<Chunk>();
+var completedFetches = 0;
+var totalFetches = searchResults.Count;
+var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
+
+var fetchTasks = searchResults.Select(async result =>
+{
+    await semaphore.WaitAsync();
+    try
+    {
+        var current = Interlocked.Increment(ref completedFetches);
+        var uri = new Uri(result.Url);
+        var domain = uri.Host;
+        onProgress?.Invoke($"[Fetching article {current}/{totalFetches}: {domain}]");
+        
+        try
+        {
+            var article = await ArticleService.FetchArticleAsync(result.Url);
+            if (!article.IsReadable || string.IsNullOrEmpty(article.TextContent))
+                return;
+            
+            var textChunks = ChunkingService.ChunkText(article.TextContent);
+            foreach (var chunkText in textChunks)
+            {
+                chunks.Add(new Chunk(chunkText, result.Url, article.Title));
+            }
+        }
+        catch (Exception ex)
+        {
+            if (verbose)
+                Console.WriteLine($"Warning: Failed to fetch article {result.Url}: {ex.Message}");
+        }
+    }
+    finally
+    {
+        semaphore.Release();
+    }
+});
+
+await Task.WhenAll(fetchTasks);
+return chunks.ToList();
+```
+
+**Details**:
+- `SemaphoreSlim` limits concurrency to `MaxConcurrentArticleFetches` (10)
+- `Interlocked.Increment` for thread-safe progress counting
+- Progress: `[Fetching article X/Y: domain]` (extracts host from URL)
+- `ArticleService.FetchArticleAsync` uses SmartReader
+- Article must be `IsReadable` and have `TextContent`
+- `ChunkingService.ChunkText` splits into ~500-char pieces
+- Each chunk becomes a `Chunk(content, url, article.Title)`
+- Errors logged (verbose only); failed URLs yield no chunks
+
+**Return**: `List<Chunk>` (potentially many per article)
+
+**Chunk Count Estimate**:
+- 15 articles × average 3000 chars/article = 45,000 chars
+- With 500-char chunks ≈ 90 chunks
+- With natural breaks → maybe 70-80 chunks
+
+**Potential Issues**:
+- Some sites block SmartReader (JS-heavy, paywalls)
+- Slow article fetches may cause long tail latency
+- Large articles create many chunks → memory + embedding cost
+
+**Future Enhancements**:
+- Add per-URL timeout
+- Filter chunks by length threshold (skip tiny chunks)
+- Deduplicate chunks across articles (same content on different sites)
+- Cache article fetches by URL
+
+### Phase 3: ExecuteParallelEmbeddingsAsync
+
+**Purpose**: Generate embeddings for the original query and all chunks, with batching, rate limiting, and concurrency control.
+
+**Implementation**:
+```csharp
+onProgress?.Invoke($"[Generating embeddings for {chunks.Count} chunks and query...]");
+
+// Start query embedding (single) and chunk embeddings (batch) concurrently
+var queryEmbeddingTask = _embeddingService.GetEmbeddingAsync(originalQuery);
+
+var chunkTexts = chunks.Select(c => c.Embedding).ToList(); // WRONG in original code?
+// Actually: chunks.Select(c => c.Content).ToList();
+var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
+    chunkTexts, onProgress);
+
+await Task.WhenAll(queryEmbeddingTask, chunkEmbeddingsTask);
+
+var queryEmbedding = await queryEmbeddingTask;
+var chunkEmbeddings = await chunkEmbeddingsTask;
+
+// Filter out chunks with empty embeddings
+var validChunks = new List<Chunk>();
+var validEmbeddings = new List<float[]>();
+
+for (var i = 0; i < chunks.Count; i++)
+{
+    if (chunkEmbeddings[i].Length > 0)
+    {
+        validChunks.Add(chunks[i]);
+        validEmbeddings.Add(chunkEmbeddings[i]);
+    }
+}
+
+// Update chunks with embeddings
+for (var i = 0; i < validChunks.Count; i++)
+{
+    validChunks[i].Embedding = validEmbeddings[i];
+}
+
+return (queryEmbedding, validEmbeddings.ToArray());
+```
+
+**Corrected Code** (matching actual source):
+```csharp
+var chunkTexts = chunks.Select(c => c.Content).ToList();
+var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
+    chunkTexts, onProgress);
+```
+
+**Details**:
+- **Query embedding**: Single request for original question (one embedding)
+- **Chunk embeddings**: Batch processing of all chunk texts
+- Both run concurrently via `Task.WhenAll`
+- `_embeddingService.GetEmbeddingsWithRateLimitAsync` uses:
+  - Batch size: 300 (default)
+  - Max concurrent batches: 4 (default)
+  - Polly retry (3 attempts, exponential backoff)
+  - `RateLimiter` (semaphore) for API concurrency
+- Failed batches return empty `float[]` (length 0)
+- Filters out failed chunks (won't be ranked)
+- `validChunks[i].Embedding = validEmbeddings[i]` attaches embedding to chunk
+
+**Return**: `(float[] queryEmbedding, float[][] chunkEmbeddings)` where:
+- `chunkEmbeddings` length = `validChunks.Count` (filtered)
+- Order matches `validChunks` order (since we filtered parallel arrays)
+
+**Progress**: Interleaved from embedding service's own progress callbacks (batch X/Y)
+
+**Potential Issues**:
+- `GetEmbeddingsWithRateLimitAsync` uses `results[batchIndex] = ...` which is not thread-safe without synchronization - **BUG**?
+  - Actually `results` is an array, not a list, so indexing is thread-safe
+  - But concurrent writes to different indices are safe
+- Filtering loop assumes `chunkEmbeddings` has same count as `chunks`; if embedding service returns fewer, might index out of range
+  - Looking at `GetEmbeddingsWithRateLimitAsync`: returns `results.SelectMany(r => r).ToArray()` which should match input count (including empty arrays for failed batches)
+  - So safe
+
+**Memory Consideration**:
+- `chunkTexts` list holds all chunk strings (may be large, but still in memory)
+- `chunkEmbeddings` holds all float arrays (600KB for 100 chunks)
+- Total: modest (~few MB)
+
+**Future Enhancements**:
+- Stream embeddings? (No benefit, need all for ranking)
+- Cache embeddings by content hash (cross-run)
+- Support different embedding model per query
+
+### Phase 4: RankAndSelectTopChunks
+
+**Purpose**: Score chunks by semantic relevance to query, sort, and select top N.
+
+**Implementation**:
+```csharp
+var chunksWithEmbeddings = chunks.Where(c => c.Embedding != null).ToList();
+
+foreach (var chunk in chunksWithEmbeddings)
+{
+    chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
+}
+
+var topChunks = chunksWithEmbeddings
+    .OrderByDescending(c => c.Score)
+    .Take(topChunksLimit)
+    .ToList();
+
+return topChunks;
+```
+
+**Details**:
+- Filters to chunks that have embeddings (successful phase 3)
+- For each: `Score = CosineSimilarity(queryEmbedding, chunkEmbedding)`
+  - Uses `TensorPrimitives.CosineSimilarity` (SIMD-accelerated)
+  - Returns float typically 0-1 (higher = more relevant)
+- `OrderByDescending` - highest scores first
+- `Take(topChunksLimit)` - select top N (from `--chunks` option)
+- Returns `List<Chunk>` (now with `Score` set)
+
+**Return**: Top N chunks ready for context formatting
+
+**Complexity**:
+- O(n) for scoring (where n = valid chunks, typically 50-100)
+- O(n log n) for sorting (fast for n=100)
+- Negligible CPU time
+
+**Edge Cases**:
+- If `topChunksLimit` > `chunksWithEmbeddings.Count`, returns all (no padding)
+- If all embeddings failed, returns empty list
+- Should handle `topChunksLimit == 0` (returns empty)
+
+### Context Formatting (After Phase 4)
+
+**Location**: In `ExecuteAsync`, after ranking:
+
+```csharp
+var context = string.Join("\n\n", topChunks.Select((c, i) =>
+    $"[Source {i + 1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
+
+return context;
+```
+
+**Format**:
+```
+[Source 1: Article Title](https://example.com/article)
+Chunk content text...
+
+[Source 2: Another Title](https://example.com/another)
+Chunk content text...
+
+[Source 3: Third Title](https://example.com/third)
+Chunk content text...
+```
+
+**Features**:
+- Each source numbered 1, 2, 3... (matches order of topChunks = descending relevance)
+- Title or "Unknown" if null
+- Title is markdown link to original URL
+- Chunk content as plain text (may contain its own formatting)
+- Double newline between sources
+
+**Rationale**:
+- Markdown links allow copy-pasting to browsers
+- Numbers allow LLM to cite `[Source 1]` in answer
+- Original title helps user recognize source
+
+**Potential Issues**:
+- LLM might misinterpret "Source 1" as literal citation required
+- If chunks contain markdown, may conflict (no escaping)
+- Some titles may have markdown special chars (unlikely but possible)
+
+**Alternative**: Could use XML-style tags or more robust citation format.
+
+## Error Handling & Edge Cases
+
+### Empty Results Handling
+
+At end of `ExecuteAsync`:
+```csharp
+if (searchResults.Count == 0)
+    return "No search results found.";
+
+if (chunks.Count == 0)
+    return "Found search results but could not extract readable content.";
+```
+
+These messages appear in final answer (LLM will respond to these contexts).
+
+### Partial Failures
+
+- Some search queries fail → proceed with others
+- Some articles fail to fetch → continue
+- Some embedding batches fail → those chunks filtered out
+- Ranking proceeds with whatever valid embeddings exist
+
+### Verbose vs Compact Progress
+
+`verbose` parameter affects what's passed to phases:
+- **Article fetching**: errors only shown if `verbose`
+- **Embeddings**: always shows batch progress via `onProgress` (from EmbeddingService)
+- **Searches**: no error suppression (warning always logged to Console, not through callback)
+
+### Progress Callback Pattern
+
+`onProgress` is invoked at major milestones:
+- Searching: `[Searching web for '{query}'...]`
+- Article fetch: `[Fetching article X/Y: domain]`
+- Embeddings: `[Generating embeddings: batch X/Y]`
+- Final: `[Found top X most relevant chunks overall. Generating answer...]`
+
+Each phase may invoke many times (e.g., embedding batches). `StatusReporter` handles these appropriately.
+
+## Performance Characteristics
+
+### Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles):
+
+| Phase | Time | Dominated By |
+|-------|------|--------------|
+| Searches | 3-8s | Network latency to SearxNG |
+| Article Fetching | 5-15s | Network + SmartReader CPU |
+| Embeddings | 2-4s | OpenRouter API latency (4 concurrent batches) |
+| Ranking | <0.1s | CPU (O(n log n) sort, n~100) |
+| **Total Pipeline** | **10-30s** | Articles + Searches |
+
+### Concurrency Limits Effect
+
+**Article Fetching** (`MaxConcurrentArticleFetches` = 10):
+- 15 articles → 2 waves (10 then 5)
+- If each takes 2s → ~4s total (vs 30s sequential)
+
+**Embedding Batching** (`MaxConcurrentEmbeddingRequests` = 4, `EmbeddingBatchSize` = 300):
+- 80 chunks → 1 batch of 300 (all fit)
+- If 300 chunks → 1 batch (300 fits), but max concurrent = 4 if multiple embedding calls
+- Here: single embedding call with 80 items = 1 batch (no parallelism needed)
+
+### Memory Usage
+
+- `searchResults` (15 items) → ~30KB
+- `chunks` (80 items × 500 chars) → ~40KB text + embeddings ~400KB (80 × 1536 × 4)
+- Total ≈ 500KB excluding temporary HTTP buffers
+
+## Design Decisions
+
+### Why Use ConcurrentBag for Results/Chunks?
+
+Thread-safe collection allows parallel tasks to add without locks. Order is not preserved (but we `DistinctBy` and `Select` maintains order of insertion? Actually no, `ConcurrentBag` doesn't guarantee order. But we later `ToList()` and `DistinctBy` preserves first occurrence order from the bag's enumeration (which is nondeterministic). This is acceptable because order doesn't matter (ranking is semantic). If order mattered, would need `ConcurrentQueue` or sorting by source.
+
+### Why Not Use Parallel.ForEach for Article Fetching?
+
+We use `Task.WhenAll` with `Select` + semaphore. `Parallel.ForEachAsync` could also work but requires .NET 6+ and we want to use same pattern as other phases. Semaphore gives explicit concurrency control.
+
+### Why Separate Query Embedding from Chunk Embeddings?
+
+`GetEmbeddingAsync` is called directly (not batched) because there's only one query. Could be batched with chunks but:
+- Query is small (single string)
+- Batch API has overhead (request structure)
+- Separate call allows independent completion (no need to wait for chunks to start query embedding)
+
+### Why Two Different Embedding Methods?
+
+`EmbeddingService` has:
+- `GetEmbeddingsWithRateLimitAsync` (used in SearchTool)
+- `GetEmbeddingsAsync` (similar but different implementation)
+
+Probably legacy/refactor artifact. Could consolidate.
+
+### Why Not Deduplicate URLs Earlier?
+
+Deduplication happens after search aggregation. Could also deduplicate within each search result (SearxNG might already dedupe across engines). But global dedupe is necessary.
+
+### Why Not Early Filtering (e.g., by domain, length)?
+
+Possibly could improve quality:
+- Filter by domain reputation
+- Filter articles too short (<200 chars) or too long (>50KB)
+- Not implemented (keep simple)
+
+## Testing Considerations
+
+**Unit Testability**: `SearchTool` is fairly testable with mocks:
+- Mock `SearxngClient` to return predetermined results
+- Mock `ArticleService` via `EmbeddingService` (or mock that too)
+- Verify progress callback invocations
+- Verify final context format
+
+**Integration Testing**:
+- End-to-end with real/mocked external services
+- Need test SearxNG instance and test OpenRouter key (or mock responses)
+
+**Performance Testing**:
+- Benchmark with different concurrency settings
+- Profile memory for large result sets (1000+ articles)
+- Measure embedding API latency impact
+
+## Known Issues
+
+### Bug in ExecuteParallelEmbeddingsAsync?
+
+Looking at the actual source code of `ExecuteParallelEmbeddingsAsync` **in the core SearchTool**:
+
+```csharp
+var chunkTexts = chunks.Select(c => c.Content).ToList();
+var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
+    chunkTexts, onProgress);
+```
+
+This is correct.
+
+But in the **initial search result**, I notice there might be confusion. I'll verify this when writing the full component documentation.
+
+### Potential Race Condition in GetEmbeddingsWithRateLimitAsync
+
+```csharp
+results[batchIndex] = batchResults;
+```
+
+This is writing to an array index from multiple parallel tasks. Array index writes are atomic for reference types on 64-bit? Actually, writes to different indices are safe because they don't overlap. This is fine.
+
+### Progress Callback May Overwhelm
+
+If invoked synchronously from many parallel tasks, could saturate the channel. `Channel.TryWrite` will return false if buffer full; we ignore return value. Could drop messages under heavy load. Acceptable for CLI UI (some messages may be lost but overall progress visible).
+
+## Related Components
+
+- **[OpenQueryApp](openquery-app.md)** - calls this
+- **[SearxngClient](../../services/SearxngClient.md)** - phase 1
+- **[ArticleService](../../services/ArticleService.md)** - phase 2a
+- **[ChunkingService](../../services/ChunkingService.md)** - phase 2b
+- **[EmbeddingService](../../services/EmbeddingService.md)** - phase 3
+- **[Ranking](../../services/EmbeddingService.md#cosinesimilarity)** - cosine similarity
+
+---
+
+## Next Steps
+
+- [Services Overview](../services/overview.md) - See supporting services
+- [CLI Reference](../../api/cli.md) - How users trigger this pipeline
+- [Performance](../performance.md) - Optimize pipeline settings
@@ -0,0 +1,471 @@
+# Services Overview
+
+Comprehensive reference for all service classes in OpenQuery.
+
+## 📋 Table of Contents
+
+1. [Service Catalog](#service-catalog)
+2. [Client Services](#client-services)
+3. [Processing Services](#processing-services)
+4. [Infrastructure Services](#infrastructure-services)
+5. [Service Interactions](#service-interactions)
+
+## Service Catalog
+
+OpenQuery's services are organized into three categories:
+
+| Category | Services | Purpose |
+|-----------|----------|---------|
+| **Clients** | `OpenRouterClient`, `SearxngClient` | External API communication |
+| **Processors** | `EmbeddingService`, `ChunkingService`, `ArticleService` | Data transformation & extraction |
+| **Infrastructure** | `RateLimiter`, `StatusReporter` | Cross-cutting concerns |
+
+All services are **stateless** (except for internal configuration) and can be safely reused across multiple operations.
+
+---
+
+## Client Services
+
+### OpenRouterClient
+
+**Location**: `Services/OpenRouterClient.cs`  
+**Purpose**: HTTP client for OpenRouter AI APIs (chat completions & embeddings)
+
+#### API Endpoints
+
+| Method | Endpoint | Purpose |
+|--------|----------|---------|
+| POST | `/chat/completions` | Chat completion (streaming or non-streaming) |
+| POST | `/embeddings` | Embedding generation for text inputs |
+
+#### Authentication
+```
+Authorization: Bearer {apiKey}
+Accept: application/json
+```
+
+#### Public Methods
+
+##### `StreamAsync(ChatCompletionRequest request, CancellationToken cancellationToken)`
+- **Returns**: `IAsyncEnumerable<StreamChunk>`
+- **Behavior**: Sets `request.Stream = true`, posts, reads Server-Sent Events stream
+- **Use Case**: Final answer streaming, real-time responses
+- **Stream Format**: SSE lines `data: {json}`; yields `TextDelta` or `ToolCall`
+
+##### `CompleteAsync(ChatCompletionRequest request)`
+- **Returns**: `Task<ChatCompletionResponse>`
+- **Behavior**: Sets `request.Stream = false`, posts, returns full response
+- **Use Case**: Query generation (non-streaming)
+
+##### `EmbedAsync(string model, List<string> inputs)`
+- **Returns**: `Task<float[][]>`
+- **Behavior**: POST `/embeddings`, returns array of vectors (ordered by input index)
+- **Use Case**: Batch embedding generation
+
+##### `HttpClient`
+- **Property**: Internal `_httpClient` (created per instance)
+- **Note**: Could use `IHttpClientFactory` for pooling (not needed for CLI)
+
+#### Error Handling
+- `EnsureSuccessStatusCode()` throws `HttpRequestException` on 4xx/5xx
+- No retry logic (handled by `EmbeddingService`)
+
+#### Configuration
+```csharp
+public OpenRouterClient(string apiKey)
+{
+    _apiKey = apiKey;
+    _httpClient = new HttpClient();
+    _httpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
+    _httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
+}
+```
+
+#### Example Usage
+```csharp
+var client = new OpenRouterClient("sk-or-...");
+var request = new ChatCompletionRequest("model", new List<Message> { ... });
+await foreach (var chunk in client.StreamAsync(request))
+{
+    Console.Write(chunk.TextDelta);
+}
+```
+
+---
+
+### SearxngClient
+
+**Location**: `Services/SearxngClient.cs`  
+**Purpose**: HTTP client for SearxNG metasearch engine
+
+#### API Endpoint
+```
+GET /search?q={query}&format=json
+```
+
+#### Constructor
+```csharp
+public SearxngClient(string baseUrl)  // e.g., "http://localhost:8002"
+```
+- `baseUrl` trimmed of trailing `/`
+
+#### Public Methods
+
+##### `SearchAsync(string query, int limit = 10)`
+- **Returns**: `Task<List<SearxngResult>>`
+- **Behavior**: GET request, deserialize JSON, take up to `limit` results
+- **On Failure**: Returns empty `List<SearxngResult>` (no exception)
+
+#### Error Handling
+- `response.EnsureSuccessStatusCode()` would throw, but code doesn't call it
+- If invalid JSON or missing `Results`, returns empty list
+- Failures are **tolerated** - individual search queries may fail without aborting whole operation
+
+#### Example Searxng Response
+```json
+{
+  "results": [
+    {
+      "title": "Quantum Entanglement - Wikipedia",
+      "url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
+      "content": "Quantum entanglement is a physical phenomenon..."
+    },
+    ...
+  ]
+}
+```
+
+---
+
+## Processing Services
+
+### EmbeddingService
+
+**Location**: `Services/EmbeddingService.cs`  
+**Purpose**: Generate embeddings with batching, rate limiting, and retry logic
+
+#### Configuration
+
+**Embedding Model**: `openai/text-embedding-3-small` (default, configurable via constructor)
+
+**ParallelProcessingOptions** (hardcoded defaults):
+```csharp
+public class ParallelProcessingOptions
+{
+    public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
+    public int EmbeddingBatchSize { get; set; } = 300;
+}
+```
+
+#### Public Methods
+
+##### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
+- **Returns**: `Task<float[][]>`
+- **Behavior**:
+  - Splits `texts` into batches of `EmbeddingBatchSize`
+  - Parallel executes batches (max `MaxConcurrentEmbeddingRequests` concurrent)
+  - Each batch: rate-limited, retry-wrapped `client.EmbedAsync(model, batch)`
+  - Reassembles in original order
+  - Failed batches → empty `float[]` for each text
+- **Progress**: Invokes `onProgress` for each batch: `"[Generating embeddings: batch X/Y]"`
+- **Thread-Safe**: Uses lock for collecting results
+
+##### `GetEmbeddingAsync(string text, CancellationToken)`
+- **Returns**: `Task<float[]>`
+- **Behavior**: Single embedding with rate limiting and retry
+- **Use Case**: Query embedding
+
+##### `Cos  static float CosineSimilarity(float[] vector1, float[] vector2)
+```
+Uses `System.Numerics.Tensors.TensorPrimitives.CosineSimilarity`
+
+Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
+```
+
+**Implementation**: Single line calling SIMD-accelerated tensor primitive
+
+---
+
+### ArticleService
+
+**Location**: `Services/ArticleService.cs`  
+**Purpose**: Extract clean article content from web URLs
+
+#### Public Methods
+
+##### `FetchArticleAsync(string url)`
+- **Returns**: `Task<Article>`
+- **Behavior**: Delegates to `SmartReader.ParseArticleAsync(url)`
+- **Result**: `Article` with `Title`, `TextContent`, `IsReadable`, and metadata
+
+#### Errors
+- Propagates exceptions (SmartReader may throw on network failures, malformed HTML)
+- `SearchTool` catches and logs
+
+#### SmartReader Notes
+- Open-source article extraction library (bundled via NuGet)
+- Uses Readability algorithm (similar to Firefox Reader View)
+- Removes ads, navigation, boilerplate
+- `IsReadable` indicates quality (e.g., not a 404 page, not too short)
+
+---
+
+### ChunkingService
+
+**Location**: `Services/ChunkingService.cs`  
+**Purpose**: Split text into 500-character chunks at natural boundaries
+
+#### Public Methods
+
+##### `ChunkText(string text)`
+- **Returns**: `List<string>`
+- **Algorithm**:
+  - Constant `MAX_CHUNK_SIZE = 500`
+  - While remaining text:
+    - Take up to 500 chars
+    - If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
+    - Trim, add if non-empty
+    - Advance start
+  - Returns all chunks
+
+#### Characteristics
+- Static class (no instances)
+- Pure function (no side effects)
+- Zero dependencies
+- Handles edge cases (empty text, short text, text without breaks)
+
+---
+
+## Infrastructure Services
+
+### RateLimiter
+
+**Location**: `Services/RateLimiter.cs`  
+**Purpose**: Limit concurrent operations using semaphore
+
+#### Constructor
+```csharp
+public RateLimiter(int maxConcurrentRequests)
+```
+Creates `SemaphoreSlim` with `maxConcurrentRequests`
+
+#### Public Methods
+
+##### `ExecuteAsync<T>(Func<Task<T>> action, CancellationToken)`
+```csharp
+public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken cancellationToken = default)
+{
+    await _semaphore.WaitAsync(cancellationToken);
+    try
+    {
+        return await action();
+    }
+    finally
+    {
+        _semaphore.Release();
+    }
+}
+```
+- Waits for semaphore slot
+- Executes `action` (typically an API call)
+- Releases semaphore (even if exception)
+- Returns result from `action`
+
+##### `ExecuteAsync(Func<Task> action, CancellationToken)`
+- Non-generic version (for void-returning actions)
+
+#### Disposal
+```csharp
+public async ValueTask DisposeAsync()
+{
+    _semaphore.Dispose();
+}
+```
+Implements `IAsyncDisposable` for async cleanup
+
+#### Usage Pattern
+```csharp
+var result = await _rateLimiter.ExecuteAsync(async () =>
+{
+    return await SomeApiCall();
+}, cancellationToken);
+```
+
+#### Where Used
+- `EmbeddingService`: Limits concurrent embedding batch requests (default 4)
+
+---
+
+### StatusReporter
+
+**Location**: `Services/StatusReporter.cs`  
+**Purpose**: Real-time progress display with spinner (compact) or verbose lines
+
+#### Constructor
+```csharp
+public StatusReporter(bool verbose)
+```
+- `verbose = true`: all progress via `WriteLine()` (no spinner)
+- `verbose = false`: spinner with latest status
+
+#### Architecture
+
+**Components**:
+- `Channel<string> _statusChannel` - producer-consumer queue
+- `Task _statusProcessor` - background task reading from channel
+- `CancellationTokenSource _spinnerCts` - spinner task cancellation
+- `Task _spinnerTask` - spinner animation task
+- `char[] _spinnerChars` - Braille spinner pattern
+
+**Spinner Animation**:
+- Runs at 10 FPS (100ms interval)
+- Cycles through `['⠋','⠙','⠹','⠸','⠼','⠴','⠦','⠧','⠇','⠏']`
+- Displays: `⠋ Fetching articles...`
+- Updates in place using ANSI: `\r\x1b[K` (carriage return + erase line)
+
+#### Public Methods
+
+##### `UpdateStatus(string message)`
+- Fire-and-forget: writes to channel via `TryWrite` (non-blocking)
+- If channel full, message dropped (acceptable loss for UI)
+
+##### `WriteLine(string text)`
+- Stops spinner temporarily
+- Clears current status line
+- Writes `text` with newline
+- In verbose mode: just `Console.WriteLine(text)`
+
+##### `ClearStatus()`
+- In compact mode: `Console.Write("\r\x1b[K")` (erase line)
+- In verbose: no-op
+- Sets `_currentMessage = null`
+
+##### `StartSpinner()` / `StopSpinner()`
+- Manual control (usually `StartSpinner` constructor call, `StopSpinner` by `Dispose`)
+
+##### `Dispose()`
+- Completes channel writer
+- Awaits `_statusProcessor` completion
+- Calls `StopSpinner()`
+
+#### Background Processing
+
+**Status Processor**:
+```csharp
+private async Task ProcessStatusUpdatesAsync()
+{
+    await foreach (var message in _statusChannel.Reader.ReadAllAsync())
+    {
+        if (_verbose)
+        {
+            Console.WriteLine(message);
+            continue;
+        }
+        Console.Write("\r\x1b[K");  // Clear line
+        Console.Write($"{_spinnerChars[0]} {message}");  // Static spinner
+        _currentMessage = message;
+    }
+}
+```
+
+**Spinner Task**:
+```csharp
+_spinnerTask = Task.Run(async () =>
+{
+    while (_spinnerCts is { Token.IsCancellationRequested: false })
+    {
+        if (_currentMessage != null)
+        {
+            Console.Write("\r\x1b[K");
+            var charIndex = index++ % spinner.Length;
+            Console.Write($"{spinner[charIndex]} {_currentMessage}");
+        }
+        await Task.Delay(100, _spinnerCts.Token);
+    }
+});
+```
+
+#### Thread Safety
+- `UpdateStatus` (producer) writes to channel
+- `ProcessStatusUpdatesAsync` (consumer) reads from channel
+- `_spinnerTask` runs concurrently
+- All UI writes happen in consumer/spinner task context (single-threaded UI)
+
+#### Design Notes
+- Could be simplified: just use `Console.CursorLeft` for spinner, no channel
+- Channel allows random `UpdateStatus` calls from any thread without blocking
+- Braille spinner requires terminal that supports Unicode (most modern terminals do)
+
+---
+
+## Service Interactions
+
+### Dependency Graph
+
+```
+OpenQueryApp
+├── OpenRouterClient ← (used for query gen + final answer)
+└── SearchTool
+    ├── SearxngClient
+    ├── ArticleService (uses SmartReader)
+    ├── ChunkingService (static)
+    ├── EmbeddingService
+    │   └── OpenRouterClient (different instance)
+    │   └── RateLimiter
+    └── ParallelProcessingOptions (config)
+```
+
+### Service Lifetimes
+
+All services are **transient** (new instance per query execution):
+- `OpenRouterClient` → 1 instance for query gen + answer
+- `SearxngClient` → 1 instance for all searches
+- `EmbeddingService` → 1 instance with its own `OpenRouterClient` and `RateLimiter`
+- `SearchTool` → 1 instance per query (constructed in `Program.cs`)
+
+No singleton or static state (except static utility classes like `ChunkingService`).
+
+### Data Flow Through Services
+
+```
+OpenQueryApp
+  │
+  ├─ OpenRouterClient.CompleteAsync() → query generation
+  │   Messages → JSON → HTTP request → response → JSON → Messages
+  │
+  └─ SearchTool.ExecuteAsync()
+      │
+      ├─ SearxngClient.SearchAsync() × N
+      │   query → URL encode → GET → JSON → SearxngResult[]
+      │
+      ├─ ArticleService.FetchArticleAsync() × M
+      │   URL → HTTP GET → SmartReader → Article
+      │
+      ├─ ChunkingService.ChunkText() × M
+      │   Article.TextContent → List<string> chunks
+      │
+      ├─ EmbeddingService.GetEmbeddingAsync(query) + GetEmbeddingsAsync(chunks[])
+      │   texts → batches → rate-limited HTTP POST → JSON → float[][]
+      │
+      ├─ CosineSimilarity(queryEmbedding, chunkEmbedding) × M
+      │   Vectors → dot product → magnitude → score
+      │
+      └─ return context string (formatted chunks)
+```
+
+---
+
+## Next Steps
+
+- **[OpenQueryApp](../components/openquery-app.md)** - Orchestrates services
+- **[SearchTool](../components/search-tool.md)** - Coordinates pipeline
+- **[Models](../components/models.md)** - Data structures passed between services
+- **[API Reference](../../api/cli.md)** - CLI that uses these services
+
+---
+
+**Service Design Principles**:
+- Single Responsibility: Each service does one thing well
+- Stateless: No instance state beyond constructor args
+- Composable: Services depend on abstractions (other services) not implementations
+- Testable: Can mock dependencies for unit testing