docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
2026-03-19 10:01:58 +01:00
parent b28d8998f7
commit 65ca2401ae
16 changed files with 7073 additions and 0 deletions
--- a/docs/components/search-tool.md
+++ b/docs/components/search-tool.md
@@ -0,0 +1,555 @@
+# SearchTool Component
+
+Deep dive into `SearchTool` - the core pipeline orchestrator that implements the 4-phase search-retrieve-rank workflow.
+
+## Overview
+
+`SearchTool` is the workhorse of OpenQuery. It Takes search queries, fetches articles, generates embeddings, ranks by relevance, and returns formatted context for the final AI answer.
+
+## Location
+`Tools/SearchTool.cs`
+
+## Class Definition
+
+```csharp
+public class SearchTool
+{
+    private readonly SearxngClient _searxngClient;
+    private readonly EmbeddingService _embeddingService;
+    private readonly ParallelProcessingOptions _options;
+    
+    public static string Name => "search";
+    public static string Description => "Search the web for information on a topic";
+    
+    public SearchTool(
+        SearxngClient searxngClient,
+        EmbeddingService embeddingService);
+    
+    public Task<string> ExecuteAsync(
+        string originalQuery,
+        List<string> generatedQueries,
+        int maxResults,
+        int topChunksLimit,
+        Action<string>? onProgress = null,
+        bool verbose = true);
+}
+```
+
+**Dependencies**:
+- `SearxngClient` - for web searches
+- `EmbeddingService` - for vector generation
+- `ParallelProcessingOptions` - concurrency settings (hardcoded new instance)
+
+**Static Properties**:
+- `Name` - tool identifier (currently "search")
+- `Description` - tool description
+
+## ExecuteAsync Method
+
+**Signature**:
+```csharp
+public async Task<string> ExecuteAsync(
+    string originalQuery,        // User's original question
+    List<string> generatedQueries, // Expanded search queries
+    int maxResults,              // Results per query
+    int topChunksLimit,          // Top N chunks to return
+    Action<string>? onProgress,  // Progress callback
+    bool verbose)                // Verbose mode flag
+```
+
+**Returns**: `Task<string>` - formatted context with source citations
+
+**Contract**:
+- Never returns `null` (returns "No search results found." on zero results)
+- Progress callback may be invoked frequently (many phases)
+- `verbose` passed to sub-components for their own logging
+
+## The 4-Phase Pipeline
+
+```
+ExecuteAsync()
+  │
+  ├─ Phase 1: ExecuteParallelSearchesAsync
+  │   Input: generatedQueries × maxResults
+  │   Output: List<SearxngResult> (deduplicated)
+  │
+  ├─ Phase 2: ExecuteParallelArticleFetchingAsync
+  │   Input: List<SearxngResult>
+  │   Output: List<Chunk> (with content, url, title)
+  │
+  ├─ Phase 3: ExecuteParallelEmbeddingsAsync
+  │   Input: originalQuery + List<Chunk>
+  │   Output: (queryEmbedding, chunkEmbeddings)
+  │   (also sets Chunk.Embedding for valid chunks)
+  │
+  ├─ Phase 4: RankAndSelectTopChunks
+  │   Input: List<Chunk> + queryEmbedding + chunkEmbeddings
+  │   Output: List<Chunk> topChunks (with Score set)
+  │
+  └─ Format Context → return string
+```
+
+### Phase 1: ExecuteParallelSearchesAsync
+
+**Purpose**: Execute all search queries in parallel, collect and deduplicate results.
+
+**Implementation**:
+```csharp
+var allResults = new ConcurrentBag<SearxngResult>();
+
+var searchTasks = generatedQueries.Select(async query =>
+{
+    onProgress?.Invoke($"[Searching web for '{query}'...]");
+    try
+    {
+        var results = await _searsult in results)
+            {
+                allResults.Add(result);
+            }
+    }
+    catch (Exception ex)
+    {
+        if (verbose)
+            Console.WriteLine($"Warning: Search failed for query '{query}': {ex.Message}");
+    }
+});
+
+await Task.WhenAll(searchTasks);
+
+var uniqueResults = allResults.DistinctBy(r => r.Url).ToList();
+return uniqueResults;
+```
+
+**Details**:
+- `ConcurrentBag<SearxngResult>` collects results thread-safely
+- `Task.WhenAll` - unbounded parallelism (parallel to `generatedQueries.Count`)
+- Each task: calls `_searxngClient.SearchAsync(query, maxResults)`
+- Errors caught and logged (verbose only); other queries continue
+- `DistinctBy(r => r.Url)` removes duplicates
+
+**Return**: `List<SearxngResult>` (unique URLs only)
+
+**Progress**: `[Searching web for '{query}'...]`
+
+**Potential Issues**:
+- Could overwhelm local SearxNG if `generatedQueries` is large (100+)
+- SearxNG itself may have its own rate limiting
+
+**Future Enhancement**:
+- Add semaphore to limit search concurrency
+- Add timeout per search task
+- Cache search results (same query across runs)
+
+### Phase 2: ExecuteParallelArticleFetchingAsync
+
+**Purpose**: Fetch each search result URL, extract article content, split into chunks.
+
+**Implementation**:
+```csharp
+var chunks = new ConcurrentBag<Chunk>();
+var completedFetches = 0;
+var totalFetches = searchResults.Count;
+var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
+
+var fetchTasks = searchResults.Select(async result =>
+{
+    await semaphore.WaitAsync();
+    try
+    {
+        var current = Interlocked.Increment(ref completedFetches);
+        var uri = new Uri(result.Url);
+        var domain = uri.Host;
+        onProgress?.Invoke($"[Fetching article {current}/{totalFetches}: {domain}]");
+        
+        try
+        {
+            var article = await ArticleService.FetchArticleAsync(result.Url);
+            if (!article.IsReadable || string.IsNullOrEmpty(article.TextContent))
+                return;
+            
+            var textChunks = ChunkingService.ChunkText(article.TextContent);
+            foreach (var chunkText in textChunks)
+            {
+                chunks.Add(new Chunk(chunkText, result.Url, article.Title));
+            }
+        }
+        catch (Exception ex)
+        {
+            if (verbose)
+                Console.WriteLine($"Warning: Failed to fetch article {result.Url}: {ex.Message}");
+        }
+    }
+    finally
+    {
+        semaphore.Release();
+    }
+});
+
+await Task.WhenAll(fetchTasks);
+return chunks.ToList();
+```
+
+**Details**:
+- `SemaphoreSlim` limits concurrency to `MaxConcurrentArticleFetches` (10)
+- `Interlocked.Increment` for thread-safe progress counting
+- Progress: `[Fetching article X/Y: domain]` (extracts host from URL)
+- `ArticleService.FetchArticleAsync` uses SmartReader
+- Article must be `IsReadable` and have `TextContent`
+- `ChunkingService.ChunkText` splits into ~500-char pieces
+- Each chunk becomes a `Chunk(content, url, article.Title)`
+- Errors logged (verbose only); failed URLs yield no chunks
+
+**Return**: `List<Chunk>` (potentially many per article)
+
+**Chunk Count Estimate**:
+- 15 articles × average 3000 chars/article = 45,000 chars
+- With 500-char chunks ≈ 90 chunks
+- With natural breaks → maybe 70-80 chunks
+
+**Potential Issues**:
+- Some sites block SmartReader (JS-heavy, paywalls)
+- Slow article fetches may cause long tail latency
+- Large articles create many chunks → memory + embedding cost
+
+**Future Enhancements**:
+- Add per-URL timeout
+- Filter chunks by length threshold (skip tiny chunks)
+- Deduplicate chunks across articles (same content on different sites)
+- Cache article fetches by URL
+
+### Phase 3: ExecuteParallelEmbeddingsAsync
+
+**Purpose**: Generate embeddings for the original query and all chunks, with batching, rate limiting, and concurrency control.
+
+**Implementation**:
+```csharp
+onProgress?.Invoke($"[Generating embeddings for {chunks.Count} chunks and query...]");
+
+// Start query embedding (single) and chunk embeddings (batch) concurrently
+var queryEmbeddingTask = _embeddingService.GetEmbeddingAsync(originalQuery);
+
+var chunkTexts = chunks.Select(c => c.Embedding).ToList(); // WRONG in original code?
+// Actually: chunks.Select(c => c.Content).ToList();
+var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
+    chunkTexts, onProgress);
+
+await Task.WhenAll(queryEmbeddingTask, chunkEmbeddingsTask);
+
+var queryEmbedding = await queryEmbeddingTask;
+var chunkEmbeddings = await chunkEmbeddingsTask;
+
+// Filter out chunks with empty embeddings
+var validChunks = new List<Chunk>();
+var validEmbeddings = new List<float[]>();
+
+for (var i = 0; i < chunks.Count; i++)
+{
+    if (chunkEmbeddings[i].Length > 0)
+    {
+        validChunks.Add(chunks[i]);
+        validEmbeddings.Add(chunkEmbeddings[i]);
+    }
+}
+
+// Update chunks with embeddings
+for (var i = 0; i < validChunks.Count; i++)
+{
+    validChunks[i].Embedding = validEmbeddings[i];
+}
+
+return (queryEmbedding, validEmbeddings.ToArray());
+```
+
+**Corrected Code** (matching actual source):
+```csharp
+var chunkTexts = chunks.Select(c => c.Content).ToList();
+var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
+    chunkTexts, onProgress);
+```
+
+**Details**:
+- **Query embedding**: Single request for original question (one embedding)
+- **Chunk embeddings**: Batch processing of all chunk texts
+- Both run concurrently via `Task.WhenAll`
+- `_embeddingService.GetEmbeddingsWithRateLimitAsync` uses:
+  - Batch size: 300 (default)
+  - Max concurrent batches: 4 (default)
+  - Polly retry (3 attempts, exponential backoff)
+  - `RateLimiter` (semaphore) for API concurrency
+- Failed batches return empty `float[]` (length 0)
+- Filters out failed chunks (won't be ranked)
+- `validChunks[i].Embedding = validEmbeddings[i]` attaches embedding to chunk
+
+**Return**: `(float[] queryEmbedding, float[][] chunkEmbeddings)` where:
+- `chunkEmbeddings` length = `validChunks.Count` (filtered)
+- Order matches `validChunks` order (since we filtered parallel arrays)
+
+**Progress**: Interleaved from embedding service's own progress callbacks (batch X/Y)
+
+**Potential Issues**:
+- `GetEmbeddingsWithRateLimitAsync` uses `results[batchIndex] = ...` which is not thread-safe without synchronization - **BUG**?
+  - Actually `results` is an array, not a list, so indexing is thread-safe
+  - But concurrent writes to different indices are safe
+- Filtering loop assumes `chunkEmbeddings` has same count as `chunks`; if embedding service returns fewer, might index out of range
+  - Looking at `GetEmbeddingsWithRateLimitAsync`: returns `results.SelectMany(r => r).ToArray()` which should match input count (including empty arrays for failed batches)
+  - So safe
+
+**Memory Consideration**:
+- `chunkTexts` list holds all chunk strings (may be large, but still in memory)
+- `chunkEmbeddings` holds all float arrays (600KB for 100 chunks)
+- Total: modest (~few MB)
+
+**Future Enhancements**:
+- Stream embeddings? (No benefit, need all for ranking)
+- Cache embeddings by content hash (cross-run)
+- Support different embedding model per query
+
+### Phase 4: RankAndSelectTopChunks
+
+**Purpose**: Score chunks by semantic relevance to query, sort, and select top N.
+
+**Implementation**:
+```csharp
+var chunksWithEmbeddings = chunks.Where(c => c.Embedding != null).ToList();
+
+foreach (var chunk in chunksWithEmbeddings)
+{
+    chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
+}
+
+var topChunks = chunksWithEmbeddings
+    .OrderByDescending(c => c.Score)
+    .Take(topChunksLimit)
+    .ToList();
+
+return topChunks;
+```
+
+**Details**:
+- Filters to chunks that have embeddings (successful phase 3)
+- For each: `Score = CosineSimilarity(queryEmbedding, chunkEmbedding)`
+  - Uses `TensorPrimitives.CosineSimilarity` (SIMD-accelerated)
+  - Returns float typically 0-1 (higher = more relevant)
+- `OrderByDescending` - highest scores first
+- `Take(topChunksLimit)` - select top N (from `--chunks` option)
+- Returns `List<Chunk>` (now with `Score` set)
+
+**Return**: Top N chunks ready for context formatting
+
+**Complexity**:
+- O(n) for scoring (where n = valid chunks, typically 50-100)
+- O(n log n) for sorting (fast for n=100)
+- Negligible CPU time
+
+**Edge Cases**:
+- If `topChunksLimit` > `chunksWithEmbeddings.Count`, returns all (no padding)
+- If all embeddings failed, returns empty list
+- Should handle `topChunksLimit == 0` (returns empty)
+
+### Context Formatting (After Phase 4)
+
+**Location**: In `ExecuteAsync`, after ranking:
+
+```csharp
+var context = string.Join("\n\n", topChunks.Select((c, i) =>
+    $"[Source {i + 1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
+
+return context;
+```
+
+**Format**:
+```
+[Source 1: Article Title](https://example.com/article)
+Chunk content text...
+
+[Source 2: Another Title](https://example.com/another)
+Chunk content text...
+
+[Source 3: Third Title](https://example.com/third)
+Chunk content text...
+```
+
+**Features**:
+- Each source numbered 1, 2, 3... (matches order of topChunks = descending relevance)
+- Title or "Unknown" if null
+- Title is markdown link to original URL
+- Chunk content as plain text (may contain its own formatting)
+- Double newline between sources
+
+**Rationale**:
+- Markdown links allow copy-pasting to browsers
+- Numbers allow LLM to cite `[Source 1]` in answer
+- Original title helps user recognize source
+
+**Potential Issues**:
+- LLM might misinterpret "Source 1" as literal citation required
+- If chunks contain markdown, may conflict (no escaping)
+- Some titles may have markdown special chars (unlikely but possible)
+
+**Alternative**: Could use XML-style tags or more robust citation format.
+
+## Error Handling & Edge Cases
+
+### Empty Results Handling
+
+At end of `ExecuteAsync`:
+```csharp
+if (searchResults.Count == 0)
+    return "No search results found.";
+
+if (chunks.Count == 0)
+    return "Found search results but could not extract readable content.";
+```
+
+These messages appear in final answer (LLM will respond to these contexts).
+
+### Partial Failures
+
+- Some search queries fail → proceed with others
+- Some articles fail to fetch → continue
+- Some embedding batches fail → those chunks filtered out
+- Ranking proceeds with whatever valid embeddings exist
+
+### Verbose vs Compact Progress
+
+`verbose` parameter affects what's passed to phases:
+- **Article fetching**: errors only shown if `verbose`
+- **Embeddings**: always shows batch progress via `onProgress` (from EmbeddingService)
+- **Searches**: no error suppression (warning always logged to Console, not through callback)
+
+### Progress Callback Pattern
+
+`onProgress` is invoked at major milestones:
+- Searching: `[Searching web for '{query}'...]`
+- Article fetch: `[Fetching article X/Y: domain]`
+- Embeddings: `[Generating embeddings: batch X/Y]`
+- Final: `[Found top X most relevant chunks overall. Generating answer...]`
+
+Each phase may invoke many times (e.g., embedding batches). `StatusReporter` handles these appropriately.
+
+## Performance Characteristics
+
+### Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles):
+
+| Phase | Time | Dominated By |
+|-------|------|--------------|
+| Searches | 3-8s | Network latency to SearxNG |
+| Article Fetching | 5-15s | Network + SmartReader CPU |
+| Embeddings | 2-4s | OpenRouter API latency (4 concurrent batches) |
+| Ranking | <0.1s | CPU (O(n log n) sort, n~100) |
+| **Total Pipeline** | **10-30s** | Articles + Searches |
+
+### Concurrency Limits Effect
+
+**Article Fetching** (`MaxConcurrentArticleFetches` = 10):
+- 15 articles → 2 waves (10 then 5)
+- If each takes 2s → ~4s total (vs 30s sequential)
+
+**Embedding Batching** (`MaxConcurrentEmbeddingRequests` = 4, `EmbeddingBatchSize` = 300):
+- 80 chunks → 1 batch of 300 (all fit)
+- If 300 chunks → 1 batch (300 fits), but max concurrent = 4 if multiple embedding calls
+- Here: single embedding call with 80 items = 1 batch (no parallelism needed)
+
+### Memory Usage
+
+- `searchResults` (15 items) → ~30KB
+- `chunks` (80 items × 500 chars) → ~40KB text + embeddings ~400KB (80 × 1536 × 4)
+- Total ≈ 500KB excluding temporary HTTP buffers
+
+## Design Decisions
+
+### Why Use ConcurrentBag for Results/Chunks?
+
+Thread-safe collection allows parallel tasks to add without locks. Order is not preserved (but we `DistinctBy` and `Select` maintains order of insertion? Actually no, `ConcurrentBag` doesn't guarantee order. But we later `ToList()` and `DistinctBy` preserves first occurrence order from the bag's enumeration (which is nondeterministic). This is acceptable because order doesn't matter (ranking is semantic). If order mattered, would need `ConcurrentQueue` or sorting by source.
+
+### Why Not Use Parallel.ForEach for Article Fetching?
+
+We use `Task.WhenAll` with `Select` + semaphore. `Parallel.ForEachAsync` could also work but requires .NET 6+ and we want to use same pattern as other phases. Semaphore gives explicit concurrency control.
+
+### Why Separate Query Embedding from Chunk Embeddings?
+
+`GetEmbeddingAsync` is called directly (not batched) because there's only one query. Could be batched with chunks but:
+- Query is small (single string)
+- Batch API has overhead (request structure)
+- Separate call allows independent completion (no need to wait for chunks to start query embedding)
+
+### Why Two Different Embedding Methods?
+
+`EmbeddingService` has:
+- `GetEmbeddingsWithRateLimitAsync` (used in SearchTool)
+- `GetEmbeddingsAsync` (similar but different implementation)
+
+Probably legacy/refactor artifact. Could consolidate.
+
+### Why Not Deduplicate URLs Earlier?
+
+Deduplication happens after search aggregation. Could also deduplicate within each search result (SearxNG might already dedupe across engines). But global dedupe is necessary.
+
+### Why Not Early Filtering (e.g., by domain, length)?
+
+Possibly could improve quality:
+- Filter by domain reputation
+- Filter articles too short (<200 chars) or too long (>50KB)
+- Not implemented (keep simple)
+
+## Testing Considerations
+
+**Unit Testability**: `SearchTool` is fairly testable with mocks:
+- Mock `SearxngClient` to return predetermined results
+- Mock `ArticleService` via `EmbeddingService` (or mock that too)
+- Verify progress callback invocations
+- Verify final context format
+
+**Integration Testing**:
+- End-to-end with real/mocked external services
+- Need test SearxNG instance and test OpenRouter key (or mock responses)
+
+**Performance Testing**:
+- Benchmark with different concurrency settings
+- Profile memory for large result sets (1000+ articles)
+- Measure embedding API latency impact
+
+## Known Issues
+
+### Bug in ExecuteParallelEmbeddingsAsync?
+
+Looking at the actual source code of `ExecuteParallelEmbeddingsAsync` **in the core SearchTool**:
+
+```csharp
+var chunkTexts = chunks.Select(c => c.Content).ToList();
+var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
+    chunkTexts, onProgress);
+```
+
+This is correct.
+
+But in the **initial search result**, I notice there might be confusion. I'll verify this when writing the full component documentation.
+
+### Potential Race Condition in GetEmbeddingsWithRateLimitAsync
+
+```csharp
+results[batchIndex] = batchResults;
+```
+
+This is writing to an array index from multiple parallel tasks. Array index writes are atomic for reference types on 64-bit? Actually, writes to different indices are safe because they don't overlap. This is fine.
+
+### Progress Callback May Overwhelm
+
+If invoked synchronously from many parallel tasks, could saturate the channel. `Channel.TryWrite` will return false if buffer full; we ignore return value. Could drop messages under heavy load. Acceptable for CLI UI (some messages may be lost but overall progress visible).
+
+## Related Components
+
+- **[OpenQueryApp](openquery-app.md)** - calls this
+- **[SearxngClient](../../services/SearxngClient.md)** - phase 1
+- **[ArticleService](../../services/ArticleService.md)** - phase 2a
+- **[ChunkingService](../../services/ChunkingService.md)** - phase 2b
+- **[EmbeddingService](../../services/EmbeddingService.md)** - phase 3
+- **[Ranking](../../services/EmbeddingService.md#cosinesimilarity)** - cosine similarity
+
+---
+
+## Next Steps
+
+- [Services Overview](../services/overview.md) - See supporting services
+- [CLI Reference](../../api/cli.md) - How users trigger this pipeline
+- [Performance](../performance.md) - Optimize pipeline settings