docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide
- Create docs/ folder with structured technical documentation:
  - installation.md: Build and setup instructions
  - configuration.md: Complete config reference
  - usage.md: CLI usage guide with examples
  - architecture.md: System design and patterns
  - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models)
  - api/: CLI reference, environment variables, programmatic API
  - troubleshooting.md: Common issues and solutions
  - performance.md: Latency, throughput, and optimization
- All documentation fully cross-referenced with internal links
- Covers project overview, architecture, components, APIs, and support

See individual files for complete documentation.
This commit is contained in:
OpenQuery Documentation
2026-03-19 10:01:58 +01:00
parent b28d8998f7
commit 65ca2401ae
16 changed files with 7073 additions and 0 deletions

682
docs/architecture.md Normal file
View File

@@ -0,0 +1,682 @@
# Architecture
Deep dive into OpenQuery's system design, architectural patterns, and data flow.
## 📋 Table of Contents
1. [System Overview](#system-overview)
2. [Architectural Patterns](#architectural-patterns)
3. [Component Architecture](#component-architecture)
4. [Data Flow](#data-flow)
5. [Concurrency Model](#concurrency-model)
6. [Error Handling & Resilience](#error-handling--resilience)
7. [Performance Considerations](#performance-considerations)
8. [Design Decisions](#design-decisions)
## System Overview
OpenQuery is a **pipeline-based AI application** that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.
### Core Design Principles
1. **Separation of Concerns** - Each component has a single, well-defined responsibility
2. **Parallel First** - Wherever possible, operations are parallelized for speed
3. **Resilient by Default** - Built-in retries, rate limiting, and graceful degradation
4. **Configurable** - Most parameters can be adjusted without code changes
5. **Observable** - Progress reporting and verbose mode for debugging
### High-Level Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ User Layer │
│ CLI (System.CommandLine) → OpenQueryApp │
└─────────────────────────────┬───────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Orchestration Layer │
│ OpenQueryApp → SearchTool (4-phase pipeline) │
└─────────────────────────────┬───────────────────────────────┘
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Search Layer │ │ Processing Layer │ │ AI Layer │
│ │ │ │ │ │
│ SearxngClient│ │ ArticleService │ │OpenRouterClient│
│ │ │ ChunkingService │ │ │
│ │ │ EmbeddingService │ │ │
└──────────────┘ └──────────────────┘ └──────────────┘
```
## Architectural Patterns
### 1. Pipeline Pattern
The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:
```csharp
Phase 1: ExecuteParallelSearchesAsync
(List<SearxngResult>)
Phase 2: ExecuteParallelArticleFetchingAsync
(List<Chunk>)
Phase 3: ExecuteParallelEmbeddingsAsync
((queryEmbedding, chunkEmbeddings))
Phase 4: RankAndSelectTopChunks
(List<Chunk> topChunks)
Formatted context string returned
```
Each phase:
- Accepts input from previous phase
- Processes in parallel where applicable
- Returns output to next phase
- Reports progress via callbacks
### 2. Service Layer Pattern
Services (`Services/` directory) are stateless classes that encapsulate specific operations:
- **Clients**: `OpenRouterClient`, `SearxngClient` (HTTP communication)
- **Processors**: `EmbeddingService`, `ChunkingService` (data transformation)
- **Extractors**: `ArticleService` (content extraction)
- **Infrastructure**: `RateLimiter`, `StatusReporter` (cross-cutting concerns)
All dependencies are explicit (constructor injection), making services easily testable.
### 3. Dependency Injection (Manual)
While not using a DI container, OpenQuery follows DI principles:
```csharp
// Program.cs: instantiate dependencies with explicit parameters
var client = new OpenRouterClient(apiKey);
var searxngClient = new SearxngClient(searxngUrl);
var embeddingService = new EmbeddingService(client);
var searchTool = new SearchTool(searxngClient, embeddingService);
var openQuery = new OpenQueryApp(client, searchTool, model);
```
Benefits:
- Clear dependency graph
- Easy to substitute mocks for testing
- No magic; visible construction
### 4. Observer Pattern (Progress Reporting)
`StatusReporter` and progress callbacks implement observer pattern:
```csharp
// SearchTool receives a progress callback
public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)
// Components invoke callback at key milestones
onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");
// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);
```
### 5. Resilience Patterns (Polly)
`EmbeddingService` uses Polly's retry policy:
```csharp
_retryPipeline = new ResiliencePipelineBuilder()
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = 3,
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Exponential,
ShouldHandle = new PredicateBuilder()
.Handle<HttpRequestException>()
})
.Build();
```
This automatically retries failed embedding requests with exponential backoff.
### 6. Producer-Consumer Pattern (Channel-based)
`StatusReporter` uses `System.Threading.Channels.Channel<string>` for asynchronous progress updates:
- Producer: `UpdateStatus()` writes messages to channel
- Consumer: Background task `ProcessStatusUpdatesAsync()` reads and displays
- Benefit: No blocking between progress generation and display
### 7. Disposable Pattern
Components that hold unmanaged resources implement `IDisposable` or `IAsyncDisposable`:
- `StatusReporter` - stops background spinner task
- `RateLimiter` - disposes semaphore
Used via `using` statements for deterministic cleanup.
## Component Architecture
### OpenQueryApp (OpenQuery.cs)
**Role**: Main orchestrator; coordinates the entire workflow
**Responsibilities**:
- Parse CLI options into `OpenQueryOptions`
- Load configuration and resolve API keys/models
- Optionally generate expanded search queries via LLM
- Invoke `SearchTool` with progress callbacks
- Stream final answer from LLM
**Key Methods**:
- `RunAsync(OpenQueryOptions)` - Main entry point
**Interactions**:
- Instantiates `OpenRouterClient` (for both query gen and final answer)
- Instantiates `SearxngClient` (passed to `SearchTool`)
- Instantiates `EmbeddingService` (passed to `SearchTool`)
- Instantiates `SearchTool` (orchestration)
- `StatusReporter` for UI updates
### SearchTool (Tools/SearchTool.cs)
**Role**: Core search-retrieve-rank pipeline orchestrator
**Responsibilities**:
- Execute 4-phase pipeline (search → fetch → embed → rank)
- Manage concurrency limits (via semaphores)
- Coordinate parallel operations
- Generate context string for final answer
**Interactions**:
- Uses `SearxngClient` for Phase 1
- Uses `ArticleService` + `ChunkingService` for Phase 2
- Uses `EmbeddingService` for Phase 3
- Has no external UI dependency (pure logic)
**Parallelization Strategy**:
- **Phase 1**: `Task.WhenAll` on search tasks (unbounded but limited by SearxNG instance)
- **Phase 2**: Semaphore (max 10 concurrent fetches)
- **Phase 3**: `Parallel.ForEachAsync` (max 4 concurrent embedding batches)
### EmbeddingService (Services/EmbeddingService.cs)
**Role**: Generate vector embeddings with batching, rate limiting, and retries
**Responsibilities**:
- Batch embedding requests (default: 300 per batch)
- Parallelize batches (default: 4 concurrent)
- Apply rate limiting (via `RateLimiter`)
- Retry failed requests (Polly)
- Calculate cosine similarity
**Key Methods**:
- `GetEmbeddingsAsync(List<string> texts, ...)` - batch with progress
- `GetEmbeddingAsync(string text)` - single embedding
- `CosineSimilarity(float[], float[])` - static vector math
**Design Notes**:
- Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
- Batches of 300 reduce API overhead
- Polly retry handles transient failures (429, 500, network blips)
### OpenRouterClient (Services/OpenRouterClient.cs)
**Role**: HTTP client for OpenRouter API (completions + embeddings)
**Responsibilities**:
- Serialize requests to JSON (source-generated)
- Send HTTP with authorization header
- Stream responses for chat completions (IAsyncEnumerable)
- Return full responses for non-streaming
- Throw on non-2xx status codes
**Endpoints**:
- POST `/chat/completions` (stream and non-stream)
- POST `/embeddings`
**Configuration**:
- Base URL: `https://openrouter.ai/api/v1`
- Headers: `Authorization: Bearer {apiKey}`, `Accept: application/json`
**Design**:
- Low-level client; no retry logic (retry is in `EmbeddingService`)
- Thin wrapper around `HttpClient`
- Could be replaced with `HttpClientFactory` in larger apps
### SearxngClient (Services/SearxngClient.cs)
**Role**: HTTP client for SearxNG metasearch
**Responsibilities**:
- Construct search URL with query param
- GET request and deserialize JSON
- Limit results (`.Take(limit)`)
- Return empty list on failure (no exceptions)
**Endpoint**: `GET /search?q={query}&format=json`
**Design**:
- Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
- `DistinctBy(r => r.Url)` deduplication happens upstream
### ArticleService (Services/ArticleService.cs)
**Role**: Extract clean article content from URLs
**Responsibilities**:
- Call `SmartReader.ParseArticleAsync(url)`
- Return `Article` object with `Title`, `TextContent`, `IsReadable`
**Design**:
- Single responsibility: extraction only (no fetching, no chunking)
- SmartReader handles all complexity (HTML parsing, boilerplate removal)
- Exceptions propagate to `SearchTool` (handled there)
### ChunkingService (Services/ChunkingService.cs)
**Role**: Split long text into 500-char chunks at natural boundaries
**Algorithm**:
1. Start at index 0
2. Take up to 500 chars
3. If not at end, backtrack to last space/newline/period
4. Add chunk, advance start
5. Repeat until done
**Design**:
- Static class (stateless utility)
- No dependencies
- Pure function (input text → output chunks)
### RateLimiter (Services/RateLimiter.cs)
**Role**: Limit concurrent operations via semaphore
**Responsibilities**:
- Wrap actions with semaphore acquisition
- Support both sync and async actions
- Implement `IAsyncDisposable` for cleanup
**Usage Pattern**:
```csharp
await _rateLimiter.ExecuteAsync(async () =>
{
// operation limited by semaphore
return await SomeApiCall();
});
```
**Design**:
- Generic wrapper (can return TResult)
- `SemaphoreSlim` initialized at construction
- Used in `EmbeddingService` for parallel embedding batches
### StatusReporter (Services/StatusReporter.cs)
**Role**: Show real-time progress with spinner or verbose mode
**Responsibilities**:
- Maintain spinner animation (background task)
- Receive status updates via channel
- Display updates with appropriate formatting
- Stop spinner on completion
**Features**:
- **Spinner mode** (non-verbose): `⠋ Fetching...` with animated Braille chars
- **Verbose mode**: `[Fetching article 1/10: example.com]` on separate lines
- **Thread-safe**: Channel is safe for concurrent writes
- **Non-blocking**: Background spinner doesn't block updates
**Design**:
- `Channel<string>` for asynchronous producer-consumer
- Background task (`_statusProcessor`) reads from channel
- Spinner runs on its own task with 100ms delay per frame
- `IDisposable` ensures proper cleanup
### ConfigManager (ConfigManager.cs)
**Role**: Load and save configuration from/to file
**Responsibilities**:
- Get config path (XDG: `~/.config/openquery/config`)
- Parse key-value pairs (no INI library, manual parsing)
- Provide `AppConfig` object with defaults
- Save settings back to file
**Design**:
- Static class (no instances)
- Creates config directory if missing
- Line-by-line parsing (simple, no dependencies)
- Could be improved with proper INI parser or JSON
## Data Flow
### End-to-End Data Flow
```
User: "What is quantum entanglement?"
1. OpenQueryOptions created
{ Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }
2. Query Generation (if Queries > 1)
→ ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
→ Deserialize to List<string> (generatedQueries)
3. Search Phase
generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
→ DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)
4. Fetch Phase
searchResults → Parallel.ArticleService.FetchArticleAsync → Article
→ ChunkingService.ChunkText (split into ~500-char pieces)
→ ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)
5. Embedding Phase
originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)
6. Ranking Phase
For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)
7. Answer Phase
context = string.Join("\n\n", topChunks.Select(...))
→ ChatCompletionRequest to OpenRouter with context + question
→ StreamAsync → Console.Write(delta) (real-time display)
Result: User sees answer with [Source N] citations
```
### Data Structures
**Chunk** - The core data structure flowing through the pipeline:
```csharp
public record Chunk(
string Content, // Text content (~500 chars)
string SourceUrl, // Where it came from
string? Title = null // Article title
)
{
public float[]? Embedding { get; set; } // Added in Phase 3
public float Score { get; set; } // Added in Phase 4
}
```
**Data Flow State**:
- Phase 1-2: `Chunk` without embedding
- Phase 3: `Chunk.Embedding` populated
- Phase 4: `Chunk.Score` populated
- Phase 5: Serialized into context string
### Memory Footprint
**Per 15-article run (approximate)**:
- Raw HTML (fetched): ~5MB (transient, discarded after extract)
- Articles: ~500KB (15 articles × ~30KB extracted text)
- Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
- Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
- Total peak: ~1-2MB (excluding OpenRouter's memory usage)
**Note**: AOT compilation reduces runtime memory compared to JIT.
## Concurrency Model
OpenQuery uses multiple parallelization strategies:
### Unbounded Parallelism (Task.WhenAll)
**Where**: Search queries (Phase 1)
```csharp
var searchTasks = generatedQueries.Select(async query => { ... });
await Task.WhenAll(searchTasks);
```
**Rationale**: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.
**Risk**: Could overwhelm SearxNG if `--queries` is set very high (100+). Default 3 is safe.
### Semaphore-Controlled Parallelism
**Where**: Article fetching (Phase 2)
```csharp
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
await Task.WhenAll(fetchTasks); // Each task waits on semaphore
```
**Rationale**: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentArticleFetches` (compile-time constant currently).
### Parallel.ForEachAsync with MaxDegreeOfParallelism
**Where**: Embedding batch processing (Phase 3)
```csharp
await Parallel.ForEachAsync(
batchIndices,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
async (batchIndex, ct) => { ... }
);
```
**Rationale**: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentEmbeddingRequests` (compile-time).
### Progress Reporting (Channel)
**Where**: All phases pass `onProgress` callback
**Implementation**:
- `StatusReporter.UpdateStatus()` → writes to channel
- Background task reads channel and displays
- Non-blocking; callbacks are fire-and-forget (TryWrite)
**Thread Safety**: Channel is thread-safe; multiple phases may write concurrently.
## Error Handling & Resilience
### HTTP Errors
**OpenRouterClient**:
- Calls `response.EnsureSuccessStatusCode()` → throws `HttpRequestException` on 4xx/5xx
- No retry (handled at higher level in `EmbeddingService`)
**SearxngClient**:
- Returns empty `List<SearxngResult>` on non-success
- No exception thrown (searches are non-critical; if some queries fail, others proceed)
### Retry Policy (Polly)
**Location**: `EmbeddingService` constructor
**Scope**: Only embedding requests (`_client.EmbedAsync`)
**Policy**:
- Max 3 attempts
- Exponential backoff: 1s, 2s, 4s
- Only retries `HttpRequestException` (network errors, 429, 5xx)
**Why not on chat completions?**
- Query generation and final answer are critical; failures should surface immediately
- Could be added in future if transient failures are common
### Graceful Degradation
**Query Generation Failure**:
```csharp
try { ... generate queries ... }
catch (Exception ex)
{
// Fall back to original question as sole query
if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
}
```
**Embedding Batch Failure**:
```csharp
catch
{
// Return empty embeddings for this batch (chunk will be filtered out)
var emptyBatch = new float[batch.Count][];
// fill with empty arrays
results.Add((batchIndex, emptyBatch));
}
```
**Article Fetch Failure**:
```csharp
try { await ArticleService.FetchArticleAsync(url); }
catch (Exception ex)
{
if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
// Chunk not added; continue with others
}
```
### User-Facing Errors
Top-level exception handler in `Program.cs`:
```csharp
try { await openQuery.RunAsync(options); }
catch (HttpRequestException ex)
{
Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
Environment.Exit(1);
}
catch (Exception ex)
{
Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
Environment.Exit(1);
}
```
### Cancellation Support
`OpenRouterClient.StreamAsync` and `EmbeddingService` methods accept `CancellationToken`.
Used in:
- Streaming answer (Ctrl+C stops immediately)
- Parallel embeddings (can be cancelled)
## Performance Considerations
### Latency Breakdown (Typical)
| Stage | Time | Description |
|-------|------|-------------|
| Query generation | 2-5s | LLM generates 3-5 queries |
| Searches | 3-8s | 3-5 parallel SearxNG queries |
| Article fetching | 5-15s | 10-20 parallel fetches (network + parse) |
| Embeddings | 2-4s | 50-100 chunks in 4-parallel batches |
| Final answer | 5-20s | Depends on answer length (streaming) |
| **Total** | **15-50s** | Varies widely based on network & content |
### Bottlenecks
1. **Network I/O** (article fetching, API calls) - can't be CPU-bound
2. **OpenRouter API latency** - varies by model and load
3. **SmartReader parsing** - CPU-bound for large HTML
4. **Embedding API rate** - OpenRouter may rate limit if too many concurrent
### Optimization Strategies
- **Parallelism**: Already maximized within API constraints
- **Caching**: Not implemented; future enhancement could cache embeddings per URL
- **Batching**: 300-chunk batches reduce API overhead
- **AOT**: Native compilation reduces startup overhead vs JIT
### Scalability Limits
- **Memory**: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
- **API Limits**: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
- **SearxNG Limits**: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).
## Design Decisions
### Why Not Use a DI Container?
OpenQuery manually wires dependencies in `Program.cs`. For a small CLI app, this is:
- Simpler (no container configuration)
- More explicit (easy to trace dependency graph)
- No runtime overhead
- Easier to understand for contributors
Container-based DI would be overengineering.
### Why AOT?
.NET 10 AOT provides:
- **Fast startup** (<100ms vs ~500ms JIT)
- **Smaller footprint** (trimmed, no JIT)
- **No runtime dependencies** (self-contained)
- Better for CLI tools distributed to users
Trade-offs:
- Longer build time
- Some reflection-based APIs not supported (not needed here)
- Less flexible (can't load dynamic assemblies, but not needed)
### Why SmartReader for Article Extraction?
SmartReader uses Readability-based algorithm similar to Firefox Reader View:
- Removes ads, navigation, comments, boilerplate
- Extracts main article content
- Handles malformed HTML gracefully
- Zero dependencies (pure .NET)
Alternatives considered:
- `HtmlAgilityPack` (too low-level, need to implement extraction logic)
- `AngleSharp` (similar, still need extraction)
- External services (like diffbot) - require API keys, costs money
SmartReader is the sweet spot: free, good quality, easy integration.
### Why Embeddings + Cosine Similarity vs Full-Text Search?
Full-text search (like Lucene) would:
- Require inverted index, more complex
- Be faster for exact keyword matching
- Not understand semantic similarity
Embeddings provide:
- Semantic similarity (understand meaning, not just keywords)
- Simple math (cosine similarity of float arrays)
- No index to maintain (just compute on-the-fly)
Trade-off: Embedding API cost and latency vs LRU cache potential.
### Why Not RAG (Retrieval Augmented Generation) Framework?
OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:
- Add dependency bloat
- Reduce control
- Increase abstraction complexity
Custom implementation is ~1000 LOC and perfectly matches needs.
### Why System.CommandLine?
Provides:
- Native-like CLI help (`openquery --help`)
- Strongly-typed options
- Command hierarchy (main + subcommands)
- Good error messages
Alternative: `CommandLineParser` (older) or manual parsing. System.CommandLine is modern and actively developed.
---
## Next Steps
- [Components](components/overview.md) - Deep dive into each module
- [API Reference](api/cli.md) - Complete command documentation
- [Troubleshooting](troubleshooting.md) - Debug issues
---
**Want to extend OpenQuery?** Check the [Components](components/overview.md) guide to understand each piece.