Files
OpenQuery/docs/architecture.md
OpenQuery Documentation 65ca2401ae docs: add comprehensive documentation with README and detailed guides
- Add user-friendly README.md with quick start guide
- Create docs/ folder with structured technical documentation:
  - installation.md: Build and setup instructions
  - configuration.md: Complete config reference
  - usage.md: CLI usage guide with examples
  - architecture.md: System design and patterns
  - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models)
  - api/: CLI reference, environment variables, programmatic API
  - troubleshooting.md: Common issues and solutions
  - performance.md: Latency, throughput, and optimization
- All documentation fully cross-referenced with internal links
- Covers project overview, architecture, components, APIs, and support

See individual files for complete documentation.
2026-03-19 10:01:58 +01:00

683 lines
23 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Architecture
Deep dive into OpenQuery's system design, architectural patterns, and data flow.
## 📋 Table of Contents
1. [System Overview](#system-overview)
2. [Architectural Patterns](#architectural-patterns)
3. [Component Architecture](#component-architecture)
4. [Data Flow](#data-flow)
5. [Concurrency Model](#concurrency-model)
6. [Error Handling & Resilience](#error-handling--resilience)
7. [Performance Considerations](#performance-considerations)
8. [Design Decisions](#design-decisions)
## System Overview
OpenQuery is a **pipeline-based AI application** that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.
### Core Design Principles
1. **Separation of Concerns** - Each component has a single, well-defined responsibility
2. **Parallel First** - Wherever possible, operations are parallelized for speed
3. **Resilient by Default** - Built-in retries, rate limiting, and graceful degradation
4. **Configurable** - Most parameters can be adjusted without code changes
5. **Observable** - Progress reporting and verbose mode for debugging
### High-Level Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ User Layer │
│ CLI (System.CommandLine) → OpenQueryApp │
└─────────────────────────────┬───────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Orchestration Layer │
│ OpenQueryApp → SearchTool (4-phase pipeline) │
└─────────────────────────────┬───────────────────────────────┘
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Search Layer │ │ Processing Layer │ │ AI Layer │
│ │ │ │ │ │
│ SearxngClient│ │ ArticleService │ │OpenRouterClient│
│ │ │ ChunkingService │ │ │
│ │ │ EmbeddingService │ │ │
└──────────────┘ └──────────────────┘ └──────────────┘
```
## Architectural Patterns
### 1. Pipeline Pattern
The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:
```csharp
Phase 1: ExecuteParallelSearchesAsync
(List<SearxngResult>)
Phase 2: ExecuteParallelArticleFetchingAsync
(List<Chunk>)
Phase 3: ExecuteParallelEmbeddingsAsync
((queryEmbedding, chunkEmbeddings))
Phase 4: RankAndSelectTopChunks
(List<Chunk> topChunks)
Formatted context string returned
```
Each phase:
- Accepts input from previous phase
- Processes in parallel where applicable
- Returns output to next phase
- Reports progress via callbacks
### 2. Service Layer Pattern
Services (`Services/` directory) are stateless classes that encapsulate specific operations:
- **Clients**: `OpenRouterClient`, `SearxngClient` (HTTP communication)
- **Processors**: `EmbeddingService`, `ChunkingService` (data transformation)
- **Extractors**: `ArticleService` (content extraction)
- **Infrastructure**: `RateLimiter`, `StatusReporter` (cross-cutting concerns)
All dependencies are explicit (constructor injection), making services easily testable.
### 3. Dependency Injection (Manual)
While not using a DI container, OpenQuery follows DI principles:
```csharp
// Program.cs: instantiate dependencies with explicit parameters
var client = new OpenRouterClient(apiKey);
var searxngClient = new SearxngClient(searxngUrl);
var embeddingService = new EmbeddingService(client);
var searchTool = new SearchTool(searxngClient, embeddingService);
var openQuery = new OpenQueryApp(client, searchTool, model);
```
Benefits:
- Clear dependency graph
- Easy to substitute mocks for testing
- No magic; visible construction
### 4. Observer Pattern (Progress Reporting)
`StatusReporter` and progress callbacks implement observer pattern:
```csharp
// SearchTool receives a progress callback
public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)
// Components invoke callback at key milestones
onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");
// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);
```
### 5. Resilience Patterns (Polly)
`EmbeddingService` uses Polly's retry policy:
```csharp
_retryPipeline = new ResiliencePipelineBuilder()
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = 3,
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Exponential,
ShouldHandle = new PredicateBuilder()
.Handle<HttpRequestException>()
})
.Build();
```
This automatically retries failed embedding requests with exponential backoff.
### 6. Producer-Consumer Pattern (Channel-based)
`StatusReporter` uses `System.Threading.Channels.Channel<string>` for asynchronous progress updates:
- Producer: `UpdateStatus()` writes messages to channel
- Consumer: Background task `ProcessStatusUpdatesAsync()` reads and displays
- Benefit: No blocking between progress generation and display
### 7. Disposable Pattern
Components that hold unmanaged resources implement `IDisposable` or `IAsyncDisposable`:
- `StatusReporter` - stops background spinner task
- `RateLimiter` - disposes semaphore
Used via `using` statements for deterministic cleanup.
## Component Architecture
### OpenQueryApp (OpenQuery.cs)
**Role**: Main orchestrator; coordinates the entire workflow
**Responsibilities**:
- Parse CLI options into `OpenQueryOptions`
- Load configuration and resolve API keys/models
- Optionally generate expanded search queries via LLM
- Invoke `SearchTool` with progress callbacks
- Stream final answer from LLM
**Key Methods**:
- `RunAsync(OpenQueryOptions)` - Main entry point
**Interactions**:
- Instantiates `OpenRouterClient` (for both query gen and final answer)
- Instantiates `SearxngClient` (passed to `SearchTool`)
- Instantiates `EmbeddingService` (passed to `SearchTool`)
- Instantiates `SearchTool` (orchestration)
- `StatusReporter` for UI updates
### SearchTool (Tools/SearchTool.cs)
**Role**: Core search-retrieve-rank pipeline orchestrator
**Responsibilities**:
- Execute 4-phase pipeline (search → fetch → embed → rank)
- Manage concurrency limits (via semaphores)
- Coordinate parallel operations
- Generate context string for final answer
**Interactions**:
- Uses `SearxngClient` for Phase 1
- Uses `ArticleService` + `ChunkingService` for Phase 2
- Uses `EmbeddingService` for Phase 3
- Has no external UI dependency (pure logic)
**Parallelization Strategy**:
- **Phase 1**: `Task.WhenAll` on search tasks (unbounded but limited by SearxNG instance)
- **Phase 2**: Semaphore (max 10 concurrent fetches)
- **Phase 3**: `Parallel.ForEachAsync` (max 4 concurrent embedding batches)
### EmbeddingService (Services/EmbeddingService.cs)
**Role**: Generate vector embeddings with batching, rate limiting, and retries
**Responsibilities**:
- Batch embedding requests (default: 300 per batch)
- Parallelize batches (default: 4 concurrent)
- Apply rate limiting (via `RateLimiter`)
- Retry failed requests (Polly)
- Calculate cosine similarity
**Key Methods**:
- `GetEmbeddingsAsync(List<string> texts, ...)` - batch with progress
- `GetEmbeddingAsync(string text)` - single embedding
- `CosineSimilarity(float[], float[])` - static vector math
**Design Notes**:
- Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
- Batches of 300 reduce API overhead
- Polly retry handles transient failures (429, 500, network blips)
### OpenRouterClient (Services/OpenRouterClient.cs)
**Role**: HTTP client for OpenRouter API (completions + embeddings)
**Responsibilities**:
- Serialize requests to JSON (source-generated)
- Send HTTP with authorization header
- Stream responses for chat completions (IAsyncEnumerable)
- Return full responses for non-streaming
- Throw on non-2xx status codes
**Endpoints**:
- POST `/chat/completions` (stream and non-stream)
- POST `/embeddings`
**Configuration**:
- Base URL: `https://openrouter.ai/api/v1`
- Headers: `Authorization: Bearer {apiKey}`, `Accept: application/json`
**Design**:
- Low-level client; no retry logic (retry is in `EmbeddingService`)
- Thin wrapper around `HttpClient`
- Could be replaced with `HttpClientFactory` in larger apps
### SearxngClient (Services/SearxngClient.cs)
**Role**: HTTP client for SearxNG metasearch
**Responsibilities**:
- Construct search URL with query param
- GET request and deserialize JSON
- Limit results (`.Take(limit)`)
- Return empty list on failure (no exceptions)
**Endpoint**: `GET /search?q={query}&format=json`
**Design**:
- Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
- `DistinctBy(r => r.Url)` deduplication happens upstream
### ArticleService (Services/ArticleService.cs)
**Role**: Extract clean article content from URLs
**Responsibilities**:
- Call `SmartReader.ParseArticleAsync(url)`
- Return `Article` object with `Title`, `TextContent`, `IsReadable`
**Design**:
- Single responsibility: extraction only (no fetching, no chunking)
- SmartReader handles all complexity (HTML parsing, boilerplate removal)
- Exceptions propagate to `SearchTool` (handled there)
### ChunkingService (Services/ChunkingService.cs)
**Role**: Split long text into 500-char chunks at natural boundaries
**Algorithm**:
1. Start at index 0
2. Take up to 500 chars
3. If not at end, backtrack to last space/newline/period
4. Add chunk, advance start
5. Repeat until done
**Design**:
- Static class (stateless utility)
- No dependencies
- Pure function (input text → output chunks)
### RateLimiter (Services/RateLimiter.cs)
**Role**: Limit concurrent operations via semaphore
**Responsibilities**:
- Wrap actions with semaphore acquisition
- Support both sync and async actions
- Implement `IAsyncDisposable` for cleanup
**Usage Pattern**:
```csharp
await _rateLimiter.ExecuteAsync(async () =>
{
// operation limited by semaphore
return await SomeApiCall();
});
```
**Design**:
- Generic wrapper (can return TResult)
- `SemaphoreSlim` initialized at construction
- Used in `EmbeddingService` for parallel embedding batches
### StatusReporter (Services/StatusReporter.cs)
**Role**: Show real-time progress with spinner or verbose mode
**Responsibilities**:
- Maintain spinner animation (background task)
- Receive status updates via channel
- Display updates with appropriate formatting
- Stop spinner on completion
**Features**:
- **Spinner mode** (non-verbose): `⠋ Fetching...` with animated Braille chars
- **Verbose mode**: `[Fetching article 1/10: example.com]` on separate lines
- **Thread-safe**: Channel is safe for concurrent writes
- **Non-blocking**: Background spinner doesn't block updates
**Design**:
- `Channel<string>` for asynchronous producer-consumer
- Background task (`_statusProcessor`) reads from channel
- Spinner runs on its own task with 100ms delay per frame
- `IDisposable` ensures proper cleanup
### ConfigManager (ConfigManager.cs)
**Role**: Load and save configuration from/to file
**Responsibilities**:
- Get config path (XDG: `~/.config/openquery/config`)
- Parse key-value pairs (no INI library, manual parsing)
- Provide `AppConfig` object with defaults
- Save settings back to file
**Design**:
- Static class (no instances)
- Creates config directory if missing
- Line-by-line parsing (simple, no dependencies)
- Could be improved with proper INI parser or JSON
## Data Flow
### End-to-End Data Flow
```
User: "What is quantum entanglement?"
1. OpenQueryOptions created
{ Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }
2. Query Generation (if Queries > 1)
→ ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
→ Deserialize to List<string> (generatedQueries)
3. Search Phase
generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
→ DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)
4. Fetch Phase
searchResults → Parallel.ArticleService.FetchArticleAsync → Article
→ ChunkingService.ChunkText (split into ~500-char pieces)
→ ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)
5. Embedding Phase
originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)
6. Ranking Phase
For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)
7. Answer Phase
context = string.Join("\n\n", topChunks.Select(...))
→ ChatCompletionRequest to OpenRouter with context + question
→ StreamAsync → Console.Write(delta) (real-time display)
Result: User sees answer with [Source N] citations
```
### Data Structures
**Chunk** - The core data structure flowing through the pipeline:
```csharp
public record Chunk(
string Content, // Text content (~500 chars)
string SourceUrl, // Where it came from
string? Title = null // Article title
)
{
public float[]? Embedding { get; set; } // Added in Phase 3
public float Score { get; set; } // Added in Phase 4
}
```
**Data Flow State**:
- Phase 1-2: `Chunk` without embedding
- Phase 3: `Chunk.Embedding` populated
- Phase 4: `Chunk.Score` populated
- Phase 5: Serialized into context string
### Memory Footprint
**Per 15-article run (approximate)**:
- Raw HTML (fetched): ~5MB (transient, discarded after extract)
- Articles: ~500KB (15 articles × ~30KB extracted text)
- Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
- Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
- Total peak: ~1-2MB (excluding OpenRouter's memory usage)
**Note**: AOT compilation reduces runtime memory compared to JIT.
## Concurrency Model
OpenQuery uses multiple parallelization strategies:
### Unbounded Parallelism (Task.WhenAll)
**Where**: Search queries (Phase 1)
```csharp
var searchTasks = generatedQueries.Select(async query => { ... });
await Task.WhenAll(searchTasks);
```
**Rationale**: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.
**Risk**: Could overwhelm SearxNG if `--queries` is set very high (100+). Default 3 is safe.
### Semaphore-Controlled Parallelism
**Where**: Article fetching (Phase 2)
```csharp
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
await Task.WhenAll(fetchTasks); // Each task waits on semaphore
```
**Rationale**: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentArticleFetches` (compile-time constant currently).
### Parallel.ForEachAsync with MaxDegreeOfParallelism
**Where**: Embedding batch processing (Phase 3)
```csharp
await Parallel.ForEachAsync(
batchIndices,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
async (batchIndex, ct) => { ... }
);
```
**Rationale**: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentEmbeddingRequests` (compile-time).
### Progress Reporting (Channel)
**Where**: All phases pass `onProgress` callback
**Implementation**:
- `StatusReporter.UpdateStatus()` → writes to channel
- Background task reads channel and displays
- Non-blocking; callbacks are fire-and-forget (TryWrite)
**Thread Safety**: Channel is thread-safe; multiple phases may write concurrently.
## Error Handling & Resilience
### HTTP Errors
**OpenRouterClient**:
- Calls `response.EnsureSuccessStatusCode()` → throws `HttpRequestException` on 4xx/5xx
- No retry (handled at higher level in `EmbeddingService`)
**SearxngClient**:
- Returns empty `List<SearxngResult>` on non-success
- No exception thrown (searches are non-critical; if some queries fail, others proceed)
### Retry Policy (Polly)
**Location**: `EmbeddingService` constructor
**Scope**: Only embedding requests (`_client.EmbedAsync`)
**Policy**:
- Max 3 attempts
- Exponential backoff: 1s, 2s, 4s
- Only retries `HttpRequestException` (network errors, 429, 5xx)
**Why not on chat completions?**
- Query generation and final answer are critical; failures should surface immediately
- Could be added in future if transient failures are common
### Graceful Degradation
**Query Generation Failure**:
```csharp
try { ... generate queries ... }
catch (Exception ex)
{
// Fall back to original question as sole query
if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
}
```
**Embedding Batch Failure**:
```csharp
catch
{
// Return empty embeddings for this batch (chunk will be filtered out)
var emptyBatch = new float[batch.Count][];
// fill with empty arrays
results.Add((batchIndex, emptyBatch));
}
```
**Article Fetch Failure**:
```csharp
try { await ArticleService.FetchArticleAsync(url); }
catch (Exception ex)
{
if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
// Chunk not added; continue with others
}
```
### User-Facing Errors
Top-level exception handler in `Program.cs`:
```csharp
try { await openQuery.RunAsync(options); }
catch (HttpRequestException ex)
{
Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
Environment.Exit(1);
}
catch (Exception ex)
{
Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
Environment.Exit(1);
}
```
### Cancellation Support
`OpenRouterClient.StreamAsync` and `EmbeddingService` methods accept `CancellationToken`.
Used in:
- Streaming answer (Ctrl+C stops immediately)
- Parallel embeddings (can be cancelled)
## Performance Considerations
### Latency Breakdown (Typical)
| Stage | Time | Description |
|-------|------|-------------|
| Query generation | 2-5s | LLM generates 3-5 queries |
| Searches | 3-8s | 3-5 parallel SearxNG queries |
| Article fetching | 5-15s | 10-20 parallel fetches (network + parse) |
| Embeddings | 2-4s | 50-100 chunks in 4-parallel batches |
| Final answer | 5-20s | Depends on answer length (streaming) |
| **Total** | **15-50s** | Varies widely based on network & content |
### Bottlenecks
1. **Network I/O** (article fetching, API calls) - can't be CPU-bound
2. **OpenRouter API latency** - varies by model and load
3. **SmartReader parsing** - CPU-bound for large HTML
4. **Embedding API rate** - OpenRouter may rate limit if too many concurrent
### Optimization Strategies
- **Parallelism**: Already maximized within API constraints
- **Caching**: Not implemented; future enhancement could cache embeddings per URL
- **Batching**: 300-chunk batches reduce API overhead
- **AOT**: Native compilation reduces startup overhead vs JIT
### Scalability Limits
- **Memory**: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
- **API Limits**: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
- **SearxNG Limits**: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).
## Design Decisions
### Why Not Use a DI Container?
OpenQuery manually wires dependencies in `Program.cs`. For a small CLI app, this is:
- Simpler (no container configuration)
- More explicit (easy to trace dependency graph)
- No runtime overhead
- Easier to understand for contributors
Container-based DI would be overengineering.
### Why AOT?
.NET 10 AOT provides:
- **Fast startup** (<100ms vs ~500ms JIT)
- **Smaller footprint** (trimmed, no JIT)
- **No runtime dependencies** (self-contained)
- Better for CLI tools distributed to users
Trade-offs:
- Longer build time
- Some reflection-based APIs not supported (not needed here)
- Less flexible (can't load dynamic assemblies, but not needed)
### Why SmartReader for Article Extraction?
SmartReader uses Readability-based algorithm similar to Firefox Reader View:
- Removes ads, navigation, comments, boilerplate
- Extracts main article content
- Handles malformed HTML gracefully
- Zero dependencies (pure .NET)
Alternatives considered:
- `HtmlAgilityPack` (too low-level, need to implement extraction logic)
- `AngleSharp` (similar, still need extraction)
- External services (like diffbot) - require API keys, costs money
SmartReader is the sweet spot: free, good quality, easy integration.
### Why Embeddings + Cosine Similarity vs Full-Text Search?
Full-text search (like Lucene) would:
- Require inverted index, more complex
- Be faster for exact keyword matching
- Not understand semantic similarity
Embeddings provide:
- Semantic similarity (understand meaning, not just keywords)
- Simple math (cosine similarity of float arrays)
- No index to maintain (just compute on-the-fly)
Trade-off: Embedding API cost and latency vs LRU cache potential.
### Why Not RAG (Retrieval Augmented Generation) Framework?
OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:
- Add dependency bloat
- Reduce control
- Increase abstraction complexity
Custom implementation is ~1000 LOC and perfectly matches needs.
### Why System.CommandLine?
Provides:
- Native-like CLI help (`openquery --help`)
- Strongly-typed options
- Command hierarchy (main + subcommands)
- Good error messages
Alternative: `CommandLineParser` (older) or manual parsing. System.CommandLine is modern and actively developed.
---
## Next Steps
- [Components](components/overview.md) - Deep dive into each module
- [API Reference](api/cli.md) - Complete command documentation
- [Troubleshooting](troubleshooting.md) - Debug issues
---
**Want to extend OpenQuery?** Check the [Components](components/overview.md) guide to understand each piece.