- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
23 KiB
Architecture
Deep dive into OpenQuery's system design, architectural patterns, and data flow.
📋 Table of Contents
- System Overview
- Architectural Patterns
- Component Architecture
- Data Flow
- Concurrency Model
- Error Handling & Resilience
- Performance Considerations
- Design Decisions
System Overview
OpenQuery is a pipeline-based AI application that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.
Core Design Principles
- Separation of Concerns - Each component has a single, well-defined responsibility
- Parallel First - Wherever possible, operations are parallelized for speed
- Resilient by Default - Built-in retries, rate limiting, and graceful degradation
- Configurable - Most parameters can be adjusted without code changes
- Observable - Progress reporting and verbose mode for debugging
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ User Layer │
│ CLI (System.CommandLine) → OpenQueryApp │
└─────────────────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Orchestration Layer │
│ OpenQueryApp → SearchTool (4-phase pipeline) │
└─────────────────────────────┬───────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Search Layer │ │ Processing Layer │ │ AI Layer │
│ │ │ │ │ │
│ SearxngClient│ │ ArticleService │ │OpenRouterClient│
│ │ │ ChunkingService │ │ │
│ │ │ EmbeddingService │ │ │
└──────────────┘ └──────────────────┘ └──────────────┘
Architectural Patterns
1. Pipeline Pattern
The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:
Phase 1: ExecuteParallelSearchesAsync
↓ (List<SearxngResult>)
Phase 2: ExecuteParallelArticleFetchingAsync
↓ (List<Chunk>)
Phase 3: ExecuteParallelEmbeddingsAsync
↓ ((queryEmbedding, chunkEmbeddings))
Phase 4: RankAndSelectTopChunks
↓ (List<Chunk> topChunks)
→ Formatted context string returned
Each phase:
- Accepts input from previous phase
- Processes in parallel where applicable
- Returns output to next phase
- Reports progress via callbacks
2. Service Layer Pattern
Services (Services/ directory) are stateless classes that encapsulate specific operations:
- Clients:
OpenRouterClient,SearxngClient(HTTP communication) - Processors:
EmbeddingService,ChunkingService(data transformation) - Extractors:
ArticleService(content extraction) - Infrastructure:
RateLimiter,StatusReporter(cross-cutting concerns)
All dependencies are explicit (constructor injection), making services easily testable.
3. Dependency Injection (Manual)
While not using a DI container, OpenQuery follows DI principles:
// Program.cs: instantiate dependencies with explicit parameters
var client = new OpenRouterClient(apiKey);
var searxngClient = new SearxngClient(searxngUrl);
var embeddingService = new EmbeddingService(client);
var searchTool = new SearchTool(searxngClient, embeddingService);
var openQuery = new OpenQueryApp(client, searchTool, model);
Benefits:
- Clear dependency graph
- Easy to substitute mocks for testing
- No magic; visible construction
4. Observer Pattern (Progress Reporting)
StatusReporter and progress callbacks implement observer pattern:
// SearchTool receives a progress callback
public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)
// Components invoke callback at key milestones
onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");
// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);
5. Resilience Patterns (Polly)
EmbeddingService uses Polly's retry policy:
_retryPipeline = new ResiliencePipelineBuilder()
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = 3,
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Exponential,
ShouldHandle = new PredicateBuilder()
.Handle<HttpRequestException>()
})
.Build();
This automatically retries failed embedding requests with exponential backoff.
6. Producer-Consumer Pattern (Channel-based)
StatusReporter uses System.Threading.Channels.Channel<string> for asynchronous progress updates:
- Producer:
UpdateStatus()writes messages to channel - Consumer: Background task
ProcessStatusUpdatesAsync()reads and displays - Benefit: No blocking between progress generation and display
7. Disposable Pattern
Components that hold unmanaged resources implement IDisposable or IAsyncDisposable:
StatusReporter- stops background spinner taskRateLimiter- disposes semaphore
Used via using statements for deterministic cleanup.
Component Architecture
OpenQueryApp (OpenQuery.cs)
Role: Main orchestrator; coordinates the entire workflow
Responsibilities:
- Parse CLI options into
OpenQueryOptions - Load configuration and resolve API keys/models
- Optionally generate expanded search queries via LLM
- Invoke
SearchToolwith progress callbacks - Stream final answer from LLM
Key Methods:
RunAsync(OpenQueryOptions)- Main entry point
Interactions:
- Instantiates
OpenRouterClient(for both query gen and final answer) - Instantiates
SearxngClient(passed toSearchTool) - Instantiates
EmbeddingService(passed toSearchTool) - Instantiates
SearchTool(orchestration) StatusReporterfor UI updates
SearchTool (Tools/SearchTool.cs)
Role: Core search-retrieve-rank pipeline orchestrator
Responsibilities:
- Execute 4-phase pipeline (search → fetch → embed → rank)
- Manage concurrency limits (via semaphores)
- Coordinate parallel operations
- Generate context string for final answer
Interactions:
- Uses
SearxngClientfor Phase 1 - Uses
ArticleService+ChunkingServicefor Phase 2 - Uses
EmbeddingServicefor Phase 3 - Has no external UI dependency (pure logic)
Parallelization Strategy:
- Phase 1:
Task.WhenAllon search tasks (unbounded but limited by SearxNG instance) - Phase 2: Semaphore (max 10 concurrent fetches)
- Phase 3:
Parallel.ForEachAsync(max 4 concurrent embedding batches)
EmbeddingService (Services/EmbeddingService.cs)
Role: Generate vector embeddings with batching, rate limiting, and retries
Responsibilities:
- Batch embedding requests (default: 300 per batch)
- Parallelize batches (default: 4 concurrent)
- Apply rate limiting (via
RateLimiter) - Retry failed requests (Polly)
- Calculate cosine similarity
Key Methods:
GetEmbeddingsAsync(List<string> texts, ...)- batch with progressGetEmbeddingAsync(string text)- single embeddingCosineSimilarity(float[], float[])- static vector math
Design Notes:
- Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
- Batches of 300 reduce API overhead
- Polly retry handles transient failures (429, 500, network blips)
OpenRouterClient (Services/OpenRouterClient.cs)
Role: HTTP client for OpenRouter API (completions + embeddings)
Responsibilities:
- Serialize requests to JSON (source-generated)
- Send HTTP with authorization header
- Stream responses for chat completions (IAsyncEnumerable)
- Return full responses for non-streaming
- Throw on non-2xx status codes
Endpoints:
- POST
/chat/completions(stream and non-stream) - POST
/embeddings
Configuration:
- Base URL:
https://openrouter.ai/api/v1 - Headers:
Authorization: Bearer {apiKey},Accept: application/json
Design:
- Low-level client; no retry logic (retry is in
EmbeddingService) - Thin wrapper around
HttpClient - Could be replaced with
HttpClientFactoryin larger apps
SearxngClient (Services/SearxngClient.cs)
Role: HTTP client for SearxNG metasearch
Responsibilities:
- Construct search URL with query param
- GET request and deserialize JSON
- Limit results (
.Take(limit)) - Return empty list on failure (no exceptions)
Endpoint: GET /search?q={query}&format=json
Design:
- Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
DistinctBy(r => r.Url)deduplication happens upstream
ArticleService (Services/ArticleService.cs)
Role: Extract clean article content from URLs
Responsibilities:
- Call
SmartReader.ParseArticleAsync(url) - Return
Articleobject withTitle,TextContent,IsReadable
Design:
- Single responsibility: extraction only (no fetching, no chunking)
- SmartReader handles all complexity (HTML parsing, boilerplate removal)
- Exceptions propagate to
SearchTool(handled there)
ChunkingService (Services/ChunkingService.cs)
Role: Split long text into 500-char chunks at natural boundaries
Algorithm:
- Start at index 0
- Take up to 500 chars
- If not at end, backtrack to last space/newline/period
- Add chunk, advance start
- Repeat until done
Design:
- Static class (stateless utility)
- No dependencies
- Pure function (input text → output chunks)
RateLimiter (Services/RateLimiter.cs)
Role: Limit concurrent operations via semaphore
Responsibilities:
- Wrap actions with semaphore acquisition
- Support both sync and async actions
- Implement
IAsyncDisposablefor cleanup
Usage Pattern:
await _rateLimiter.ExecuteAsync(async () =>
{
// operation limited by semaphore
return await SomeApiCall();
});
Design:
- Generic wrapper (can return TResult)
SemaphoreSliminitialized at construction- Used in
EmbeddingServicefor parallel embedding batches
StatusReporter (Services/StatusReporter.cs)
Role: Show real-time progress with spinner or verbose mode
Responsibilities:
- Maintain spinner animation (background task)
- Receive status updates via channel
- Display updates with appropriate formatting
- Stop spinner on completion
Features:
- Spinner mode (non-verbose):
⠋ Fetching...with animated Braille chars - Verbose mode:
[Fetching article 1/10: example.com]on separate lines - Thread-safe: Channel is safe for concurrent writes
- Non-blocking: Background spinner doesn't block updates
Design:
Channel<string>for asynchronous producer-consumer- Background task (
_statusProcessor) reads from channel - Spinner runs on its own task with 100ms delay per frame
IDisposableensures proper cleanup
ConfigManager (ConfigManager.cs)
Role: Load and save configuration from/to file
Responsibilities:
- Get config path (XDG:
~/.config/openquery/config) - Parse key-value pairs (no INI library, manual parsing)
- Provide
AppConfigobject with defaults - Save settings back to file
Design:
- Static class (no instances)
- Creates config directory if missing
- Line-by-line parsing (simple, no dependencies)
- Could be improved with proper INI parser or JSON
Data Flow
End-to-End Data Flow
User: "What is quantum entanglement?"
1. OpenQueryOptions created
{ Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }
2. Query Generation (if Queries > 1)
→ ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
→ Deserialize to List<string> (generatedQueries)
3. Search Phase
generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
→ DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)
4. Fetch Phase
searchResults → Parallel.ArticleService.FetchArticleAsync → Article
→ ChunkingService.ChunkText (split into ~500-char pieces)
→ ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)
5. Embedding Phase
originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)
6. Ranking Phase
For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)
7. Answer Phase
context = string.Join("\n\n", topChunks.Select(...))
→ ChatCompletionRequest to OpenRouter with context + question
→ StreamAsync → Console.Write(delta) (real-time display)
Result: User sees answer with [Source N] citations
Data Structures
Chunk - The core data structure flowing through the pipeline:
public record Chunk(
string Content, // Text content (~500 chars)
string SourceUrl, // Where it came from
string? Title = null // Article title
)
{
public float[]? Embedding { get; set; } // Added in Phase 3
public float Score { get; set; } // Added in Phase 4
}
Data Flow State:
- Phase 1-2:
Chunkwithout embedding - Phase 3:
Chunk.Embeddingpopulated - Phase 4:
Chunk.Scorepopulated - Phase 5: Serialized into context string
Memory Footprint
Per 15-article run (approximate):
- Raw HTML (fetched): ~5MB (transient, discarded after extract)
- Articles: ~500KB (15 articles × ~30KB extracted text)
- Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
- Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
- Total peak: ~1-2MB (excluding OpenRouter's memory usage)
Note: AOT compilation reduces runtime memory compared to JIT.
Concurrency Model
OpenQuery uses multiple parallelization strategies:
Unbounded Parallelism (Task.WhenAll)
Where: Search queries (Phase 1)
var searchTasks = generatedQueries.Select(async query => { ... });
await Task.WhenAll(searchTasks);
Rationale: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.
Risk: Could overwhelm SearxNG if --queries is set very high (100+). Default 3 is safe.
Semaphore-Controlled Parallelism
Where: Article fetching (Phase 2)
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
await Task.WhenAll(fetchTasks); // Each task waits on semaphore
Rationale: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.
Configurable: Yes, via ParallelProcessingOptions.MaxConcurrentArticleFetches (compile-time constant currently).
Parallel.ForEachAsync with MaxDegreeOfParallelism
Where: Embedding batch processing (Phase 3)
await Parallel.ForEachAsync(
batchIndices,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
async (batchIndex, ct) => { ... }
);
Rationale: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.
Configurable: Yes, via ParallelProcessingOptions.MaxConcurrentEmbeddingRequests (compile-time).
Progress Reporting (Channel)
Where: All phases pass onProgress callback
Implementation:
StatusReporter.UpdateStatus()→ writes to channel- Background task reads channel and displays
- Non-blocking; callbacks are fire-and-forget (TryWrite)
Thread Safety: Channel is thread-safe; multiple phases may write concurrently.
Error Handling & Resilience
HTTP Errors
OpenRouterClient:
- Calls
response.EnsureSuccessStatusCode()→ throwsHttpRequestExceptionon 4xx/5xx - No retry (handled at higher level in
EmbeddingService)
SearxngClient:
- Returns empty
List<SearxngResult>on non-success - No exception thrown (searches are non-critical; if some queries fail, others proceed)
Retry Policy (Polly)
Location: EmbeddingService constructor
Scope: Only embedding requests (_client.EmbedAsync)
Policy:
- Max 3 attempts
- Exponential backoff: 1s, 2s, 4s
- Only retries
HttpRequestException(network errors, 429, 5xx)
Why not on chat completions?
- Query generation and final answer are critical; failures should surface immediately
- Could be added in future if transient failures are common
Graceful Degradation
Query Generation Failure:
try { ... generate queries ... }
catch (Exception ex)
{
// Fall back to original question as sole query
if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
}
Embedding Batch Failure:
catch
{
// Return empty embeddings for this batch (chunk will be filtered out)
var emptyBatch = new float[batch.Count][];
// fill with empty arrays
results.Add((batchIndex, emptyBatch));
}
Article Fetch Failure:
try { await ArticleService.FetchArticleAsync(url); }
catch (Exception ex)
{
if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
// Chunk not added; continue with others
}
User-Facing Errors
Top-level exception handler in Program.cs:
try { await openQuery.RunAsync(options); }
catch (HttpRequestException ex)
{
Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
Environment.Exit(1);
}
catch (Exception ex)
{
Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
Environment.Exit(1);
}
Cancellation Support
OpenRouterClient.StreamAsync and EmbeddingService methods accept CancellationToken.
Used in:
- Streaming answer (Ctrl+C stops immediately)
- Parallel embeddings (can be cancelled)
Performance Considerations
Latency Breakdown (Typical)
| Stage | Time | Description |
|---|---|---|
| Query generation | 2-5s | LLM generates 3-5 queries |
| Searches | 3-8s | 3-5 parallel SearxNG queries |
| Article fetching | 5-15s | 10-20 parallel fetches (network + parse) |
| Embeddings | 2-4s | 50-100 chunks in 4-parallel batches |
| Final answer | 5-20s | Depends on answer length (streaming) |
| Total | 15-50s | Varies widely based on network & content |
Bottlenecks
- Network I/O (article fetching, API calls) - can't be CPU-bound
- OpenRouter API latency - varies by model and load
- SmartReader parsing - CPU-bound for large HTML
- Embedding API rate - OpenRouter may rate limit if too many concurrent
Optimization Strategies
- Parallelism: Already maximized within API constraints
- Caching: Not implemented; future enhancement could cache embeddings per URL
- Batching: 300-chunk batches reduce API overhead
- AOT: Native compilation reduces startup overhead vs JIT
Scalability Limits
- Memory: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
- API Limits: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
- SearxNG Limits: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).
Design Decisions
Why Not Use a DI Container?
OpenQuery manually wires dependencies in Program.cs. For a small CLI app, this is:
- Simpler (no container configuration)
- More explicit (easy to trace dependency graph)
- No runtime overhead
- Easier to understand for contributors
Container-based DI would be overengineering.
Why AOT?
.NET 10 AOT provides:
- Fast startup (<100ms vs ~500ms JIT)
- Smaller footprint (trimmed, no JIT)
- No runtime dependencies (self-contained)
- Better for CLI tools distributed to users
Trade-offs:
- Longer build time
- Some reflection-based APIs not supported (not needed here)
- Less flexible (can't load dynamic assemblies, but not needed)
Why SmartReader for Article Extraction?
SmartReader uses Readability-based algorithm similar to Firefox Reader View:
- Removes ads, navigation, comments, boilerplate
- Extracts main article content
- Handles malformed HTML gracefully
- Zero dependencies (pure .NET)
Alternatives considered:
HtmlAgilityPack(too low-level, need to implement extraction logic)AngleSharp(similar, still need extraction)- External services (like diffbot) - require API keys, costs money
SmartReader is the sweet spot: free, good quality, easy integration.
Why Embeddings + Cosine Similarity vs Full-Text Search?
Full-text search (like Lucene) would:
- Require inverted index, more complex
- Be faster for exact keyword matching
- Not understand semantic similarity
Embeddings provide:
- Semantic similarity (understand meaning, not just keywords)
- Simple math (cosine similarity of float arrays)
- No index to maintain (just compute on-the-fly)
Trade-off: Embedding API cost and latency vs LRU cache potential.
Why Not RAG (Retrieval Augmented Generation) Framework?
OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:
- Add dependency bloat
- Reduce control
- Increase abstraction complexity
Custom implementation is ~1000 LOC and perfectly matches needs.
Why System.CommandLine?
Provides:
- Native-like CLI help (
openquery --help) - Strongly-typed options
- Command hierarchy (main + subcommands)
- Good error messages
Alternative: CommandLineParser (older) or manual parsing. System.CommandLine is modern and actively developed.
Next Steps
- Components - Deep dive into each module
- API Reference - Complete command documentation
- Troubleshooting - Debug issues
Want to extend OpenQuery? Check the Components guide to understand each piece.