Files

OpenQuery Documentation 65ca2401ae docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide
- Create docs/ folder with structured technical documentation:
  - installation.md: Build and setup instructions
  - configuration.md: Complete config reference
  - usage.md: CLI usage guide with examples
  - architecture.md: System design and patterns
  - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models)
  - api/: CLI reference, environment variables, programmatic API
  - troubleshooting.md: Common issues and solutions
  - performance.md: Latency, throughput, and optimization
- All documentation fully cross-referenced with internal links
- Covers project overview, architecture, components, APIs, and support

See individual files for complete documentation.

2026-03-19 10:01:58 +01:00

23 KiB

Raw Permalink Blame History

Architecture

Deep dive into OpenQuery's system design, architectural patterns, and data flow.

📋 Table of Contents

System Overview
Architectural Patterns
Component Architecture
Data Flow
Concurrency Model
Error Handling & Resilience
Performance Considerations
Design Decisions

System Overview

OpenQuery is a pipeline-based AI application that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.

Core Design Principles

Separation of Concerns - Each component has a single, well-defined responsibility
Parallel First - Wherever possible, operations are parallelized for speed
Resilient by Default - Built-in retries, rate limiting, and graceful degradation
Configurable - Most parameters can be adjusted without code changes
Observable - Progress reporting and verbose mode for debugging

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                         User Layer                          │
│  CLI (System.CommandLine) → OpenQueryApp                   │
└─────────────────────────────┬───────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Orchestration Layer                      │
│  OpenQueryApp → SearchTool (4-phase pipeline)              │
└─────────────────────────────┬───────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌──────────────┐   ┌──────────────────┐   ┌──────────────┐
│ Search Layer │   │ Processing Layer │   │ AI Layer     │
│              │   │                  │   │              │
│ SearxngClient│   │ ArticleService   │   │OpenRouterClient│
│              │   │ ChunkingService  │   │              │
│              │   │ EmbeddingService │   │              │
└──────────────┘   └──────────────────┘   └──────────────┘

Architectural Patterns

1. Pipeline Pattern

The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:

Phase 1: ExecuteParallelSearchesAsync
   ↓ (List<SearxngResult>)
Phase 2: ExecuteParallelArticleFetchingAsync
   ↓ (List<Chunk>)
Phase 3: ExecuteParallelEmbeddingsAsync
   ↓ ((queryEmbedding, chunkEmbeddings))
Phase 4: RankAndSelectTopChunks
   ↓ (List<Chunk> topChunks)
→ Formatted context string returned

Each phase:

Accepts input from previous phase
Processes in parallel where applicable
Returns output to next phase
Reports progress via callbacks

2. Service Layer Pattern

Services (Services/ directory) are stateless classes that encapsulate specific operations:

Clients: OpenRouterClient, SearxngClient (HTTP communication)
Processors: EmbeddingService, ChunkingService (data transformation)
Extractors: ArticleService (content extraction)
Infrastructure: RateLimiter, StatusReporter (cross-cutting concerns)

All dependencies are explicit (constructor injection), making services easily testable.

3. Dependency Injection (Manual)

While not using a DI container, OpenQuery follows DI principles:

// Program.cs: instantiate dependencies with explicit parameters
var client = new OpenRouterClient(apiKey);
var searxngClient = new SearxngClient(searxngUrl);
var embeddingService = new EmbeddingService(client);
var searchTool = new SearchTool(searxngClient, embeddingService);
var openQuery = new OpenQueryApp(client, searchTool, model);

Benefits:

Clear dependency graph
Easy to substitute mocks for testing
No magic; visible construction

4. Observer Pattern (Progress Reporting)

StatusReporter and progress callbacks implement observer pattern:

// SearchTool receives a progress callback
public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)

// Components invoke callback at key milestones
onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");

// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);

5. Resilience Patterns (Polly)

EmbeddingService uses Polly's retry policy:

_retryPipeline = new ResiliencePipelineBuilder()
    .AddRetry(new RetryStrategyOptions
    {
        MaxRetryAttempts = 3,
        Delay = TimeSpan.FromSeconds(1),
        BackoffType = DelayBackoffType.Exponential,
        ShouldHandle = new PredicateBuilder()
            .Handle<HttpRequestException>()
    })
    .Build();

This automatically retries failed embedding requests with exponential backoff.

6. Producer-Consumer Pattern (Channel-based)

StatusReporter uses System.Threading.Channels.Channel<string> for asynchronous progress updates:

Producer: UpdateStatus() writes messages to channel
Consumer: Background task ProcessStatusUpdatesAsync() reads and displays
Benefit: No blocking between progress generation and display

7. Disposable Pattern

Components that hold unmanaged resources implement IDisposable or IAsyncDisposable:

StatusReporter - stops background spinner task
RateLimiter - disposes semaphore

Used via using statements for deterministic cleanup.

Component Architecture

OpenQueryApp (OpenQuery.cs)

Role: Main orchestrator; coordinates the entire workflow

Responsibilities:

Parse CLI options into OpenQueryOptions
Load configuration and resolve API keys/models
Optionally generate expanded search queries via LLM
Invoke SearchTool with progress callbacks
Stream final answer from LLM

Key Methods:

RunAsync(OpenQueryOptions) - Main entry point

Interactions:

Instantiates OpenRouterClient (for both query gen and final answer)
Instantiates SearxngClient (passed to SearchTool)
Instantiates EmbeddingService (passed to SearchTool)
Instantiates SearchTool (orchestration)
StatusReporter for UI updates

SearchTool (Tools/SearchTool.cs)

Role: Core search-retrieve-rank pipeline orchestrator

Responsibilities:

Execute 4-phase pipeline (search → fetch → embed → rank)
Manage concurrency limits (via semaphores)
Coordinate parallel operations
Generate context string for final answer

Interactions:

Uses SearxngClient for Phase 1
Uses ArticleService + ChunkingService for Phase 2
Uses EmbeddingService for Phase 3
Has no external UI dependency (pure logic)

Parallelization Strategy:

Phase 1: Task.WhenAll on search tasks (unbounded but limited by SearxNG instance)
Phase 2: Semaphore (max 10 concurrent fetches)
Phase 3: Parallel.ForEachAsync (max 4 concurrent embedding batches)

EmbeddingService (Services/EmbeddingService.cs)

Role: Generate vector embeddings with batching, rate limiting, and retries

Responsibilities:

Batch embedding requests (default: 300 per batch)
Parallelize batches (default: 4 concurrent)
Apply rate limiting (via RateLimiter)
Retry failed requests (Polly)
Calculate cosine similarity

Key Methods:

GetEmbeddingsAsync(List<string> texts, ...) - batch with progress
GetEmbeddingAsync(string text) - single embedding
CosineSimilarity(float[], float[]) - static vector math

Design Notes:

Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
Batches of 300 reduce API overhead
Polly retry handles transient failures (429, 500, network blips)

OpenRouterClient (Services/OpenRouterClient.cs)

Role: HTTP client for OpenRouter API (completions + embeddings)

Responsibilities:

Serialize requests to JSON (source-generated)
Send HTTP with authorization header
Stream responses for chat completions (IAsyncEnumerable)
Return full responses for non-streaming
Throw on non-2xx status codes

Endpoints:

POST /chat/completions (stream and non-stream)
POST /embeddings

Configuration:

Base URL: https://openrouter.ai/api/v1
Headers: Authorization: Bearer {apiKey}, Accept: application/json

Design:

Low-level client; no retry logic (retry is in EmbeddingService)
Thin wrapper around HttpClient
Could be replaced with HttpClientFactory in larger apps

SearxngClient (Services/SearxngClient.cs)

Role: HTTP client for SearxNG metasearch

Responsibilities:

Construct search URL with query param
GET request and deserialize JSON
Limit results (.Take(limit))
Return empty list on failure (no exceptions)

Endpoint: GET /search?q={query}&format=json

Design:

Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
DistinctBy(r => r.Url) deduplication happens upstream

ArticleService (Services/ArticleService.cs)

Role: Extract clean article content from URLs

Responsibilities:

Call SmartReader.ParseArticleAsync(url)
Return Article object with Title, TextContent, IsReadable

Design:

Single responsibility: extraction only (no fetching, no chunking)
SmartReader handles all complexity (HTML parsing, boilerplate removal)
Exceptions propagate to SearchTool (handled there)

ChunkingService (Services/ChunkingService.cs)

Role: Split long text into 500-char chunks at natural boundaries

Algorithm:

Start at index 0
Take up to 500 chars
If not at end, backtrack to last space/newline/period
Add chunk, advance start
Repeat until done

Design:

Static class (stateless utility)
No dependencies
Pure function (input text → output chunks)

RateLimiter (Services/RateLimiter.cs)

Role: Limit concurrent operations via semaphore

Responsibilities:

Wrap actions with semaphore acquisition
Support both sync and async actions
Implement IAsyncDisposable for cleanup

Usage Pattern:

await _rateLimiter.ExecuteAsync(async () =>
{
    // operation limited by semaphore
    return await SomeApiCall();
});

Design:

Generic wrapper (can return TResult)
SemaphoreSlim initialized at construction
Used in EmbeddingService for parallel embedding batches

StatusReporter (Services/StatusReporter.cs)

Role: Show real-time progress with spinner or verbose mode

Responsibilities:

Maintain spinner animation (background task)
Receive status updates via channel
Display updates with appropriate formatting
Stop spinner on completion

Features:

Spinner mode (non-verbose): ⠋ Fetching... with animated Braille chars
Verbose mode: [Fetching article 1/10: example.com] on separate lines
Thread-safe: Channel is safe for concurrent writes
Non-blocking: Background spinner doesn't block updates

Design:

Channel<string> for asynchronous producer-consumer
Background task (_statusProcessor) reads from channel
Spinner runs on its own task with 100ms delay per frame
IDisposable ensures proper cleanup

ConfigManager (ConfigManager.cs)

Role: Load and save configuration from/to file

Responsibilities:

Get config path (XDG: ~/.config/openquery/config)
Parse key-value pairs (no INI library, manual parsing)
Provide AppConfig object with defaults
Save settings back to file

Design:

Static class (no instances)
Creates config directory if missing
Line-by-line parsing (simple, no dependencies)
Could be improved with proper INI parser or JSON

Data Flow

End-to-End Data Flow

User: "What is quantum entanglement?"

1. OpenQueryOptions created
   { Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }

2. Query Generation (if Queries > 1)
   → ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
   → Deserialize to List<string> (generatedQueries)

3. Search Phase
   generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
   → DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)

4. Fetch Phase
   searchResults → Parallel.ArticleService.FetchArticleAsync → Article
   → ChunkingService.ChunkText (split into ~500-char pieces)
   → ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)

5. Embedding Phase
   originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
   chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)

6. Ranking Phase
   For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
   OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)

7. Answer Phase
   context = string.Join("\n\n", topChunks.Select(...)) 
   → ChatCompletionRequest to OpenRouter with context + question
   → StreamAsync → Console.Write(delta) (real-time display)

Result: User sees answer with [Source N] citations

Data Structures

Chunk - The core data structure flowing through the pipeline:

public record Chunk(
    string Content,      // Text content (~500 chars)
    string SourceUrl,    // Where it came from
    string? Title = null // Article title
)
{
    public float[]? Embedding { get; set; }  // Added in Phase 3
    public float Score { get; set; }         // Added in Phase 4
}

Data Flow State:

Phase 1-2: Chunk without embedding
Phase 3: Chunk.Embedding populated
Phase 4: Chunk.Score populated
Phase 5: Serialized into context string

Memory Footprint

Per 15-article run (approximate):

Raw HTML (fetched): ~5MB (transient, discarded after extract)
Articles: ~500KB (15 articles × ~30KB extracted text)
Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
Total peak: ~1-2MB (excluding OpenRouter's memory usage)

Note: AOT compilation reduces runtime memory compared to JIT.

Concurrency Model

OpenQuery uses multiple parallelization strategies:

Unbounded Parallelism (Task.WhenAll)

Where: Search queries (Phase 1)

var searchTasks = generatedQueries.Select(async query => { ... });
await Task.WhenAll(searchTasks);

Rationale: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.

Risk: Could overwhelm SearxNG if --queries is set very high (100+). Default 3 is safe.

Semaphore-Controlled Parallelism

Where: Article fetching (Phase 2)

var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
await Task.WhenAll(fetchTasks); // Each task waits on semaphore

Rationale: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.

Configurable: Yes, via ParallelProcessingOptions.MaxConcurrentArticleFetches (compile-time constant currently).

Parallel.ForEachAsync with MaxDegreeOfParallelism

Where: Embedding batch processing (Phase 3)

await Parallel.ForEachAsync(
    batchIndices,
    new ParallelOptions { MaxDegreeOfParallelism = 4 },
    async (batchIndex, ct) => { ... }
);

Rationale: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.

Configurable: Yes, via ParallelProcessingOptions.MaxConcurrentEmbeddingRequests (compile-time).

Progress Reporting (Channel)

Where: All phases pass onProgress callback

Implementation:

StatusReporter.UpdateStatus() → writes to channel
Background task reads channel and displays
Non-blocking; callbacks are fire-and-forget (TryWrite)

Thread Safety: Channel is thread-safe; multiple phases may write concurrently.

Error Handling & Resilience

HTTP Errors

OpenRouterClient:

Calls response.EnsureSuccessStatusCode() → throws HttpRequestException on 4xx/5xx
No retry (handled at higher level in EmbeddingService)

SearxngClient:

Returns empty List<SearxngResult> on non-success
No exception thrown (searches are non-critical; if some queries fail, others proceed)

Retry Policy (Polly)

Location: EmbeddingService constructor

Scope: Only embedding requests (_client.EmbedAsync)

Policy:

Max 3 attempts
Exponential backoff: 1s, 2s, 4s
Only retries HttpRequestException (network errors, 429, 5xx)

Why not on chat completions?

Query generation and final answer are critical; failures should surface immediately
Could be added in future if transient failures are common

Graceful Degradation

Query Generation Failure:

try { ... generate queries ... }
catch (Exception ex)
{
    // Fall back to original question as sole query
    if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
}

Embedding Batch Failure:

catch
{
    // Return empty embeddings for this batch (chunk will be filtered out)
    var emptyBatch = new float[batch.Count][];
    // fill with empty arrays
    results.Add((batchIndex, emptyBatch));
}

Article Fetch Failure:

try { await ArticleService.FetchArticleAsync(url); }
catch (Exception ex)
{
    if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
    // Chunk not added; continue with others
}

User-Facing Errors

Top-level exception handler in Program.cs:

try { await openQuery.RunAsync(options); }
catch (HttpRequestException ex)
{
    Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
    Environment.Exit(1);
}
catch (Exception ex)
{
    Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
    Environment.Exit(1);
}

Cancellation Support

OpenRouterClient.StreamAsync and EmbeddingService methods accept CancellationToken.

Used in:

Streaming answer (Ctrl+C stops immediately)
Parallel embeddings (can be cancelled)

Performance Considerations

Latency Breakdown (Typical)

Stage	Time	Description
Query generation	2-5s	LLM generates 3-5 queries
Searches	3-8s	3-5 parallel SearxNG queries
Article fetching	5-15s	10-20 parallel fetches (network + parse)
Embeddings	2-4s	50-100 chunks in 4-parallel batches
Final answer	5-20s	Depends on answer length (streaming)
Total	15-50s	Varies widely based on network & content

Bottlenecks

Network I/O (article fetching, API calls) - can't be CPU-bound
OpenRouter API latency - varies by model and load
SmartReader parsing - CPU-bound for large HTML
Embedding API rate - OpenRouter may rate limit if too many concurrent

Optimization Strategies

Parallelism: Already maximized within API constraints
Caching: Not implemented; future enhancement could cache embeddings per URL
Batching: 300-chunk batches reduce API overhead
AOT: Native compilation reduces startup overhead vs JIT

Scalability Limits

Memory: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
API Limits: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
SearxNG Limits: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).

Design Decisions

Why Not Use a DI Container?

OpenQuery manually wires dependencies in Program.cs. For a small CLI app, this is:

Simpler (no container configuration)
More explicit (easy to trace dependency graph)
No runtime overhead
Easier to understand for contributors

Container-based DI would be overengineering.

Why AOT?

.NET 10 AOT provides:

Fast startup (<100ms vs ~500ms JIT)
Smaller footprint (trimmed, no JIT)
No runtime dependencies (self-contained)
Better for CLI tools distributed to users

Trade-offs:

Longer build time
Some reflection-based APIs not supported (not needed here)
Less flexible (can't load dynamic assemblies, but not needed)

Why SmartReader for Article Extraction?

SmartReader uses Readability-based algorithm similar to Firefox Reader View:

Removes ads, navigation, comments, boilerplate
Extracts main article content
Handles malformed HTML gracefully
Zero dependencies (pure .NET)

Alternatives considered:

HtmlAgilityPack (too low-level, need to implement extraction logic)
AngleSharp (similar, still need extraction)
External services (like diffbot) - require API keys, costs money

SmartReader is the sweet spot: free, good quality, easy integration.

Why Embeddings + Cosine Similarity vs Full-Text Search?

Full-text search (like Lucene) would:

Require inverted index, more complex
Be faster for exact keyword matching
Not understand semantic similarity

Embeddings provide:

Semantic similarity (understand meaning, not just keywords)
Simple math (cosine similarity of float arrays)
No index to maintain (just compute on-the-fly)

Trade-off: Embedding API cost and latency vs LRU cache potential.

Why Not RAG (Retrieval Augmented Generation) Framework?

OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:

Add dependency bloat
Reduce control
Increase abstraction complexity

Custom implementation is ~1000 LOC and perfectly matches needs.

Why System.CommandLine?

Provides:

Native-like CLI help (openquery --help)
Strongly-typed options
Command hierarchy (main + subcommands)
Good error messages

Alternative: CommandLineParser (older) or manual parsing. System.CommandLine is modern and actively developed.

Next Steps

Components - Deep dive into each module
API Reference - Complete command documentation
Troubleshooting - Debug issues

Want to extend OpenQuery? Check the Components guide to understand each piece.

23 KiB Raw Permalink Blame History Unescape Escape

Architecture

📋 Table of Contents

System Overview

Core Design Principles

High-Level Architecture

Architectural Patterns

1. Pipeline Pattern

2. Service Layer Pattern

3. Dependency Injection (Manual)

4. Observer Pattern (Progress Reporting)

5. Resilience Patterns (Polly)

6. Producer-Consumer Pattern (Channel-based)

7. Disposable Pattern

Component Architecture

OpenQueryApp (OpenQuery.cs)

SearchTool (Tools/SearchTool.cs)

EmbeddingService (Services/EmbeddingService.cs)

OpenRouterClient (Services/OpenRouterClient.cs)

SearxngClient (Services/SearxngClient.cs)

ArticleService (Services/ArticleService.cs)

ChunkingService (Services/ChunkingService.cs)

RateLimiter (Services/RateLimiter.cs)

StatusReporter (Services/StatusReporter.cs)

ConfigManager (ConfigManager.cs)

Data Flow

End-to-End Data Flow

Data Structures

Memory Footprint

Concurrency Model

Unbounded Parallelism (Task.WhenAll)

Semaphore-Controlled Parallelism

Parallel.ForEachAsync with MaxDegreeOfParallelism

Progress Reporting (Channel)

Error Handling & Resilience

HTTP Errors

Retry Policy (Polly)

Graceful Degradation

User-Facing Errors

Cancellation Support

Performance Considerations

Latency Breakdown (Typical)

Bottlenecks

Optimization Strategies

Scalability Limits

Design Decisions

Why Not Use a DI Container?

Why AOT?

Why SmartReader for Article Extraction?

Why Embeddings + Cosine Similarity vs Full-Text Search?

Why Not RAG (Retrieval Augmented Generation) Framework?

Why System.CommandLine?

Next Steps

23 KiB

Raw Permalink Blame History