Files
OpenQuery/docs/architecture.md
OpenQuery Documentation 65ca2401ae docs: add comprehensive documentation with README and detailed guides
- Add user-friendly README.md with quick start guide
- Create docs/ folder with structured technical documentation:
  - installation.md: Build and setup instructions
  - configuration.md: Complete config reference
  - usage.md: CLI usage guide with examples
  - architecture.md: System design and patterns
  - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models)
  - api/: CLI reference, environment variables, programmatic API
  - troubleshooting.md: Common issues and solutions
  - performance.md: Latency, throughput, and optimization
- All documentation fully cross-referenced with internal links
- Covers project overview, architecture, components, APIs, and support

See individual files for complete documentation.
2026-03-19 10:01:58 +01:00

23 KiB
Raw Permalink Blame History

Architecture

Deep dive into OpenQuery's system design, architectural patterns, and data flow.

📋 Table of Contents

  1. System Overview
  2. Architectural Patterns
  3. Component Architecture
  4. Data Flow
  5. Concurrency Model
  6. Error Handling & Resilience
  7. Performance Considerations
  8. Design Decisions

System Overview

OpenQuery is a pipeline-based AI application that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.

Core Design Principles

  1. Separation of Concerns - Each component has a single, well-defined responsibility
  2. Parallel First - Wherever possible, operations are parallelized for speed
  3. Resilient by Default - Built-in retries, rate limiting, and graceful degradation
  4. Configurable - Most parameters can be adjusted without code changes
  5. Observable - Progress reporting and verbose mode for debugging

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                         User Layer                          │
│  CLI (System.CommandLine) → OpenQueryApp                   │
└─────────────────────────────┬───────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Orchestration Layer                      │
│  OpenQueryApp → SearchTool (4-phase pipeline)              │
└─────────────────────────────┬───────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌──────────────┐   ┌──────────────────┐   ┌──────────────┐
│ Search Layer │   │ Processing Layer │   │ AI Layer     │
│              │   │                  │   │              │
│ SearxngClient│   │ ArticleService   │   │OpenRouterClient│
│              │   │ ChunkingService  │   │              │
│              │   │ EmbeddingService │   │              │
└──────────────┘   └──────────────────┘   └──────────────┘

Architectural Patterns

1. Pipeline Pattern

The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:

Phase 1: ExecuteParallelSearchesAsync
    (List<SearxngResult>)
Phase 2: ExecuteParallelArticleFetchingAsync
    (List<Chunk>)
Phase 3: ExecuteParallelEmbeddingsAsync
    ((queryEmbedding, chunkEmbeddings))
Phase 4: RankAndSelectTopChunks
    (List<Chunk> topChunks)
 Formatted context string returned

Each phase:

  • Accepts input from previous phase
  • Processes in parallel where applicable
  • Returns output to next phase
  • Reports progress via callbacks

2. Service Layer Pattern

Services (Services/ directory) are stateless classes that encapsulate specific operations:

  • Clients: OpenRouterClient, SearxngClient (HTTP communication)
  • Processors: EmbeddingService, ChunkingService (data transformation)
  • Extractors: ArticleService (content extraction)
  • Infrastructure: RateLimiter, StatusReporter (cross-cutting concerns)

All dependencies are explicit (constructor injection), making services easily testable.

3. Dependency Injection (Manual)

While not using a DI container, OpenQuery follows DI principles:

// Program.cs: instantiate dependencies with explicit parameters
var client = new OpenRouterClient(apiKey);
var searxngClient = new SearxngClient(searxngUrl);
var embeddingService = new EmbeddingService(client);
var searchTool = new SearchTool(searxngClient, embeddingService);
var openQuery = new OpenQueryApp(client, searchTool, model);

Benefits:

  • Clear dependency graph
  • Easy to substitute mocks for testing
  • No magic; visible construction

4. Observer Pattern (Progress Reporting)

StatusReporter and progress callbacks implement observer pattern:

// SearchTool receives a progress callback
public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)

// Components invoke callback at key milestones
onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");

// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);

5. Resilience Patterns (Polly)

EmbeddingService uses Polly's retry policy:

_retryPipeline = new ResiliencePipelineBuilder()
    .AddRetry(new RetryStrategyOptions
    {
        MaxRetryAttempts = 3,
        Delay = TimeSpan.FromSeconds(1),
        BackoffType = DelayBackoffType.Exponential,
        ShouldHandle = new PredicateBuilder()
            .Handle<HttpRequestException>()
    })
    .Build();

This automatically retries failed embedding requests with exponential backoff.

6. Producer-Consumer Pattern (Channel-based)

StatusReporter uses System.Threading.Channels.Channel<string> for asynchronous progress updates:

  • Producer: UpdateStatus() writes messages to channel
  • Consumer: Background task ProcessStatusUpdatesAsync() reads and displays
  • Benefit: No blocking between progress generation and display

7. Disposable Pattern

Components that hold unmanaged resources implement IDisposable or IAsyncDisposable:

  • StatusReporter - stops background spinner task
  • RateLimiter - disposes semaphore

Used via using statements for deterministic cleanup.

Component Architecture

OpenQueryApp (OpenQuery.cs)

Role: Main orchestrator; coordinates the entire workflow

Responsibilities:

  • Parse CLI options into OpenQueryOptions
  • Load configuration and resolve API keys/models
  • Optionally generate expanded search queries via LLM
  • Invoke SearchTool with progress callbacks
  • Stream final answer from LLM

Key Methods:

  • RunAsync(OpenQueryOptions) - Main entry point

Interactions:

  • Instantiates OpenRouterClient (for both query gen and final answer)
  • Instantiates SearxngClient (passed to SearchTool)
  • Instantiates EmbeddingService (passed to SearchTool)
  • Instantiates SearchTool (orchestration)
  • StatusReporter for UI updates

SearchTool (Tools/SearchTool.cs)

Role: Core search-retrieve-rank pipeline orchestrator

Responsibilities:

  • Execute 4-phase pipeline (search → fetch → embed → rank)
  • Manage concurrency limits (via semaphores)
  • Coordinate parallel operations
  • Generate context string for final answer

Interactions:

  • Uses SearxngClient for Phase 1
  • Uses ArticleService + ChunkingService for Phase 2
  • Uses EmbeddingService for Phase 3
  • Has no external UI dependency (pure logic)

Parallelization Strategy:

  • Phase 1: Task.WhenAll on search tasks (unbounded but limited by SearxNG instance)
  • Phase 2: Semaphore (max 10 concurrent fetches)
  • Phase 3: Parallel.ForEachAsync (max 4 concurrent embedding batches)

EmbeddingService (Services/EmbeddingService.cs)

Role: Generate vector embeddings with batching, rate limiting, and retries

Responsibilities:

  • Batch embedding requests (default: 300 per batch)
  • Parallelize batches (default: 4 concurrent)
  • Apply rate limiting (via RateLimiter)
  • Retry failed requests (Polly)
  • Calculate cosine similarity

Key Methods:

  • GetEmbeddingsAsync(List<string> texts, ...) - batch with progress
  • GetEmbeddingAsync(string text) - single embedding
  • CosineSimilarity(float[], float[]) - static vector math

Design Notes:

  • Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
  • Batches of 300 reduce API overhead
  • Polly retry handles transient failures (429, 500, network blips)

OpenRouterClient (Services/OpenRouterClient.cs)

Role: HTTP client for OpenRouter API (completions + embeddings)

Responsibilities:

  • Serialize requests to JSON (source-generated)
  • Send HTTP with authorization header
  • Stream responses for chat completions (IAsyncEnumerable)
  • Return full responses for non-streaming
  • Throw on non-2xx status codes

Endpoints:

  • POST /chat/completions (stream and non-stream)
  • POST /embeddings

Configuration:

  • Base URL: https://openrouter.ai/api/v1
  • Headers: Authorization: Bearer {apiKey}, Accept: application/json

Design:

  • Low-level client; no retry logic (retry is in EmbeddingService)
  • Thin wrapper around HttpClient
  • Could be replaced with HttpClientFactory in larger apps

SearxngClient (Services/SearxngClient.cs)

Role: HTTP client for SearxNG metasearch

Responsibilities:

  • Construct search URL with query param
  • GET request and deserialize JSON
  • Limit results (.Take(limit))
  • Return empty list on failure (no exceptions)

Endpoint: GET /search?q={query}&format=json

Design:

  • Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
  • DistinctBy(r => r.Url) deduplication happens upstream

ArticleService (Services/ArticleService.cs)

Role: Extract clean article content from URLs

Responsibilities:

  • Call SmartReader.ParseArticleAsync(url)
  • Return Article object with Title, TextContent, IsReadable

Design:

  • Single responsibility: extraction only (no fetching, no chunking)
  • SmartReader handles all complexity (HTML parsing, boilerplate removal)
  • Exceptions propagate to SearchTool (handled there)

ChunkingService (Services/ChunkingService.cs)

Role: Split long text into 500-char chunks at natural boundaries

Algorithm:

  1. Start at index 0
  2. Take up to 500 chars
  3. If not at end, backtrack to last space/newline/period
  4. Add chunk, advance start
  5. Repeat until done

Design:

  • Static class (stateless utility)
  • No dependencies
  • Pure function (input text → output chunks)

RateLimiter (Services/RateLimiter.cs)

Role: Limit concurrent operations via semaphore

Responsibilities:

  • Wrap actions with semaphore acquisition
  • Support both sync and async actions
  • Implement IAsyncDisposable for cleanup

Usage Pattern:

await _rateLimiter.ExecuteAsync(async () =>
{
    // operation limited by semaphore
    return await SomeApiCall();
});

Design:

  • Generic wrapper (can return TResult)
  • SemaphoreSlim initialized at construction
  • Used in EmbeddingService for parallel embedding batches

StatusReporter (Services/StatusReporter.cs)

Role: Show real-time progress with spinner or verbose mode

Responsibilities:

  • Maintain spinner animation (background task)
  • Receive status updates via channel
  • Display updates with appropriate formatting
  • Stop spinner on completion

Features:

  • Spinner mode (non-verbose): ⠋ Fetching... with animated Braille chars
  • Verbose mode: [Fetching article 1/10: example.com] on separate lines
  • Thread-safe: Channel is safe for concurrent writes
  • Non-blocking: Background spinner doesn't block updates

Design:

  • Channel<string> for asynchronous producer-consumer
  • Background task (_statusProcessor) reads from channel
  • Spinner runs on its own task with 100ms delay per frame
  • IDisposable ensures proper cleanup

ConfigManager (ConfigManager.cs)

Role: Load and save configuration from/to file

Responsibilities:

  • Get config path (XDG: ~/.config/openquery/config)
  • Parse key-value pairs (no INI library, manual parsing)
  • Provide AppConfig object with defaults
  • Save settings back to file

Design:

  • Static class (no instances)
  • Creates config directory if missing
  • Line-by-line parsing (simple, no dependencies)
  • Could be improved with proper INI parser or JSON

Data Flow

End-to-End Data Flow

User: "What is quantum entanglement?"

1. OpenQueryOptions created
   { Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }

2. Query Generation (if Queries > 1)
   → ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
   → Deserialize to List<string> (generatedQueries)

3. Search Phase
   generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
   → DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)

4. Fetch Phase
   searchResults → Parallel.ArticleService.FetchArticleAsync → Article
   → ChunkingService.ChunkText (split into ~500-char pieces)
   → ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)

5. Embedding Phase
   originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
   chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)

6. Ranking Phase
   For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
   OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)

7. Answer Phase
   context = string.Join("\n\n", topChunks.Select(...)) 
   → ChatCompletionRequest to OpenRouter with context + question
   → StreamAsync → Console.Write(delta) (real-time display)

Result: User sees answer with [Source N] citations

Data Structures

Chunk - The core data structure flowing through the pipeline:

public record Chunk(
    string Content,      // Text content (~500 chars)
    string SourceUrl,    // Where it came from
    string? Title = null // Article title
)
{
    public float[]? Embedding { get; set; }  // Added in Phase 3
    public float Score { get; set; }         // Added in Phase 4
}

Data Flow State:

  • Phase 1-2: Chunk without embedding
  • Phase 3: Chunk.Embedding populated
  • Phase 4: Chunk.Score populated
  • Phase 5: Serialized into context string

Memory Footprint

Per 15-article run (approximate):

  • Raw HTML (fetched): ~5MB (transient, discarded after extract)
  • Articles: ~500KB (15 articles × ~30KB extracted text)
  • Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
  • Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
  • Total peak: ~1-2MB (excluding OpenRouter's memory usage)

Note: AOT compilation reduces runtime memory compared to JIT.

Concurrency Model

OpenQuery uses multiple parallelization strategies:

Unbounded Parallelism (Task.WhenAll)

Where: Search queries (Phase 1)

var searchTasks = generatedQueries.Select(async query => { ... });
await Task.WhenAll(searchTasks);

Rationale: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.

Risk: Could overwhelm SearxNG if --queries is set very high (100+). Default 3 is safe.

Semaphore-Controlled Parallelism

Where: Article fetching (Phase 2)

var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
await Task.WhenAll(fetchTasks); // Each task waits on semaphore

Rationale: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.

Configurable: Yes, via ParallelProcessingOptions.MaxConcurrentArticleFetches (compile-time constant currently).

Parallel.ForEachAsync with MaxDegreeOfParallelism

Where: Embedding batch processing (Phase 3)

await Parallel.ForEachAsync(
    batchIndices,
    new ParallelOptions { MaxDegreeOfParallelism = 4 },
    async (batchIndex, ct) => { ... }
);

Rationale: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.

Configurable: Yes, via ParallelProcessingOptions.MaxConcurrentEmbeddingRequests (compile-time).

Progress Reporting (Channel)

Where: All phases pass onProgress callback

Implementation:

  • StatusReporter.UpdateStatus() → writes to channel
  • Background task reads channel and displays
  • Non-blocking; callbacks are fire-and-forget (TryWrite)

Thread Safety: Channel is thread-safe; multiple phases may write concurrently.

Error Handling & Resilience

HTTP Errors

OpenRouterClient:

  • Calls response.EnsureSuccessStatusCode() → throws HttpRequestException on 4xx/5xx
  • No retry (handled at higher level in EmbeddingService)

SearxngClient:

  • Returns empty List<SearxngResult> on non-success
  • No exception thrown (searches are non-critical; if some queries fail, others proceed)

Retry Policy (Polly)

Location: EmbeddingService constructor

Scope: Only embedding requests (_client.EmbedAsync)

Policy:

  • Max 3 attempts
  • Exponential backoff: 1s, 2s, 4s
  • Only retries HttpRequestException (network errors, 429, 5xx)

Why not on chat completions?

  • Query generation and final answer are critical; failures should surface immediately
  • Could be added in future if transient failures are common

Graceful Degradation

Query Generation Failure:

try { ... generate queries ... }
catch (Exception ex)
{
    // Fall back to original question as sole query
    if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
}

Embedding Batch Failure:

catch
{
    // Return empty embeddings for this batch (chunk will be filtered out)
    var emptyBatch = new float[batch.Count][];
    // fill with empty arrays
    results.Add((batchIndex, emptyBatch));
}

Article Fetch Failure:

try { await ArticleService.FetchArticleAsync(url); }
catch (Exception ex)
{
    if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
    // Chunk not added; continue with others
}

User-Facing Errors

Top-level exception handler in Program.cs:

try { await openQuery.RunAsync(options); }
catch (HttpRequestException ex)
{
    Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
    Environment.Exit(1);
}
catch (Exception ex)
{
    Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
    Environment.Exit(1);
}

Cancellation Support

OpenRouterClient.StreamAsync and EmbeddingService methods accept CancellationToken.

Used in:

  • Streaming answer (Ctrl+C stops immediately)
  • Parallel embeddings (can be cancelled)

Performance Considerations

Latency Breakdown (Typical)

Stage Time Description
Query generation 2-5s LLM generates 3-5 queries
Searches 3-8s 3-5 parallel SearxNG queries
Article fetching 5-15s 10-20 parallel fetches (network + parse)
Embeddings 2-4s 50-100 chunks in 4-parallel batches
Final answer 5-20s Depends on answer length (streaming)
Total 15-50s Varies widely based on network & content

Bottlenecks

  1. Network I/O (article fetching, API calls) - can't be CPU-bound
  2. OpenRouter API latency - varies by model and load
  3. SmartReader parsing - CPU-bound for large HTML
  4. Embedding API rate - OpenRouter may rate limit if too many concurrent

Optimization Strategies

  • Parallelism: Already maximized within API constraints
  • Caching: Not implemented; future enhancement could cache embeddings per URL
  • Batching: 300-chunk batches reduce API overhead
  • AOT: Native compilation reduces startup overhead vs JIT

Scalability Limits

  • Memory: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
  • API Limits: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
  • SearxNG Limits: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).

Design Decisions

Why Not Use a DI Container?

OpenQuery manually wires dependencies in Program.cs. For a small CLI app, this is:

  • Simpler (no container configuration)
  • More explicit (easy to trace dependency graph)
  • No runtime overhead
  • Easier to understand for contributors

Container-based DI would be overengineering.

Why AOT?

.NET 10 AOT provides:

  • Fast startup (<100ms vs ~500ms JIT)
  • Smaller footprint (trimmed, no JIT)
  • No runtime dependencies (self-contained)
  • Better for CLI tools distributed to users

Trade-offs:

  • Longer build time
  • Some reflection-based APIs not supported (not needed here)
  • Less flexible (can't load dynamic assemblies, but not needed)

Why SmartReader for Article Extraction?

SmartReader uses Readability-based algorithm similar to Firefox Reader View:

  • Removes ads, navigation, comments, boilerplate
  • Extracts main article content
  • Handles malformed HTML gracefully
  • Zero dependencies (pure .NET)

Alternatives considered:

  • HtmlAgilityPack (too low-level, need to implement extraction logic)
  • AngleSharp (similar, still need extraction)
  • External services (like diffbot) - require API keys, costs money

SmartReader is the sweet spot: free, good quality, easy integration.

Full-text search (like Lucene) would:

  • Require inverted index, more complex
  • Be faster for exact keyword matching
  • Not understand semantic similarity

Embeddings provide:

  • Semantic similarity (understand meaning, not just keywords)
  • Simple math (cosine similarity of float arrays)
  • No index to maintain (just compute on-the-fly)

Trade-off: Embedding API cost and latency vs LRU cache potential.

Why Not RAG (Retrieval Augmented Generation) Framework?

OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:

  • Add dependency bloat
  • Reduce control
  • Increase abstraction complexity

Custom implementation is ~1000 LOC and perfectly matches needs.

Why System.CommandLine?

Provides:

  • Native-like CLI help (openquery --help)
  • Strongly-typed options
  • Command hierarchy (main + subcommands)
  • Good error messages

Alternative: CommandLineParser (older) or manual parsing. System.CommandLine is modern and actively developed.


Next Steps


Want to extend OpenQuery? Check the Components guide to understand each piece.