OpenQuery/docs/architecture.md

# Architecture

Deep dive into OpenQuery's system design, architectural patterns, and data flow.

## 📋 Table of Contents

1. [System Overview](#system-overview)
2. [Architectural Patterns](#architectural-patterns)
3. [Component Architecture](#component-architecture)
4. [Data Flow](#data-flow)
5. [Concurrency Model](#concurrency-model)
6. [Error Handling & Resilience](#error-handling--resilience)
7. [Performance Considerations](#performance-considerations)
8. [Design Decisions](#design-decisions)

## System Overview

OpenQuery is a **pipeline-based AI application** that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.

### Core Design Principles

1. **Separation of Concerns** - Each component has a single, well-defined responsibility
2. **Parallel First** - Wherever possible, operations are parallelized for speed
3. **Resilient by Default** - Built-in retries, rate limiting, and graceful degradation
4. **Configurable** - Most parameters can be adjusted without code changes
5. **Observable** - Progress reporting and verbose mode for debugging

### High-Level Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                         User Layer                          │
│  CLI (System.CommandLine) → OpenQueryApp                   │
└─────────────────────────────┬───────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Orchestration Layer                      │
│  OpenQueryApp → SearchTool (4-phase pipeline)              │
└─────────────────────────────┬───────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌──────────────┐   ┌──────────────────┐   ┌──────────────┐
│ Search Layer │   │ Processing Layer │   │ AI Layer     │
│              │   │                  │   │              │
│ SearxngClient│   │ ArticleService   │   │OpenRouterClient│
│              │   │ ChunkingService  │   │              │
│              │   │ EmbeddingService │   │              │
└──────────────┘   └──────────────────┘   └──────────────┘
```

## Architectural Patterns

### 1. Pipeline Pattern

The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:

```csharp
Phase 1: ExecuteParallelSearchesAsync
   ↓ (List<SearxngResult>)
Phase 2: ExecuteParallelArticleFetchingAsync
   ↓ (List<Chunk>)
Phase 3: ExecuteParallelEmbeddingsAsync
   ↓ ((queryEmbedding, chunkEmbeddings))
Phase 4: RankAndSelectTopChunks
   ↓ (List<Chunk> topChunks)
→ Formatted context string returned
```

Each phase:
- Accepts input from previous phase
- Processes in parallel where applicable
- Returns output to next phase
- Reports progress via callbacks

### 2. Service Layer Pattern

Services (`Services/` directory) are stateless classes that encapsulate specific operations:

- **Clients**: `OpenRouterClient`, `SearxngClient` (HTTP communication)
- **Processors**: `EmbeddingService`, `ChunkingService` (data transformation)
- **Extractors**: `ArticleService` (content extraction)
- **Infrastructure**: `RateLimiter`, `StatusReporter` (cross-cutting concerns)

All dependencies are explicit (constructor injection), making services easily testable.

### 3. Dependency Injection (Manual)

While not using a DI container, OpenQuery follows DI principles:

```csharp
// Program.cs: instantiate dependencies with explicit parameters
var client = new OpenRouterClient(apiKey);
var searxngClient = new SearxngClient(searxngUrl);
var embeddingService = new EmbeddingService(client);
var searchTool = new SearchTool(searxngClient, embeddingService);
var openQuery = new OpenQueryApp(client, searchTool, model);
```

Benefits:
- Clear dependency graph
- Easy to substitute mocks for testing
- No magic; visible construction

### 4. Observer Pattern (Progress Reporting)

`StatusReporter` and progress callbacks implement observer pattern:

```csharp
// SearchTool receives a progress callback
public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)

// Components invoke callback at key milestones
onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");

// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);
```

### 5. Resilience Patterns (Polly)

`EmbeddingService` uses Polly's retry policy:

```csharp
_retryPipeline = new ResiliencePipelineBuilder()
    .AddRetry(new RetryStrategyOptions
    {
        MaxRetryAttempts = 3,
        Delay = TimeSpan.FromSeconds(1),
        BackoffType = DelayBackoffType.Exponential,
        ShouldHandle = new PredicateBuilder()
            .Handle<HttpRequestException>()
    })
    .Build();
```

This automatically retries failed embedding requests with exponential backoff.

### 6. Producer-Consumer Pattern (Channel-based)

`StatusReporter` uses `System.Threading.Channels.Channel<string>` for asynchronous progress updates:

- Producer: `UpdateStatus()` writes messages to channel
- Consumer: Background task `ProcessStatusUpdatesAsync()` reads and displays
- Benefit: No blocking between progress generation and display

### 7. Disposable Pattern

Components that hold unmanaged resources implement `IDisposable` or `IAsyncDisposable`:

- `StatusReporter` - stops background spinner task
- `RateLimiter` - disposes semaphore

Used via `using` statements for deterministic cleanup.

## Component Architecture

### OpenQueryApp (OpenQuery.cs)

**Role**: Main orchestrator; coordinates the entire workflow

**Responsibilities**:
- Parse CLI options into `OpenQueryOptions`
- Load configuration and resolve API keys/models
- Optionally generate expanded search queries via LLM
- Invoke `SearchTool` with progress callbacks
- Stream final answer from LLM

**Key Methods**:
- `RunAsync(OpenQueryOptions)` - Main entry point

**Interactions**:
- Instantiates `OpenRouterClient` (for both query gen and final answer)
- Instantiates `SearxngClient` (passed to `SearchTool`)
- Instantiates `EmbeddingService` (passed to `SearchTool`)
- Instantiates `SearchTool` (orchestration)
- `StatusReporter` for UI updates

### SearchTool (Tools/SearchTool.cs)

**Role**: Core search-retrieve-rank pipeline orchestrator

**Responsibilities**:
- Execute 4-phase pipeline (search → fetch → embed → rank)
- Manage concurrency limits (via semaphores)
- Coordinate parallel operations
- Generate context string for final answer

**Interactions**:
- Uses `SearxngClient` for Phase 1
- Uses `ArticleService` + `ChunkingService` for Phase 2
- Uses `EmbeddingService` for Phase 3
- Has no external UI dependency (pure logic)

**Parallelization Strategy**:
- **Phase 1**: `Task.WhenAll` on search tasks (unbounded but limited by SearxNG instance)
- **Phase 2**: Semaphore (max 10 concurrent fetches)
- **Phase 3**: `Parallel.ForEachAsync` (max 4 concurrent embedding batches)

### EmbeddingService (Services/EmbeddingService.cs)

**Role**: Generate vector embeddings with batching, rate limiting, and retries

**Responsibilities**:
- Batch embedding requests (default: 300 per batch)
- Parallelize batches (default: 4 concurrent)
- Apply rate limiting (via `RateLimiter`)
- Retry failed requests (Polly)
- Calculate cosine similarity

**Key Methods**:
- `GetEmbeddingsAsync(List<string> texts, ...)` - batch with progress
- `GetEmbeddingAsync(string text)` - single embedding
- `CosineSimilarity(float[], float[])` - static vector math

**Design Notes**:
- Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
- Batches of 300 reduce API overhead
- Polly retry handles transient failures (429, 500, network blips)

### OpenRouterClient (Services/OpenRouterClient.cs)

**Role**: HTTP client for OpenRouter API (completions + embeddings)

**Responsibilities**:
- Serialize requests to JSON (source-generated)
- Send HTTP with authorization header
- Stream responses for chat completions (IAsyncEnumerable)
- Return full responses for non-streaming
- Throw on non-2xx status codes

**Endpoints**:
- POST `/chat/completions` (stream and non-stream)
- POST `/embeddings`

**Configuration**:
- Base URL: `https://openrouter.ai/api/v1`
- Headers: `Authorization: Bearer {apiKey}`, `Accept: application/json`

**Design**:
- Low-level client; no retry logic (retry is in `EmbeddingService`)
- Thin wrapper around `HttpClient`
- Could be replaced with `HttpClientFactory` in larger apps

### SearxngClient (Services/SearxngClient.cs)

**Role**: HTTP client for SearxNG metasearch

**Responsibilities**:
- Construct search URL with query param
- GET request and deserialize JSON
- Limit results (`.Take(limit)`)
- Return empty list on failure (no exceptions)

**Endpoint**: `GET /search?q={query}&format=json`

**Design**:
- Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
- `DistinctBy(r => r.Url)` deduplication happens upstream

### ArticleService (Services/ArticleService.cs)

**Role**: Extract clean article content from URLs

**Responsibilities**:
- Call `SmartReader.ParseArticleAsync(url)`
- Return `Article` object with `Title`, `TextContent`, `IsReadable`

**Design**:
- Single responsibility: extraction only (no fetching, no chunking)
- SmartReader handles all complexity (HTML parsing, boilerplate removal)
- Exceptions propagate to `SearchTool` (handled there)

### ChunkingService (Services/ChunkingService.cs)

**Role**: Split long text into 500-char chunks at natural boundaries

**Algorithm**:
1. Start at index 0
2. Take up to 500 chars
3. If not at end, backtrack to last space/newline/period
4. Add chunk, advance start
5. Repeat until done

**Design**:
- Static class (stateless utility)
- No dependencies
- Pure function (input text → output chunks)

### RateLimiter (Services/RateLimiter.cs)

**Role**: Limit concurrent operations via semaphore

**Responsibilities**:
- Wrap actions with semaphore acquisition
- Support both sync and async actions
- Implement `IAsyncDisposable` for cleanup

**Usage Pattern**:
```csharp
await _rateLimiter.ExecuteAsync(async () =>
{
    // operation limited by semaphore
    return await SomeApiCall();
});
```

**Design**:
- Generic wrapper (can return TResult)
- `SemaphoreSlim` initialized at construction
- Used in `EmbeddingService` for parallel embedding batches

### StatusReporter (Services/StatusReporter.cs)

**Role**: Show real-time progress with spinner or verbose mode

**Responsibilities**:
- Maintain spinner animation (background task)
- Receive status updates via channel
- Display updates with appropriate formatting
- Stop spinner on completion

**Features**:
- **Spinner mode** (non-verbose): `⠋ Fetching...` with animated Braille chars
- **Verbose mode**: `[Fetching article 1/10: example.com]` on separate lines
- **Thread-safe**: Channel is safe for concurrent writes
- **Non-blocking**: Background spinner doesn't block updates

**Design**:
- `Channel<string>` for asynchronous producer-consumer
- Background task (`_statusProcessor`) reads from channel
- Spinner runs on its own task with 100ms delay per frame
- `IDisposable` ensures proper cleanup

### ConfigManager (ConfigManager.cs)

**Role**: Load and save configuration from/to file

**Responsibilities**:
- Get config path (XDG: `~/.config/openquery/config`)
- Parse key-value pairs (no INI library, manual parsing)
- Provide `AppConfig` object with defaults
- Save settings back to file

**Design**:
- Static class (no instances)
- Creates config directory if missing
- Line-by-line parsing (simple, no dependencies)
- Could be improved with proper INI parser or JSON

## Data Flow

### End-to-End Data Flow

```
User: "What is quantum entanglement?"

1. OpenQueryOptions created
   { Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }

2. Query Generation (if Queries > 1)
   → ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
   → Deserialize to List<string> (generatedQueries)

3. Search Phase
   generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
   → DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)

4. Fetch Phase
   searchResults → Parallel.ArticleService.FetchArticleAsync → Article
   → ChunkingService.ChunkText (split into ~500-char pieces)
   → ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)

5. Embedding Phase
   originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
   chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)

6. Ranking Phase
   For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
   OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)

7. Answer Phase
   context = string.Join("\n\n", topChunks.Select(...))
   → ChatCompletionRequest to OpenRouter with context + question
   → StreamAsync → Console.Write(delta) (real-time display)

Result: User sees answer with [Source N] citations
```

### Data Structures

**Chunk** - The core data structure flowing through the pipeline:
```csharp
public record Chunk(
    string Content,      // Text content (~500 chars)
    string SourceUrl,    // Where it came from
    string? Title = null // Article title
)
{
    public float[]? Embedding { get; set; }  // Added in Phase 3
    public float Score { get; set; }         // Added in Phase 4
}
```

**Data Flow State**:
- Phase 1-2: `Chunk` without embedding
- Phase 3: `Chunk.Embedding` populated
- Phase 4: `Chunk.Score` populated
- Phase 5: Serialized into context string

### Memory Footprint

**Per 15-article run (approximate)**:
- Raw HTML (fetched): ~5MB (transient, discarded after extract)
- Articles: ~500KB (15 articles × ~30KB extracted text)
- Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
- Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
- Total peak: ~1-2MB (excluding OpenRouter's memory usage)

**Note**: AOT compilation reduces runtime memory compared to JIT.

## Concurrency Model

OpenQuery uses multiple parallelization strategies:

### Unbounded Parallelism (Task.WhenAll)

**Where**: Search queries (Phase 1)

```csharp
var searchTasks = generatedQueries.Select(async query => { ... });
await Task.WhenAll(searchTasks);
```

**Rationale**: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.

**Risk**: Could overwhelm SearxNG if `--queries` is set very high (100+). Default 3 is safe.

### Semaphore-Controlled Parallelism

**Where**: Article fetching (Phase 2)

```csharp
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
await Task.WhenAll(fetchTasks); // Each task waits on semaphore
```

**Rationale**: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.

**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentArticleFetches` (compile-time constant currently).

### Parallel.ForEachAsync with MaxDegreeOfParallelism

**Where**: Embedding batch processing (Phase 3)

```csharp
await Parallel.ForEachAsync(
    batchIndices,
    new ParallelOptions { MaxDegreeOfParallelism = 4 },
    async (batchIndex, ct) => { ... }
);
```

**Rationale**: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.

**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentEmbeddingRequests` (compile-time).

### Progress Reporting (Channel)

**Where**: All phases pass `onProgress` callback

**Implementation**:
- `StatusReporter.UpdateStatus()` → writes to channel
- Background task reads channel and displays
- Non-blocking; callbacks are fire-and-forget (TryWrite)

**Thread Safety**: Channel is thread-safe; multiple phases may write concurrently.

## Error Handling & Resilience

### HTTP Errors

**OpenRouterClient**:
- Calls `response.EnsureSuccessStatusCode()` → throws `HttpRequestException` on 4xx/5xx
- No retry (handled at higher level in `EmbeddingService`)

**SearxngClient**:
- Returns empty `List<SearxngResult>` on non-success
- No exception thrown (searches are non-critical; if some queries fail, others proceed)

### Retry Policy (Polly)

**Location**: `EmbeddingService` constructor

**Scope**: Only embedding requests (`_client.EmbedAsync`)

**Policy**:
- Max 3 attempts
- Exponential backoff: 1s, 2s, 4s
- Only retries `HttpRequestException` (network errors, 429, 5xx)

**Why not on chat completions?**
- Query generation and final answer are critical; failures should surface immediately
- Could be added in future if transient failures are common

### Graceful Degradation

**Query Generation Failure**:
```csharp
try { ... generate queries ... }
catch (Exception ex)
{
    // Fall back to original question as sole query
    if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
}
```

**Embedding Batch Failure**:
```csharp
catch
{
    // Return empty embeddings for this batch (chunk will be filtered out)
    var emptyBatch = new float[batch.Count][];
    // fill with empty arrays
    results.Add((batchIndex, emptyBatch));
}
```

**Article Fetch Failure**:
```csharp
try { await ArticleService.FetchArticleAsync(url); }
catch (Exception ex)
{
    if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
    // Chunk not added; continue with others
}
```

### User-Facing Errors

Top-level exception handler in `Program.cs`:
```csharp
try { await openQuery.RunAsync(options); }
catch (HttpRequestException ex)
{
    Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
    Environment.Exit(1);
}
catch (Exception ex)
{
    Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
    Environment.Exit(1);
}
```

### Cancellation Support

`OpenRouterClient.StreamAsync` and `EmbeddingService` methods accept `CancellationToken`.

Used in:
- Streaming answer (Ctrl+C stops immediately)
- Parallel embeddings (can be cancelled)

## Performance Considerations

### Latency Breakdown (Typical)

| Stage | Time | Description |
|-------|------|-------------|
| Query generation | 2-5s | LLM generates 3-5 queries |
| Searches | 3-8s | 3-5 parallel SearxNG queries |
| Article fetching | 5-15s | 10-20 parallel fetches (network + parse) |
| Embeddings | 2-4s | 50-100 chunks in 4-parallel batches |
| Final answer | 5-20s | Depends on answer length (streaming) |
| **Total** | **15-50s** | Varies widely based on network & content |

### Bottlenecks

1. **Network I/O** (article fetching, API calls) - can't be CPU-bound
2. **OpenRouter API latency** - varies by model and load
3. **SmartReader parsing** - CPU-bound for large HTML
4. **Embedding API rate** - OpenRouter may rate limit if too many concurrent

### Optimization Strategies

- **Parallelism**: Already maximized within API constraints
- **Caching**: Not implemented; future enhancement could cache embeddings per URL
- **Batching**: 300-chunk batches reduce API overhead
- **AOT**: Native compilation reduces startup overhead vs JIT

### Scalability Limits

- **Memory**: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
- **API Limits**: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
- **SearxNG Limits**: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).

## Design Decisions

### Why Not Use a DI Container?

OpenQuery manually wires dependencies in `Program.cs`. For a small CLI app, this is:
- Simpler (no container configuration)
- More explicit (easy to trace dependency graph)
- No runtime overhead
- Easier to understand for contributors

Container-based DI would be overengineering.

### Why AOT?

.NET 10 AOT provides:
- **Fast startup** (<100ms vs ~500ms JIT)
- **Smaller footprint** (trimmed, no JIT)
- **No runtime dependencies** (self-contained)
- Better for CLI tools distributed to users

Trade-offs:
- Longer build time
- Some reflection-based APIs not supported (not needed here)
- Less flexible (can't load dynamic assemblies, but not needed)

### Why SmartReader for Article Extraction?

SmartReader uses Readability-based algorithm similar to Firefox Reader View:
- Removes ads, navigation, comments, boilerplate
- Extracts main article content
- Handles malformed HTML gracefully
- Zero dependencies (pure .NET)

Alternatives considered:
- `HtmlAgilityPack` (too low-level, need to implement extraction logic)
- `AngleSharp` (similar, still need extraction)
- External services (like diffbot) - require API keys, costs money

SmartReader is the sweet spot: free, good quality, easy integration.

### Why Embeddings + Cosine Similarity vs Full-Text Search?

Full-text search (like Lucene) would:
- Require inverted index, more complex
- Be faster for exact keyword matching
- Not understand semantic similarity

Embeddings provide:
- Semantic similarity (understand meaning, not just keywords)
- Simple math (cosine similarity of float arrays)
- No index to maintain (just compute on-the-fly)

Trade-off: Embedding API cost and latency vs LRU cache potential.

### Why Not RAG (Retrieval Augmented Generation) Framework?

OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:
- Add dependency bloat
- Reduce control
- Increase abstraction complexity

Custom implementation is ~1000 LOC and perfectly matches needs.

### Why System.CommandLine?

Provides:
- Native-like CLI help (`openquery --help`)
- Strongly-typed options
- Command hierarchy (main + subcommands)
- Good error messages

Alternative: `CommandLineParser` (older) or manual parsing. System.CommandLine is modern and actively developed.

---

## Next Steps

- [Components](components/overview.md) - Deep dive into each module
- [API Reference](api/cli.md) - Complete command documentation
- [Troubleshooting](troubleshooting.md) - Debug issues

---

**Want to extend OpenQuery?** Check the [Components](components/overview.md) guide to understand each piece.