- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
683 lines
23 KiB
Markdown
683 lines
23 KiB
Markdown
# Architecture
|
||
|
||
Deep dive into OpenQuery's system design, architectural patterns, and data flow.
|
||
|
||
## 📋 Table of Contents
|
||
|
||
1. [System Overview](#system-overview)
|
||
2. [Architectural Patterns](#architectural-patterns)
|
||
3. [Component Architecture](#component-architecture)
|
||
4. [Data Flow](#data-flow)
|
||
5. [Concurrency Model](#concurrency-model)
|
||
6. [Error Handling & Resilience](#error-handling--resilience)
|
||
7. [Performance Considerations](#performance-considerations)
|
||
8. [Design Decisions](#design-decisions)
|
||
|
||
## System Overview
|
||
|
||
OpenQuery is a **pipeline-based AI application** that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.
|
||
|
||
### Core Design Principles
|
||
|
||
1. **Separation of Concerns** - Each component has a single, well-defined responsibility
|
||
2. **Parallel First** - Wherever possible, operations are parallelized for speed
|
||
3. **Resilient by Default** - Built-in retries, rate limiting, and graceful degradation
|
||
4. **Configurable** - Most parameters can be adjusted without code changes
|
||
5. **Observable** - Progress reporting and verbose mode for debugging
|
||
|
||
### High-Level Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ User Layer │
|
||
│ CLI (System.CommandLine) → OpenQueryApp │
|
||
└─────────────────────────────┬───────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ Orchestration Layer │
|
||
│ OpenQueryApp → SearchTool (4-phase pipeline) │
|
||
└─────────────────────────────┬───────────────────────────────┘
|
||
│
|
||
┌─────────────────────┼─────────────────────┐
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
|
||
│ Search Layer │ │ Processing Layer │ │ AI Layer │
|
||
│ │ │ │ │ │
|
||
│ SearxngClient│ │ ArticleService │ │OpenRouterClient│
|
||
│ │ │ ChunkingService │ │ │
|
||
│ │ │ EmbeddingService │ │ │
|
||
└──────────────┘ └──────────────────┘ └──────────────┘
|
||
```
|
||
|
||
## Architectural Patterns
|
||
|
||
### 1. Pipeline Pattern
|
||
|
||
The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:
|
||
|
||
```csharp
|
||
Phase 1: ExecuteParallelSearchesAsync
|
||
↓ (List<SearxngResult>)
|
||
Phase 2: ExecuteParallelArticleFetchingAsync
|
||
↓ (List<Chunk>)
|
||
Phase 3: ExecuteParallelEmbeddingsAsync
|
||
↓ ((queryEmbedding, chunkEmbeddings))
|
||
Phase 4: RankAndSelectTopChunks
|
||
↓ (List<Chunk> topChunks)
|
||
→ Formatted context string returned
|
||
```
|
||
|
||
Each phase:
|
||
- Accepts input from previous phase
|
||
- Processes in parallel where applicable
|
||
- Returns output to next phase
|
||
- Reports progress via callbacks
|
||
|
||
### 2. Service Layer Pattern
|
||
|
||
Services (`Services/` directory) are stateless classes that encapsulate specific operations:
|
||
|
||
- **Clients**: `OpenRouterClient`, `SearxngClient` (HTTP communication)
|
||
- **Processors**: `EmbeddingService`, `ChunkingService` (data transformation)
|
||
- **Extractors**: `ArticleService` (content extraction)
|
||
- **Infrastructure**: `RateLimiter`, `StatusReporter` (cross-cutting concerns)
|
||
|
||
All dependencies are explicit (constructor injection), making services easily testable.
|
||
|
||
### 3. Dependency Injection (Manual)
|
||
|
||
While not using a DI container, OpenQuery follows DI principles:
|
||
|
||
```csharp
|
||
// Program.cs: instantiate dependencies with explicit parameters
|
||
var client = new OpenRouterClient(apiKey);
|
||
var searxngClient = new SearxngClient(searxngUrl);
|
||
var embeddingService = new EmbeddingService(client);
|
||
var searchTool = new SearchTool(searxngClient, embeddingService);
|
||
var openQuery = new OpenQueryApp(client, searchTool, model);
|
||
```
|
||
|
||
Benefits:
|
||
- Clear dependency graph
|
||
- Easy to substitute mocks for testing
|
||
- No magic; visible construction
|
||
|
||
### 4. Observer Pattern (Progress Reporting)
|
||
|
||
`StatusReporter` and progress callbacks implement observer pattern:
|
||
|
||
```csharp
|
||
// SearchTool receives a progress callback
|
||
public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)
|
||
|
||
// Components invoke callback at key milestones
|
||
onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");
|
||
|
||
// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
|
||
_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);
|
||
```
|
||
|
||
### 5. Resilience Patterns (Polly)
|
||
|
||
`EmbeddingService` uses Polly's retry policy:
|
||
|
||
```csharp
|
||
_retryPipeline = new ResiliencePipelineBuilder()
|
||
.AddRetry(new RetryStrategyOptions
|
||
{
|
||
MaxRetryAttempts = 3,
|
||
Delay = TimeSpan.FromSeconds(1),
|
||
BackoffType = DelayBackoffType.Exponential,
|
||
ShouldHandle = new PredicateBuilder()
|
||
.Handle<HttpRequestException>()
|
||
})
|
||
.Build();
|
||
```
|
||
|
||
This automatically retries failed embedding requests with exponential backoff.
|
||
|
||
### 6. Producer-Consumer Pattern (Channel-based)
|
||
|
||
`StatusReporter` uses `System.Threading.Channels.Channel<string>` for asynchronous progress updates:
|
||
|
||
- Producer: `UpdateStatus()` writes messages to channel
|
||
- Consumer: Background task `ProcessStatusUpdatesAsync()` reads and displays
|
||
- Benefit: No blocking between progress generation and display
|
||
|
||
### 7. Disposable Pattern
|
||
|
||
Components that hold unmanaged resources implement `IDisposable` or `IAsyncDisposable`:
|
||
|
||
- `StatusReporter` - stops background spinner task
|
||
- `RateLimiter` - disposes semaphore
|
||
|
||
Used via `using` statements for deterministic cleanup.
|
||
|
||
## Component Architecture
|
||
|
||
### OpenQueryApp (OpenQuery.cs)
|
||
|
||
**Role**: Main orchestrator; coordinates the entire workflow
|
||
|
||
**Responsibilities**:
|
||
- Parse CLI options into `OpenQueryOptions`
|
||
- Load configuration and resolve API keys/models
|
||
- Optionally generate expanded search queries via LLM
|
||
- Invoke `SearchTool` with progress callbacks
|
||
- Stream final answer from LLM
|
||
|
||
**Key Methods**:
|
||
- `RunAsync(OpenQueryOptions)` - Main entry point
|
||
|
||
**Interactions**:
|
||
- Instantiates `OpenRouterClient` (for both query gen and final answer)
|
||
- Instantiates `SearxngClient` (passed to `SearchTool`)
|
||
- Instantiates `EmbeddingService` (passed to `SearchTool`)
|
||
- Instantiates `SearchTool` (orchestration)
|
||
- `StatusReporter` for UI updates
|
||
|
||
### SearchTool (Tools/SearchTool.cs)
|
||
|
||
**Role**: Core search-retrieve-rank pipeline orchestrator
|
||
|
||
**Responsibilities**:
|
||
- Execute 4-phase pipeline (search → fetch → embed → rank)
|
||
- Manage concurrency limits (via semaphores)
|
||
- Coordinate parallel operations
|
||
- Generate context string for final answer
|
||
|
||
**Interactions**:
|
||
- Uses `SearxngClient` for Phase 1
|
||
- Uses `ArticleService` + `ChunkingService` for Phase 2
|
||
- Uses `EmbeddingService` for Phase 3
|
||
- Has no external UI dependency (pure logic)
|
||
|
||
**Parallelization Strategy**:
|
||
- **Phase 1**: `Task.WhenAll` on search tasks (unbounded but limited by SearxNG instance)
|
||
- **Phase 2**: Semaphore (max 10 concurrent fetches)
|
||
- **Phase 3**: `Parallel.ForEachAsync` (max 4 concurrent embedding batches)
|
||
|
||
### EmbeddingService (Services/EmbeddingService.cs)
|
||
|
||
**Role**: Generate vector embeddings with batching, rate limiting, and retries
|
||
|
||
**Responsibilities**:
|
||
- Batch embedding requests (default: 300 per batch)
|
||
- Parallelize batches (default: 4 concurrent)
|
||
- Apply rate limiting (via `RateLimiter`)
|
||
- Retry failed requests (Polly)
|
||
- Calculate cosine similarity
|
||
|
||
**Key Methods**:
|
||
- `GetEmbeddingsAsync(List<string> texts, ...)` - batch with progress
|
||
- `GetEmbeddingAsync(string text)` - single embedding
|
||
- `CosineSimilarity(float[], float[])` - static vector math
|
||
|
||
**Design Notes**:
|
||
- Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
|
||
- Batches of 300 reduce API overhead
|
||
- Polly retry handles transient failures (429, 500, network blips)
|
||
|
||
### OpenRouterClient (Services/OpenRouterClient.cs)
|
||
|
||
**Role**: HTTP client for OpenRouter API (completions + embeddings)
|
||
|
||
**Responsibilities**:
|
||
- Serialize requests to JSON (source-generated)
|
||
- Send HTTP with authorization header
|
||
- Stream responses for chat completions (IAsyncEnumerable)
|
||
- Return full responses for non-streaming
|
||
- Throw on non-2xx status codes
|
||
|
||
**Endpoints**:
|
||
- POST `/chat/completions` (stream and non-stream)
|
||
- POST `/embeddings`
|
||
|
||
**Configuration**:
|
||
- Base URL: `https://openrouter.ai/api/v1`
|
||
- Headers: `Authorization: Bearer {apiKey}`, `Accept: application/json`
|
||
|
||
**Design**:
|
||
- Low-level client; no retry logic (retry is in `EmbeddingService`)
|
||
- Thin wrapper around `HttpClient`
|
||
- Could be replaced with `HttpClientFactory` in larger apps
|
||
|
||
### SearxngClient (Services/SearxngClient.cs)
|
||
|
||
**Role**: HTTP client for SearxNG metasearch
|
||
|
||
**Responsibilities**:
|
||
- Construct search URL with query param
|
||
- GET request and deserialize JSON
|
||
- Limit results (`.Take(limit)`)
|
||
- Return empty list on failure (no exceptions)
|
||
|
||
**Endpoint**: `GET /search?q={query}&format=json`
|
||
|
||
**Design**:
|
||
- Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
|
||
- `DistinctBy(r => r.Url)` deduplication happens upstream
|
||
|
||
### ArticleService (Services/ArticleService.cs)
|
||
|
||
**Role**: Extract clean article content from URLs
|
||
|
||
**Responsibilities**:
|
||
- Call `SmartReader.ParseArticleAsync(url)`
|
||
- Return `Article` object with `Title`, `TextContent`, `IsReadable`
|
||
|
||
**Design**:
|
||
- Single responsibility: extraction only (no fetching, no chunking)
|
||
- SmartReader handles all complexity (HTML parsing, boilerplate removal)
|
||
- Exceptions propagate to `SearchTool` (handled there)
|
||
|
||
### ChunkingService (Services/ChunkingService.cs)
|
||
|
||
**Role**: Split long text into 500-char chunks at natural boundaries
|
||
|
||
**Algorithm**:
|
||
1. Start at index 0
|
||
2. Take up to 500 chars
|
||
3. If not at end, backtrack to last space/newline/period
|
||
4. Add chunk, advance start
|
||
5. Repeat until done
|
||
|
||
**Design**:
|
||
- Static class (stateless utility)
|
||
- No dependencies
|
||
- Pure function (input text → output chunks)
|
||
|
||
### RateLimiter (Services/RateLimiter.cs)
|
||
|
||
**Role**: Limit concurrent operations via semaphore
|
||
|
||
**Responsibilities**:
|
||
- Wrap actions with semaphore acquisition
|
||
- Support both sync and async actions
|
||
- Implement `IAsyncDisposable` for cleanup
|
||
|
||
**Usage Pattern**:
|
||
```csharp
|
||
await _rateLimiter.ExecuteAsync(async () =>
|
||
{
|
||
// operation limited by semaphore
|
||
return await SomeApiCall();
|
||
});
|
||
```
|
||
|
||
**Design**:
|
||
- Generic wrapper (can return TResult)
|
||
- `SemaphoreSlim` initialized at construction
|
||
- Used in `EmbeddingService` for parallel embedding batches
|
||
|
||
### StatusReporter (Services/StatusReporter.cs)
|
||
|
||
**Role**: Show real-time progress with spinner or verbose mode
|
||
|
||
**Responsibilities**:
|
||
- Maintain spinner animation (background task)
|
||
- Receive status updates via channel
|
||
- Display updates with appropriate formatting
|
||
- Stop spinner on completion
|
||
|
||
**Features**:
|
||
- **Spinner mode** (non-verbose): `⠋ Fetching...` with animated Braille chars
|
||
- **Verbose mode**: `[Fetching article 1/10: example.com]` on separate lines
|
||
- **Thread-safe**: Channel is safe for concurrent writes
|
||
- **Non-blocking**: Background spinner doesn't block updates
|
||
|
||
**Design**:
|
||
- `Channel<string>` for asynchronous producer-consumer
|
||
- Background task (`_statusProcessor`) reads from channel
|
||
- Spinner runs on its own task with 100ms delay per frame
|
||
- `IDisposable` ensures proper cleanup
|
||
|
||
### ConfigManager (ConfigManager.cs)
|
||
|
||
**Role**: Load and save configuration from/to file
|
||
|
||
**Responsibilities**:
|
||
- Get config path (XDG: `~/.config/openquery/config`)
|
||
- Parse key-value pairs (no INI library, manual parsing)
|
||
- Provide `AppConfig` object with defaults
|
||
- Save settings back to file
|
||
|
||
**Design**:
|
||
- Static class (no instances)
|
||
- Creates config directory if missing
|
||
- Line-by-line parsing (simple, no dependencies)
|
||
- Could be improved with proper INI parser or JSON
|
||
|
||
## Data Flow
|
||
|
||
### End-to-End Data Flow
|
||
|
||
```
|
||
User: "What is quantum entanglement?"
|
||
|
||
1. OpenQueryOptions created
|
||
{ Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }
|
||
|
||
2. Query Generation (if Queries > 1)
|
||
→ ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
|
||
→ Deserialize to List<string> (generatedQueries)
|
||
|
||
3. Search Phase
|
||
generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
|
||
→ DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)
|
||
|
||
4. Fetch Phase
|
||
searchResults → Parallel.ArticleService.FetchArticleAsync → Article
|
||
→ ChunkingService.ChunkText (split into ~500-char pieces)
|
||
→ ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)
|
||
|
||
5. Embedding Phase
|
||
originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
|
||
chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)
|
||
|
||
6. Ranking Phase
|
||
For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
|
||
OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)
|
||
|
||
7. Answer Phase
|
||
context = string.Join("\n\n", topChunks.Select(...))
|
||
→ ChatCompletionRequest to OpenRouter with context + question
|
||
→ StreamAsync → Console.Write(delta) (real-time display)
|
||
|
||
Result: User sees answer with [Source N] citations
|
||
```
|
||
|
||
### Data Structures
|
||
|
||
**Chunk** - The core data structure flowing through the pipeline:
|
||
```csharp
|
||
public record Chunk(
|
||
string Content, // Text content (~500 chars)
|
||
string SourceUrl, // Where it came from
|
||
string? Title = null // Article title
|
||
)
|
||
{
|
||
public float[]? Embedding { get; set; } // Added in Phase 3
|
||
public float Score { get; set; } // Added in Phase 4
|
||
}
|
||
```
|
||
|
||
**Data Flow State**:
|
||
- Phase 1-2: `Chunk` without embedding
|
||
- Phase 3: `Chunk.Embedding` populated
|
||
- Phase 4: `Chunk.Score` populated
|
||
- Phase 5: Serialized into context string
|
||
|
||
### Memory Footprint
|
||
|
||
**Per 15-article run (approximate)**:
|
||
- Raw HTML (fetched): ~5MB (transient, discarded after extract)
|
||
- Articles: ~500KB (15 articles × ~30KB extracted text)
|
||
- Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
|
||
- Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
|
||
- Total peak: ~1-2MB (excluding OpenRouter's memory usage)
|
||
|
||
**Note**: AOT compilation reduces runtime memory compared to JIT.
|
||
|
||
## Concurrency Model
|
||
|
||
OpenQuery uses multiple parallelization strategies:
|
||
|
||
### Unbounded Parallelism (Task.WhenAll)
|
||
|
||
**Where**: Search queries (Phase 1)
|
||
|
||
```csharp
|
||
var searchTasks = generatedQueries.Select(async query => { ... });
|
||
await Task.WhenAll(searchTasks);
|
||
```
|
||
|
||
**Rationale**: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.
|
||
|
||
**Risk**: Could overwhelm SearxNG if `--queries` is set very high (100+). Default 3 is safe.
|
||
|
||
### Semaphore-Controlled Parallelism
|
||
|
||
**Where**: Article fetching (Phase 2)
|
||
|
||
```csharp
|
||
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
|
||
await Task.WhenAll(fetchTasks); // Each task waits on semaphore
|
||
```
|
||
|
||
**Rationale**: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.
|
||
|
||
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentArticleFetches` (compile-time constant currently).
|
||
|
||
### Parallel.ForEachAsync with MaxDegreeOfParallelism
|
||
|
||
**Where**: Embedding batch processing (Phase 3)
|
||
|
||
```csharp
|
||
await Parallel.ForEachAsync(
|
||
batchIndices,
|
||
new ParallelOptions { MaxDegreeOfParallelism = 4 },
|
||
async (batchIndex, ct) => { ... }
|
||
);
|
||
```
|
||
|
||
**Rationale**: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.
|
||
|
||
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentEmbeddingRequests` (compile-time).
|
||
|
||
### Progress Reporting (Channel)
|
||
|
||
**Where**: All phases pass `onProgress` callback
|
||
|
||
**Implementation**:
|
||
- `StatusReporter.UpdateStatus()` → writes to channel
|
||
- Background task reads channel and displays
|
||
- Non-blocking; callbacks are fire-and-forget (TryWrite)
|
||
|
||
**Thread Safety**: Channel is thread-safe; multiple phases may write concurrently.
|
||
|
||
## Error Handling & Resilience
|
||
|
||
### HTTP Errors
|
||
|
||
**OpenRouterClient**:
|
||
- Calls `response.EnsureSuccessStatusCode()` → throws `HttpRequestException` on 4xx/5xx
|
||
- No retry (handled at higher level in `EmbeddingService`)
|
||
|
||
**SearxngClient**:
|
||
- Returns empty `List<SearxngResult>` on non-success
|
||
- No exception thrown (searches are non-critical; if some queries fail, others proceed)
|
||
|
||
### Retry Policy (Polly)
|
||
|
||
**Location**: `EmbeddingService` constructor
|
||
|
||
**Scope**: Only embedding requests (`_client.EmbedAsync`)
|
||
|
||
**Policy**:
|
||
- Max 3 attempts
|
||
- Exponential backoff: 1s, 2s, 4s
|
||
- Only retries `HttpRequestException` (network errors, 429, 5xx)
|
||
|
||
**Why not on chat completions?**
|
||
- Query generation and final answer are critical; failures should surface immediately
|
||
- Could be added in future if transient failures are common
|
||
|
||
### Graceful Degradation
|
||
|
||
**Query Generation Failure**:
|
||
```csharp
|
||
try { ... generate queries ... }
|
||
catch (Exception ex)
|
||
{
|
||
// Fall back to original question as sole query
|
||
if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
|
||
}
|
||
```
|
||
|
||
**Embedding Batch Failure**:
|
||
```csharp
|
||
catch
|
||
{
|
||
// Return empty embeddings for this batch (chunk will be filtered out)
|
||
var emptyBatch = new float[batch.Count][];
|
||
// fill with empty arrays
|
||
results.Add((batchIndex, emptyBatch));
|
||
}
|
||
```
|
||
|
||
**Article Fetch Failure**:
|
||
```csharp
|
||
try { await ArticleService.FetchArticleAsync(url); }
|
||
catch (Exception ex)
|
||
{
|
||
if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
|
||
// Chunk not added; continue with others
|
||
}
|
||
```
|
||
|
||
### User-Facing Errors
|
||
|
||
Top-level exception handler in `Program.cs`:
|
||
```csharp
|
||
try { await openQuery.RunAsync(options); }
|
||
catch (HttpRequestException ex)
|
||
{
|
||
Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
|
||
Environment.Exit(1);
|
||
}
|
||
catch (Exception ex)
|
||
{
|
||
Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
|
||
Environment.Exit(1);
|
||
}
|
||
```
|
||
|
||
### Cancellation Support
|
||
|
||
`OpenRouterClient.StreamAsync` and `EmbeddingService` methods accept `CancellationToken`.
|
||
|
||
Used in:
|
||
- Streaming answer (Ctrl+C stops immediately)
|
||
- Parallel embeddings (can be cancelled)
|
||
|
||
## Performance Considerations
|
||
|
||
### Latency Breakdown (Typical)
|
||
|
||
| Stage | Time | Description |
|
||
|-------|------|-------------|
|
||
| Query generation | 2-5s | LLM generates 3-5 queries |
|
||
| Searches | 3-8s | 3-5 parallel SearxNG queries |
|
||
| Article fetching | 5-15s | 10-20 parallel fetches (network + parse) |
|
||
| Embeddings | 2-4s | 50-100 chunks in 4-parallel batches |
|
||
| Final answer | 5-20s | Depends on answer length (streaming) |
|
||
| **Total** | **15-50s** | Varies widely based on network & content |
|
||
|
||
### Bottlenecks
|
||
|
||
1. **Network I/O** (article fetching, API calls) - can't be CPU-bound
|
||
2. **OpenRouter API latency** - varies by model and load
|
||
3. **SmartReader parsing** - CPU-bound for large HTML
|
||
4. **Embedding API rate** - OpenRouter may rate limit if too many concurrent
|
||
|
||
### Optimization Strategies
|
||
|
||
- **Parallelism**: Already maximized within API constraints
|
||
- **Caching**: Not implemented; future enhancement could cache embeddings per URL
|
||
- **Batching**: 300-chunk batches reduce API overhead
|
||
- **AOT**: Native compilation reduces startup overhead vs JIT
|
||
|
||
### Scalability Limits
|
||
|
||
- **Memory**: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
|
||
- **API Limits**: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
|
||
- **SearxNG Limits**: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).
|
||
|
||
## Design Decisions
|
||
|
||
### Why Not Use a DI Container?
|
||
|
||
OpenQuery manually wires dependencies in `Program.cs`. For a small CLI app, this is:
|
||
- Simpler (no container configuration)
|
||
- More explicit (easy to trace dependency graph)
|
||
- No runtime overhead
|
||
- Easier to understand for contributors
|
||
|
||
Container-based DI would be overengineering.
|
||
|
||
### Why AOT?
|
||
|
||
.NET 10 AOT provides:
|
||
- **Fast startup** (<100ms vs ~500ms JIT)
|
||
- **Smaller footprint** (trimmed, no JIT)
|
||
- **No runtime dependencies** (self-contained)
|
||
- Better for CLI tools distributed to users
|
||
|
||
Trade-offs:
|
||
- Longer build time
|
||
- Some reflection-based APIs not supported (not needed here)
|
||
- Less flexible (can't load dynamic assemblies, but not needed)
|
||
|
||
### Why SmartReader for Article Extraction?
|
||
|
||
SmartReader uses Readability-based algorithm similar to Firefox Reader View:
|
||
- Removes ads, navigation, comments, boilerplate
|
||
- Extracts main article content
|
||
- Handles malformed HTML gracefully
|
||
- Zero dependencies (pure .NET)
|
||
|
||
Alternatives considered:
|
||
- `HtmlAgilityPack` (too low-level, need to implement extraction logic)
|
||
- `AngleSharp` (similar, still need extraction)
|
||
- External services (like diffbot) - require API keys, costs money
|
||
|
||
SmartReader is the sweet spot: free, good quality, easy integration.
|
||
|
||
### Why Embeddings + Cosine Similarity vs Full-Text Search?
|
||
|
||
Full-text search (like Lucene) would:
|
||
- Require inverted index, more complex
|
||
- Be faster for exact keyword matching
|
||
- Not understand semantic similarity
|
||
|
||
Embeddings provide:
|
||
- Semantic similarity (understand meaning, not just keywords)
|
||
- Simple math (cosine similarity of float arrays)
|
||
- No index to maintain (just compute on-the-fly)
|
||
|
||
Trade-off: Embedding API cost and latency vs LRU cache potential.
|
||
|
||
### Why Not RAG (Retrieval Augmented Generation) Framework?
|
||
|
||
OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:
|
||
- Add dependency bloat
|
||
- Reduce control
|
||
- Increase abstraction complexity
|
||
|
||
Custom implementation is ~1000 LOC and perfectly matches needs.
|
||
|
||
### Why System.CommandLine?
|
||
|
||
Provides:
|
||
- Native-like CLI help (`openquery --help`)
|
||
- Strongly-typed options
|
||
- Command hierarchy (main + subcommands)
|
||
- Good error messages
|
||
|
||
Alternative: `CommandLineParser` (older) or manual parsing. System.CommandLine is modern and actively developed.
|
||
|
||
---
|
||
|
||
## Next Steps
|
||
|
||
- [Components](components/overview.md) - Deep dive into each module
|
||
- [API Reference](api/cli.md) - Complete command documentation
|
||
- [Troubleshooting](troubleshooting.md) - Debug issues
|
||
|
||
---
|
||
|
||
**Want to extend OpenQuery?** Check the [Components](components/overview.md) guide to understand each piece.
|