docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide
- Create docs/ folder with structured technical documentation:
  - installation.md: Build and setup instructions
  - configuration.md: Complete config reference
  - usage.md: CLI usage guide with examples
  - architecture.md: System design and patterns
  - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models)
  - api/: CLI reference, environment variables, programmatic API
  - troubleshooting.md: Common issues and solutions
  - performance.md: Latency, throughput, and optimization
- All documentation fully cross-referenced with internal links
- Covers project overview, architecture, components, APIs, and support

See individual files for complete documentation.
This commit is contained in:
OpenQuery Documentation
2026-03-19 10:01:58 +01:00
parent b28d8998f7
commit 65ca2401ae
16 changed files with 7073 additions and 0 deletions

528
docs/components/models.md Normal file
View File

@@ -0,0 +1,528 @@
# Models Reference
Complete reference for all data models, DTOs, and records in OpenQuery.
## 📋 Table of Contents
1. [Core Data Models](#core-data-models)
2. [OpenRouter API Models](#openrouter-api-models)
3. [SearxNG API Models](#searxng-api-models)
4. [JSON Serialization](#json-serialization)
5. [Model Relationships](#model-relationships)
## Core Data Models
### OpenQueryOptions
**Location**: `Models/OpenQueryOptions.cs`
**Type**: `record`
**Purpose**: Immutable options object for a single query execution
```csharp
public record OpenQueryOptions(
int Chunks, // Number of top chunks to include in context
int Results, // Search results per generated query
int Queries, // Number of search queries to generate (if >1)
bool Short, // Request concise answer
bool Long, // Request detailed answer
bool Verbose, // Enable verbose logging
string Question // Original user question (required)
);
```
**Lifecycle**:
- Created in `Program.cs` by combining CLI options, config defaults, and environment variables
- Passed to `OpenQueryApp.RunAsync(options)`
**Validation**: None (assumes valid values from CLI parser/config)
**Example**:
```csharp
var options = new OpenQueryOptions(
Chunks: 3,
Results: 5,
Queries: 3,
Short: false,
Long: false,
Verbose: true,
Question: "What is quantum entanglement?"
);
```
---
### Chunk
**Location**: `Models/Chunk.cs`
**Type**: `record`
**Purpose**: Content chunk with metadata, embedding, and relevance score
```csharp
public record Chunk(
string Content, // Text content (typically ~500 chars)
string SourceUrl, // Original article URL
string? Title = null // Article title (optional, may be null)
)
{
public float[]? Embedding { get; set; } // Vector embedding (1536-dim for text-embedding-3-small)
public float Score { get; set; } // Relevance score (0-1, higher = more relevant)
}
```
**Lifecycle**:
1. **Created** in `SearchTool.ExecuteParallelArticleFetchingAsync`:
```csharp
chunks.Add(new Chunk(chunkText, result.Url, article.Title));
```
At this point: `Embedding = null`, `Score = 0`
2. **Embedded** in `SearchTool.ExecuteParallelEmbeddingsAsync`:
```csharp
validChunks[i].Embedding = validEmbeddings[i];
```
3. **Scored** in `SearchTool.RankAndSelectTopChunks`:
```csharp
chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
```
4. **Formatted** into context string:
```csharp
$"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"
```
**Properties**:
- `Content`: Never null/empty (filters empty chunks in `ChunkingService`)
- `SourceUrl`: Always provided (from `SearxngResult.Url`)
- `Title`: May be null if article extraction failed to get title
- `Embedding`: Null until phase 3; may remain null if embedding failed
- `Score`: 0 until phase 4; irrelevant for non-embedded chunks
**Equality**: Records use value equality (all properties compared). Two chunks with same content/url/title are equal; embeddings and scores ignored for equality (as they're mutable).
---
### ParallelProcessingOptions
**Location**: `Models/ParallelOptions.cs`
**Type**: `class`
**Purpose**: Configuration for parallel/concurrent operations
```csharp
public class ParallelProcessingOptions
{
public int MaxConcurrentArticleFetches { get; set; } = 10;
public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
public int EmbeddingBatchSize { get; set; } = 300;
}
```
**Usage**:
- Instantiated in `SearchTool` constructor (hardcoded new)
- Passed to `EmbeddingService` constructor
- Read by `SearchTool` for article fetching semaphore
**Default Values**:
| Property | Default | Effect |
|----------|---------|--------|
| `MaxConcurrentArticleFetches` | 10 | Up to 10 articles fetched simultaneously |
| `MaxConcurrentEmbeddingRequests` | 4 | Up to 4 embedding batches in parallel |
| `EmbeddingBatchSize` | 300 | Each embedding API call handles up to 300 texts |
**Current Limitation**: These are **compile-time defaults** (hardcoded in `SearchTool.cs`). To make them configurable:
1. Add to `AppConfig`
2. Read in `ConfigManager`
3. Pass through `SearchTool` constructor
---
## OpenRouter API Models
**Location**: `Models/OpenRouter.cs`
**Purpose**: DTOs for OpenRouter's REST API (JSON serialization)
### Chat Completion
#### `ChatCompletionRequest`
```csharp
public record ChatCompletionRequest(
[property: JsonPropertyName("model")] string Model,
[property: JsonPropertyName("messages")] List<Message> Messages,
[property: JsonPropertyName("tools")] List<ToolDefinition>? Tools = null,
[property: JsonPropertyName("stream")] bool Stream = false
);
```
**Example**:
```json
{
"model": "qwen/qwen3.5-flash-02-23",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is 2+2?" }
],
"stream": true
}
```
#### `Message`
```csharp
public record Message(
[property: JsonPropertyName("role")] string Role,
[property: JsonPropertyName("content")] string? Content = null,
[property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null,
[property: JsonPropertyName("tool_call_id")] string? ToolCallId = null
)
{
// Factory method for tool responses
public static Message FromTool(string content, string toolCallId) =>
new Message("tool", content, null, toolCallId);
}
```
**Roles**: `"system"`, `"user"`, `"assistant"`, `"tool"`
**Usage**:
- `Content` for text messages
- `ToolCalls` when assistant requests tool use
- `ToolCallId` when responding to tool call
#### `ChatCompletionResponse`
```csharp
public record ChatCompletionResponse(
[property: JsonPropertyName("choices")] List<Choice> Choices,
[property: JsonPropertyName("usage")] Usage? Usage = null
);
public record Choice(
[property: JsonPropertyName("message")] Message Message,
[property: JsonPropertyName("finish_reason")] string? FinishReason = null
);
```
**Response Example**:
```json
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Answer text..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150
}
}
```
#### `Usage`
```csharp
public record Usage(
[property: JsonPropertyName("prompt_tokens")] int PromptTokens,
[property: JsonPropertyName("completion_tokens")] int CompletionTokens,
[property: JsonPropertyName("total_tokens")] int TotalTokens
);
```
### Tool Calling (Not Currently Used)
#### `ToolDefinition` / `ToolFunction`
```csharp
public record ToolDefinition(
[property: JsonPropertyName("type")] string Type, // e.g., "function"
[property: JsonPropertyName("function")] ToolFunction Function
);
public record ToolFunction(
[property: JsonPropertyName("name")] string Name,
[property: JsonPropertyName("description")] string Description,
[property: JsonPropertyName("parameters")] JsonElement Parameters // JSON Schema
);
```
#### `ToolCall` / `FunctionCall`
```csharp
public record ToolCall(
[property: JsonPropertyName("id")] string Id,
[property: JsonPropertyName("type")] string Type,
[property: JsonPropertyName("function")] FunctionCall Function
);
public record FunctionCall(
[property: JsonPropertyName("name")] string Name,
[property: JsonPropertyName("arguments")] string Arguments // JSON string
);
```
**Note**: OpenQuery doesn't use tools currently, but models are defined for future tool-calling capability.
### Streaming
#### `StreamChunk`
```csharp
public record StreamChunk(
string? TextDelta = null,
ClientToolCall? Tool = null
);
```
Yielded by `OpenRouterClient.StreamAsync()` for each SSE event.
#### `ChatCompletionChunk` (Server Response)
```csharp
public record ChatCompletionChunk(
[property: JsonPropertyName("choices")] List<ChunkChoice> Choices
);
public record ChunkChoice(
[property: JsonPropertyName("delta")] ChunkDelta Delta
);
public record ChunkDelta(
[property: JsonPropertyName("content")] string? Content = null,
[property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null
);
```
**Streaming Response Example** (SSE):
```
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":" world"}}]}
data: [DONE]
```
`OpenRouterClient.StreamAsync` parses and yields `StreamChunk` with non-null `TextDelta` for content.
### Embeddings
#### `EmbeddingRequest`
```csharp
public record EmbeddingRequest(
[property: JsonPropertyName("model")] string Model,
[property: JsonPropertyName("input")] List<string> Input
);
```
**Example**:
```json
{
"model": "openai/text-embedding-3-small",
"input": ["text 1", "text 2", ...]
}
```
#### `EmbeddingResponse`
```csharp
public record EmbeddingResponse(
[property: JsonPropertyName("data")] List<EmbeddingData> Data,
[property: JsonPropertyName("usage")] Usage Usage
);
public record EmbeddingData(
[property: JsonPropertyName("embedding")] float[] Embedding,
[property: JsonPropertyName("index")] int Index
);
```
**Response Example**:
```json
{
"data": [
{ "embedding": [0.1, 0.2, ...], "index": 0 },
{ "embedding": [0.3, 0.4, ...], "index": 1 }
],
"usage": {
"prompt_tokens": 100,
"total_tokens": 100
}
}
```
**Note**: `_client.EmbedAsync` orders by `index` to match input order.
---
## SearxNG API Models
**Location**: `Models/Searxng.cs`
**Purpose**: DTOs for SearxNG's JSON response format
### `SearxngRoot`
```csharp
public record SearxngRoot(
[property: JsonPropertyName("results")] List<SearxngResult> Results
);
```
Top-level response object.
### `SearxngResult`
```csharp
public record SearxngResult(
[property: JsonPropertyName("title")] string Title,
[property: JsonPropertyName("url")] string Url,
[property: JsonPropertyName("content")] string Content // Snippet/description
);
```
**Fields**:
- `Title`: Result title (from page `<title>` or OpenGraph)
- `Url`: Absolute URL to article
- `Content`: Short snippet (~200 chars) from search engine
**Usage**:
- `Url` passed to `ArticleService.FetchArticleAsync`
- `Title` used as fallback if article extraction fails
- `Content` currently unused (could be for quick answer without fetching)
**Example Response**:
```json
{
"results": [
{
"title": "Quantum Entanglement - Wikipedia",
"url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
"content": "Quantum entanglement is a physical phenomenon..."
}
]
}
```
---
## JSON Serialization
### JsonContext (Source Generation)
**Location**: `Models/JsonContexts.cs`
**Purpose**: Provide source-generated JSON serializer context for AOT compatibility
#### Declaration
```csharp
[JsonSerializable(typeof(ChatCompletionRequest))]
[JsonSerializable(typeof(ChatCompletionResponse))]
[JsonSerializable(typeof(ChatCompletionChunk))]
[JsonSerializable(typeof(EmbeddingRequest))]
[JsonSerializable(typeof(EmbeddingResponse))]
[JsonSerializable(typeof(SearxngRoot))]
[JsonJsonSerializer(typeof(List<string>))]
internal partial class AppJsonContext : JsonSerializerContext
{
}
```
**Usage**:
```csharp
var json = JsonSerializer.Serialize(request, AppJsonContext.Default.ChatCompletionRequest);
var response = JsonSerializer.Deserialize(json, AppJsonContext.Default.ChatCompletionResponse);
```
**Benefits**:
- **AOT-compatible**: No reflection, works with PublishAot=true
- **Performance**: Pre-compiled serializers are faster
- **Trimming safe**: Unused serializers trimmed automatically
**Generated**: Partial class compiled by source generator (no manual implementation)
**Important**: Must include ALL types that will be serialized/deserialized in `[JsonSerializable]` attributes, otherwise runtime exception in AOT.
---
## Model Relationships
### Object Graph (Typical Execution)
```
OpenQueryOptions
OpenQueryApp.RunAsync()
├─ queryGenerationMessages (List<Message>)
│ ├─ system: "You are an expert researcher..."
│ └─ user: "Generate N queries for: {question}"
│ ↓
│ ChatCompletionRequest → OpenRouter → ChatCompletionResponse
│ ↓
│ List<string> generatedQueries
├─ SearchTool.ExecuteAsync()
│ ↓
│ ┌─────────────────────────────────────┐
│ │ Phase 1: Parallel Searches │
│ │ SearxngClient.SearchAsync(query) × N
│ │ → List<SearxngResult> │
│ │ (Title, Url, Content) │
│ └─────────────────────────────────────┘
│ ↓
│ ┌─────────────────────────────────────┐
│ │ Phase 2: Article Fetch & Chunking │
│ │ ArticleService.FetchAsync(Url) × M
│ │ → Article (TextContent, Title)
│ │ → ChunkingService.ChunkText → List<string> chunks
│ │ → Chunk(content, url, title) × K │
│ └─────────────────────────────────────┘
│ ↓
│ ┌─────────────────────────────────────┐
│ │ Phase 3: Embeddings │
│ │ EmbeddingService.GetEmbeddingsAsync(chunkContents)
│ │ → float[][] chunkEmbeddings │
│ │ → Set chunk.Embedding for each │
│ │ Also: GetEmbeddingAsync(question) → float[] queryEmbedding
│ └─────────────────────────────────────┘
│ ↓
│ ┌─────────────────────────────────────┐
│ │ Phase 4: Ranking │
│ │ For each chunk: Score = CosineSimilarity(queryEmbedding, chunk.Embedding)
│ │ → Set chunk.Score │
│ │ → OrderByDescending(Score) │
│ │ → Take(topChunksLimit) → topChunks (List<Chunk>)
│ └─────────────────────────────────────┘
│ ↓
│ Context string: formatted topChunks
│ ↓
└─ OpenQueryApp → final ChatCompletionRequest
System: "Answer based on context..."
User: "Context:\n{context}\n\nQuestion: {question}"
StreamAsync() → StreamChunk.TextDelta → Console
```
### Record Immutability
Most DTOs are `record` types:
- **Immutable**: Properties are init-only (`{ get; init; }`)
- **Value semantics**: Equality based on content
- **Thread-safe**: Can be shared across threads
**Exception**:
- `Chunk`: Has mutable properties `Embedding` and `Score` (set during pipeline)
- `ParallelProcessingOptions`: Class with mutable setters
- `AppConfig`: Class with mutable setters
---
## Next Steps
- **[API Reference](../../api/cli.md)** - How these models are used in CLI commands
- **[OpenRouterClient](../../services/OpenRouterClient.md)** - Uses OpenRouter models
- **[SearxngClient](../../services/SearxngClient.md)** - Uses Searxng models
- **[SearchTool](../../components/search-tool.md)** - Orchestrates all models
---
**Quick Reference Table**
| Model | Category | Purpose | Mutable? |
|-------|----------|---------|----------|
| `OpenQueryOptions` | Core | CLI options | No (record) |
| `Chunk` | Core | Content + metadata + ranking | Partially (Embedding, Score) |
| `ParallelProcessingOptions` | Config | Concurrency settings | Yes (class) |
| `ChatCompletionRequest/Response` | OpenRouter | LLM API | No |
| `EmbeddingRequest/Response` | OpenRouter | Embeddings API | No |
| `SearxngRoot/Result` | SearxNG | Search results | No |
| `AppJsonContext` | Internal | JSON serialization | No (generated partial) |

View File

@@ -0,0 +1,395 @@
# OpenQueryApp Component
Deep dive into the `OpenQueryApp` class - the main application orchestrator.
## Overview
`OpenQueryApp` is the heart of OpenQuery. It coordinates all components, manages the workflow from question to answer, and handles progress reporting.
## Location
`OpenQuery.cs` in project root
## Class Definition
```csharp
public class OpenQueryApp
{
private readonly OpenRouterClient _client;
private readonly SearchTool _searchTool;
private readonly string _model;
public OpenQueryApp(
OpenRouterClient client,
SearchTool searchTool,
string model);
public async Task RunAsync(OpenQueryOptions options);
}
```
**Dependencies**:
- `OpenRouterClient` - for query generation and final answer streaming
- `SearchTool` - for search-retrieve-rank pipeline
- `string _model` - model identifier to use for LLM calls
**Lifecycle**: Instantiated once per query execution in `Program.cs`, then `RunAsync()` called once.
## RunAsync Workflow
```csharp
public async Task RunAsync(OpenQueryOptions options)
{
// 1. Setup
using var reporter = new StatusReporter(options.Verbose);
reporter.StartSpinner();
// 2. Query Generation (if needed)
List<string> queries = await GenerateQueriesIfNeededAsync(options, reporter);
// 3. Search Pipeline
string searchResult = await ExecuteSearchPipelineAsync(options, queries, reporter);
// 4. Final Answer Streaming
await StreamFinalAnswerAsync(options, searchResult, reporter);
}
```
### Step 1: Status Reporter Setup
```csharp
using var reporter = new StatusReporter(options.Verbose);
reporter.StartSpinner();
```
- Creates `StatusReporter` (implements `IDisposable`)
- Starts spinner animation (unless verbose)
- `using` ensures disposal on exit
### Step 2: Query Generation
**When**: `options.Queries > 1` (user wants multiple search queries)
**Purpose**: Use LLM to generate diverse, optimized search queries from the original question
**System Prompt** (hardcoded in `OpenQuery.cs`):
```
You are an expert researcher. The user will ask a question. Your task is to
generate optimal search queries to gather comprehensive information.
Instructions:
1. Break down complex questions.
2. Use synonyms and alternative phrasing.
3. Target different aspects (entities, mechanisms, pros/cons, history).
CRITICAL: Output must be a valid JSON array of strings ONLY. No markdown,
explanations, or other text.
```
**Request**:
```csharp
var queryGenMessages = new List<Message>
{
new Message("system", systemPrompt),
new Message("user", $"Generate {options.Queries} distinct search queries for:\n{options.Question}")
};
var request = new ChatCompletionRequest(_model, queryGenMessages);
var response = await _client.CompleteAsync(request);
```
**Response Parsing**:
```csharp
var content = response.Choices.FirstOrDefault()?.Message.Content;
if (!string.IsNullOrEmpty(content))
{
// Remove markdown code fences if present
content = Regex.Replace(content, @"```json\s*|\s*```", "").Trim();
// Deserialize to List<string>
var generatedQueries = JsonSerializer.Deserialize(content, AppJsonContext.Default.ListString);
if (generatedQueries != null && generatedQueries.Count > 0)
{
queries = generatedQueries;
}
}
```
**Fallback**: If any step fails (exception, null, empty, invalid JSON), use `new List<string> { options.Question }` (single query = original)
**Note**: Query generation reuses the same model as final answer. This could be optimized:
- Use cheaper/faster model for query gen
- Separate model configuration
- Cache query generation results
### Step 3: Search Pipeline Execution
```csharp
var searchResult = await _searchTool.ExecuteAsync(
options.Question,
queries,
options.Results,
options.Chunks,
(progress) => {
if (options.Verbose)
reporter.WriteLine(progress);
else
reporter.UpdateStatus(parsedMessage);
},
options.Verbose);
```
**Parameters**:
- `originalQuery`: User's original question (used for final embedding)
- `generatedQueries`: From step 2 (or fallback)
- `maxResults`: `options.Results` (search results per query)
- `topChunksLimit`: `options.Chunks` (top N chunks to return)
- `onProgress`: Callback to update UI
- `verbose`: Passed through to `SearchTool`
**Returns**: `string context` - formatted context with source citations
**Progress Handling**:
- In verbose mode: all progress printed as lines (via `reporter.WriteLine()`)
- In compact mode: parse progress messages to show concise status (e.g., "Fetching articles 3/10...")
### Step 4: Final Answer Streaming
**Status Update**:
```csharp
if (!options.Verbose)
reporter.UpdateStatus("Asking AI...");
else
{
reporter.ClearStatus();
Console.WriteLine();
}
```
**Build System Prompt**:
```csharp
var systemPrompt = "You are a helpful AI assistant. Answer the user's question in depth, based on the provided context. Be precise and accurate. You can mention sources or citations.";
if (options.Short) systemPrompt += " Give a very short concise answer.";
if (options.Long) systemPrompt += " Give a long elaborate detailed answer.";
```
**Prompt Structure**:
```
System: {systemPrompt}
User: Context:
{searchResult}
Question: {options.Question}
```
Where `searchResult` is:
```
[Source 1: Title](URL)
Content chunk 1
[Source 2: Title](URL)
Content chunk 2
...
```
**Streaming**:
```csharp
var requestStream = new ChatCompletionRequest(_model, messages);
var assistantResponse = new StringBuilder();
var isFirstChunk = true;
using var streamCts = new CancellationTokenSource();
await foreach (var chunk in _client.StreamAsync(requestStream, streamCts.Token))
{
if (chunk.TextDelta == null) continue;
if (isFirstChunk)
{
reporter.StopSpinner();
if (!options.Verbose) reporter.ClearStatus();
else Console.Write("Assistant: ");
isFirstChunk = false;
}
Console.Write(chunk.TextDelta);
assistantResponse.Append(chunk.TextDelta);
}
```
**Key Points**:
- `StreamAsync` yields `StreamChunk` objects (text deltas)
- First chunk stops spinner and clears status line
- Each delta written to Console immediately (real-time feel)
- Entire response accumulated in `assistantResponse` (though not used elsewhere)
- `CancellationTokenSource` passed but not canceled (Ctrl+C would cancel from outside)
**Finally Block**:
```csharp
finally
{
reporter.StopSpinner();
}
```
Ensures spinner stops even if streaming fails.
**End**:
```csharp
Console.WriteLine(); // Newline after complete answer
```
## Error Handling
`RunAsync` itself does not catch exceptions. All exceptions propagate to `Program.cs`:
```csharp
try
{
var openQuery = new OpenQueryApp(client, searchTool, model);
await openQuery.RunAsync(options);
}
catch (HttpRequestException ex)
{
Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
Environment.Exit(1);
}
catch (Exception ex)
{
Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
Environment.Exit(1);
}
```
**Common Exceptions**:
- `HttpRequestException` - network failures, API errors
- `JsonException` - malformed JSON from API
- `TaskCanceledException` - timeout or user interrupt
- `Exception` - anything else
**No Retries at This Level**: Fail fast; user sees error immediately. Lower-level retries exist (embedding service).
## Performance Characteristics
**Query Generation**:
- One non-streaming LLM call
- Takes 2-5 seconds depending on model
- Typically <1000 tokens
**Search Pipeline** (`SearchTool.ExecuteAsync`):
- See `SearchTool.md` for detailed timing breakdown
- Total 10-30 seconds typically
**Final Answer Streaming**:
- Streaming LLM call
- Time depends on answer length (typically 5-20 seconds)
- User sees words appear progressively
**Total End-to-End**: 15-50 seconds for typical query
## Design Decisions
### Why Not Stream Query Generation?
Query generation currently uses `CompleteAsync` (non-streaming). Could be streamed but:
- Queries are short (JSON array)
- Streaming offers no UX benefit (user doesn't see intermediate queries)
- Simpler to wait for all queries before proceeding
### Why Build Prompt Manually Instead of Templates?
Simple string concatenation is fine for few prompts. Pros:
- No template dependencies
- Easy to read and modify
- No runtime compilation overhead
Cons:
- No validation
- Could benefit from prompt engineering framework
### Why Accumulate `assistantResponse` StringBuilder?
Currently built but not used. Could be:
- Saved to file (future feature: `--output file.md`)
- Analyzed for token counting
- Removed if not needed
### Could Query Generation Be Cached?
Yes! For repeated questions (common in scripts), cache query results:
- `Dictionary<string, List<string>>` cache in memory
- Or persistent cache (Redis, file)
- Not implemented (low priority)
### Single Responsibility Violation?
`OpenQueryApp` does:
- Query generation
- Pipeline orchestration
- Answer streaming
That's 3 responsibilities, but they're tightly coupled to the "query → answer" workflow. Separating them would add complexity without clear benefit. Acceptable as "application coordinator".
## Extension Points
### Adding New Model for Query Generation
Currently uses same `_model` for queries and answer. To use different models:
1. Add `queryGenerationModel` parameter to constructor
2. Use it for query gen: `new ChatCompletionRequest(queryGenerationModel, queryGenMessages)`
3. Keep `_model` for final answer
Or make it configurable via environment variable: `OPENROUTER_QUERY_MODEL`
### Post-Processing Answer
Opportunity to add:
- Source citation formatting (footnotes, clickable links)
- Answer summarization
- Export to Markdown/JSON
- Text-to-speech
Add after streaming loop, before final newline.
### Progress UI Enhancement
Current `StatusReporter` is basic. Could add:
- Progress bar with percentage
- ETA calculation
- Colors (ANSI) for different message types
- Logging to file
- Web dashboard
Would require extending `StatusReporter` or replacing it.
## Testing Considerations
**Challenges**:
- `RunAsync` is cohesive (hard to unit test in isolation)
- Depends on many services (need mocks)
- Asynchronous and streaming
**Recommended Approach**:
1. Extract interfaces:
- `ISearchTool` (wrapper around `SearchTool`)
- `IOpenRouterClient` (wrapper around `OpenRouterClient`)
2. Mock interfaces in tests
3. Test query generation parsing separately
4. Test progress callback counting
5. Test final answer prompt construction
**Integration Tests**:
- End-to-end with real/mocked APIs
- Automated tests with test SearxNG/OpenRouter instances
## Related Components
- **[SearchTool](search-tool.md)** - pipeline executed by `OpenQueryApp`
- **[Program.cs](../Program.md)** - creates `OpenQueryApp`
- **[StatusReporter](../services/StatusReporter.md)** - progress UI used by `OpenQueryApp`
---
## Next Steps
- [SearchTool](search-tool.md) - See the pipeline in detail
- [Services](../services/overview.md) - Understand each service
- [CLI Reference](../../api/cli.md) - How users invoke this

603
docs/components/overview.md Normal file
View File

@@ -0,0 +1,603 @@
# Components Overview
Detailed documentation for each major component in the OpenQuery system.
## 📋 Table of Contents
1. [Component Hierarchy](#component-hierarchy)
2. [Core Components](#core-components)
3. [Services](#services)
4. [Data Models](#data-models)
5. [Component Interactions](#component-interactions)
## Component Hierarchy
```
OpenQuery/
├── Program.cs [Entry Point, CLI]
├── OpenQuery.cs [OpenQueryApp - Orchestrator]
├── Tools/
│ └── SearchTool.cs [Pipeline Orchestration]
├── Services/
│ ├── OpenRouterClient.cs [LLM & Embedding API]
│ ├── SearxngClient.cs [Search API]
│ ├── EmbeddingService.cs [Embedding Generation + Math]
│ ├── ChunkingService.cs [Text Splitting]
│ ├── ArticleService.cs [Content Extraction]
│ ├── RateLimiter.cs [Concurrency Control]
│ └── StatusReporter.cs [Progress Display]
├── Models/
│ ├── OpenQueryOptions.cs [CLI Options Record]
│ ├── Chunk.cs [Content + Metadata]
│ ├── ParallelOptions.cs [Concurrency Settings]
│ ├── OpenRouter.cs [API DTOs]
│ ├── Searxng.cs [Search Result DTOs]
│ └── JsonContexts.cs [JSON Context]
└── ConfigManager.cs [Configuration Persistence]
```
## Core Components
### 1. Program.cs
**Type**: Console Application Entry Point
**Responsibilities**: CLI parsing, dependency wiring, error handling
**Key Elements**:
- `RootCommand` from System.CommandLine
- Options: `--chunks`, `--results`, `--queries`, `--short`, `--long`, `--verbose`
- Subcommand: `configure` (with interactive mode)
- Configuration loading via `ConfigManager.Load()`
- Environment variable resolution
- Service instantiation and coordination
- Top-level try-catch for error reporting
**Code Flow**:
1. Load config file
2. Define CLI options and commands
3. Set handler for root command
4. Handler: resolve API key/model → instantiate services → call `OpenQueryApp.RunAsync()`
5. Set handler for configure command (writes config file)
6. Invoke command parser: `await rootCommand.InvokeAsync(args)`
**Exit Codes**:
- 0 = success
- 1 = error
### 2. OpenQueryApp (OpenQuery.cs)
**Type**: Main Application Class
**Responsibilities**: Workflow orchestration, query generation, answer streaming
**Constructor Parameters**:
- `OpenRouterClient client` - for query gen and final answer
- `SearchTool searchTool` - for search-retrieve-rank pipeline
- `string model` - LLM model identifier
**Main Method**: `RunAsync(OpenQueryOptions options)`
**Workflow Steps**:
1. Create `StatusReporter` (for progress UI)
2. **Optional Query Generation** (if `options.Queries > 1`):
- Create system message instructing JSON array output
- Create user message with `options.Question`
- Call `client.CompleteAsync()` with query gen model
- Parse JSON response; fall back to original question on failure
- Result: `List<string> queries` (1 or many)
3. **Execute Search Pipeline**:
- Call `_searchTool.ExecuteAsync()` with queries, options
- Receive `string context` (formatted context with source citations)
- Progress reported via callback to `StatusReporter`
4. **Generate Final Answer**:
- Build system prompt (append "short" or "long" modifier)
- Create user message with `Context:\n{context}\n\nQuestion: {options.Question}`
- Stream answer via `client.StreamAsync()`
- Write each `chunk.TextDelta` to Console as it arrives
- Stop spinner on first chunk, continue streaming
5. Dispose reporter
**Error Handling**:
- Exceptions propagate to `Program.cs` top-level handler
- `HttpRequestException` vs generic `Exception`
**Note**: Query generation uses the same model as final answer; could be separated for cost/performance.
### 3. SearchTool (Tools/SearchTool.cs)
**Type**: Pipeline Orchestrator
**Responsibilities**: Execute 4-phase search-retrieve-rank-return workflow
**Constructor Parameters**:
- `SearxngClient searxngClient`
- `EmbeddingService embeddingService`
**Main Method**: `ExecuteAsync(originalQuery, generatedQueries, maxResults, topChunksLimit, onProgress, verbose)`
**Returns**: `Task<string>` - formatted context string with source citations
**Pipeline Phases**:
#### Phase 1: ExecuteParallelSearchesAsync
- Parallelize `searxngClient.SearchAsync(query, maxResults)` for each query
- Collect all results in `ConcurrentBag<SearxngResult>`
- Deduplicate by `DistinctBy(r => r.Url)`
**Output**: `List<SearxngResult>` (aggregated, unique)
#### Phase 2: ExecuteParallelArticleFetchingAsync
- Semaphore: `MaxConcurrentArticleFetches` (default 10)
- For each `SearxngResult`: fetch URL via `ArticleService.FetchArticleAsync()`
- Extract article text, title
- Chunk via `ChunkingService.ChunkText(article.TextContent)`
- Add each chunk as new `Chunk(content, url, title)`
**Output**: `List<Chunk>` (potentially 50-100 chunks)
#### Phase 3: ExecuteParallelEmbeddingsAsync
- Start two parallel tasks:
1. Query embedding: `embeddingService.GetEmbeddingAsync(originalQuery)`
2. Chunk embeddings: `embeddingService.GetEmbeddingsWithRateLimitAsync(chunkTexts, onProgress)`
- `Parallel.ForEachAsync` with `MaxConcurrentEmbeddingRequests` (default 4)
- Batch size: 300 chunks per embedding API call
- Filter chunks with empty embeddings (failed batches)
**Output**: `(float[] queryEmbedding, float[][] chunkEmbeddings)`
#### Phase 4: RankAndSelectTopChunks
- Calculate cosine similarity for each chunk vs query
- Assign `chunk.Score`
- Order by descending score
- Take `topChunksLimit` (from `--chunks` option)
- Return `List<Chunk>` (top N)
**Formatting**:
```csharp
string context = string.Join("\n\n", topChunks.Select((c, i) =>
$"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
```
**Progress Callbacks**: Invoked at each major step for UI feedback
## Services
### OpenRouterClient
**Purpose**: HTTP client for OpenRouter API (chat completions + embeddings)
**Base URL**: `https://openrouter.ai/api/v1`
**Authentication**: `Authorization: Bearer {apiKey}`
**Methods**:
#### `StreamAsync(ChatCompletionRequest request, CancellationToken)`
- Sets `request.Stream = true`
- POST to `/chat/completions`
- Reads SSE stream line-by-line
- Parses `data: {json}` chunks
- Yields `StreamChunk` (text delta or tool call)
- Supports cancellation
#### `CompleteAsync(ChatCompletionRequest request)`
- Sets `request.Stream = false`
- POST to `/chat/completions`
- Deserializes full response
- Returns `ChatCompletionResponse`
#### `EmbedAsync(string model, List<string> inputs)`
- POST to `/embeddings`
- Returns `float[][]` (ordered by input index)
**Error Handling**: `EnsureSuccessStatusCode()` throws `HttpRequestException` on failure
**Design**: Thin wrapper; no retry logic (delegated to EmbeddingService)
### SearxngClient
**Purpose**: HTTP client for SearxNG metasearch
**Base URL**: Configurable (default `http://localhost:8002`)
**Methods**:
#### `SearchAsync(string query, int limit = 10)`
- GET `{baseUrl}/search?q={query}&format=json`
- Deserializes to `SearxngRoot`
- Returns `Results.Take(limit).ToList()`
- On failure: returns empty `List<SearxngResult>` (no exception)
**Design**: Very simple; failures are tolerated (OpenQuery continues with other queries)
### EmbeddingService
**Purpose**: Batch embedding generation with rate limiting, parallelization, and retries
**Configuration** (from `ParallelProcessingOptions`):
- `MaxConcurrentEmbeddingRequests` = 4
- `EmbeddingBatchSize` = 300
**Default Embedding Model**: `openai/text-embedding-3-small`
**Methods**:
#### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
- Splits `texts` into batches of `EmbeddingBatchSize`
- Parallelizes batches with `Parallel.ForEachAsync` + `MaxConcurrentEmbeddingRequests`
- Each batch: rate-limited + retry-wrapped `client.EmbedAsync(model, batch)`
- Collects results in order (by batch index)
- Returns `float[][]` (same order as input texts)
- Failed batches return empty `float[]` for each text
#### `GetEmbeddingAsync(string text, CancellationToken)`
- Wraps single-text call in rate limiter + retry
- Returns `float[]`
#### `CosineSimilarity(float[] v1, float[] v2)`
- Static method using `TensorPrimitives.CosineSimilarity`
- Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
**Retry Policy** (Polly):
- Max 3 attempts
- 1s base delay, exponential backoff
- Only `HttpRequestException`
**Rate Limiting**: `RateLimiter` semaphore with `MaxConcurrentEmbeddingRequests`
**Design Notes**:
- Two similar methods (`GetEmbeddingsAsync` and `GetEmbeddingsWithRateLimitAsync`) - could be consolidated
- Uses Polly for resilience (good pattern)
- Concurrency control prevents overwhelming OpenRouter
### ChunkingService
**Purpose**: Split long text into manageable pieces
**Static Class** (no dependencies, pure function)
**Algorithm** (in `ChunkText(string text)`):
- Constant `MAX_CHUNK_SIZE = 500`
- While remaining text:
- Take up to 500 chars
- If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
- Trim and add non-empty chunk
- Advance start position
**Rationale**: 500 chars is a sweet spot for embeddings - long enough for context, short enough for semantic coherence.
**Edge Cases**: Handles text shorter than 500 chars, empty text, text with no natural breaks.
### ArticleService
**Purpose**: Extract clean article content from URLs
**Method**: `FetchArticleAsync(string url)`
**Implementation**: Delegates to `SmartReader.ParseArticleAsync(url)`
**Returns**: `Article` object (from SmartReader)
- `Title` (string)
- `TextContent` (string) - cleaned article body
- `IsReadable` (bool) - quality indicator
- Other metadata (author, date, etc.)
**Error Handling**: Exceptions propagate (handled by `SearchTool`)
**Design**: Thin wrapper around third-party library. Could be extended to add caching, custom extraction rules, etc.
### RateLimiter
**Purpose**: Limit concurrent operations via semaphore
**Interface**:
```csharp
public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken);
public async Task ExecuteAsync(Func<Task> action, CancellationToken);
```
**Implementation**: `SemaphoreSlim` with `WaitAsync` and `Release`
**Disposal**: `IAsyncDisposable` (awaits semaphore disposal)
**Usage**: Wrap API calls that need concurrency control
```csharp
var result = await _rateLimiter.ExecuteAsync(async () =>
await _client.EmbedAsync(model, batch), cancellationToken);
```
**Design**: Simple, reusable. Could be replaced with `Polly.RateLimiting` policy but this is lightweight.
### StatusReporter
**Purpose**: Real-time progress UI with spinner and verbose modes
**Architecture**:
- Producer: UpdateStatus(text) → writes to `Channel<string>`
- Consumer: Background task `ProcessStatusUpdatesAsync()` reads from channel
- Spinner: Separate task animates Braille characters every 100ms
**Modes**:
**Verbose Mode** (`_verbose = true`):
- All progress messages written as `Console.WriteLine()`
- No spinner
- Full audit trail
**Compact Mode** (default):
- Status line with spinner (overwrites same line)
- Only latest status visible
- Example: `⠋ Fetching articles 3/10...`
**Key Methods**:
- `UpdateStatus(message)` - fire-and-forget, non-blocking
- `WriteLine(text)` - stops spinner temporarily, writes full line
- `StartSpinner()` / `StopSpinner()` - manual control
- `ClearStatus()` - ANSI escape `\r\x1b[K` to clear line
- `Dispose()` - completes channel, waits for background tasks
**Spinner Chars**: `['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']` (Braille patterns, smooth animation)
**ANSI Codes**: `\r` (carriage return), `\x1b[K` (erase to end of line)
**Thread Safety**: Channel is thread-safe; multiple components can write concurrently without locks
**Design**: Well-encapsulated; could be reused in other CLI projects.
### ConfigManager
**Purpose**: Load/save configuration from XDG-compliant location
**Config Path**:
- `Environment.SpecialFolder.UserProfile``~/.config/openquery/config`
**Schema** (`AppConfig`):
```csharp
public class AppConfig
{
public string ApiKey { get; set; } = "";
public string Model { get; set; } = "qwen/qwen3.5-flash-02-23";
public int DefaultQueries { get; set; } = 3;
public int DefaultChunks { get; set; } = 3;
public int DefaultResults { get; set; } = 5;
}
```
**Format**: Simple `key=value` (no INI parser, manual line split)
**Methods**:
- `Load()` → reads file if exists, returns `AppConfig` (with defaults)
- `Save(AppConfig)` → writes all 5 keys, overwrites existing
**Design**:
- Static class (no instances)
- Creates directory if missing
- No validation (writes whatever values given)
- Could be improved with JSON format (but keep simple)
## Data Models
### OpenQueryOptions
**Location**: `Models/OpenQueryOptions.cs`
**Type**: `record`
**Purpose**: Immutable options object passed through workflow
**Properties**:
- `int Chunks` - top N chunks for context
- `int Results` - search results per query
- `int Queries` - number of expanded queries to generate
- `bool Short` - concise answer flag
- `bool Long` - detailed answer flag
- `bool Verbose` - verbose logging flag
- `string Question` - original user question
**Created**: In `Program.cs` from CLI options + config defaults
**Used By**: `OpenQueryApp.RunAsync()`
### Chunk
**Location**: `Models/Chunk.cs`
**Type**: `record`
**Purpose**: Content chunk with metadata and embedding
**Properties**:
- `string Content` - extracted text (~500 chars)
- `string SourceUrl` - article URL
- `string? Title` - article title (nullable)
- `float[]? Embedding` - vector embedding (populated by EmbeddingService)
- `float Score` - relevance score (populated during ranking)
**Lifecycle**:
1. Instantiated in `SearchTool.ExecuteParallelArticleFetchingAsync` with content, url, title
2. `Embedding` set in `ExecuteParallelEmbeddingsAsync` after batch processing
3. `Score` set in `RankAndSelectTopChunks` after cosine similarity
4. Serialized into context string for final answer
**Equality**: Records provide value equality (based on all properties)
### ParallelProcessingOptions
**Location**: `Models/ParallelOptions.cs`
**Type**: `class` (mutable)
**Purpose**: Concurrency settings for parallel operations
**Properties** (with defaults):
- `MaxConcurrentArticleFetches` = 10
- `MaxConcurrentEmbeddingRequests` = 4
- `EmbeddingBatchSize` = 300
**Used By**: `EmbeddingService` (for embeddings), `SearchTool` (for article fetching)
**Currently**: Hardcoded in `SearchTool` constructor; could be made configurable
### OpenRouter Models (Models/OpenRouter.cs)
**Purpose**: DTOs for OpenRouter API (JSON serializable)
**Chat Completion**:
- `ChatCompletionRequest` (model, messages, tools, stream)
- `ChatCompletionResponse` (choices[], usage[])
- `Message` (role, content, tool_calls, tool_call_id)
- `ToolDefinition`, `ToolFunction`, `ToolCall`, `FunctionCall`
- `Choice`, `Usage`
**Embedding**:
- `EmbeddingRequest` (model, input[])
- `EmbeddingResponse` (data[], usage)
- `EmbeddingData` (embedding[], index)
**Streaming**:
- `StreamChunk` (TextDelta, Tool)
- `ChatCompletionChunk`, `ChunkChoice`, `ChunkDelta`
**JSON Properties**: Uses `[JsonPropertyName]` to match API
**Serialization**: System.Text.Json with source generation (AppJsonContext)
### Searxng Models (Models/Searxng.cs)
**Purpose**: DTOs for SearxNG search results
**Records**:
- `SearxngRoot` with `List<SearxngResult> Results`
- `SearxngResult` with `Title`, `Url`, `Content` (snippet)
**Usage**: Deserialized from SearxNG's JSON response
### JsonContexts
**Location**: `Models/JsonContexts.cs`
**Purpose**: Source-generated JSON serializer context for AOT compatibility
**Pattern**:
```csharp
[JsonSerializable(typeof(ChatCompletionRequest))]
[JsonSerializable(typeof(ChatCompletionResponse))]
... etc ...
internal partial class AppJsonContext : JsonSerializerContext
{
}
```
**Generated**: Partial class compiled by source generator
**Used By**: All `JsonSerializer.Serialize/Deserialize` calls with `AppJsonContext.Default.{Type}`
**Benefits**:
- AOT-compatible (no reflection)
- Faster serialization (compiled delegates)
- Smaller binary (trimming-safe)
## Component Interactions
### Dependencies Graph
```
Program.cs
├── ConfigManager (load/save)
├── OpenRouterClient ──┐
├── SearxngClient ─────┤
├── EmbeddingService ──┤
└── SearchTool ────────┤
OpenQueryApp ◄──────────┘
├── OpenRouterClient (query gen + answer streaming)
├── SearchTool (pipeline)
│ ├── SearxngClient (searches)
│ ├── ArticleService (fetch)
│ ├── ChunkingService (split)
│ ├── EmbeddingService (embeddings)
│ ├── RateLimiter (concurrency)
│ └── StatusReporter (progress via callback)
└── StatusReporter (UI)
```
### Data Flow Between Components
```
OpenQueryOptions
OpenQueryApp
├─ Query Generation
│ └─ OpenRouterClient.CompleteAsync()
│ → List<string> generatedQueries
├─ Search Pipeline
│ └─ SearchTool.ExecuteAsync(originalQuery, generatedQueries, ...)
│ ↓
│ Phase 1: SearxngClient.SearchAsync(query) × N
│ → ConcurrentBag<SearxngResult>
│ → List<SearxngResult> (unique)
│ ↓
│ Phase 2: ArticleService.FetchArticleAsync(url) × M
│ → ChunkingService.ChunkText(article.TextContent)
│ → ConcurrentBag<Chunk> (content, url, title)
│ ↓
│ Phase 3: EmbeddingService.GetEmbeddingsAsync(chunkContents)
│ → (queryEmbedding, chunkEmbeddings)
│ ↓
│ Phase 4: CosineSimilarity + Rank
│ → List<Chunk> topChunks (with Score, Embedding set)
│ ↓
│ Format: context string with [Source N: Title](Url)
│ → return context string
└─ Final Answer
└─ OpenRouterClient.StreamAsync(prompt with context)
→ stream deltas to Console
```
### Interface Contracts
**SearchTool → Progress**:
```csharp
// Invoked as: onProgress?.Invoke("[Fetching article 1/10: example.com]")
Action<string>? onProgress
```
**StatusReporter ← Progress**:
```csharp
// Handler in OpenQueryApp:
(progress) => {
if (options.Verbose) reporter.WriteLine(progress);
else reporter.UpdateStatus(parsedShorterMessage);
}
```
**SearchTool → ArticleService**:
```csharp
Article article = await ArticleService.FetchArticleAsync(url);
```
**SearchTool → EmbeddingService**:
```csharp
(float[] queryEmbedding, float[][] chunkEmbeddings) = await ExecuteParallelEmbeddingsAsync(...);
// Also: embeddingService.GetEmbeddingAsync(text), GetEmbeddingsWithRateLimitAsync(...)
```
**SearchTool → ChunkingService**:
```csharp
List<string> chunks = ChunkingService.ChunkText(article.TextContent);
```
**SearchTool → RateLimiter**:
```csharp
await _rateLimiter.ExecuteAsync(async () => await _client.EmbedAsync(...), ct);
```
---
## Next Steps
- [OpenQueryApp](openquery-app.md) - Main orchestrator details
- [SearchTool](search-tool.md) - Pipeline implementation
- [Services](services.md) - All service classes documented
- [Models](models.md) - Complete data model reference

View File

@@ -0,0 +1,555 @@
# SearchTool Component
Deep dive into `SearchTool` - the core pipeline orchestrator that implements the 4-phase search-retrieve-rank workflow.
## Overview
`SearchTool` is the workhorse of OpenQuery. It Takes search queries, fetches articles, generates embeddings, ranks by relevance, and returns formatted context for the final AI answer.
## Location
`Tools/SearchTool.cs`
## Class Definition
```csharp
public class SearchTool
{
private readonly SearxngClient _searxngClient;
private readonly EmbeddingService _embeddingService;
private readonly ParallelProcessingOptions _options;
public static string Name => "search";
public static string Description => "Search the web for information on a topic";
public SearchTool(
SearxngClient searxngClient,
EmbeddingService embeddingService);
public Task<string> ExecuteAsync(
string originalQuery,
List<string> generatedQueries,
int maxResults,
int topChunksLimit,
Action<string>? onProgress = null,
bool verbose = true);
}
```
**Dependencies**:
- `SearxngClient` - for web searches
- `EmbeddingService` - for vector generation
- `ParallelProcessingOptions` - concurrency settings (hardcoded new instance)
**Static Properties**:
- `Name` - tool identifier (currently "search")
- `Description` - tool description
## ExecuteAsync Method
**Signature**:
```csharp
public async Task<string> ExecuteAsync(
string originalQuery, // User's original question
List<string> generatedQueries, // Expanded search queries
int maxResults, // Results per query
int topChunksLimit, // Top N chunks to return
Action<string>? onProgress, // Progress callback
bool verbose) // Verbose mode flag
```
**Returns**: `Task<string>` - formatted context with source citations
**Contract**:
- Never returns `null` (returns "No search results found." on zero results)
- Progress callback may be invoked frequently (many phases)
- `verbose` passed to sub-components for their own logging
## The 4-Phase Pipeline
```
ExecuteAsync()
├─ Phase 1: ExecuteParallelSearchesAsync
│ Input: generatedQueries × maxResults
│ Output: List<SearxngResult> (deduplicated)
├─ Phase 2: ExecuteParallelArticleFetchingAsync
│ Input: List<SearxngResult>
│ Output: List<Chunk> (with content, url, title)
├─ Phase 3: ExecuteParallelEmbeddingsAsync
│ Input: originalQuery + List<Chunk>
│ Output: (queryEmbedding, chunkEmbeddings)
│ (also sets Chunk.Embedding for valid chunks)
├─ Phase 4: RankAndSelectTopChunks
│ Input: List<Chunk> + queryEmbedding + chunkEmbeddings
│ Output: List<Chunk> topChunks (with Score set)
└─ Format Context → return string
```
### Phase 1: ExecuteParallelSearchesAsync
**Purpose**: Execute all search queries in parallel, collect and deduplicate results.
**Implementation**:
```csharp
var allResults = new ConcurrentBag<SearxngResult>();
var searchTasks = generatedQueries.Select(async query =>
{
onProgress?.Invoke($"[Searching web for '{query}'...]");
try
{
var results = await _searsult in results)
{
allResults.Add(result);
}
}
catch (Exception ex)
{
if (verbose)
Console.WriteLine($"Warning: Search failed for query '{query}': {ex.Message}");
}
});
await Task.WhenAll(searchTasks);
var uniqueResults = allResults.DistinctBy(r => r.Url).ToList();
return uniqueResults;
```
**Details**:
- `ConcurrentBag<SearxngResult>` collects results thread-safely
- `Task.WhenAll` - unbounded parallelism (parallel to `generatedQueries.Count`)
- Each task: calls `_searxngClient.SearchAsync(query, maxResults)`
- Errors caught and logged (verbose only); other queries continue
- `DistinctBy(r => r.Url)` removes duplicates
**Return**: `List<SearxngResult>` (unique URLs only)
**Progress**: `[Searching web for '{query}'...]`
**Potential Issues**:
- Could overwhelm local SearxNG if `generatedQueries` is large (100+)
- SearxNG itself may have its own rate limiting
**Future Enhancement**:
- Add semaphore to limit search concurrency
- Add timeout per search task
- Cache search results (same query across runs)
### Phase 2: ExecuteParallelArticleFetchingAsync
**Purpose**: Fetch each search result URL, extract article content, split into chunks.
**Implementation**:
```csharp
var chunks = new ConcurrentBag<Chunk>();
var completedFetches = 0;
var totalFetches = searchResults.Count;
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
var fetchTasks = searchResults.Select(async result =>
{
await semaphore.WaitAsync();
try
{
var current = Interlocked.Increment(ref completedFetches);
var uri = new Uri(result.Url);
var domain = uri.Host;
onProgress?.Invoke($"[Fetching article {current}/{totalFetches}: {domain}]");
try
{
var article = await ArticleService.FetchArticleAsync(result.Url);
if (!article.IsReadable || string.IsNullOrEmpty(article.TextContent))
return;
var textChunks = ChunkingService.ChunkText(article.TextContent);
foreach (var chunkText in textChunks)
{
chunks.Add(new Chunk(chunkText, result.Url, article.Title));
}
}
catch (Exception ex)
{
if (verbose)
Console.WriteLine($"Warning: Failed to fetch article {result.Url}: {ex.Message}");
}
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(fetchTasks);
return chunks.ToList();
```
**Details**:
- `SemaphoreSlim` limits concurrency to `MaxConcurrentArticleFetches` (10)
- `Interlocked.Increment` for thread-safe progress counting
- Progress: `[Fetching article X/Y: domain]` (extracts host from URL)
- `ArticleService.FetchArticleAsync` uses SmartReader
- Article must be `IsReadable` and have `TextContent`
- `ChunkingService.ChunkText` splits into ~500-char pieces
- Each chunk becomes a `Chunk(content, url, article.Title)`
- Errors logged (verbose only); failed URLs yield no chunks
**Return**: `List<Chunk>` (potentially many per article)
**Chunk Count Estimate**:
- 15 articles × average 3000 chars/article = 45,000 chars
- With 500-char chunks ≈ 90 chunks
- With natural breaks → maybe 70-80 chunks
**Potential Issues**:
- Some sites block SmartReader (JS-heavy, paywalls)
- Slow article fetches may cause long tail latency
- Large articles create many chunks → memory + embedding cost
**Future Enhancements**:
- Add per-URL timeout
- Filter chunks by length threshold (skip tiny chunks)
- Deduplicate chunks across articles (same content on different sites)
- Cache article fetches by URL
### Phase 3: ExecuteParallelEmbeddingsAsync
**Purpose**: Generate embeddings for the original query and all chunks, with batching, rate limiting, and concurrency control.
**Implementation**:
```csharp
onProgress?.Invoke($"[Generating embeddings for {chunks.Count} chunks and query...]");
// Start query embedding (single) and chunk embeddings (batch) concurrently
var queryEmbeddingTask = _embeddingService.GetEmbeddingAsync(originalQuery);
var chunkTexts = chunks.Select(c => c.Embedding).ToList(); // WRONG in original code?
// Actually: chunks.Select(c => c.Content).ToList();
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
chunkTexts, onProgress);
await Task.WhenAll(queryEmbeddingTask, chunkEmbeddingsTask);
var queryEmbedding = await queryEmbeddingTask;
var chunkEmbeddings = await chunkEmbeddingsTask;
// Filter out chunks with empty embeddings
var validChunks = new List<Chunk>();
var validEmbeddings = new List<float[]>();
for (var i = 0; i < chunks.Count; i++)
{
if (chunkEmbeddings[i].Length > 0)
{
validChunks.Add(chunks[i]);
validEmbeddings.Add(chunkEmbeddings[i]);
}
}
// Update chunks with embeddings
for (var i = 0; i < validChunks.Count; i++)
{
validChunks[i].Embedding = validEmbeddings[i];
}
return (queryEmbedding, validEmbeddings.ToArray());
```
**Corrected Code** (matching actual source):
```csharp
var chunkTexts = chunks.Select(c => c.Content).ToList();
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
chunkTexts, onProgress);
```
**Details**:
- **Query embedding**: Single request for original question (one embedding)
- **Chunk embeddings**: Batch processing of all chunk texts
- Both run concurrently via `Task.WhenAll`
- `_embeddingService.GetEmbeddingsWithRateLimitAsync` uses:
- Batch size: 300 (default)
- Max concurrent batches: 4 (default)
- Polly retry (3 attempts, exponential backoff)
- `RateLimiter` (semaphore) for API concurrency
- Failed batches return empty `float[]` (length 0)
- Filters out failed chunks (won't be ranked)
- `validChunks[i].Embedding = validEmbeddings[i]` attaches embedding to chunk
**Return**: `(float[] queryEmbedding, float[][] chunkEmbeddings)` where:
- `chunkEmbeddings` length = `validChunks.Count` (filtered)
- Order matches `validChunks` order (since we filtered parallel arrays)
**Progress**: Interleaved from embedding service's own progress callbacks (batch X/Y)
**Potential Issues**:
- `GetEmbeddingsWithRateLimitAsync` uses `results[batchIndex] = ...` which is not thread-safe without synchronization - **BUG**?
- Actually `results` is an array, not a list, so indexing is thread-safe
- But concurrent writes to different indices are safe
- Filtering loop assumes `chunkEmbeddings` has same count as `chunks`; if embedding service returns fewer, might index out of range
- Looking at `GetEmbeddingsWithRateLimitAsync`: returns `results.SelectMany(r => r).ToArray()` which should match input count (including empty arrays for failed batches)
- So safe
**Memory Consideration**:
- `chunkTexts` list holds all chunk strings (may be large, but still in memory)
- `chunkEmbeddings` holds all float arrays (600KB for 100 chunks)
- Total: modest (~few MB)
**Future Enhancements**:
- Stream embeddings? (No benefit, need all for ranking)
- Cache embeddings by content hash (cross-run)
- Support different embedding model per query
### Phase 4: RankAndSelectTopChunks
**Purpose**: Score chunks by semantic relevance to query, sort, and select top N.
**Implementation**:
```csharp
var chunksWithEmbeddings = chunks.Where(c => c.Embedding != null).ToList();
foreach (var chunk in chunksWithEmbeddings)
{
chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
}
var topChunks = chunksWithEmbeddings
.OrderByDescending(c => c.Score)
.Take(topChunksLimit)
.ToList();
return topChunks;
```
**Details**:
- Filters to chunks that have embeddings (successful phase 3)
- For each: `Score = CosineSimilarity(queryEmbedding, chunkEmbedding)`
- Uses `TensorPrimitives.CosineSimilarity` (SIMD-accelerated)
- Returns float typically 0-1 (higher = more relevant)
- `OrderByDescending` - highest scores first
- `Take(topChunksLimit)` - select top N (from `--chunks` option)
- Returns `List<Chunk>` (now with `Score` set)
**Return**: Top N chunks ready for context formatting
**Complexity**:
- O(n) for scoring (where n = valid chunks, typically 50-100)
- O(n log n) for sorting (fast for n=100)
- Negligible CPU time
**Edge Cases**:
- If `topChunksLimit` > `chunksWithEmbeddings.Count`, returns all (no padding)
- If all embeddings failed, returns empty list
- Should handle `topChunksLimit == 0` (returns empty)
### Context Formatting (After Phase 4)
**Location**: In `ExecuteAsync`, after ranking:
```csharp
var context = string.Join("\n\n", topChunks.Select((c, i) =>
$"[Source {i + 1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
return context;
```
**Format**:
```
[Source 1: Article Title](https://example.com/article)
Chunk content text...
[Source 2: Another Title](https://example.com/another)
Chunk content text...
[Source 3: Third Title](https://example.com/third)
Chunk content text...
```
**Features**:
- Each source numbered 1, 2, 3... (matches order of topChunks = descending relevance)
- Title or "Unknown" if null
- Title is markdown link to original URL
- Chunk content as plain text (may contain its own formatting)
- Double newline between sources
**Rationale**:
- Markdown links allow copy-pasting to browsers
- Numbers allow LLM to cite `[Source 1]` in answer
- Original title helps user recognize source
**Potential Issues**:
- LLM might misinterpret "Source 1" as literal citation required
- If chunks contain markdown, may conflict (no escaping)
- Some titles may have markdown special chars (unlikely but possible)
**Alternative**: Could use XML-style tags or more robust citation format.
## Error Handling & Edge Cases
### Empty Results Handling
At end of `ExecuteAsync`:
```csharp
if (searchResults.Count == 0)
return "No search results found.";
if (chunks.Count == 0)
return "Found search results but could not extract readable content.";
```
These messages appear in final answer (LLM will respond to these contexts).
### Partial Failures
- Some search queries fail → proceed with others
- Some articles fail to fetch → continue
- Some embedding batches fail → those chunks filtered out
- Ranking proceeds with whatever valid embeddings exist
### Verbose vs Compact Progress
`verbose` parameter affects what's passed to phases:
- **Article fetching**: errors only shown if `verbose`
- **Embeddings**: always shows batch progress via `onProgress` (from EmbeddingService)
- **Searches**: no error suppression (warning always logged to Console, not through callback)
### Progress Callback Pattern
`onProgress` is invoked at major milestones:
- Searching: `[Searching web for '{query}'...]`
- Article fetch: `[Fetching article X/Y: domain]`
- Embeddings: `[Generating embeddings: batch X/Y]`
- Final: `[Found top X most relevant chunks overall. Generating answer...]`
Each phase may invoke many times (e.g., embedding batches). `StatusReporter` handles these appropriately.
## Performance Characteristics
### Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles):
| Phase | Time | Dominated By |
|-------|------|--------------|
| Searches | 3-8s | Network latency to SearxNG |
| Article Fetching | 5-15s | Network + SmartReader CPU |
| Embeddings | 2-4s | OpenRouter API latency (4 concurrent batches) |
| Ranking | <0.1s | CPU (O(n log n) sort, n~100) |
| **Total Pipeline** | **10-30s** | Articles + Searches |
### Concurrency Limits Effect
**Article Fetching** (`MaxConcurrentArticleFetches` = 10):
- 15 articles → 2 waves (10 then 5)
- If each takes 2s → ~4s total (vs 30s sequential)
**Embedding Batching** (`MaxConcurrentEmbeddingRequests` = 4, `EmbeddingBatchSize` = 300):
- 80 chunks → 1 batch of 300 (all fit)
- If 300 chunks → 1 batch (300 fits), but max concurrent = 4 if multiple embedding calls
- Here: single embedding call with 80 items = 1 batch (no parallelism needed)
### Memory Usage
- `searchResults` (15 items) → ~30KB
- `chunks` (80 items × 500 chars) → ~40KB text + embeddings ~400KB (80 × 1536 × 4)
- Total ≈ 500KB excluding temporary HTTP buffers
## Design Decisions
### Why Use ConcurrentBag for Results/Chunks?
Thread-safe collection allows parallel tasks to add without locks. Order is not preserved (but we `DistinctBy` and `Select` maintains order of insertion? Actually no, `ConcurrentBag` doesn't guarantee order. But we later `ToList()` and `DistinctBy` preserves first occurrence order from the bag's enumeration (which is nondeterministic). This is acceptable because order doesn't matter (ranking is semantic). If order mattered, would need `ConcurrentQueue` or sorting by source.
### Why Not Use Parallel.ForEach for Article Fetching?
We use `Task.WhenAll` with `Select` + semaphore. `Parallel.ForEachAsync` could also work but requires .NET 6+ and we want to use same pattern as other phases. Semaphore gives explicit concurrency control.
### Why Separate Query Embedding from Chunk Embeddings?
`GetEmbeddingAsync` is called directly (not batched) because there's only one query. Could be batched with chunks but:
- Query is small (single string)
- Batch API has overhead (request structure)
- Separate call allows independent completion (no need to wait for chunks to start query embedding)
### Why Two Different Embedding Methods?
`EmbeddingService` has:
- `GetEmbeddingsWithRateLimitAsync` (used in SearchTool)
- `GetEmbeddingsAsync` (similar but different implementation)
Probably legacy/refactor artifact. Could consolidate.
### Why Not Deduplicate URLs Earlier?
Deduplication happens after search aggregation. Could also deduplicate within each search result (SearxNG might already dedupe across engines). But global dedupe is necessary.
### Why Not Early Filtering (e.g., by domain, length)?
Possibly could improve quality:
- Filter by domain reputation
- Filter articles too short (<200 chars) or too long (>50KB)
- Not implemented (keep simple)
## Testing Considerations
**Unit Testability**: `SearchTool` is fairly testable with mocks:
- Mock `SearxngClient` to return predetermined results
- Mock `ArticleService` via `EmbeddingService` (or mock that too)
- Verify progress callback invocations
- Verify final context format
**Integration Testing**:
- End-to-end with real/mocked external services
- Need test SearxNG instance and test OpenRouter key (or mock responses)
**Performance Testing**:
- Benchmark with different concurrency settings
- Profile memory for large result sets (1000+ articles)
- Measure embedding API latency impact
## Known Issues
### Bug in ExecuteParallelEmbeddingsAsync?
Looking at the actual source code of `ExecuteParallelEmbeddingsAsync` **in the core SearchTool**:
```csharp
var chunkTexts = chunks.Select(c => c.Content).ToList();
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
chunkTexts, onProgress);
```
This is correct.
But in the **initial search result**, I notice there might be confusion. I'll verify this when writing the full component documentation.
### Potential Race Condition in GetEmbeddingsWithRateLimitAsync
```csharp
results[batchIndex] = batchResults;
```
This is writing to an array index from multiple parallel tasks. Array index writes are atomic for reference types on 64-bit? Actually, writes to different indices are safe because they don't overlap. This is fine.
### Progress Callback May Overwhelm
If invoked synchronously from many parallel tasks, could saturate the channel. `Channel.TryWrite` will return false if buffer full; we ignore return value. Could drop messages under heavy load. Acceptable for CLI UI (some messages may be lost but overall progress visible).
## Related Components
- **[OpenQueryApp](openquery-app.md)** - calls this
- **[SearxngClient](../../services/SearxngClient.md)** - phase 1
- **[ArticleService](../../services/ArticleService.md)** - phase 2a
- **[ChunkingService](../../services/ChunkingService.md)** - phase 2b
- **[EmbeddingService](../../services/EmbeddingService.md)** - phase 3
- **[Ranking](../../services/EmbeddingService.md#cosinesimilarity)** - cosine similarity
---
## Next Steps
- [Services Overview](../services/overview.md) - See supporting services
- [CLI Reference](../../api/cli.md) - How users trigger this pipeline
- [Performance](../performance.md) - Optimize pipeline settings

471
docs/components/services.md Normal file
View File

@@ -0,0 +1,471 @@
# Services Overview
Comprehensive reference for all service classes in OpenQuery.
## 📋 Table of Contents
1. [Service Catalog](#service-catalog)
2. [Client Services](#client-services)
3. [Processing Services](#processing-services)
4. [Infrastructure Services](#infrastructure-services)
5. [Service Interactions](#service-interactions)
## Service Catalog
OpenQuery's services are organized into three categories:
| Category | Services | Purpose |
|-----------|----------|---------|
| **Clients** | `OpenRouterClient`, `SearxngClient` | External API communication |
| **Processors** | `EmbeddingService`, `ChunkingService`, `ArticleService` | Data transformation & extraction |
| **Infrastructure** | `RateLimiter`, `StatusReporter` | Cross-cutting concerns |
All services are **stateless** (except for internal configuration) and can be safely reused across multiple operations.
---
## Client Services
### OpenRouterClient
**Location**: `Services/OpenRouterClient.cs`
**Purpose**: HTTP client for OpenRouter AI APIs (chat completions & embeddings)
#### API Endpoints
| Method | Endpoint | Purpose |
|--------|----------|---------|
| POST | `/chat/completions` | Chat completion (streaming or non-streaming) |
| POST | `/embeddings` | Embedding generation for text inputs |
#### Authentication
```
Authorization: Bearer {apiKey}
Accept: application/json
```
#### Public Methods
##### `StreamAsync(ChatCompletionRequest request, CancellationToken cancellationToken)`
- **Returns**: `IAsyncEnumerable<StreamChunk>`
- **Behavior**: Sets `request.Stream = true`, posts, reads Server-Sent Events stream
- **Use Case**: Final answer streaming, real-time responses
- **Stream Format**: SSE lines `data: {json}`; yields `TextDelta` or `ToolCall`
##### `CompleteAsync(ChatCompletionRequest request)`
- **Returns**: `Task<ChatCompletionResponse>`
- **Behavior**: Sets `request.Stream = false`, posts, returns full response
- **Use Case**: Query generation (non-streaming)
##### `EmbedAsync(string model, List<string> inputs)`
- **Returns**: `Task<float[][]>`
- **Behavior**: POST `/embeddings`, returns array of vectors (ordered by input index)
- **Use Case**: Batch embedding generation
##### `HttpClient`
- **Property**: Internal `_httpClient` (created per instance)
- **Note**: Could use `IHttpClientFactory` for pooling (not needed for CLI)
#### Error Handling
- `EnsureSuccessStatusCode()` throws `HttpRequestException` on 4xx/5xx
- No retry logic (handled by `EmbeddingService`)
#### Configuration
```csharp
public OpenRouterClient(string apiKey)
{
_apiKey = apiKey;
_httpClient = new HttpClient();
_httpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
_httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
```
#### Example Usage
```csharp
var client = new OpenRouterClient("sk-or-...");
var request = new ChatCompletionRequest("model", new List<Message> { ... });
await foreach (var chunk in client.StreamAsync(request))
{
Console.Write(chunk.TextDelta);
}
```
---
### SearxngClient
**Location**: `Services/SearxngClient.cs`
**Purpose**: HTTP client for SearxNG metasearch engine
#### API Endpoint
```
GET /search?q={query}&format=json
```
#### Constructor
```csharp
public SearxngClient(string baseUrl) // e.g., "http://localhost:8002"
```
- `baseUrl` trimmed of trailing `/`
#### Public Methods
##### `SearchAsync(string query, int limit = 10)`
- **Returns**: `Task<List<SearxngResult>>`
- **Behavior**: GET request, deserialize JSON, take up to `limit` results
- **On Failure**: Returns empty `List<SearxngResult>` (no exception)
#### Error Handling
- `response.EnsureSuccessStatusCode()` would throw, but code doesn't call it
- If invalid JSON or missing `Results`, returns empty list
- Failures are **tolerated** - individual search queries may fail without aborting whole operation
#### Example Searxng Response
```json
{
"results": [
{
"title": "Quantum Entanglement - Wikipedia",
"url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
"content": "Quantum entanglement is a physical phenomenon..."
},
...
]
}
```
---
## Processing Services
### EmbeddingService
**Location**: `Services/EmbeddingService.cs`
**Purpose**: Generate embeddings with batching, rate limiting, and retry logic
#### Configuration
**Embedding Model**: `openai/text-embedding-3-small` (default, configurable via constructor)
**ParallelProcessingOptions** (hardcoded defaults):
```csharp
public class ParallelProcessingOptions
{
public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
public int EmbeddingBatchSize { get; set; } = 300;
}
```
#### Public Methods
##### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
- **Returns**: `Task<float[][]>`
- **Behavior**:
- Splits `texts` into batches of `EmbeddingBatchSize`
- Parallel executes batches (max `MaxConcurrentEmbeddingRequests` concurrent)
- Each batch: rate-limited, retry-wrapped `client.EmbedAsync(model, batch)`
- Reassembles in original order
- Failed batches → empty `float[]` for each text
- **Progress**: Invokes `onProgress` for each batch: `"[Generating embeddings: batch X/Y]"`
- **Thread-Safe**: Uses lock for collecting results
##### `GetEmbeddingAsync(string text, CancellationToken)`
- **Returns**: `Task<float[]>`
- **Behavior**: Single embedding with rate limiting and retry
- **Use Case**: Query embedding
##### `Cos static float CosineSimilarity(float[] vector1, float[] vector2)
```
Uses `System.Numerics.Tensors.TensorPrimitives.CosineSimilarity`
Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
```
**Implementation**: Single line calling SIMD-accelerated tensor primitive
---
### ArticleService
**Location**: `Services/ArticleService.cs`
**Purpose**: Extract clean article content from web URLs
#### Public Methods
##### `FetchArticleAsync(string url)`
- **Returns**: `Task<Article>`
- **Behavior**: Delegates to `SmartReader.ParseArticleAsync(url)`
- **Result**: `Article` with `Title`, `TextContent`, `IsReadable`, and metadata
#### Errors
- Propagates exceptions (SmartReader may throw on network failures, malformed HTML)
- `SearchTool` catches and logs
#### SmartReader Notes
- Open-source article extraction library (bundled via NuGet)
- Uses Readability algorithm (similar to Firefox Reader View)
- Removes ads, navigation, boilerplate
- `IsReadable` indicates quality (e.g., not a 404 page, not too short)
---
### ChunkingService
**Location**: `Services/ChunkingService.cs`
**Purpose**: Split text into 500-character chunks at natural boundaries
#### Public Methods
##### `ChunkText(string text)`
- **Returns**: `List<string>`
- **Algorithm**:
- Constant `MAX_CHUNK_SIZE = 500`
- While remaining text:
- Take up to 500 chars
- If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
- Trim, add if non-empty
- Advance start
- Returns all chunks
#### Characteristics
- Static class (no instances)
- Pure function (no side effects)
- Zero dependencies
- Handles edge cases (empty text, short text, text without breaks)
---
## Infrastructure Services
### RateLimiter
**Location**: `Services/RateLimiter.cs`
**Purpose**: Limit concurrent operations using semaphore
#### Constructor
```csharp
public RateLimiter(int maxConcurrentRequests)
```
Creates `SemaphoreSlim` with `maxConcurrentRequests`
#### Public Methods
##### `ExecuteAsync<T>(Func<Task<T>> action, CancellationToken)`
```csharp
public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken cancellationToken = default)
{
await _semaphore.WaitAsync(cancellationToken);
try
{
return await action();
}
finally
{
_semaphore.Release();
}
}
```
- Waits for semaphore slot
- Executes `action` (typically an API call)
- Releases semaphore (even if exception)
- Returns result from `action`
##### `ExecuteAsync(Func<Task> action, CancellationToken)`
- Non-generic version (for void-returning actions)
#### Disposal
```csharp
public async ValueTask DisposeAsync()
{
_semaphore.Dispose();
}
```
Implements `IAsyncDisposable` for async cleanup
#### Usage Pattern
```csharp
var result = await _rateLimiter.ExecuteAsync(async () =>
{
return await SomeApiCall();
}, cancellationToken);
```
#### Where Used
- `EmbeddingService`: Limits concurrent embedding batch requests (default 4)
---
### StatusReporter
**Location**: `Services/StatusReporter.cs`
**Purpose**: Real-time progress display with spinner (compact) or verbose lines
#### Constructor
```csharp
public StatusReporter(bool verbose)
```
- `verbose = true`: all progress via `WriteLine()` (no spinner)
- `verbose = false`: spinner with latest status
#### Architecture
**Components**:
- `Channel<string> _statusChannel` - producer-consumer queue
- `Task _statusProcessor` - background task reading from channel
- `CancellationTokenSource _spinnerCts` - spinner task cancellation
- `Task _spinnerTask` - spinner animation task
- `char[] _spinnerChars` - Braille spinner pattern
**Spinner Animation**:
- Runs at 10 FPS (100ms interval)
- Cycles through `['⠋','⠙','⠹','⠸','⠼','⠴','⠦','⠧','⠇','⠏']`
- Displays: `⠋ Fetching articles...`
- Updates in place using ANSI: `\r\x1b[K` (carriage return + erase line)
#### Public Methods
##### `UpdateStatus(string message)`
- Fire-and-forget: writes to channel via `TryWrite` (non-blocking)
- If channel full, message dropped (acceptable loss for UI)
##### `WriteLine(string text)`
- Stops spinner temporarily
- Clears current status line
- Writes `text` with newline
- In verbose mode: just `Console.WriteLine(text)`
##### `ClearStatus()`
- In compact mode: `Console.Write("\r\x1b[K")` (erase line)
- In verbose: no-op
- Sets `_currentMessage = null`
##### `StartSpinner()` / `StopSpinner()`
- Manual control (usually `StartSpinner` constructor call, `StopSpinner` by `Dispose`)
##### `Dispose()`
- Completes channel writer
- Awaits `_statusProcessor` completion
- Calls `StopSpinner()`
#### Background Processing
**Status Processor**:
```csharp
private async Task ProcessStatusUpdatesAsync()
{
await foreach (var message in _statusChannel.Reader.ReadAllAsync())
{
if (_verbose)
{
Console.WriteLine(message);
continue;
}
Console.Write("\r\x1b[K"); // Clear line
Console.Write($"{_spinnerChars[0]} {message}"); // Static spinner
_currentMessage = message;
}
}
```
**Spinner Task**:
```csharp
_spinnerTask = Task.Run(async () =>
{
while (_spinnerCts is { Token.IsCancellationRequested: false })
{
if (_currentMessage != null)
{
Console.Write("\r\x1b[K");
var charIndex = index++ % spinner.Length;
Console.Write($"{spinner[charIndex]} {_currentMessage}");
}
await Task.Delay(100, _spinnerCts.Token);
}
});
```
#### Thread Safety
- `UpdateStatus` (producer) writes to channel
- `ProcessStatusUpdatesAsync` (consumer) reads from channel
- `_spinnerTask` runs concurrently
- All UI writes happen in consumer/spinner task context (single-threaded UI)
#### Design Notes
- Could be simplified: just use `Console.CursorLeft` for spinner, no channel
- Channel allows random `UpdateStatus` calls from any thread without blocking
- Braille spinner requires terminal that supports Unicode (most modern terminals do)
---
## Service Interactions
### Dependency Graph
```
OpenQueryApp
├── OpenRouterClient ← (used for query gen + final answer)
└── SearchTool
├── SearxngClient
├── ArticleService (uses SmartReader)
├── ChunkingService (static)
├── EmbeddingService
│ └── OpenRouterClient (different instance)
│ └── RateLimiter
└── ParallelProcessingOptions (config)
```
### Service Lifetimes
All services are **transient** (new instance per query execution):
- `OpenRouterClient` → 1 instance for query gen + answer
- `SearxngClient` → 1 instance for all searches
- `EmbeddingService` → 1 instance with its own `OpenRouterClient` and `RateLimiter`
- `SearchTool` → 1 instance per query (constructed in `Program.cs`)
No singleton or static state (except static utility classes like `ChunkingService`).
### Data Flow Through Services
```
OpenQueryApp
├─ OpenRouterClient.CompleteAsync() → query generation
│ Messages → JSON → HTTP request → response → JSON → Messages
└─ SearchTool.ExecuteAsync()
├─ SearxngClient.SearchAsync() × N
│ query → URL encode → GET → JSON → SearxngResult[]
├─ ArticleService.FetchArticleAsync() × M
│ URL → HTTP GET → SmartReader → Article
├─ ChunkingService.ChunkText() × M
│ Article.TextContent → List<string> chunks
├─ EmbeddingService.GetEmbeddingAsync(query) + GetEmbeddingsAsync(chunks[])
│ texts → batches → rate-limited HTTP POST → JSON → float[][]
├─ CosineSimilarity(queryEmbedding, chunkEmbedding) × M
│ Vectors → dot product → magnitude → score
└─ return context string (formatted chunks)
```
---
## Next Steps
- **[OpenQueryApp](../components/openquery-app.md)** - Orchestrates services
- **[SearchTool](../components/search-tool.md)** - Coordinates pipeline
- **[Models](../components/models.md)** - Data structures passed between services
- **[API Reference](../../api/cli.md)** - CLI that uses these services
---
**Service Design Principles**:
- Single Responsibility: Each service does one thing well
- Stateless: No instance state beyond constructor args
- Composable: Services depend on abstractions (other services) not implementations
- Testable: Can mock dependencies for unit testing