docs: add comprehensive documentation with README and detailed guides
- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
This commit is contained in:
528
docs/components/models.md
Normal file
528
docs/components/models.md
Normal file
@@ -0,0 +1,528 @@
|
||||
# Models Reference
|
||||
|
||||
Complete reference for all data models, DTOs, and records in OpenQuery.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Core Data Models](#core-data-models)
|
||||
2. [OpenRouter API Models](#openrouter-api-models)
|
||||
3. [SearxNG API Models](#searxng-api-models)
|
||||
4. [JSON Serialization](#json-serialization)
|
||||
5. [Model Relationships](#model-relationships)
|
||||
|
||||
## Core Data Models
|
||||
|
||||
### OpenQueryOptions
|
||||
|
||||
**Location**: `Models/OpenQueryOptions.cs`
|
||||
**Type**: `record`
|
||||
**Purpose**: Immutable options object for a single query execution
|
||||
|
||||
```csharp
|
||||
public record OpenQueryOptions(
|
||||
int Chunks, // Number of top chunks to include in context
|
||||
int Results, // Search results per generated query
|
||||
int Queries, // Number of search queries to generate (if >1)
|
||||
bool Short, // Request concise answer
|
||||
bool Long, // Request detailed answer
|
||||
bool Verbose, // Enable verbose logging
|
||||
string Question // Original user question (required)
|
||||
);
|
||||
```
|
||||
|
||||
**Lifecycle**:
|
||||
- Created in `Program.cs` by combining CLI options, config defaults, and environment variables
|
||||
- Passed to `OpenQueryApp.RunAsync(options)`
|
||||
|
||||
**Validation**: None (assumes valid values from CLI parser/config)
|
||||
|
||||
**Example**:
|
||||
```csharp
|
||||
var options = new OpenQueryOptions(
|
||||
Chunks: 3,
|
||||
Results: 5,
|
||||
Queries: 3,
|
||||
Short: false,
|
||||
Long: false,
|
||||
Verbose: true,
|
||||
Question: "What is quantum entanglement?"
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Chunk
|
||||
|
||||
**Location**: `Models/Chunk.cs`
|
||||
**Type**: `record`
|
||||
**Purpose**: Content chunk with metadata, embedding, and relevance score
|
||||
|
||||
```csharp
|
||||
public record Chunk(
|
||||
string Content, // Text content (typically ~500 chars)
|
||||
string SourceUrl, // Original article URL
|
||||
string? Title = null // Article title (optional, may be null)
|
||||
)
|
||||
{
|
||||
public float[]? Embedding { get; set; } // Vector embedding (1536-dim for text-embedding-3-small)
|
||||
public float Score { get; set; } // Relevance score (0-1, higher = more relevant)
|
||||
}
|
||||
```
|
||||
|
||||
**Lifecycle**:
|
||||
1. **Created** in `SearchTool.ExecuteParallelArticleFetchingAsync`:
|
||||
```csharp
|
||||
chunks.Add(new Chunk(chunkText, result.Url, article.Title));
|
||||
```
|
||||
At this point: `Embedding = null`, `Score = 0`
|
||||
|
||||
2. **Embedded** in `SearchTool.ExecuteParallelEmbeddingsAsync`:
|
||||
```csharp
|
||||
validChunks[i].Embedding = validEmbeddings[i];
|
||||
```
|
||||
|
||||
3. **Scored** in `SearchTool.RankAndSelectTopChunks`:
|
||||
```csharp
|
||||
chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
|
||||
```
|
||||
|
||||
4. **Formatted** into context string:
|
||||
```csharp
|
||||
$"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"
|
||||
```
|
||||
|
||||
**Properties**:
|
||||
- `Content`: Never null/empty (filters empty chunks in `ChunkingService`)
|
||||
- `SourceUrl`: Always provided (from `SearxngResult.Url`)
|
||||
- `Title`: May be null if article extraction failed to get title
|
||||
- `Embedding`: Null until phase 3; may remain null if embedding failed
|
||||
- `Score`: 0 until phase 4; irrelevant for non-embedded chunks
|
||||
|
||||
**Equality**: Records use value equality (all properties compared). Two chunks with same content/url/title are equal; embeddings and scores ignored for equality (as they're mutable).
|
||||
|
||||
---
|
||||
|
||||
### ParallelProcessingOptions
|
||||
|
||||
**Location**: `Models/ParallelOptions.cs`
|
||||
**Type**: `class`
|
||||
**Purpose**: Configuration for parallel/concurrent operations
|
||||
|
||||
```csharp
|
||||
public class ParallelProcessingOptions
|
||||
{
|
||||
public int MaxConcurrentArticleFetches { get; set; } = 10;
|
||||
public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
|
||||
public int EmbeddingBatchSize { get; set; } = 300;
|
||||
}
|
||||
```
|
||||
|
||||
**Usage**:
|
||||
- Instantiated in `SearchTool` constructor (hardcoded new)
|
||||
- Passed to `EmbeddingService` constructor
|
||||
- Read by `SearchTool` for article fetching semaphore
|
||||
|
||||
**Default Values**:
|
||||
| Property | Default | Effect |
|
||||
|----------|---------|--------|
|
||||
| `MaxConcurrentArticleFetches` | 10 | Up to 10 articles fetched simultaneously |
|
||||
| `MaxConcurrentEmbeddingRequests` | 4 | Up to 4 embedding batches in parallel |
|
||||
| `EmbeddingBatchSize` | 300 | Each embedding API call handles up to 300 texts |
|
||||
|
||||
**Current Limitation**: These are **compile-time defaults** (hardcoded in `SearchTool.cs`). To make them configurable:
|
||||
1. Add to `AppConfig`
|
||||
2. Read in `ConfigManager`
|
||||
3. Pass through `SearchTool` constructor
|
||||
|
||||
---
|
||||
|
||||
## OpenRouter API Models
|
||||
|
||||
**Location**: `Models/OpenRouter.cs`
|
||||
**Purpose**: DTOs for OpenRouter's REST API (JSON serialization)
|
||||
|
||||
### Chat Completion
|
||||
|
||||
#### `ChatCompletionRequest`
|
||||
```csharp
|
||||
public record ChatCompletionRequest(
|
||||
[property: JsonPropertyName("model")] string Model,
|
||||
[property: JsonPropertyName("messages")] List<Message> Messages,
|
||||
[property: JsonPropertyName("tools")] List<ToolDefinition>? Tools = null,
|
||||
[property: JsonPropertyName("stream")] bool Stream = false
|
||||
);
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{
|
||||
"model": "qwen/qwen3.5-flash-02-23",
|
||||
"messages": [
|
||||
{ "role": "system", "content": "You are a helpful assistant." },
|
||||
{ "role": "user", "content": "What is 2+2?" }
|
||||
],
|
||||
"stream": true
|
||||
}
|
||||
```
|
||||
|
||||
#### `Message`
|
||||
```csharp
|
||||
public record Message(
|
||||
[property: JsonPropertyName("role")] string Role,
|
||||
[property: JsonPropertyName("content")] string? Content = null,
|
||||
[property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null,
|
||||
[property: JsonPropertyName("tool_call_id")] string? ToolCallId = null
|
||||
)
|
||||
{
|
||||
// Factory method for tool responses
|
||||
public static Message FromTool(string content, string toolCallId) =>
|
||||
new Message("tool", content, null, toolCallId);
|
||||
}
|
||||
```
|
||||
|
||||
**Roles**: `"system"`, `"user"`, `"assistant"`, `"tool"`
|
||||
|
||||
**Usage**:
|
||||
- `Content` for text messages
|
||||
- `ToolCalls` when assistant requests tool use
|
||||
- `ToolCallId` when responding to tool call
|
||||
|
||||
#### `ChatCompletionResponse`
|
||||
```csharp
|
||||
public record ChatCompletionResponse(
|
||||
[property: JsonPropertyName("choices")] List<Choice> Choices,
|
||||
[property: JsonPropertyName("usage")] Usage? Usage = null
|
||||
);
|
||||
|
||||
public record Choice(
|
||||
[property: JsonPropertyName("message")] Message Message,
|
||||
[property: JsonPropertyName("finish_reason")] string? FinishReason = null
|
||||
);
|
||||
```
|
||||
|
||||
**Response Example**:
|
||||
```json
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "Answer text..."
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 100,
|
||||
"completion_tokens": 50,
|
||||
"total_tokens": 150
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `Usage`
|
||||
```csharp
|
||||
public record Usage(
|
||||
[property: JsonPropertyName("prompt_tokens")] int PromptTokens,
|
||||
[property: JsonPropertyName("completion_tokens")] int CompletionTokens,
|
||||
[property: JsonPropertyName("total_tokens")] int TotalTokens
|
||||
);
|
||||
```
|
||||
|
||||
### Tool Calling (Not Currently Used)
|
||||
|
||||
#### `ToolDefinition` / `ToolFunction`
|
||||
```csharp
|
||||
public record ToolDefinition(
|
||||
[property: JsonPropertyName("type")] string Type, // e.g., "function"
|
||||
[property: JsonPropertyName("function")] ToolFunction Function
|
||||
);
|
||||
|
||||
public record ToolFunction(
|
||||
[property: JsonPropertyName("name")] string Name,
|
||||
[property: JsonPropertyName("description")] string Description,
|
||||
[property: JsonPropertyName("parameters")] JsonElement Parameters // JSON Schema
|
||||
);
|
||||
```
|
||||
|
||||
#### `ToolCall` / `FunctionCall`
|
||||
```csharp
|
||||
public record ToolCall(
|
||||
[property: JsonPropertyName("id")] string Id,
|
||||
[property: JsonPropertyName("type")] string Type,
|
||||
[property: JsonPropertyName("function")] FunctionCall Function
|
||||
);
|
||||
|
||||
public record FunctionCall(
|
||||
[property: JsonPropertyName("name")] string Name,
|
||||
[property: JsonPropertyName("arguments")] string Arguments // JSON string
|
||||
);
|
||||
```
|
||||
|
||||
**Note**: OpenQuery doesn't use tools currently, but models are defined for future tool-calling capability.
|
||||
|
||||
### Streaming
|
||||
|
||||
#### `StreamChunk`
|
||||
```csharp
|
||||
public record StreamChunk(
|
||||
string? TextDelta = null,
|
||||
ClientToolCall? Tool = null
|
||||
);
|
||||
```
|
||||
|
||||
Yielded by `OpenRouterClient.StreamAsync()` for each SSE event.
|
||||
|
||||
#### `ChatCompletionChunk` (Server Response)
|
||||
```csharp
|
||||
public record ChatCompletionChunk(
|
||||
[property: JsonPropertyName("choices")] List<ChunkChoice> Choices
|
||||
);
|
||||
|
||||
public record ChunkChoice(
|
||||
[property: JsonPropertyName("delta")] ChunkDelta Delta
|
||||
);
|
||||
|
||||
public record ChunkDelta(
|
||||
[property: JsonPropertyName("content")] string? Content = null,
|
||||
[property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null
|
||||
);
|
||||
```
|
||||
|
||||
**Streaming Response Example** (SSE):
|
||||
```
|
||||
data: {"choices":[{"delta":{"content":"Hello"}}]}
|
||||
data: {"choices":[{"delta":{"content":" world"}}]}
|
||||
data: [DONE]
|
||||
```
|
||||
|
||||
`OpenRouterClient.StreamAsync` parses and yields `StreamChunk` with non-null `TextDelta` for content.
|
||||
|
||||
### Embeddings
|
||||
|
||||
#### `EmbeddingRequest`
|
||||
```csharp
|
||||
public record EmbeddingRequest(
|
||||
[property: JsonPropertyName("model")] string Model,
|
||||
[property: JsonPropertyName("input")] List<string> Input
|
||||
);
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{
|
||||
"model": "openai/text-embedding-3-small",
|
||||
"input": ["text 1", "text 2", ...]
|
||||
}
|
||||
```
|
||||
|
||||
#### `EmbeddingResponse`
|
||||
```csharp
|
||||
public record EmbeddingResponse(
|
||||
[property: JsonPropertyName("data")] List<EmbeddingData> Data,
|
||||
[property: JsonPropertyName("usage")] Usage Usage
|
||||
);
|
||||
|
||||
public record EmbeddingData(
|
||||
[property: JsonPropertyName("embedding")] float[] Embedding,
|
||||
[property: JsonPropertyName("index")] int Index
|
||||
);
|
||||
```
|
||||
|
||||
**Response Example**:
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{ "embedding": [0.1, 0.2, ...], "index": 0 },
|
||||
{ "embedding": [0.3, 0.4, ...], "index": 1 }
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 100,
|
||||
"total_tokens": 100
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note**: `_client.EmbedAsync` orders by `index` to match input order.
|
||||
|
||||
---
|
||||
|
||||
## SearxNG API Models
|
||||
|
||||
**Location**: `Models/Searxng.cs`
|
||||
**Purpose**: DTOs for SearxNG's JSON response format
|
||||
|
||||
### `SearxngRoot`
|
||||
```csharp
|
||||
public record SearxngRoot(
|
||||
[property: JsonPropertyName("results")] List<SearxngResult> Results
|
||||
);
|
||||
```
|
||||
|
||||
Top-level response object.
|
||||
|
||||
### `SearxngResult`
|
||||
```csharp
|
||||
public record SearxngResult(
|
||||
[property: JsonPropertyName("title")] string Title,
|
||||
[property: JsonPropertyName("url")] string Url,
|
||||
[property: JsonPropertyName("content")] string Content // Snippet/description
|
||||
);
|
||||
```
|
||||
|
||||
**Fields**:
|
||||
- `Title`: Result title (from page `<title>` or OpenGraph)
|
||||
- `Url`: Absolute URL to article
|
||||
- `Content`: Short snippet (~200 chars) from search engine
|
||||
|
||||
**Usage**:
|
||||
- `Url` passed to `ArticleService.FetchArticleAsync`
|
||||
- `Title` used as fallback if article extraction fails
|
||||
- `Content` currently unused (could be for quick answer without fetching)
|
||||
|
||||
**Example Response**:
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"title": "Quantum Entanglement - Wikipedia",
|
||||
"url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
|
||||
"content": "Quantum entanglement is a physical phenomenon..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## JSON Serialization
|
||||
|
||||
### JsonContext (Source Generation)
|
||||
|
||||
**Location**: `Models/JsonContexts.cs`
|
||||
**Purpose**: Provide source-generated JSON serializer context for AOT compatibility
|
||||
|
||||
#### Declaration
|
||||
```csharp
|
||||
[JsonSerializable(typeof(ChatCompletionRequest))]
|
||||
[JsonSerializable(typeof(ChatCompletionResponse))]
|
||||
[JsonSerializable(typeof(ChatCompletionChunk))]
|
||||
[JsonSerializable(typeof(EmbeddingRequest))]
|
||||
[JsonSerializable(typeof(EmbeddingResponse))]
|
||||
[JsonSerializable(typeof(SearxngRoot))]
|
||||
[JsonJsonSerializer(typeof(List<string>))]
|
||||
internal partial class AppJsonContext : JsonSerializerContext
|
||||
{
|
||||
}
|
||||
```
|
||||
|
||||
**Usage**:
|
||||
```csharp
|
||||
var json = JsonSerializer.Serialize(request, AppJsonContext.Default.ChatCompletionRequest);
|
||||
var response = JsonSerializer.Deserialize(json, AppJsonContext.Default.ChatCompletionResponse);
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- **AOT-compatible**: No reflection, works with PublishAot=true
|
||||
- **Performance**: Pre-compiled serializers are faster
|
||||
- **Trimming safe**: Unused serializers trimmed automatically
|
||||
|
||||
**Generated**: Partial class compiled by source generator (no manual implementation)
|
||||
|
||||
**Important**: Must include ALL types that will be serialized/deserialized in `[JsonSerializable]` attributes, otherwise runtime exception in AOT.
|
||||
|
||||
---
|
||||
|
||||
## Model Relationships
|
||||
|
||||
### Object Graph (Typical Execution)
|
||||
|
||||
```
|
||||
OpenQueryOptions
|
||||
↓
|
||||
OpenQueryApp.RunAsync()
|
||||
│
|
||||
├─ queryGenerationMessages (List<Message>)
|
||||
│ ├─ system: "You are an expert researcher..."
|
||||
│ └─ user: "Generate N queries for: {question}"
|
||||
│ ↓
|
||||
│ ChatCompletionRequest → OpenRouter → ChatCompletionResponse
|
||||
│ ↓
|
||||
│ List<string> generatedQueries
|
||||
│
|
||||
├─ SearchTool.ExecuteAsync()
|
||||
│ ↓
|
||||
│ ┌─────────────────────────────────────┐
|
||||
│ │ Phase 1: Parallel Searches │
|
||||
│ │ SearxngClient.SearchAsync(query) × N
|
||||
│ │ → List<SearxngResult> │
|
||||
│ │ (Title, Url, Content) │
|
||||
│ └─────────────────────────────────────┘
|
||||
│ ↓
|
||||
│ ┌─────────────────────────────────────┐
|
||||
│ │ Phase 2: Article Fetch & Chunking │
|
||||
│ │ ArticleService.FetchAsync(Url) × M
|
||||
│ │ → Article (TextContent, Title)
|
||||
│ │ → ChunkingService.ChunkText → List<string> chunks
|
||||
│ │ → Chunk(content, url, title) × K │
|
||||
│ └─────────────────────────────────────┘
|
||||
│ ↓
|
||||
│ ┌─────────────────────────────────────┐
|
||||
│ │ Phase 3: Embeddings │
|
||||
│ │ EmbeddingService.GetEmbeddingsAsync(chunkContents)
|
||||
│ │ → float[][] chunkEmbeddings │
|
||||
│ │ → Set chunk.Embedding for each │
|
||||
│ │ Also: GetEmbeddingAsync(question) → float[] queryEmbedding
|
||||
│ └─────────────────────────────────────┘
|
||||
│ ↓
|
||||
│ ┌─────────────────────────────────────┐
|
||||
│ │ Phase 4: Ranking │
|
||||
│ │ For each chunk: Score = CosineSimilarity(queryEmbedding, chunk.Embedding)
|
||||
│ │ → Set chunk.Score │
|
||||
│ │ → OrderByDescending(Score) │
|
||||
│ │ → Take(topChunksLimit) → topChunks (List<Chunk>)
|
||||
│ └─────────────────────────────────────┘
|
||||
│ ↓
|
||||
│ Context string: formatted topChunks
|
||||
│ ↓
|
||||
└─ OpenQueryApp → final ChatCompletionRequest
|
||||
System: "Answer based on context..."
|
||||
User: "Context:\n{context}\n\nQuestion: {question}"
|
||||
↓
|
||||
StreamAsync() → StreamChunk.TextDelta → Console
|
||||
```
|
||||
|
||||
### Record Immutability
|
||||
|
||||
Most DTOs are `record` types:
|
||||
- **Immutable**: Properties are init-only (`{ get; init; }`)
|
||||
- **Value semantics**: Equality based on content
|
||||
- **Thread-safe**: Can be shared across threads
|
||||
|
||||
**Exception**:
|
||||
- `Chunk`: Has mutable properties `Embedding` and `Score` (set during pipeline)
|
||||
- `ParallelProcessingOptions`: Class with mutable setters
|
||||
- `AppConfig`: Class with mutable setters
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[API Reference](../../api/cli.md)** - How these models are used in CLI commands
|
||||
- **[OpenRouterClient](../../services/OpenRouterClient.md)** - Uses OpenRouter models
|
||||
- **[SearxngClient](../../services/SearxngClient.md)** - Uses Searxng models
|
||||
- **[SearchTool](../../components/search-tool.md)** - Orchestrates all models
|
||||
|
||||
---
|
||||
|
||||
**Quick Reference Table**
|
||||
|
||||
| Model | Category | Purpose | Mutable? |
|
||||
|-------|----------|---------|----------|
|
||||
| `OpenQueryOptions` | Core | CLI options | No (record) |
|
||||
| `Chunk` | Core | Content + metadata + ranking | Partially (Embedding, Score) |
|
||||
| `ParallelProcessingOptions` | Config | Concurrency settings | Yes (class) |
|
||||
| `ChatCompletionRequest/Response` | OpenRouter | LLM API | No |
|
||||
| `EmbeddingRequest/Response` | OpenRouter | Embeddings API | No |
|
||||
| `SearxngRoot/Result` | SearxNG | Search results | No |
|
||||
| `AppJsonContext` | Internal | JSON serialization | No (generated partial) |
|
||||
395
docs/components/openquery-app.md
Normal file
395
docs/components/openquery-app.md
Normal file
@@ -0,0 +1,395 @@
|
||||
# OpenQueryApp Component
|
||||
|
||||
Deep dive into the `OpenQueryApp` class - the main application orchestrator.
|
||||
|
||||
## Overview
|
||||
|
||||
`OpenQueryApp` is the heart of OpenQuery. It coordinates all components, manages the workflow from question to answer, and handles progress reporting.
|
||||
|
||||
## Location
|
||||
`OpenQuery.cs` in project root
|
||||
|
||||
## Class Definition
|
||||
|
||||
```csharp
|
||||
public class OpenQueryApp
|
||||
{
|
||||
private readonly OpenRouterClient _client;
|
||||
private readonly SearchTool _searchTool;
|
||||
private readonly string _model;
|
||||
|
||||
public OpenQueryApp(
|
||||
OpenRouterClient client,
|
||||
SearchTool searchTool,
|
||||
string model);
|
||||
|
||||
public async Task RunAsync(OpenQueryOptions options);
|
||||
}
|
||||
```
|
||||
|
||||
**Dependencies**:
|
||||
- `OpenRouterClient` - for query generation and final answer streaming
|
||||
- `SearchTool` - for search-retrieve-rank pipeline
|
||||
- `string _model` - model identifier to use for LLM calls
|
||||
|
||||
**Lifecycle**: Instantiated once per query execution in `Program.cs`, then `RunAsync()` called once.
|
||||
|
||||
## RunAsync Workflow
|
||||
|
||||
```csharp
|
||||
public async Task RunAsync(OpenQueryOptions options)
|
||||
{
|
||||
// 1. Setup
|
||||
using var reporter = new StatusReporter(options.Verbose);
|
||||
reporter.StartSpinner();
|
||||
|
||||
// 2. Query Generation (if needed)
|
||||
List<string> queries = await GenerateQueriesIfNeededAsync(options, reporter);
|
||||
|
||||
// 3. Search Pipeline
|
||||
string searchResult = await ExecuteSearchPipelineAsync(options, queries, reporter);
|
||||
|
||||
// 4. Final Answer Streaming
|
||||
await StreamFinalAnswerAsync(options, searchResult, reporter);
|
||||
}
|
||||
```
|
||||
|
||||
### Step 1: Status Reporter Setup
|
||||
|
||||
```csharp
|
||||
using var reporter = new StatusReporter(options.Verbose);
|
||||
reporter.StartSpinner();
|
||||
```
|
||||
|
||||
- Creates `StatusReporter` (implements `IDisposable`)
|
||||
- Starts spinner animation (unless verbose)
|
||||
- `using` ensures disposal on exit
|
||||
|
||||
### Step 2: Query Generation
|
||||
|
||||
**When**: `options.Queries > 1` (user wants multiple search queries)
|
||||
|
||||
**Purpose**: Use LLM to generate diverse, optimized search queries from the original question
|
||||
|
||||
**System Prompt** (hardcoded in `OpenQuery.cs`):
|
||||
```
|
||||
You are an expert researcher. The user will ask a question. Your task is to
|
||||
generate optimal search queries to gather comprehensive information.
|
||||
|
||||
Instructions:
|
||||
1. Break down complex questions.
|
||||
2. Use synonyms and alternative phrasing.
|
||||
3. Target different aspects (entities, mechanisms, pros/cons, history).
|
||||
|
||||
CRITICAL: Output must be a valid JSON array of strings ONLY. No markdown,
|
||||
explanations, or other text.
|
||||
```
|
||||
|
||||
**Request**:
|
||||
```csharp
|
||||
var queryGenMessages = new List<Message>
|
||||
{
|
||||
new Message("system", systemPrompt),
|
||||
new Message("user", $"Generate {options.Queries} distinct search queries for:\n{options.Question}")
|
||||
};
|
||||
var request = new ChatCompletionRequest(_model, queryGenMessages);
|
||||
var response = await _client.CompleteAsync(request);
|
||||
```
|
||||
|
||||
**Response Parsing**:
|
||||
```csharp
|
||||
var content = response.Choices.FirstOrDefault()?.Message.Content;
|
||||
if (!string.IsNullOrEmpty(content))
|
||||
{
|
||||
// Remove markdown code fences if present
|
||||
content = Regex.Replace(content, @"```json\s*|\s*```", "").Trim();
|
||||
|
||||
// Deserialize to List<string>
|
||||
var generatedQueries = JsonSerializer.Deserialize(content, AppJsonContext.Default.ListString);
|
||||
if (generatedQueries != null && generatedQueries.Count > 0)
|
||||
{
|
||||
queries = generatedQueries;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Fallback**: If any step fails (exception, null, empty, invalid JSON), use `new List<string> { options.Question }` (single query = original)
|
||||
|
||||
**Note**: Query generation reuses the same model as final answer. This could be optimized:
|
||||
- Use cheaper/faster model for query gen
|
||||
- Separate model configuration
|
||||
- Cache query generation results
|
||||
|
||||
### Step 3: Search Pipeline Execution
|
||||
|
||||
```csharp
|
||||
var searchResult = await _searchTool.ExecuteAsync(
|
||||
options.Question,
|
||||
queries,
|
||||
options.Results,
|
||||
options.Chunks,
|
||||
(progress) => {
|
||||
if (options.Verbose)
|
||||
reporter.WriteLine(progress);
|
||||
else
|
||||
reporter.UpdateStatus(parsedMessage);
|
||||
},
|
||||
options.Verbose);
|
||||
```
|
||||
|
||||
**Parameters**:
|
||||
- `originalQuery`: User's original question (used for final embedding)
|
||||
- `generatedQueries`: From step 2 (or fallback)
|
||||
- `maxResults`: `options.Results` (search results per query)
|
||||
- `topChunksLimit`: `options.Chunks` (top N chunks to return)
|
||||
- `onProgress`: Callback to update UI
|
||||
- `verbose`: Passed through to `SearchTool`
|
||||
|
||||
**Returns**: `string context` - formatted context with source citations
|
||||
|
||||
**Progress Handling**:
|
||||
- In verbose mode: all progress printed as lines (via `reporter.WriteLine()`)
|
||||
- In compact mode: parse progress messages to show concise status (e.g., "Fetching articles 3/10...")
|
||||
|
||||
### Step 4: Final Answer Streaming
|
||||
|
||||
**Status Update**:
|
||||
```csharp
|
||||
if (!options.Verbose)
|
||||
reporter.UpdateStatus("Asking AI...");
|
||||
else
|
||||
{
|
||||
reporter.ClearStatus();
|
||||
Console.WriteLine();
|
||||
}
|
||||
```
|
||||
|
||||
**Build System Prompt**:
|
||||
```csharp
|
||||
var systemPrompt = "You are a helpful AI assistant. Answer the user's question in depth, based on the provided context. Be precise and accurate. You can mention sources or citations.";
|
||||
if (options.Short) systemPrompt += " Give a very short concise answer.";
|
||||
if (options.Long) systemPrompt += " Give a long elaborate detailed answer.";
|
||||
```
|
||||
|
||||
**Prompt Structure**:
|
||||
```
|
||||
System: {systemPrompt}
|
||||
User: Context:
|
||||
{searchResult}
|
||||
|
||||
Question: {options.Question}
|
||||
```
|
||||
|
||||
Where `searchResult` is:
|
||||
```
|
||||
[Source 1: Title](URL)
|
||||
Content chunk 1
|
||||
|
||||
[Source 2: Title](URL)
|
||||
Content chunk 2
|
||||
|
||||
...
|
||||
```
|
||||
|
||||
**Streaming**:
|
||||
```csharp
|
||||
var requestStream = new ChatCompletionRequest(_model, messages);
|
||||
var assistantResponse = new StringBuilder();
|
||||
var isFirstChunk = true;
|
||||
|
||||
using var streamCts = new CancellationTokenSource();
|
||||
await foreach (var chunk in _client.StreamAsync(requestStream, streamCts.Token))
|
||||
{
|
||||
if (chunk.TextDelta == null) continue;
|
||||
|
||||
if (isFirstChunk)
|
||||
{
|
||||
reporter.StopSpinner();
|
||||
if (!options.Verbose) reporter.ClearStatus();
|
||||
else Console.Write("Assistant: ");
|
||||
isFirstChunk = false;
|
||||
}
|
||||
|
||||
Console.Write(chunk.TextDelta);
|
||||
assistantResponse.Append(chunk.TextDelta);
|
||||
}
|
||||
```
|
||||
|
||||
**Key Points**:
|
||||
- `StreamAsync` yields `StreamChunk` objects (text deltas)
|
||||
- First chunk stops spinner and clears status line
|
||||
- Each delta written to Console immediately (real-time feel)
|
||||
- Entire response accumulated in `assistantResponse` (though not used elsewhere)
|
||||
- `CancellationTokenSource` passed but not canceled (Ctrl+C would cancel from outside)
|
||||
|
||||
**Finally Block**:
|
||||
```csharp
|
||||
finally
|
||||
{
|
||||
reporter.StopSpinner();
|
||||
}
|
||||
```
|
||||
Ensures spinner stops even if streaming fails.
|
||||
|
||||
**End**:
|
||||
```csharp
|
||||
Console.WriteLine(); // Newline after complete answer
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
`RunAsync` itself does not catch exceptions. All exceptions propagate to `Program.cs`:
|
||||
|
||||
```csharp
|
||||
try
|
||||
{
|
||||
var openQuery = new OpenQueryApp(client, searchTool, model);
|
||||
await openQuery.RunAsync(options);
|
||||
}
|
||||
catch (HttpRequestException ex)
|
||||
{
|
||||
Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
|
||||
Environment.Exit(1);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
|
||||
Environment.Exit(1);
|
||||
}
|
||||
```
|
||||
|
||||
**Common Exceptions**:
|
||||
- `HttpRequestException` - network failures, API errors
|
||||
- `JsonException` - malformed JSON from API
|
||||
- `TaskCanceledException` - timeout or user interrupt
|
||||
- `Exception` - anything else
|
||||
|
||||
**No Retries at This Level**: Fail fast; user sees error immediately. Lower-level retries exist (embedding service).
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
**Query Generation**:
|
||||
- One non-streaming LLM call
|
||||
- Takes 2-5 seconds depending on model
|
||||
- Typically <1000 tokens
|
||||
|
||||
**Search Pipeline** (`SearchTool.ExecuteAsync`):
|
||||
- See `SearchTool.md` for detailed timing breakdown
|
||||
- Total 10-30 seconds typically
|
||||
|
||||
**Final Answer Streaming**:
|
||||
- Streaming LLM call
|
||||
- Time depends on answer length (typically 5-20 seconds)
|
||||
- User sees words appear progressively
|
||||
|
||||
**Total End-to-End**: 15-50 seconds for typical query
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Not Stream Query Generation?
|
||||
|
||||
Query generation currently uses `CompleteAsync` (non-streaming). Could be streamed but:
|
||||
- Queries are short (JSON array)
|
||||
- Streaming offers no UX benefit (user doesn't see intermediate queries)
|
||||
- Simpler to wait for all queries before proceeding
|
||||
|
||||
### Why Build Prompt Manually Instead of Templates?
|
||||
|
||||
Simple string concatenation is fine for few prompts. Pros:
|
||||
- No template dependencies
|
||||
- Easy to read and modify
|
||||
- No runtime compilation overhead
|
||||
|
||||
Cons:
|
||||
- No validation
|
||||
- Could benefit from prompt engineering framework
|
||||
|
||||
### Why Accumulate `assistantResponse` StringBuilder?
|
||||
|
||||
Currently built but not used. Could be:
|
||||
- Saved to file (future feature: `--output file.md`)
|
||||
- Analyzed for token counting
|
||||
- Removed if not needed
|
||||
|
||||
### Could Query Generation Be Cached?
|
||||
|
||||
Yes! For repeated questions (common in scripts), cache query results:
|
||||
- `Dictionary<string, List<string>>` cache in memory
|
||||
- Or persistent cache (Redis, file)
|
||||
- Not implemented (low priority)
|
||||
|
||||
### Single Responsibility Violation?
|
||||
|
||||
`OpenQueryApp` does:
|
||||
- Query generation
|
||||
- Pipeline orchestration
|
||||
- Answer streaming
|
||||
|
||||
That's 3 responsibilities, but they're tightly coupled to the "query → answer" workflow. Separating them would add complexity without clear benefit. Acceptable as "application coordinator".
|
||||
|
||||
## Extension Points
|
||||
|
||||
### Adding New Model for Query Generation
|
||||
|
||||
Currently uses same `_model` for queries and answer. To use different models:
|
||||
|
||||
1. Add `queryGenerationModel` parameter to constructor
|
||||
2. Use it for query gen: `new ChatCompletionRequest(queryGenerationModel, queryGenMessages)`
|
||||
3. Keep `_model` for final answer
|
||||
|
||||
Or make it configurable via environment variable: `OPENROUTER_QUERY_MODEL`
|
||||
|
||||
### Post-Processing Answer
|
||||
|
||||
Opportunity to add:
|
||||
- Source citation formatting (footnotes, clickable links)
|
||||
- Answer summarization
|
||||
- Export to Markdown/JSON
|
||||
- Text-to-speech
|
||||
|
||||
Add after streaming loop, before final newline.
|
||||
|
||||
### Progress UI Enhancement
|
||||
|
||||
Current `StatusReporter` is basic. Could add:
|
||||
- Progress bar with percentage
|
||||
- ETA calculation
|
||||
- Colors (ANSI) for different message types
|
||||
- Logging to file
|
||||
- Web dashboard
|
||||
|
||||
Would require extending `StatusReporter` or replacing it.
|
||||
|
||||
## Testing Considerations
|
||||
|
||||
**Challenges**:
|
||||
- `RunAsync` is cohesive (hard to unit test in isolation)
|
||||
- Depends on many services (need mocks)
|
||||
- Asynchronous and streaming
|
||||
|
||||
**Recommended Approach**:
|
||||
1. Extract interfaces:
|
||||
- `ISearchTool` (wrapper around `SearchTool`)
|
||||
- `IOpenRouterClient` (wrapper around `OpenRouterClient`)
|
||||
2. Mock interfaces in tests
|
||||
3. Test query generation parsing separately
|
||||
4. Test progress callback counting
|
||||
5. Test final answer prompt construction
|
||||
|
||||
**Integration Tests**:
|
||||
- End-to-end with real/mocked APIs
|
||||
- Automated tests with test SearxNG/OpenRouter instances
|
||||
|
||||
## Related Components
|
||||
|
||||
- **[SearchTool](search-tool.md)** - pipeline executed by `OpenQueryApp`
|
||||
- **[Program.cs](../Program.md)** - creates `OpenQueryApp`
|
||||
- **[StatusReporter](../services/StatusReporter.md)** - progress UI used by `OpenQueryApp`
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [SearchTool](search-tool.md) - See the pipeline in detail
|
||||
- [Services](../services/overview.md) - Understand each service
|
||||
- [CLI Reference](../../api/cli.md) - How users invoke this
|
||||
603
docs/components/overview.md
Normal file
603
docs/components/overview.md
Normal file
@@ -0,0 +1,603 @@
|
||||
# Components Overview
|
||||
|
||||
Detailed documentation for each major component in the OpenQuery system.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Component Hierarchy](#component-hierarchy)
|
||||
2. [Core Components](#core-components)
|
||||
3. [Services](#services)
|
||||
4. [Data Models](#data-models)
|
||||
5. [Component Interactions](#component-interactions)
|
||||
|
||||
## Component Hierarchy
|
||||
|
||||
```
|
||||
OpenQuery/
|
||||
├── Program.cs [Entry Point, CLI]
|
||||
├── OpenQuery.cs [OpenQueryApp - Orchestrator]
|
||||
├── Tools/
|
||||
│ └── SearchTool.cs [Pipeline Orchestration]
|
||||
├── Services/
|
||||
│ ├── OpenRouterClient.cs [LLM & Embedding API]
|
||||
│ ├── SearxngClient.cs [Search API]
|
||||
│ ├── EmbeddingService.cs [Embedding Generation + Math]
|
||||
│ ├── ChunkingService.cs [Text Splitting]
|
||||
│ ├── ArticleService.cs [Content Extraction]
|
||||
│ ├── RateLimiter.cs [Concurrency Control]
|
||||
│ └── StatusReporter.cs [Progress Display]
|
||||
├── Models/
|
||||
│ ├── OpenQueryOptions.cs [CLI Options Record]
|
||||
│ ├── Chunk.cs [Content + Metadata]
|
||||
│ ├── ParallelOptions.cs [Concurrency Settings]
|
||||
│ ├── OpenRouter.cs [API DTOs]
|
||||
│ ├── Searxng.cs [Search Result DTOs]
|
||||
│ └── JsonContexts.cs [JSON Context]
|
||||
└── ConfigManager.cs [Configuration Persistence]
|
||||
```
|
||||
|
||||
## Core Components
|
||||
|
||||
### 1. Program.cs
|
||||
|
||||
**Type**: Console Application Entry Point
|
||||
**Responsibilities**: CLI parsing, dependency wiring, error handling
|
||||
|
||||
**Key Elements**:
|
||||
- `RootCommand` from System.CommandLine
|
||||
- Options: `--chunks`, `--results`, `--queries`, `--short`, `--long`, `--verbose`
|
||||
- Subcommand: `configure` (with interactive mode)
|
||||
- Configuration loading via `ConfigManager.Load()`
|
||||
- Environment variable resolution
|
||||
- Service instantiation and coordination
|
||||
- Top-level try-catch for error reporting
|
||||
|
||||
**Code Flow**:
|
||||
1. Load config file
|
||||
2. Define CLI options and commands
|
||||
3. Set handler for root command
|
||||
4. Handler: resolve API key/model → instantiate services → call `OpenQueryApp.RunAsync()`
|
||||
5. Set handler for configure command (writes config file)
|
||||
6. Invoke command parser: `await rootCommand.InvokeAsync(args)`
|
||||
|
||||
**Exit Codes**:
|
||||
- 0 = success
|
||||
- 1 = error
|
||||
|
||||
### 2. OpenQueryApp (OpenQuery.cs)
|
||||
|
||||
**Type**: Main Application Class
|
||||
**Responsibilities**: Workflow orchestration, query generation, answer streaming
|
||||
|
||||
**Constructor Parameters**:
|
||||
- `OpenRouterClient client` - for query gen and final answer
|
||||
- `SearchTool searchTool` - for search-retrieve-rank pipeline
|
||||
- `string model` - LLM model identifier
|
||||
|
||||
**Main Method**: `RunAsync(OpenQueryOptions options)`
|
||||
|
||||
**Workflow Steps**:
|
||||
1. Create `StatusReporter` (for progress UI)
|
||||
2. **Optional Query Generation** (if `options.Queries > 1`):
|
||||
- Create system message instructing JSON array output
|
||||
- Create user message with `options.Question`
|
||||
- Call `client.CompleteAsync()` with query gen model
|
||||
- Parse JSON response; fall back to original question on failure
|
||||
- Result: `List<string> queries` (1 or many)
|
||||
3. **Execute Search Pipeline**:
|
||||
- Call `_searchTool.ExecuteAsync()` with queries, options
|
||||
- Receive `string context` (formatted context with source citations)
|
||||
- Progress reported via callback to `StatusReporter`
|
||||
4. **Generate Final Answer**:
|
||||
- Build system prompt (append "short" or "long" modifier)
|
||||
- Create user message with `Context:\n{context}\n\nQuestion: {options.Question}`
|
||||
- Stream answer via `client.StreamAsync()`
|
||||
- Write each `chunk.TextDelta` to Console as it arrives
|
||||
- Stop spinner on first chunk, continue streaming
|
||||
5. Dispose reporter
|
||||
|
||||
**Error Handling**:
|
||||
- Exceptions propagate to `Program.cs` top-level handler
|
||||
- `HttpRequestException` vs generic `Exception`
|
||||
|
||||
**Note**: Query generation uses the same model as final answer; could be separated for cost/performance.
|
||||
|
||||
### 3. SearchTool (Tools/SearchTool.cs)
|
||||
|
||||
**Type**: Pipeline Orchestrator
|
||||
**Responsibilities**: Execute 4-phase search-retrieve-rank-return workflow
|
||||
|
||||
**Constructor Parameters**:
|
||||
- `SearxngClient searxngClient`
|
||||
- `EmbeddingService embeddingService`
|
||||
|
||||
**Main Method**: `ExecuteAsync(originalQuery, generatedQueries, maxResults, topChunksLimit, onProgress, verbose)`
|
||||
|
||||
**Returns**: `Task<string>` - formatted context string with source citations
|
||||
|
||||
**Pipeline Phases**:
|
||||
|
||||
#### Phase 1: ExecuteParallelSearchesAsync
|
||||
- Parallelize `searxngClient.SearchAsync(query, maxResults)` for each query
|
||||
- Collect all results in `ConcurrentBag<SearxngResult>`
|
||||
- Deduplicate by `DistinctBy(r => r.Url)`
|
||||
|
||||
**Output**: `List<SearxngResult>` (aggregated, unique)
|
||||
|
||||
#### Phase 2: ExecuteParallelArticleFetchingAsync
|
||||
- Semaphore: `MaxConcurrentArticleFetches` (default 10)
|
||||
- For each `SearxngResult`: fetch URL via `ArticleService.FetchArticleAsync()`
|
||||
- Extract article text, title
|
||||
- Chunk via `ChunkingService.ChunkText(article.TextContent)`
|
||||
- Add each chunk as new `Chunk(content, url, title)`
|
||||
|
||||
**Output**: `List<Chunk>` (potentially 50-100 chunks)
|
||||
|
||||
#### Phase 3: ExecuteParallelEmbeddingsAsync
|
||||
- Start two parallel tasks:
|
||||
1. Query embedding: `embeddingService.GetEmbeddingAsync(originalQuery)`
|
||||
2. Chunk embeddings: `embeddingService.GetEmbeddingsWithRateLimitAsync(chunkTexts, onProgress)`
|
||||
- `Parallel.ForEachAsync` with `MaxConcurrentEmbeddingRequests` (default 4)
|
||||
- Batch size: 300 chunks per embedding API call
|
||||
- Filter chunks with empty embeddings (failed batches)
|
||||
|
||||
**Output**: `(float[] queryEmbedding, float[][] chunkEmbeddings)`
|
||||
|
||||
#### Phase 4: RankAndSelectTopChunks
|
||||
- Calculate cosine similarity for each chunk vs query
|
||||
- Assign `chunk.Score`
|
||||
- Order by descending score
|
||||
- Take `topChunksLimit` (from `--chunks` option)
|
||||
- Return `List<Chunk>` (top N)
|
||||
|
||||
**Formatting**:
|
||||
```csharp
|
||||
string context = string.Join("\n\n", topChunks.Select((c, i) =>
|
||||
$"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
|
||||
```
|
||||
|
||||
**Progress Callbacks**: Invoked at each major step for UI feedback
|
||||
|
||||
## Services
|
||||
|
||||
### OpenRouterClient
|
||||
|
||||
**Purpose**: HTTP client for OpenRouter API (chat completions + embeddings)
|
||||
|
||||
**Base URL**: `https://openrouter.ai/api/v1`
|
||||
|
||||
**Authentication**: `Authorization: Bearer {apiKey}`
|
||||
|
||||
**Methods**:
|
||||
|
||||
#### `StreamAsync(ChatCompletionRequest request, CancellationToken)`
|
||||
- Sets `request.Stream = true`
|
||||
- POST to `/chat/completions`
|
||||
- Reads SSE stream line-by-line
|
||||
- Parses `data: {json}` chunks
|
||||
- Yields `StreamChunk` (text delta or tool call)
|
||||
- Supports cancellation
|
||||
|
||||
#### `CompleteAsync(ChatCompletionRequest request)`
|
||||
- Sets `request.Stream = false`
|
||||
- POST to `/chat/completions`
|
||||
- Deserializes full response
|
||||
- Returns `ChatCompletionResponse`
|
||||
|
||||
#### `EmbedAsync(string model, List<string> inputs)`
|
||||
- POST to `/embeddings`
|
||||
- Returns `float[][]` (ordered by input index)
|
||||
|
||||
**Error Handling**: `EnsureSuccessStatusCode()` throws `HttpRequestException` on failure
|
||||
|
||||
**Design**: Thin wrapper; no retry logic (delegated to EmbeddingService)
|
||||
|
||||
### SearxngClient
|
||||
|
||||
**Purpose**: HTTP client for SearxNG metasearch
|
||||
|
||||
**Base URL**: Configurable (default `http://localhost:8002`)
|
||||
|
||||
**Methods**:
|
||||
|
||||
#### `SearchAsync(string query, int limit = 10)`
|
||||
- GET `{baseUrl}/search?q={query}&format=json`
|
||||
- Deserializes to `SearxngRoot`
|
||||
- Returns `Results.Take(limit).ToList()`
|
||||
- On failure: returns empty `List<SearxngResult>` (no exception)
|
||||
|
||||
**Design**: Very simple; failures are tolerated (OpenQuery continues with other queries)
|
||||
|
||||
### EmbeddingService
|
||||
|
||||
**Purpose**: Batch embedding generation with rate limiting, parallelization, and retries
|
||||
|
||||
**Configuration** (from `ParallelProcessingOptions`):
|
||||
- `MaxConcurrentEmbeddingRequests` = 4
|
||||
- `EmbeddingBatchSize` = 300
|
||||
|
||||
**Default Embedding Model**: `openai/text-embedding-3-small`
|
||||
|
||||
**Methods**:
|
||||
|
||||
#### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
|
||||
- Splits `texts` into batches of `EmbeddingBatchSize`
|
||||
- Parallelizes batches with `Parallel.ForEachAsync` + `MaxConcurrentEmbeddingRequests`
|
||||
- Each batch: rate-limited + retry-wrapped `client.EmbedAsync(model, batch)`
|
||||
- Collects results in order (by batch index)
|
||||
- Returns `float[][]` (same order as input texts)
|
||||
- Failed batches return empty `float[]` for each text
|
||||
|
||||
#### `GetEmbeddingAsync(string text, CancellationToken)`
|
||||
- Wraps single-text call in rate limiter + retry
|
||||
- Returns `float[]`
|
||||
|
||||
#### `CosineSimilarity(float[] v1, float[] v2)`
|
||||
- Static method using `TensorPrimitives.CosineSimilarity`
|
||||
- Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
|
||||
|
||||
**Retry Policy** (Polly):
|
||||
- Max 3 attempts
|
||||
- 1s base delay, exponential backoff
|
||||
- Only `HttpRequestException`
|
||||
|
||||
**Rate Limiting**: `RateLimiter` semaphore with `MaxConcurrentEmbeddingRequests`
|
||||
|
||||
**Design Notes**:
|
||||
- Two similar methods (`GetEmbeddingsAsync` and `GetEmbeddingsWithRateLimitAsync`) - could be consolidated
|
||||
- Uses Polly for resilience (good pattern)
|
||||
- Concurrency control prevents overwhelming OpenRouter
|
||||
|
||||
### ChunkingService
|
||||
|
||||
**Purpose**: Split long text into manageable pieces
|
||||
|
||||
**Static Class** (no dependencies, pure function)
|
||||
|
||||
**Algorithm** (in `ChunkText(string text)`):
|
||||
- Constant `MAX_CHUNK_SIZE = 500`
|
||||
- While remaining text:
|
||||
- Take up to 500 chars
|
||||
- If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
|
||||
- Trim and add non-empty chunk
|
||||
- Advance start position
|
||||
|
||||
**Rationale**: 500 chars is a sweet spot for embeddings - long enough for context, short enough for semantic coherence.
|
||||
|
||||
**Edge Cases**: Handles text shorter than 500 chars, empty text, text with no natural breaks.
|
||||
|
||||
### ArticleService
|
||||
|
||||
**Purpose**: Extract clean article content from URLs
|
||||
|
||||
**Method**: `FetchArticleAsync(string url)`
|
||||
|
||||
**Implementation**: Delegates to `SmartReader.ParseArticleAsync(url)`
|
||||
|
||||
**Returns**: `Article` object (from SmartReader)
|
||||
- `Title` (string)
|
||||
- `TextContent` (string) - cleaned article body
|
||||
- `IsReadable` (bool) - quality indicator
|
||||
- Other metadata (author, date, etc.)
|
||||
|
||||
**Error Handling**: Exceptions propagate (handled by `SearchTool`)
|
||||
|
||||
**Design**: Thin wrapper around third-party library. Could be extended to add caching, custom extraction rules, etc.
|
||||
|
||||
### RateLimiter
|
||||
|
||||
**Purpose**: Limit concurrent operations via semaphore
|
||||
|
||||
**Interface**:
|
||||
```csharp
|
||||
public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken);
|
||||
public async Task ExecuteAsync(Func<Task> action, CancellationToken);
|
||||
```
|
||||
|
||||
**Implementation**: `SemaphoreSlim` with `WaitAsync` and `Release`
|
||||
|
||||
**Disposal**: `IAsyncDisposable` (awaits semaphore disposal)
|
||||
|
||||
**Usage**: Wrap API calls that need concurrency control
|
||||
```csharp
|
||||
var result = await _rateLimiter.ExecuteAsync(async () =>
|
||||
await _client.EmbedAsync(model, batch), cancellationToken);
|
||||
```
|
||||
|
||||
**Design**: Simple, reusable. Could be replaced with `Polly.RateLimiting` policy but this is lightweight.
|
||||
|
||||
### StatusReporter
|
||||
|
||||
**Purpose**: Real-time progress UI with spinner and verbose modes
|
||||
|
||||
**Architecture**:
|
||||
- Producer: UpdateStatus(text) → writes to `Channel<string>`
|
||||
- Consumer: Background task `ProcessStatusUpdatesAsync()` reads from channel
|
||||
- Spinner: Separate task animates Braille characters every 100ms
|
||||
|
||||
**Modes**:
|
||||
|
||||
**Verbose Mode** (`_verbose = true`):
|
||||
- All progress messages written as `Console.WriteLine()`
|
||||
- No spinner
|
||||
- Full audit trail
|
||||
|
||||
**Compact Mode** (default):
|
||||
- Status line with spinner (overwrites same line)
|
||||
- Only latest status visible
|
||||
- Example: `⠋ Fetching articles 3/10...`
|
||||
|
||||
**Key Methods**:
|
||||
- `UpdateStatus(message)` - fire-and-forget, non-blocking
|
||||
- `WriteLine(text)` - stops spinner temporarily, writes full line
|
||||
- `StartSpinner()` / `StopSpinner()` - manual control
|
||||
- `ClearStatus()` - ANSI escape `\r\x1b[K` to clear line
|
||||
- `Dispose()` - completes channel, waits for background tasks
|
||||
|
||||
**Spinner Chars**: `['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']` (Braille patterns, smooth animation)
|
||||
|
||||
**ANSI Codes**: `\r` (carriage return), `\x1b[K` (erase to end of line)
|
||||
|
||||
**Thread Safety**: Channel is thread-safe; multiple components can write concurrently without locks
|
||||
|
||||
**Design**: Well-encapsulated; could be reused in other CLI projects.
|
||||
|
||||
### ConfigManager
|
||||
|
||||
**Purpose**: Load/save configuration from XDG-compliant location
|
||||
|
||||
**Config Path**:
|
||||
- `Environment.SpecialFolder.UserProfile` → `~/.config/openquery/config`
|
||||
|
||||
**Schema** (`AppConfig`):
|
||||
```csharp
|
||||
public class AppConfig
|
||||
{
|
||||
public string ApiKey { get; set; } = "";
|
||||
public string Model { get; set; } = "qwen/qwen3.5-flash-02-23";
|
||||
public int DefaultQueries { get; set; } = 3;
|
||||
public int DefaultChunks { get; set; } = 3;
|
||||
public int DefaultResults { get; set; } = 5;
|
||||
}
|
||||
```
|
||||
|
||||
**Format**: Simple `key=value` (no INI parser, manual line split)
|
||||
|
||||
**Methods**:
|
||||
- `Load()` → reads file if exists, returns `AppConfig` (with defaults)
|
||||
- `Save(AppConfig)` → writes all 5 keys, overwrites existing
|
||||
|
||||
**Design**:
|
||||
- Static class (no instances)
|
||||
- Creates directory if missing
|
||||
- No validation (writes whatever values given)
|
||||
- Could be improved with JSON format (but keep simple)
|
||||
|
||||
## Data Models
|
||||
|
||||
### OpenQueryOptions
|
||||
|
||||
**Location**: `Models/OpenQueryOptions.cs`
|
||||
|
||||
**Type**: `record`
|
||||
|
||||
**Purpose**: Immutable options object passed through workflow
|
||||
|
||||
**Properties**:
|
||||
- `int Chunks` - top N chunks for context
|
||||
- `int Results` - search results per query
|
||||
- `int Queries` - number of expanded queries to generate
|
||||
- `bool Short` - concise answer flag
|
||||
- `bool Long` - detailed answer flag
|
||||
- `bool Verbose` - verbose logging flag
|
||||
- `string Question` - original user question
|
||||
|
||||
**Created**: In `Program.cs` from CLI options + config defaults
|
||||
|
||||
**Used By**: `OpenQueryApp.RunAsync()`
|
||||
|
||||
### Chunk
|
||||
|
||||
**Location**: `Models/Chunk.cs`
|
||||
|
||||
**Type**: `record`
|
||||
|
||||
**Purpose**: Content chunk with metadata and embedding
|
||||
|
||||
**Properties**:
|
||||
- `string Content` - extracted text (~500 chars)
|
||||
- `string SourceUrl` - article URL
|
||||
- `string? Title` - article title (nullable)
|
||||
- `float[]? Embedding` - vector embedding (populated by EmbeddingService)
|
||||
- `float Score` - relevance score (populated during ranking)
|
||||
|
||||
**Lifecycle**:
|
||||
1. Instantiated in `SearchTool.ExecuteParallelArticleFetchingAsync` with content, url, title
|
||||
2. `Embedding` set in `ExecuteParallelEmbeddingsAsync` after batch processing
|
||||
3. `Score` set in `RankAndSelectTopChunks` after cosine similarity
|
||||
4. Serialized into context string for final answer
|
||||
|
||||
**Equality**: Records provide value equality (based on all properties)
|
||||
|
||||
### ParallelProcessingOptions
|
||||
|
||||
**Location**: `Models/ParallelOptions.cs`
|
||||
|
||||
**Type**: `class` (mutable)
|
||||
|
||||
**Purpose**: Concurrency settings for parallel operations
|
||||
|
||||
**Properties** (with defaults):
|
||||
- `MaxConcurrentArticleFetches` = 10
|
||||
- `MaxConcurrentEmbeddingRequests` = 4
|
||||
- `EmbeddingBatchSize` = 300
|
||||
|
||||
**Used By**: `EmbeddingService` (for embeddings), `SearchTool` (for article fetching)
|
||||
|
||||
**Currently**: Hardcoded in `SearchTool` constructor; could be made configurable
|
||||
|
||||
### OpenRouter Models (Models/OpenRouter.cs)
|
||||
|
||||
**Purpose**: DTOs for OpenRouter API (JSON serializable)
|
||||
|
||||
**Chat Completion**:
|
||||
- `ChatCompletionRequest` (model, messages, tools, stream)
|
||||
- `ChatCompletionResponse` (choices[], usage[])
|
||||
- `Message` (role, content, tool_calls, tool_call_id)
|
||||
- `ToolDefinition`, `ToolFunction`, `ToolCall`, `FunctionCall`
|
||||
- `Choice`, `Usage`
|
||||
|
||||
**Embedding**:
|
||||
- `EmbeddingRequest` (model, input[])
|
||||
- `EmbeddingResponse` (data[], usage)
|
||||
- `EmbeddingData` (embedding[], index)
|
||||
|
||||
**Streaming**:
|
||||
- `StreamChunk` (TextDelta, Tool)
|
||||
- `ChatCompletionChunk`, `ChunkChoice`, `ChunkDelta`
|
||||
|
||||
**JSON Properties**: Uses `[JsonPropertyName]` to match API
|
||||
|
||||
**Serialization**: System.Text.Json with source generation (AppJsonContext)
|
||||
|
||||
### Searxng Models (Models/Searxng.cs)
|
||||
|
||||
**Purpose**: DTOs for SearxNG search results
|
||||
|
||||
**Records**:
|
||||
- `SearxngRoot` with `List<SearxngResult> Results`
|
||||
- `SearxngResult` with `Title`, `Url`, `Content` (snippet)
|
||||
|
||||
**Usage**: Deserialized from SearxNG's JSON response
|
||||
|
||||
### JsonContexts
|
||||
|
||||
**Location**: `Models/JsonContexts.cs`
|
||||
|
||||
**Purpose**: Source-generated JSON serializer context for AOT compatibility
|
||||
|
||||
**Pattern**:
|
||||
```csharp
|
||||
[JsonSerializable(typeof(ChatCompletionRequest))]
|
||||
[JsonSerializable(typeof(ChatCompletionResponse))]
|
||||
... etc ...
|
||||
internal partial class AppJsonContext : JsonSerializerContext
|
||||
{
|
||||
}
|
||||
```
|
||||
|
||||
**Generated**: Partial class compiled by source generator
|
||||
|
||||
**Used By**: All `JsonSerializer.Serialize/Deserialize` calls with `AppJsonContext.Default.{Type}`
|
||||
|
||||
**Benefits**:
|
||||
- AOT-compatible (no reflection)
|
||||
- Faster serialization (compiled delegates)
|
||||
- Smaller binary (trimming-safe)
|
||||
|
||||
## Component Interactions
|
||||
|
||||
### Dependencies Graph
|
||||
|
||||
```
|
||||
Program.cs
|
||||
├── ConfigManager (load/save)
|
||||
├── OpenRouterClient ──┐
|
||||
├── SearxngClient ─────┤
|
||||
├── EmbeddingService ──┤
|
||||
└── SearchTool ────────┤
|
||||
│
|
||||
OpenQueryApp ◄──────────┘
|
||||
│
|
||||
├── OpenRouterClient (query gen + answer streaming)
|
||||
├── SearchTool (pipeline)
|
||||
│ ├── SearxngClient (searches)
|
||||
│ ├── ArticleService (fetch)
|
||||
│ ├── ChunkingService (split)
|
||||
│ ├── EmbeddingService (embeddings)
|
||||
│ ├── RateLimiter (concurrency)
|
||||
│ └── StatusReporter (progress via callback)
|
||||
└── StatusReporter (UI)
|
||||
```
|
||||
|
||||
### Data Flow Between Components
|
||||
|
||||
```
|
||||
OpenQueryOptions
|
||||
↓
|
||||
OpenQueryApp
|
||||
├─ Query Generation
|
||||
│ └─ OpenRouterClient.CompleteAsync()
|
||||
│ → List<string> generatedQueries
|
||||
│
|
||||
├─ Search Pipeline
|
||||
│ └─ SearchTool.ExecuteAsync(originalQuery, generatedQueries, ...)
|
||||
│ ↓
|
||||
│ Phase 1: SearxngClient.SearchAsync(query) × N
|
||||
│ → ConcurrentBag<SearxngResult>
|
||||
│ → List<SearxngResult> (unique)
|
||||
│ ↓
|
||||
│ Phase 2: ArticleService.FetchArticleAsync(url) × M
|
||||
│ → ChunkingService.ChunkText(article.TextContent)
|
||||
│ → ConcurrentBag<Chunk> (content, url, title)
|
||||
│ ↓
|
||||
│ Phase 3: EmbeddingService.GetEmbeddingsAsync(chunkContents)
|
||||
│ → (queryEmbedding, chunkEmbeddings)
|
||||
│ ↓
|
||||
│ Phase 4: CosineSimilarity + Rank
|
||||
│ → List<Chunk> topChunks (with Score, Embedding set)
|
||||
│ ↓
|
||||
│ Format: context string with [Source N: Title](Url)
|
||||
│ → return context string
|
||||
│
|
||||
└─ Final Answer
|
||||
└─ OpenRouterClient.StreamAsync(prompt with context)
|
||||
→ stream deltas to Console
|
||||
```
|
||||
|
||||
### Interface Contracts
|
||||
|
||||
**SearchTool → Progress**:
|
||||
```csharp
|
||||
// Invoked as: onProgress?.Invoke("[Fetching article 1/10: example.com]")
|
||||
Action<string>? onProgress
|
||||
```
|
||||
|
||||
**StatusReporter ← Progress**:
|
||||
```csharp
|
||||
// Handler in OpenQueryApp:
|
||||
(progress) => {
|
||||
if (options.Verbose) reporter.WriteLine(progress);
|
||||
else reporter.UpdateStatus(parsedShorterMessage);
|
||||
}
|
||||
```
|
||||
|
||||
**SearchTool → ArticleService**:
|
||||
```csharp
|
||||
Article article = await ArticleService.FetchArticleAsync(url);
|
||||
```
|
||||
|
||||
**SearchTool → EmbeddingService**:
|
||||
```csharp
|
||||
(float[] queryEmbedding, float[][] chunkEmbeddings) = await ExecuteParallelEmbeddingsAsync(...);
|
||||
// Also: embeddingService.GetEmbeddingAsync(text), GetEmbeddingsWithRateLimitAsync(...)
|
||||
```
|
||||
|
||||
**SearchTool → ChunkingService**:
|
||||
```csharp
|
||||
List<string> chunks = ChunkingService.ChunkText(article.TextContent);
|
||||
```
|
||||
|
||||
**SearchTool → RateLimiter**:
|
||||
```csharp
|
||||
await _rateLimiter.ExecuteAsync(async () => await _client.EmbedAsync(...), ct);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [OpenQueryApp](openquery-app.md) - Main orchestrator details
|
||||
- [SearchTool](search-tool.md) - Pipeline implementation
|
||||
- [Services](services.md) - All service classes documented
|
||||
- [Models](models.md) - Complete data model reference
|
||||
555
docs/components/search-tool.md
Normal file
555
docs/components/search-tool.md
Normal file
@@ -0,0 +1,555 @@
|
||||
# SearchTool Component
|
||||
|
||||
Deep dive into `SearchTool` - the core pipeline orchestrator that implements the 4-phase search-retrieve-rank workflow.
|
||||
|
||||
## Overview
|
||||
|
||||
`SearchTool` is the workhorse of OpenQuery. It Takes search queries, fetches articles, generates embeddings, ranks by relevance, and returns formatted context for the final AI answer.
|
||||
|
||||
## Location
|
||||
`Tools/SearchTool.cs`
|
||||
|
||||
## Class Definition
|
||||
|
||||
```csharp
|
||||
public class SearchTool
|
||||
{
|
||||
private readonly SearxngClient _searxngClient;
|
||||
private readonly EmbeddingService _embeddingService;
|
||||
private readonly ParallelProcessingOptions _options;
|
||||
|
||||
public static string Name => "search";
|
||||
public static string Description => "Search the web for information on a topic";
|
||||
|
||||
public SearchTool(
|
||||
SearxngClient searxngClient,
|
||||
EmbeddingService embeddingService);
|
||||
|
||||
public Task<string> ExecuteAsync(
|
||||
string originalQuery,
|
||||
List<string> generatedQueries,
|
||||
int maxResults,
|
||||
int topChunksLimit,
|
||||
Action<string>? onProgress = null,
|
||||
bool verbose = true);
|
||||
}
|
||||
```
|
||||
|
||||
**Dependencies**:
|
||||
- `SearxngClient` - for web searches
|
||||
- `EmbeddingService` - for vector generation
|
||||
- `ParallelProcessingOptions` - concurrency settings (hardcoded new instance)
|
||||
|
||||
**Static Properties**:
|
||||
- `Name` - tool identifier (currently "search")
|
||||
- `Description` - tool description
|
||||
|
||||
## ExecuteAsync Method
|
||||
|
||||
**Signature**:
|
||||
```csharp
|
||||
public async Task<string> ExecuteAsync(
|
||||
string originalQuery, // User's original question
|
||||
List<string> generatedQueries, // Expanded search queries
|
||||
int maxResults, // Results per query
|
||||
int topChunksLimit, // Top N chunks to return
|
||||
Action<string>? onProgress, // Progress callback
|
||||
bool verbose) // Verbose mode flag
|
||||
```
|
||||
|
||||
**Returns**: `Task<string>` - formatted context with source citations
|
||||
|
||||
**Contract**:
|
||||
- Never returns `null` (returns "No search results found." on zero results)
|
||||
- Progress callback may be invoked frequently (many phases)
|
||||
- `verbose` passed to sub-components for their own logging
|
||||
|
||||
## The 4-Phase Pipeline
|
||||
|
||||
```
|
||||
ExecuteAsync()
|
||||
│
|
||||
├─ Phase 1: ExecuteParallelSearchesAsync
|
||||
│ Input: generatedQueries × maxResults
|
||||
│ Output: List<SearxngResult> (deduplicated)
|
||||
│
|
||||
├─ Phase 2: ExecuteParallelArticleFetchingAsync
|
||||
│ Input: List<SearxngResult>
|
||||
│ Output: List<Chunk> (with content, url, title)
|
||||
│
|
||||
├─ Phase 3: ExecuteParallelEmbeddingsAsync
|
||||
│ Input: originalQuery + List<Chunk>
|
||||
│ Output: (queryEmbedding, chunkEmbeddings)
|
||||
│ (also sets Chunk.Embedding for valid chunks)
|
||||
│
|
||||
├─ Phase 4: RankAndSelectTopChunks
|
||||
│ Input: List<Chunk> + queryEmbedding + chunkEmbeddings
|
||||
│ Output: List<Chunk> topChunks (with Score set)
|
||||
│
|
||||
└─ Format Context → return string
|
||||
```
|
||||
|
||||
### Phase 1: ExecuteParallelSearchesAsync
|
||||
|
||||
**Purpose**: Execute all search queries in parallel, collect and deduplicate results.
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
var allResults = new ConcurrentBag<SearxngResult>();
|
||||
|
||||
var searchTasks = generatedQueries.Select(async query =>
|
||||
{
|
||||
onProgress?.Invoke($"[Searching web for '{query}'...]");
|
||||
try
|
||||
{
|
||||
var results = await _searsult in results)
|
||||
{
|
||||
allResults.Add(result);
|
||||
}
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
if (verbose)
|
||||
Console.WriteLine($"Warning: Search failed for query '{query}': {ex.Message}");
|
||||
}
|
||||
});
|
||||
|
||||
await Task.WhenAll(searchTasks);
|
||||
|
||||
var uniqueResults = allResults.DistinctBy(r => r.Url).ToList();
|
||||
return uniqueResults;
|
||||
```
|
||||
|
||||
**Details**:
|
||||
- `ConcurrentBag<SearxngResult>` collects results thread-safely
|
||||
- `Task.WhenAll` - unbounded parallelism (parallel to `generatedQueries.Count`)
|
||||
- Each task: calls `_searxngClient.SearchAsync(query, maxResults)`
|
||||
- Errors caught and logged (verbose only); other queries continue
|
||||
- `DistinctBy(r => r.Url)` removes duplicates
|
||||
|
||||
**Return**: `List<SearxngResult>` (unique URLs only)
|
||||
|
||||
**Progress**: `[Searching web for '{query}'...]`
|
||||
|
||||
**Potential Issues**:
|
||||
- Could overwhelm local SearxNG if `generatedQueries` is large (100+)
|
||||
- SearxNG itself may have its own rate limiting
|
||||
|
||||
**Future Enhancement**:
|
||||
- Add semaphore to limit search concurrency
|
||||
- Add timeout per search task
|
||||
- Cache search results (same query across runs)
|
||||
|
||||
### Phase 2: ExecuteParallelArticleFetchingAsync
|
||||
|
||||
**Purpose**: Fetch each search result URL, extract article content, split into chunks.
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
var chunks = new ConcurrentBag<Chunk>();
|
||||
var completedFetches = 0;
|
||||
var totalFetches = searchResults.Count;
|
||||
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
|
||||
|
||||
var fetchTasks = searchResults.Select(async result =>
|
||||
{
|
||||
await semaphore.WaitAsync();
|
||||
try
|
||||
{
|
||||
var current = Interlocked.Increment(ref completedFetches);
|
||||
var uri = new Uri(result.Url);
|
||||
var domain = uri.Host;
|
||||
onProgress?.Invoke($"[Fetching article {current}/{totalFetches}: {domain}]");
|
||||
|
||||
try
|
||||
{
|
||||
var article = await ArticleService.FetchArticleAsync(result.Url);
|
||||
if (!article.IsReadable || string.IsNullOrEmpty(article.TextContent))
|
||||
return;
|
||||
|
||||
var textChunks = ChunkingService.ChunkText(article.TextContent);
|
||||
foreach (var chunkText in textChunks)
|
||||
{
|
||||
chunks.Add(new Chunk(chunkText, result.Url, article.Title));
|
||||
}
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
if (verbose)
|
||||
Console.WriteLine($"Warning: Failed to fetch article {result.Url}: {ex.Message}");
|
||||
}
|
||||
}
|
||||
finally
|
||||
{
|
||||
semaphore.Release();
|
||||
}
|
||||
});
|
||||
|
||||
await Task.WhenAll(fetchTasks);
|
||||
return chunks.ToList();
|
||||
```
|
||||
|
||||
**Details**:
|
||||
- `SemaphoreSlim` limits concurrency to `MaxConcurrentArticleFetches` (10)
|
||||
- `Interlocked.Increment` for thread-safe progress counting
|
||||
- Progress: `[Fetching article X/Y: domain]` (extracts host from URL)
|
||||
- `ArticleService.FetchArticleAsync` uses SmartReader
|
||||
- Article must be `IsReadable` and have `TextContent`
|
||||
- `ChunkingService.ChunkText` splits into ~500-char pieces
|
||||
- Each chunk becomes a `Chunk(content, url, article.Title)`
|
||||
- Errors logged (verbose only); failed URLs yield no chunks
|
||||
|
||||
**Return**: `List<Chunk>` (potentially many per article)
|
||||
|
||||
**Chunk Count Estimate**:
|
||||
- 15 articles × average 3000 chars/article = 45,000 chars
|
||||
- With 500-char chunks ≈ 90 chunks
|
||||
- With natural breaks → maybe 70-80 chunks
|
||||
|
||||
**Potential Issues**:
|
||||
- Some sites block SmartReader (JS-heavy, paywalls)
|
||||
- Slow article fetches may cause long tail latency
|
||||
- Large articles create many chunks → memory + embedding cost
|
||||
|
||||
**Future Enhancements**:
|
||||
- Add per-URL timeout
|
||||
- Filter chunks by length threshold (skip tiny chunks)
|
||||
- Deduplicate chunks across articles (same content on different sites)
|
||||
- Cache article fetches by URL
|
||||
|
||||
### Phase 3: ExecuteParallelEmbeddingsAsync
|
||||
|
||||
**Purpose**: Generate embeddings for the original query and all chunks, with batching, rate limiting, and concurrency control.
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
onProgress?.Invoke($"[Generating embeddings for {chunks.Count} chunks and query...]");
|
||||
|
||||
// Start query embedding (single) and chunk embeddings (batch) concurrently
|
||||
var queryEmbeddingTask = _embeddingService.GetEmbeddingAsync(originalQuery);
|
||||
|
||||
var chunkTexts = chunks.Select(c => c.Embedding).ToList(); // WRONG in original code?
|
||||
// Actually: chunks.Select(c => c.Content).ToList();
|
||||
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
|
||||
chunkTexts, onProgress);
|
||||
|
||||
await Task.WhenAll(queryEmbeddingTask, chunkEmbeddingsTask);
|
||||
|
||||
var queryEmbedding = await queryEmbeddingTask;
|
||||
var chunkEmbeddings = await chunkEmbeddingsTask;
|
||||
|
||||
// Filter out chunks with empty embeddings
|
||||
var validChunks = new List<Chunk>();
|
||||
var validEmbeddings = new List<float[]>();
|
||||
|
||||
for (var i = 0; i < chunks.Count; i++)
|
||||
{
|
||||
if (chunkEmbeddings[i].Length > 0)
|
||||
{
|
||||
validChunks.Add(chunks[i]);
|
||||
validEmbeddings.Add(chunkEmbeddings[i]);
|
||||
}
|
||||
}
|
||||
|
||||
// Update chunks with embeddings
|
||||
for (var i = 0; i < validChunks.Count; i++)
|
||||
{
|
||||
validChunks[i].Embedding = validEmbeddings[i];
|
||||
}
|
||||
|
||||
return (queryEmbedding, validEmbeddings.ToArray());
|
||||
```
|
||||
|
||||
**Corrected Code** (matching actual source):
|
||||
```csharp
|
||||
var chunkTexts = chunks.Select(c => c.Content).ToList();
|
||||
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
|
||||
chunkTexts, onProgress);
|
||||
```
|
||||
|
||||
**Details**:
|
||||
- **Query embedding**: Single request for original question (one embedding)
|
||||
- **Chunk embeddings**: Batch processing of all chunk texts
|
||||
- Both run concurrently via `Task.WhenAll`
|
||||
- `_embeddingService.GetEmbeddingsWithRateLimitAsync` uses:
|
||||
- Batch size: 300 (default)
|
||||
- Max concurrent batches: 4 (default)
|
||||
- Polly retry (3 attempts, exponential backoff)
|
||||
- `RateLimiter` (semaphore) for API concurrency
|
||||
- Failed batches return empty `float[]` (length 0)
|
||||
- Filters out failed chunks (won't be ranked)
|
||||
- `validChunks[i].Embedding = validEmbeddings[i]` attaches embedding to chunk
|
||||
|
||||
**Return**: `(float[] queryEmbedding, float[][] chunkEmbeddings)` where:
|
||||
- `chunkEmbeddings` length = `validChunks.Count` (filtered)
|
||||
- Order matches `validChunks` order (since we filtered parallel arrays)
|
||||
|
||||
**Progress**: Interleaved from embedding service's own progress callbacks (batch X/Y)
|
||||
|
||||
**Potential Issues**:
|
||||
- `GetEmbeddingsWithRateLimitAsync` uses `results[batchIndex] = ...` which is not thread-safe without synchronization - **BUG**?
|
||||
- Actually `results` is an array, not a list, so indexing is thread-safe
|
||||
- But concurrent writes to different indices are safe
|
||||
- Filtering loop assumes `chunkEmbeddings` has same count as `chunks`; if embedding service returns fewer, might index out of range
|
||||
- Looking at `GetEmbeddingsWithRateLimitAsync`: returns `results.SelectMany(r => r).ToArray()` which should match input count (including empty arrays for failed batches)
|
||||
- So safe
|
||||
|
||||
**Memory Consideration**:
|
||||
- `chunkTexts` list holds all chunk strings (may be large, but still in memory)
|
||||
- `chunkEmbeddings` holds all float arrays (600KB for 100 chunks)
|
||||
- Total: modest (~few MB)
|
||||
|
||||
**Future Enhancements**:
|
||||
- Stream embeddings? (No benefit, need all for ranking)
|
||||
- Cache embeddings by content hash (cross-run)
|
||||
- Support different embedding model per query
|
||||
|
||||
### Phase 4: RankAndSelectTopChunks
|
||||
|
||||
**Purpose**: Score chunks by semantic relevance to query, sort, and select top N.
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
var chunksWithEmbeddings = chunks.Where(c => c.Embedding != null).ToList();
|
||||
|
||||
foreach (var chunk in chunksWithEmbeddings)
|
||||
{
|
||||
chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
|
||||
}
|
||||
|
||||
var topChunks = chunksWithEmbeddings
|
||||
.OrderByDescending(c => c.Score)
|
||||
.Take(topChunksLimit)
|
||||
.ToList();
|
||||
|
||||
return topChunks;
|
||||
```
|
||||
|
||||
**Details**:
|
||||
- Filters to chunks that have embeddings (successful phase 3)
|
||||
- For each: `Score = CosineSimilarity(queryEmbedding, chunkEmbedding)`
|
||||
- Uses `TensorPrimitives.CosineSimilarity` (SIMD-accelerated)
|
||||
- Returns float typically 0-1 (higher = more relevant)
|
||||
- `OrderByDescending` - highest scores first
|
||||
- `Take(topChunksLimit)` - select top N (from `--chunks` option)
|
||||
- Returns `List<Chunk>` (now with `Score` set)
|
||||
|
||||
**Return**: Top N chunks ready for context formatting
|
||||
|
||||
**Complexity**:
|
||||
- O(n) for scoring (where n = valid chunks, typically 50-100)
|
||||
- O(n log n) for sorting (fast for n=100)
|
||||
- Negligible CPU time
|
||||
|
||||
**Edge Cases**:
|
||||
- If `topChunksLimit` > `chunksWithEmbeddings.Count`, returns all (no padding)
|
||||
- If all embeddings failed, returns empty list
|
||||
- Should handle `topChunksLimit == 0` (returns empty)
|
||||
|
||||
### Context Formatting (After Phase 4)
|
||||
|
||||
**Location**: In `ExecuteAsync`, after ranking:
|
||||
|
||||
```csharp
|
||||
var context = string.Join("\n\n", topChunks.Select((c, i) =>
|
||||
$"[Source {i + 1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
|
||||
|
||||
return context;
|
||||
```
|
||||
|
||||
**Format**:
|
||||
```
|
||||
[Source 1: Article Title](https://example.com/article)
|
||||
Chunk content text...
|
||||
|
||||
[Source 2: Another Title](https://example.com/another)
|
||||
Chunk content text...
|
||||
|
||||
[Source 3: Third Title](https://example.com/third)
|
||||
Chunk content text...
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- Each source numbered 1, 2, 3... (matches order of topChunks = descending relevance)
|
||||
- Title or "Unknown" if null
|
||||
- Title is markdown link to original URL
|
||||
- Chunk content as plain text (may contain its own formatting)
|
||||
- Double newline between sources
|
||||
|
||||
**Rationale**:
|
||||
- Markdown links allow copy-pasting to browsers
|
||||
- Numbers allow LLM to cite `[Source 1]` in answer
|
||||
- Original title helps user recognize source
|
||||
|
||||
**Potential Issues**:
|
||||
- LLM might misinterpret "Source 1" as literal citation required
|
||||
- If chunks contain markdown, may conflict (no escaping)
|
||||
- Some titles may have markdown special chars (unlikely but possible)
|
||||
|
||||
**Alternative**: Could use XML-style tags or more robust citation format.
|
||||
|
||||
## Error Handling & Edge Cases
|
||||
|
||||
### Empty Results Handling
|
||||
|
||||
At end of `ExecuteAsync`:
|
||||
```csharp
|
||||
if (searchResults.Count == 0)
|
||||
return "No search results found.";
|
||||
|
||||
if (chunks.Count == 0)
|
||||
return "Found search results but could not extract readable content.";
|
||||
```
|
||||
|
||||
These messages appear in final answer (LLM will respond to these contexts).
|
||||
|
||||
### Partial Failures
|
||||
|
||||
- Some search queries fail → proceed with others
|
||||
- Some articles fail to fetch → continue
|
||||
- Some embedding batches fail → those chunks filtered out
|
||||
- Ranking proceeds with whatever valid embeddings exist
|
||||
|
||||
### Verbose vs Compact Progress
|
||||
|
||||
`verbose` parameter affects what's passed to phases:
|
||||
- **Article fetching**: errors only shown if `verbose`
|
||||
- **Embeddings**: always shows batch progress via `onProgress` (from EmbeddingService)
|
||||
- **Searches**: no error suppression (warning always logged to Console, not through callback)
|
||||
|
||||
### Progress Callback Pattern
|
||||
|
||||
`onProgress` is invoked at major milestones:
|
||||
- Searching: `[Searching web for '{query}'...]`
|
||||
- Article fetch: `[Fetching article X/Y: domain]`
|
||||
- Embeddings: `[Generating embeddings: batch X/Y]`
|
||||
- Final: `[Found top X most relevant chunks overall. Generating answer...]`
|
||||
|
||||
Each phase may invoke many times (e.g., embedding batches). `StatusReporter` handles these appropriately.
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles):
|
||||
|
||||
| Phase | Time | Dominated By |
|
||||
|-------|------|--------------|
|
||||
| Searches | 3-8s | Network latency to SearxNG |
|
||||
| Article Fetching | 5-15s | Network + SmartReader CPU |
|
||||
| Embeddings | 2-4s | OpenRouter API latency (4 concurrent batches) |
|
||||
| Ranking | <0.1s | CPU (O(n log n) sort, n~100) |
|
||||
| **Total Pipeline** | **10-30s** | Articles + Searches |
|
||||
|
||||
### Concurrency Limits Effect
|
||||
|
||||
**Article Fetching** (`MaxConcurrentArticleFetches` = 10):
|
||||
- 15 articles → 2 waves (10 then 5)
|
||||
- If each takes 2s → ~4s total (vs 30s sequential)
|
||||
|
||||
**Embedding Batching** (`MaxConcurrentEmbeddingRequests` = 4, `EmbeddingBatchSize` = 300):
|
||||
- 80 chunks → 1 batch of 300 (all fit)
|
||||
- If 300 chunks → 1 batch (300 fits), but max concurrent = 4 if multiple embedding calls
|
||||
- Here: single embedding call with 80 items = 1 batch (no parallelism needed)
|
||||
|
||||
### Memory Usage
|
||||
|
||||
- `searchResults` (15 items) → ~30KB
|
||||
- `chunks` (80 items × 500 chars) → ~40KB text + embeddings ~400KB (80 × 1536 × 4)
|
||||
- Total ≈ 500KB excluding temporary HTTP buffers
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Use ConcurrentBag for Results/Chunks?
|
||||
|
||||
Thread-safe collection allows parallel tasks to add without locks. Order is not preserved (but we `DistinctBy` and `Select` maintains order of insertion? Actually no, `ConcurrentBag` doesn't guarantee order. But we later `ToList()` and `DistinctBy` preserves first occurrence order from the bag's enumeration (which is nondeterministic). This is acceptable because order doesn't matter (ranking is semantic). If order mattered, would need `ConcurrentQueue` or sorting by source.
|
||||
|
||||
### Why Not Use Parallel.ForEach for Article Fetching?
|
||||
|
||||
We use `Task.WhenAll` with `Select` + semaphore. `Parallel.ForEachAsync` could also work but requires .NET 6+ and we want to use same pattern as other phases. Semaphore gives explicit concurrency control.
|
||||
|
||||
### Why Separate Query Embedding from Chunk Embeddings?
|
||||
|
||||
`GetEmbeddingAsync` is called directly (not batched) because there's only one query. Could be batched with chunks but:
|
||||
- Query is small (single string)
|
||||
- Batch API has overhead (request structure)
|
||||
- Separate call allows independent completion (no need to wait for chunks to start query embedding)
|
||||
|
||||
### Why Two Different Embedding Methods?
|
||||
|
||||
`EmbeddingService` has:
|
||||
- `GetEmbeddingsWithRateLimitAsync` (used in SearchTool)
|
||||
- `GetEmbeddingsAsync` (similar but different implementation)
|
||||
|
||||
Probably legacy/refactor artifact. Could consolidate.
|
||||
|
||||
### Why Not Deduplicate URLs Earlier?
|
||||
|
||||
Deduplication happens after search aggregation. Could also deduplicate within each search result (SearxNG might already dedupe across engines). But global dedupe is necessary.
|
||||
|
||||
### Why Not Early Filtering (e.g., by domain, length)?
|
||||
|
||||
Possibly could improve quality:
|
||||
- Filter by domain reputation
|
||||
- Filter articles too short (<200 chars) or too long (>50KB)
|
||||
- Not implemented (keep simple)
|
||||
|
||||
## Testing Considerations
|
||||
|
||||
**Unit Testability**: `SearchTool` is fairly testable with mocks:
|
||||
- Mock `SearxngClient` to return predetermined results
|
||||
- Mock `ArticleService` via `EmbeddingService` (or mock that too)
|
||||
- Verify progress callback invocations
|
||||
- Verify final context format
|
||||
|
||||
**Integration Testing**:
|
||||
- End-to-end with real/mocked external services
|
||||
- Need test SearxNG instance and test OpenRouter key (or mock responses)
|
||||
|
||||
**Performance Testing**:
|
||||
- Benchmark with different concurrency settings
|
||||
- Profile memory for large result sets (1000+ articles)
|
||||
- Measure embedding API latency impact
|
||||
|
||||
## Known Issues
|
||||
|
||||
### Bug in ExecuteParallelEmbeddingsAsync?
|
||||
|
||||
Looking at the actual source code of `ExecuteParallelEmbeddingsAsync` **in the core SearchTool**:
|
||||
|
||||
```csharp
|
||||
var chunkTexts = chunks.Select(c => c.Content).ToList();
|
||||
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
|
||||
chunkTexts, onProgress);
|
||||
```
|
||||
|
||||
This is correct.
|
||||
|
||||
But in the **initial search result**, I notice there might be confusion. I'll verify this when writing the full component documentation.
|
||||
|
||||
### Potential Race Condition in GetEmbeddingsWithRateLimitAsync
|
||||
|
||||
```csharp
|
||||
results[batchIndex] = batchResults;
|
||||
```
|
||||
|
||||
This is writing to an array index from multiple parallel tasks. Array index writes are atomic for reference types on 64-bit? Actually, writes to different indices are safe because they don't overlap. This is fine.
|
||||
|
||||
### Progress Callback May Overwhelm
|
||||
|
||||
If invoked synchronously from many parallel tasks, could saturate the channel. `Channel.TryWrite` will return false if buffer full; we ignore return value. Could drop messages under heavy load. Acceptable for CLI UI (some messages may be lost but overall progress visible).
|
||||
|
||||
## Related Components
|
||||
|
||||
- **[OpenQueryApp](openquery-app.md)** - calls this
|
||||
- **[SearxngClient](../../services/SearxngClient.md)** - phase 1
|
||||
- **[ArticleService](../../services/ArticleService.md)** - phase 2a
|
||||
- **[ChunkingService](../../services/ChunkingService.md)** - phase 2b
|
||||
- **[EmbeddingService](../../services/EmbeddingService.md)** - phase 3
|
||||
- **[Ranking](../../services/EmbeddingService.md#cosinesimilarity)** - cosine similarity
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Services Overview](../services/overview.md) - See supporting services
|
||||
- [CLI Reference](../../api/cli.md) - How users trigger this pipeline
|
||||
- [Performance](../performance.md) - Optimize pipeline settings
|
||||
471
docs/components/services.md
Normal file
471
docs/components/services.md
Normal file
@@ -0,0 +1,471 @@
|
||||
# Services Overview
|
||||
|
||||
Comprehensive reference for all service classes in OpenQuery.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Service Catalog](#service-catalog)
|
||||
2. [Client Services](#client-services)
|
||||
3. [Processing Services](#processing-services)
|
||||
4. [Infrastructure Services](#infrastructure-services)
|
||||
5. [Service Interactions](#service-interactions)
|
||||
|
||||
## Service Catalog
|
||||
|
||||
OpenQuery's services are organized into three categories:
|
||||
|
||||
| Category | Services | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| **Clients** | `OpenRouterClient`, `SearxngClient` | External API communication |
|
||||
| **Processors** | `EmbeddingService`, `ChunkingService`, `ArticleService` | Data transformation & extraction |
|
||||
| **Infrastructure** | `RateLimiter`, `StatusReporter` | Cross-cutting concerns |
|
||||
|
||||
All services are **stateless** (except for internal configuration) and can be safely reused across multiple operations.
|
||||
|
||||
---
|
||||
|
||||
## Client Services
|
||||
|
||||
### OpenRouterClient
|
||||
|
||||
**Location**: `Services/OpenRouterClient.cs`
|
||||
**Purpose**: HTTP client for OpenRouter AI APIs (chat completions & embeddings)
|
||||
|
||||
#### API Endpoints
|
||||
|
||||
| Method | Endpoint | Purpose |
|
||||
|--------|----------|---------|
|
||||
| POST | `/chat/completions` | Chat completion (streaming or non-streaming) |
|
||||
| POST | `/embeddings` | Embedding generation for text inputs |
|
||||
|
||||
#### Authentication
|
||||
```
|
||||
Authorization: Bearer {apiKey}
|
||||
Accept: application/json
|
||||
```
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `StreamAsync(ChatCompletionRequest request, CancellationToken cancellationToken)`
|
||||
- **Returns**: `IAsyncEnumerable<StreamChunk>`
|
||||
- **Behavior**: Sets `request.Stream = true`, posts, reads Server-Sent Events stream
|
||||
- **Use Case**: Final answer streaming, real-time responses
|
||||
- **Stream Format**: SSE lines `data: {json}`; yields `TextDelta` or `ToolCall`
|
||||
|
||||
##### `CompleteAsync(ChatCompletionRequest request)`
|
||||
- **Returns**: `Task<ChatCompletionResponse>`
|
||||
- **Behavior**: Sets `request.Stream = false`, posts, returns full response
|
||||
- **Use Case**: Query generation (non-streaming)
|
||||
|
||||
##### `EmbedAsync(string model, List<string> inputs)`
|
||||
- **Returns**: `Task<float[][]>`
|
||||
- **Behavior**: POST `/embeddings`, returns array of vectors (ordered by input index)
|
||||
- **Use Case**: Batch embedding generation
|
||||
|
||||
##### `HttpClient`
|
||||
- **Property**: Internal `_httpClient` (created per instance)
|
||||
- **Note**: Could use `IHttpClientFactory` for pooling (not needed for CLI)
|
||||
|
||||
#### Error Handling
|
||||
- `EnsureSuccessStatusCode()` throws `HttpRequestException` on 4xx/5xx
|
||||
- No retry logic (handled by `EmbeddingService`)
|
||||
|
||||
#### Configuration
|
||||
```csharp
|
||||
public OpenRouterClient(string apiKey)
|
||||
{
|
||||
_apiKey = apiKey;
|
||||
_httpClient = new HttpClient();
|
||||
_httpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
|
||||
_httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
|
||||
}
|
||||
```
|
||||
|
||||
#### Example Usage
|
||||
```csharp
|
||||
var client = new OpenRouterClient("sk-or-...");
|
||||
var request = new ChatCompletionRequest("model", new List<Message> { ... });
|
||||
await foreach (var chunk in client.StreamAsync(request))
|
||||
{
|
||||
Console.Write(chunk.TextDelta);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SearxngClient
|
||||
|
||||
**Location**: `Services/SearxngClient.cs`
|
||||
**Purpose**: HTTP client for SearxNG metasearch engine
|
||||
|
||||
#### API Endpoint
|
||||
```
|
||||
GET /search?q={query}&format=json
|
||||
```
|
||||
|
||||
#### Constructor
|
||||
```csharp
|
||||
public SearxngClient(string baseUrl) // e.g., "http://localhost:8002"
|
||||
```
|
||||
- `baseUrl` trimmed of trailing `/`
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `SearchAsync(string query, int limit = 10)`
|
||||
- **Returns**: `Task<List<SearxngResult>>`
|
||||
- **Behavior**: GET request, deserialize JSON, take up to `limit` results
|
||||
- **On Failure**: Returns empty `List<SearxngResult>` (no exception)
|
||||
|
||||
#### Error Handling
|
||||
- `response.EnsureSuccessStatusCode()` would throw, but code doesn't call it
|
||||
- If invalid JSON or missing `Results`, returns empty list
|
||||
- Failures are **tolerated** - individual search queries may fail without aborting whole operation
|
||||
|
||||
#### Example Searxng Response
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"title": "Quantum Entanglement - Wikipedia",
|
||||
"url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
|
||||
"content": "Quantum entanglement is a physical phenomenon..."
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Processing Services
|
||||
|
||||
### EmbeddingService
|
||||
|
||||
**Location**: `Services/EmbeddingService.cs`
|
||||
**Purpose**: Generate embeddings with batching, rate limiting, and retry logic
|
||||
|
||||
#### Configuration
|
||||
|
||||
**Embedding Model**: `openai/text-embedding-3-small` (default, configurable via constructor)
|
||||
|
||||
**ParallelProcessingOptions** (hardcoded defaults):
|
||||
```csharp
|
||||
public class ParallelProcessingOptions
|
||||
{
|
||||
public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
|
||||
public int EmbeddingBatchSize { get; set; } = 300;
|
||||
}
|
||||
```
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
|
||||
- **Returns**: `Task<float[][]>`
|
||||
- **Behavior**:
|
||||
- Splits `texts` into batches of `EmbeddingBatchSize`
|
||||
- Parallel executes batches (max `MaxConcurrentEmbeddingRequests` concurrent)
|
||||
- Each batch: rate-limited, retry-wrapped `client.EmbedAsync(model, batch)`
|
||||
- Reassembles in original order
|
||||
- Failed batches → empty `float[]` for each text
|
||||
- **Progress**: Invokes `onProgress` for each batch: `"[Generating embeddings: batch X/Y]"`
|
||||
- **Thread-Safe**: Uses lock for collecting results
|
||||
|
||||
##### `GetEmbeddingAsync(string text, CancellationToken)`
|
||||
- **Returns**: `Task<float[]>`
|
||||
- **Behavior**: Single embedding with rate limiting and retry
|
||||
- **Use Case**: Query embedding
|
||||
|
||||
##### `Cos static float CosineSimilarity(float[] vector1, float[] vector2)
|
||||
```
|
||||
Uses `System.Numerics.Tensors.TensorPrimitives.CosineSimilarity`
|
||||
|
||||
Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
|
||||
```
|
||||
|
||||
**Implementation**: Single line calling SIMD-accelerated tensor primitive
|
||||
|
||||
---
|
||||
|
||||
### ArticleService
|
||||
|
||||
**Location**: `Services/ArticleService.cs`
|
||||
**Purpose**: Extract clean article content from web URLs
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `FetchArticleAsync(string url)`
|
||||
- **Returns**: `Task<Article>`
|
||||
- **Behavior**: Delegates to `SmartReader.ParseArticleAsync(url)`
|
||||
- **Result**: `Article` with `Title`, `TextContent`, `IsReadable`, and metadata
|
||||
|
||||
#### Errors
|
||||
- Propagates exceptions (SmartReader may throw on network failures, malformed HTML)
|
||||
- `SearchTool` catches and logs
|
||||
|
||||
#### SmartReader Notes
|
||||
- Open-source article extraction library (bundled via NuGet)
|
||||
- Uses Readability algorithm (similar to Firefox Reader View)
|
||||
- Removes ads, navigation, boilerplate
|
||||
- `IsReadable` indicates quality (e.g., not a 404 page, not too short)
|
||||
|
||||
---
|
||||
|
||||
### ChunkingService
|
||||
|
||||
**Location**: `Services/ChunkingService.cs`
|
||||
**Purpose**: Split text into 500-character chunks at natural boundaries
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `ChunkText(string text)`
|
||||
- **Returns**: `List<string>`
|
||||
- **Algorithm**:
|
||||
- Constant `MAX_CHUNK_SIZE = 500`
|
||||
- While remaining text:
|
||||
- Take up to 500 chars
|
||||
- If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
|
||||
- Trim, add if non-empty
|
||||
- Advance start
|
||||
- Returns all chunks
|
||||
|
||||
#### Characteristics
|
||||
- Static class (no instances)
|
||||
- Pure function (no side effects)
|
||||
- Zero dependencies
|
||||
- Handles edge cases (empty text, short text, text without breaks)
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure Services
|
||||
|
||||
### RateLimiter
|
||||
|
||||
**Location**: `Services/RateLimiter.cs`
|
||||
**Purpose**: Limit concurrent operations using semaphore
|
||||
|
||||
#### Constructor
|
||||
```csharp
|
||||
public RateLimiter(int maxConcurrentRequests)
|
||||
```
|
||||
Creates `SemaphoreSlim` with `maxConcurrentRequests`
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `ExecuteAsync<T>(Func<Task<T>> action, CancellationToken)`
|
||||
```csharp
|
||||
public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken cancellationToken = default)
|
||||
{
|
||||
await _semaphore.WaitAsync(cancellationToken);
|
||||
try
|
||||
{
|
||||
return await action();
|
||||
}
|
||||
finally
|
||||
{
|
||||
_semaphore.Release();
|
||||
}
|
||||
}
|
||||
```
|
||||
- Waits for semaphore slot
|
||||
- Executes `action` (typically an API call)
|
||||
- Releases semaphore (even if exception)
|
||||
- Returns result from `action`
|
||||
|
||||
##### `ExecuteAsync(Func<Task> action, CancellationToken)`
|
||||
- Non-generic version (for void-returning actions)
|
||||
|
||||
#### Disposal
|
||||
```csharp
|
||||
public async ValueTask DisposeAsync()
|
||||
{
|
||||
_semaphore.Dispose();
|
||||
}
|
||||
```
|
||||
Implements `IAsyncDisposable` for async cleanup
|
||||
|
||||
#### Usage Pattern
|
||||
```csharp
|
||||
var result = await _rateLimiter.ExecuteAsync(async () =>
|
||||
{
|
||||
return await SomeApiCall();
|
||||
}, cancellationToken);
|
||||
```
|
||||
|
||||
#### Where Used
|
||||
- `EmbeddingService`: Limits concurrent embedding batch requests (default 4)
|
||||
|
||||
---
|
||||
|
||||
### StatusReporter
|
||||
|
||||
**Location**: `Services/StatusReporter.cs`
|
||||
**Purpose**: Real-time progress display with spinner (compact) or verbose lines
|
||||
|
||||
#### Constructor
|
||||
```csharp
|
||||
public StatusReporter(bool verbose)
|
||||
```
|
||||
- `verbose = true`: all progress via `WriteLine()` (no spinner)
|
||||
- `verbose = false`: spinner with latest status
|
||||
|
||||
#### Architecture
|
||||
|
||||
**Components**:
|
||||
- `Channel<string> _statusChannel` - producer-consumer queue
|
||||
- `Task _statusProcessor` - background task reading from channel
|
||||
- `CancellationTokenSource _spinnerCts` - spinner task cancellation
|
||||
- `Task _spinnerTask` - spinner animation task
|
||||
- `char[] _spinnerChars` - Braille spinner pattern
|
||||
|
||||
**Spinner Animation**:
|
||||
- Runs at 10 FPS (100ms interval)
|
||||
- Cycles through `['⠋','⠙','⠹','⠸','⠼','⠴','⠦','⠧','⠇','⠏']`
|
||||
- Displays: `⠋ Fetching articles...`
|
||||
- Updates in place using ANSI: `\r\x1b[K` (carriage return + erase line)
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `UpdateStatus(string message)`
|
||||
- Fire-and-forget: writes to channel via `TryWrite` (non-blocking)
|
||||
- If channel full, message dropped (acceptable loss for UI)
|
||||
|
||||
##### `WriteLine(string text)`
|
||||
- Stops spinner temporarily
|
||||
- Clears current status line
|
||||
- Writes `text` with newline
|
||||
- In verbose mode: just `Console.WriteLine(text)`
|
||||
|
||||
##### `ClearStatus()`
|
||||
- In compact mode: `Console.Write("\r\x1b[K")` (erase line)
|
||||
- In verbose: no-op
|
||||
- Sets `_currentMessage = null`
|
||||
|
||||
##### `StartSpinner()` / `StopSpinner()`
|
||||
- Manual control (usually `StartSpinner` constructor call, `StopSpinner` by `Dispose`)
|
||||
|
||||
##### `Dispose()`
|
||||
- Completes channel writer
|
||||
- Awaits `_statusProcessor` completion
|
||||
- Calls `StopSpinner()`
|
||||
|
||||
#### Background Processing
|
||||
|
||||
**Status Processor**:
|
||||
```csharp
|
||||
private async Task ProcessStatusUpdatesAsync()
|
||||
{
|
||||
await foreach (var message in _statusChannel.Reader.ReadAllAsync())
|
||||
{
|
||||
if (_verbose)
|
||||
{
|
||||
Console.WriteLine(message);
|
||||
continue;
|
||||
}
|
||||
Console.Write("\r\x1b[K"); // Clear line
|
||||
Console.Write($"{_spinnerChars[0]} {message}"); // Static spinner
|
||||
_currentMessage = message;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Spinner Task**:
|
||||
```csharp
|
||||
_spinnerTask = Task.Run(async () =>
|
||||
{
|
||||
while (_spinnerCts is { Token.IsCancellationRequested: false })
|
||||
{
|
||||
if (_currentMessage != null)
|
||||
{
|
||||
Console.Write("\r\x1b[K");
|
||||
var charIndex = index++ % spinner.Length;
|
||||
Console.Write($"{spinner[charIndex]} {_currentMessage}");
|
||||
}
|
||||
await Task.Delay(100, _spinnerCts.Token);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
#### Thread Safety
|
||||
- `UpdateStatus` (producer) writes to channel
|
||||
- `ProcessStatusUpdatesAsync` (consumer) reads from channel
|
||||
- `_spinnerTask` runs concurrently
|
||||
- All UI writes happen in consumer/spinner task context (single-threaded UI)
|
||||
|
||||
#### Design Notes
|
||||
- Could be simplified: just use `Console.CursorLeft` for spinner, no channel
|
||||
- Channel allows random `UpdateStatus` calls from any thread without blocking
|
||||
- Braille spinner requires terminal that supports Unicode (most modern terminals do)
|
||||
|
||||
---
|
||||
|
||||
## Service Interactions
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
OpenQueryApp
|
||||
├── OpenRouterClient ← (used for query gen + final answer)
|
||||
└── SearchTool
|
||||
├── SearxngClient
|
||||
├── ArticleService (uses SmartReader)
|
||||
├── ChunkingService (static)
|
||||
├── EmbeddingService
|
||||
│ └── OpenRouterClient (different instance)
|
||||
│ └── RateLimiter
|
||||
└── ParallelProcessingOptions (config)
|
||||
```
|
||||
|
||||
### Service Lifetimes
|
||||
|
||||
All services are **transient** (new instance per query execution):
|
||||
- `OpenRouterClient` → 1 instance for query gen + answer
|
||||
- `SearxngClient` → 1 instance for all searches
|
||||
- `EmbeddingService` → 1 instance with its own `OpenRouterClient` and `RateLimiter`
|
||||
- `SearchTool` → 1 instance per query (constructed in `Program.cs`)
|
||||
|
||||
No singleton or static state (except static utility classes like `ChunkingService`).
|
||||
|
||||
### Data Flow Through Services
|
||||
|
||||
```
|
||||
OpenQueryApp
|
||||
│
|
||||
├─ OpenRouterClient.CompleteAsync() → query generation
|
||||
│ Messages → JSON → HTTP request → response → JSON → Messages
|
||||
│
|
||||
└─ SearchTool.ExecuteAsync()
|
||||
│
|
||||
├─ SearxngClient.SearchAsync() × N
|
||||
│ query → URL encode → GET → JSON → SearxngResult[]
|
||||
│
|
||||
├─ ArticleService.FetchArticleAsync() × M
|
||||
│ URL → HTTP GET → SmartReader → Article
|
||||
│
|
||||
├─ ChunkingService.ChunkText() × M
|
||||
│ Article.TextContent → List<string> chunks
|
||||
│
|
||||
├─ EmbeddingService.GetEmbeddingAsync(query) + GetEmbeddingsAsync(chunks[])
|
||||
│ texts → batches → rate-limited HTTP POST → JSON → float[][]
|
||||
│
|
||||
├─ CosineSimilarity(queryEmbedding, chunkEmbedding) × M
|
||||
│ Vectors → dot product → magnitude → score
|
||||
│
|
||||
└─ return context string (formatted chunks)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[OpenQueryApp](../components/openquery-app.md)** - Orchestrates services
|
||||
- **[SearchTool](../components/search-tool.md)** - Coordinates pipeline
|
||||
- **[Models](../components/models.md)** - Data structures passed between services
|
||||
- **[API Reference](../../api/cli.md)** - CLI that uses these services
|
||||
|
||||
---
|
||||
|
||||
**Service Design Principles**:
|
||||
- Single Responsibility: Each service does one thing well
|
||||
- Stateless: No instance state beyond constructor args
|
||||
- Composable: Services depend on abstractions (other services) not implementations
|
||||
- Testable: Can mock dependencies for unit testing
|
||||
Reference in New Issue
Block a user