Files
OpenQuery/docs/components/models.md
OpenQuery Documentation 65ca2401ae docs: add comprehensive documentation with README and detailed guides
- Add user-friendly README.md with quick start guide
- Create docs/ folder with structured technical documentation:
  - installation.md: Build and setup instructions
  - configuration.md: Complete config reference
  - usage.md: CLI usage guide with examples
  - architecture.md: System design and patterns
  - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models)
  - api/: CLI reference, environment variables, programmatic API
  - troubleshooting.md: Common issues and solutions
  - performance.md: Latency, throughput, and optimization
- All documentation fully cross-referenced with internal links
- Covers project overview, architecture, components, APIs, and support

See individual files for complete documentation.
2026-03-19 10:01:58 +01:00

529 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Models Reference
Complete reference for all data models, DTOs, and records in OpenQuery.
## 📋 Table of Contents
1. [Core Data Models](#core-data-models)
2. [OpenRouter API Models](#openrouter-api-models)
3. [SearxNG API Models](#searxng-api-models)
4. [JSON Serialization](#json-serialization)
5. [Model Relationships](#model-relationships)
## Core Data Models
### OpenQueryOptions
**Location**: `Models/OpenQueryOptions.cs`
**Type**: `record`
**Purpose**: Immutable options object for a single query execution
```csharp
public record OpenQueryOptions(
int Chunks, // Number of top chunks to include in context
int Results, // Search results per generated query
int Queries, // Number of search queries to generate (if >1)
bool Short, // Request concise answer
bool Long, // Request detailed answer
bool Verbose, // Enable verbose logging
string Question // Original user question (required)
);
```
**Lifecycle**:
- Created in `Program.cs` by combining CLI options, config defaults, and environment variables
- Passed to `OpenQueryApp.RunAsync(options)`
**Validation**: None (assumes valid values from CLI parser/config)
**Example**:
```csharp
var options = new OpenQueryOptions(
Chunks: 3,
Results: 5,
Queries: 3,
Short: false,
Long: false,
Verbose: true,
Question: "What is quantum entanglement?"
);
```
---
### Chunk
**Location**: `Models/Chunk.cs`
**Type**: `record`
**Purpose**: Content chunk with metadata, embedding, and relevance score
```csharp
public record Chunk(
string Content, // Text content (typically ~500 chars)
string SourceUrl, // Original article URL
string? Title = null // Article title (optional, may be null)
)
{
public float[]? Embedding { get; set; } // Vector embedding (1536-dim for text-embedding-3-small)
public float Score { get; set; } // Relevance score (0-1, higher = more relevant)
}
```
**Lifecycle**:
1. **Created** in `SearchTool.ExecuteParallelArticleFetchingAsync`:
```csharp
chunks.Add(new Chunk(chunkText, result.Url, article.Title));
```
At this point: `Embedding = null`, `Score = 0`
2. **Embedded** in `SearchTool.ExecuteParallelEmbeddingsAsync`:
```csharp
validChunks[i].Embedding = validEmbeddings[i];
```
3. **Scored** in `SearchTool.RankAndSelectTopChunks`:
```csharp
chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
```
4. **Formatted** into context string:
```csharp
$"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"
```
**Properties**:
- `Content`: Never null/empty (filters empty chunks in `ChunkingService`)
- `SourceUrl`: Always provided (from `SearxngResult.Url`)
- `Title`: May be null if article extraction failed to get title
- `Embedding`: Null until phase 3; may remain null if embedding failed
- `Score`: 0 until phase 4; irrelevant for non-embedded chunks
**Equality**: Records use value equality (all properties compared). Two chunks with same content/url/title are equal; embeddings and scores ignored for equality (as they're mutable).
---
### ParallelProcessingOptions
**Location**: `Models/ParallelOptions.cs`
**Type**: `class`
**Purpose**: Configuration for parallel/concurrent operations
```csharp
public class ParallelProcessingOptions
{
public int MaxConcurrentArticleFetches { get; set; } = 10;
public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
public int EmbeddingBatchSize { get; set; } = 300;
}
```
**Usage**:
- Instantiated in `SearchTool` constructor (hardcoded new)
- Passed to `EmbeddingService` constructor
- Read by `SearchTool` for article fetching semaphore
**Default Values**:
| Property | Default | Effect |
|----------|---------|--------|
| `MaxConcurrentArticleFetches` | 10 | Up to 10 articles fetched simultaneously |
| `MaxConcurrentEmbeddingRequests` | 4 | Up to 4 embedding batches in parallel |
| `EmbeddingBatchSize` | 300 | Each embedding API call handles up to 300 texts |
**Current Limitation**: These are **compile-time defaults** (hardcoded in `SearchTool.cs`). To make them configurable:
1. Add to `AppConfig`
2. Read in `ConfigManager`
3. Pass through `SearchTool` constructor
---
## OpenRouter API Models
**Location**: `Models/OpenRouter.cs`
**Purpose**: DTOs for OpenRouter's REST API (JSON serialization)
### Chat Completion
#### `ChatCompletionRequest`
```csharp
public record ChatCompletionRequest(
[property: JsonPropertyName("model")] string Model,
[property: JsonPropertyName("messages")] List<Message> Messages,
[property: JsonPropertyName("tools")] List<ToolDefinition>? Tools = null,
[property: JsonPropertyName("stream")] bool Stream = false
);
```
**Example**:
```json
{
"model": "qwen/qwen3.5-flash-02-23",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is 2+2?" }
],
"stream": true
}
```
#### `Message`
```csharp
public record Message(
[property: JsonPropertyName("role")] string Role,
[property: JsonPropertyName("content")] string? Content = null,
[property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null,
[property: JsonPropertyName("tool_call_id")] string? ToolCallId = null
)
{
// Factory method for tool responses
public static Message FromTool(string content, string toolCallId) =>
new Message("tool", content, null, toolCallId);
}
```
**Roles**: `"system"`, `"user"`, `"assistant"`, `"tool"`
**Usage**:
- `Content` for text messages
- `ToolCalls` when assistant requests tool use
- `ToolCallId` when responding to tool call
#### `ChatCompletionResponse`
```csharp
public record ChatCompletionResponse(
[property: JsonPropertyName("choices")] List<Choice> Choices,
[property: JsonPropertyName("usage")] Usage? Usage = null
);
public record Choice(
[property: JsonPropertyName("message")] Message Message,
[property: JsonPropertyName("finish_reason")] string? FinishReason = null
);
```
**Response Example**:
```json
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Answer text..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150
}
}
```
#### `Usage`
```csharp
public record Usage(
[property: JsonPropertyName("prompt_tokens")] int PromptTokens,
[property: JsonPropertyName("completion_tokens")] int CompletionTokens,
[property: JsonPropertyName("total_tokens")] int TotalTokens
);
```
### Tool Calling (Not Currently Used)
#### `ToolDefinition` / `ToolFunction`
```csharp
public record ToolDefinition(
[property: JsonPropertyName("type")] string Type, // e.g., "function"
[property: JsonPropertyName("function")] ToolFunction Function
);
public record ToolFunction(
[property: JsonPropertyName("name")] string Name,
[property: JsonPropertyName("description")] string Description,
[property: JsonPropertyName("parameters")] JsonElement Parameters // JSON Schema
);
```
#### `ToolCall` / `FunctionCall`
```csharp
public record ToolCall(
[property: JsonPropertyName("id")] string Id,
[property: JsonPropertyName("type")] string Type,
[property: JsonPropertyName("function")] FunctionCall Function
);
public record FunctionCall(
[property: JsonPropertyName("name")] string Name,
[property: JsonPropertyName("arguments")] string Arguments // JSON string
);
```
**Note**: OpenQuery doesn't use tools currently, but models are defined for future tool-calling capability.
### Streaming
#### `StreamChunk`
```csharp
public record StreamChunk(
string? TextDelta = null,
ClientToolCall? Tool = null
);
```
Yielded by `OpenRouterClient.StreamAsync()` for each SSE event.
#### `ChatCompletionChunk` (Server Response)
```csharp
public record ChatCompletionChunk(
[property: JsonPropertyName("choices")] List<ChunkChoice> Choices
);
public record ChunkChoice(
[property: JsonPropertyName("delta")] ChunkDelta Delta
);
public record ChunkDelta(
[property: JsonPropertyName("content")] string? Content = null,
[property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null
);
```
**Streaming Response Example** (SSE):
```
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":" world"}}]}
data: [DONE]
```
`OpenRouterClient.StreamAsync` parses and yields `StreamChunk` with non-null `TextDelta` for content.
### Embeddings
#### `EmbeddingRequest`
```csharp
public record EmbeddingRequest(
[property: JsonPropertyName("model")] string Model,
[property: JsonPropertyName("input")] List<string> Input
);
```
**Example**:
```json
{
"model": "openai/text-embedding-3-small",
"input": ["text 1", "text 2", ...]
}
```
#### `EmbeddingResponse`
```csharp
public record EmbeddingResponse(
[property: JsonPropertyName("data")] List<EmbeddingData> Data,
[property: JsonPropertyName("usage")] Usage Usage
);
public record EmbeddingData(
[property: JsonPropertyName("embedding")] float[] Embedding,
[property: JsonPropertyName("index")] int Index
);
```
**Response Example**:
```json
{
"data": [
{ "embedding": [0.1, 0.2, ...], "index": 0 },
{ "embedding": [0.3, 0.4, ...], "index": 1 }
],
"usage": {
"prompt_tokens": 100,
"total_tokens": 100
}
}
```
**Note**: `_client.EmbedAsync` orders by `index` to match input order.
---
## SearxNG API Models
**Location**: `Models/Searxng.cs`
**Purpose**: DTOs for SearxNG's JSON response format
### `SearxngRoot`
```csharp
public record SearxngRoot(
[property: JsonPropertyName("results")] List<SearxngResult> Results
);
```
Top-level response object.
### `SearxngResult`
```csharp
public record SearxngResult(
[property: JsonPropertyName("title")] string Title,
[property: JsonPropertyName("url")] string Url,
[property: JsonPropertyName("content")] string Content // Snippet/description
);
```
**Fields**:
- `Title`: Result title (from page `<title>` or OpenGraph)
- `Url`: Absolute URL to article
- `Content`: Short snippet (~200 chars) from search engine
**Usage**:
- `Url` passed to `ArticleService.FetchArticleAsync`
- `Title` used as fallback if article extraction fails
- `Content` currently unused (could be for quick answer without fetching)
**Example Response**:
```json
{
"results": [
{
"title": "Quantum Entanglement - Wikipedia",
"url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
"content": "Quantum entanglement is a physical phenomenon..."
}
]
}
```
---
## JSON Serialization
### JsonContext (Source Generation)
**Location**: `Models/JsonContexts.cs`
**Purpose**: Provide source-generated JSON serializer context for AOT compatibility
#### Declaration
```csharp
[JsonSerializable(typeof(ChatCompletionRequest))]
[JsonSerializable(typeof(ChatCompletionResponse))]
[JsonSerializable(typeof(ChatCompletionChunk))]
[JsonSerializable(typeof(EmbeddingRequest))]
[JsonSerializable(typeof(EmbeddingResponse))]
[JsonSerializable(typeof(SearxngRoot))]
[JsonJsonSerializer(typeof(List<string>))]
internal partial class AppJsonContext : JsonSerializerContext
{
}
```
**Usage**:
```csharp
var json = JsonSerializer.Serialize(request, AppJsonContext.Default.ChatCompletionRequest);
var response = JsonSerializer.Deserialize(json, AppJsonContext.Default.ChatCompletionResponse);
```
**Benefits**:
- **AOT-compatible**: No reflection, works with PublishAot=true
- **Performance**: Pre-compiled serializers are faster
- **Trimming safe**: Unused serializers trimmed automatically
**Generated**: Partial class compiled by source generator (no manual implementation)
**Important**: Must include ALL types that will be serialized/deserialized in `[JsonSerializable]` attributes, otherwise runtime exception in AOT.
---
## Model Relationships
### Object Graph (Typical Execution)
```
OpenQueryOptions
OpenQueryApp.RunAsync()
├─ queryGenerationMessages (List<Message>)
│ ├─ system: "You are an expert researcher..."
│ └─ user: "Generate N queries for: {question}"
│ ↓
│ ChatCompletionRequest → OpenRouter → ChatCompletionResponse
│ ↓
│ List<string> generatedQueries
├─ SearchTool.ExecuteAsync()
│ ↓
│ ┌─────────────────────────────────────┐
│ │ Phase 1: Parallel Searches │
│ │ SearxngClient.SearchAsync(query) × N
│ │ → List<SearxngResult> │
│ │ (Title, Url, Content) │
│ └─────────────────────────────────────┘
│ ↓
│ ┌─────────────────────────────────────┐
│ │ Phase 2: Article Fetch & Chunking │
│ │ ArticleService.FetchAsync(Url) × M
│ │ → Article (TextContent, Title)
│ │ → ChunkingService.ChunkText → List<string> chunks
│ │ → Chunk(content, url, title) × K │
│ └─────────────────────────────────────┘
│ ↓
│ ┌─────────────────────────────────────┐
│ │ Phase 3: Embeddings │
│ │ EmbeddingService.GetEmbeddingsAsync(chunkContents)
│ │ → float[][] chunkEmbeddings │
│ │ → Set chunk.Embedding for each │
│ │ Also: GetEmbeddingAsync(question) → float[] queryEmbedding
│ └─────────────────────────────────────┘
│ ↓
│ ┌─────────────────────────────────────┐
│ │ Phase 4: Ranking │
│ │ For each chunk: Score = CosineSimilarity(queryEmbedding, chunk.Embedding)
│ │ → Set chunk.Score │
│ │ → OrderByDescending(Score) │
│ │ → Take(topChunksLimit) → topChunks (List<Chunk>)
│ └─────────────────────────────────────┘
│ ↓
│ Context string: formatted topChunks
│ ↓
└─ OpenQueryApp → final ChatCompletionRequest
System: "Answer based on context..."
User: "Context:\n{context}\n\nQuestion: {question}"
StreamAsync() → StreamChunk.TextDelta → Console
```
### Record Immutability
Most DTOs are `record` types:
- **Immutable**: Properties are init-only (`{ get; init; }`)
- **Value semantics**: Equality based on content
- **Thread-safe**: Can be shared across threads
**Exception**:
- `Chunk`: Has mutable properties `Embedding` and `Score` (set during pipeline)
- `ParallelProcessingOptions`: Class with mutable setters
- `AppConfig`: Class with mutable setters
---
## Next Steps
- **[API Reference](../../api/cli.md)** - How these models are used in CLI commands
- **[OpenRouterClient](../../services/OpenRouterClient.md)** - Uses OpenRouter models
- **[SearxngClient](../../services/SearxngClient.md)** - Uses Searxng models
- **[SearchTool](../../components/search-tool.md)** - Orchestrates all models
---
**Quick Reference Table**
| Model | Category | Purpose | Mutable? |
|-------|----------|---------|----------|
| `OpenQueryOptions` | Core | CLI options | No (record) |
| `Chunk` | Core | Content + metadata + ranking | Partially (Embedding, Score) |
| `ParallelProcessingOptions` | Config | Concurrency settings | Yes (class) |
| `ChatCompletionRequest/Response` | OpenRouter | LLM API | No |
| `EmbeddingRequest/Response` | OpenRouter | Embeddings API | No |
| `SearxngRoot/Result` | SearxNG | Search results | No |
| `AppJsonContext` | Internal | JSON serialization | No (generated partial) |