diff --git a/README.md b/README.md new file mode 100644 index 0000000..829c68e --- /dev/null +++ b/README.md @@ -0,0 +1,196 @@ +# OpenQuery + +**AI-powered search and answer system** that finds accurate, well-sourced answers to your questions by searching the web, extracting relevant content, and synthesizing intelligent responses. + +![.NET](https://img.shields.io/badge/.NET-10.0-blue) +[AOT](https://img.shields.io/badge/AOT-Compiled-green) +[License](https://img.shields.io/badge/license-MIT-green) + +## ✨ Features + +- πŸ€– **Smart Query Generation** - Automatically creates multiple diverse search queries from your question +- ⚑ **Parallel Processing** - Fast concurrent searches, article fetching, and embedding generation +- 🎯 **Semantic Search** - Uses vector embeddings to find the most relevant information +- πŸ“š **Clean Article Extraction** - Intelligently extracts article content using SmartReader +- πŸ”„ **Streaming Responses** - Watch the AI answer generate in real-time +- βš™οΈ **Fully Configurable** - Control queries, results, and context chunks +- πŸ›‘οΈ **Production Ready** - Built with rate limiting, retries, and error handling + +## πŸš€ Quick Start + +### 1. Prerequisites + +- A **SearxNG** instance (Docker recommended): + ```bash + docker run -d --name searxng -p 8002:8080 searxng/searxng:latest + ``` + +- A **OpenRouter API key** from [openrouter.ai](https://openrouter.ai) + +### 2. Installation + +```bash +# Clone and build +git clone +cd OpenQuery +chmod +x install.sh +./install.sh + +# Or build manually +dotnet publish -c Release -r linux-x64 --self-contained true /p:PublishAot=true +``` + +### 3. Configuration + +```bash +# Interactive setup +openquery configure -i + +# Or set environment variables +export OPENROUTER_API_KEY="sk-or-..." +export SEARXNG_URL="http://localhost:8002" # default +``` + +### 4. Ask a Question + +```bash +openquery "What is quantum entanglement and how does it work?" +``` + +That's it! The system will: +1. Generate 3 search queries (configurable) +2. Search the web via SearxNG +3. Extract and chunk relevant articles +4. Rank content by semantic relevance +5. Stream a comprehensive answer with citations + +## πŸ“– Usage Examples + +```bash +# Concise answer +openquery -s "Who won the 2024 US presidential election?" + +# Detailed research +openquery -l -q 5 -r 10 "Explain quantum computing and its applications" + +# See everything +openquery -v "What are the health benefits of meditation?" + +# Customize +openquery -c 5 -r 8 "Current state of SpaceX Starship development" +``` + +## πŸ”§ Options + +``` +-c, --chunks N Number of top context chunks (default: 3) +-r, --results N Search results per query (default: 5) +-q, --queries N Number of search queries to generate (default: 3) +-s, --short Give a concise answer +-l, --long Give a detailed answer +-v, --verbose Show detailed progress +``` + +## 🌐 Supported Models + +OpenQuery works with any OpenRouter model. Popular choices: + +- `qwen/qwen3.5-flash-02-23` (default, fast & affordable) +- `google/gemini-3-flash-preview` +- `deepseek/deepseek-v3.2` +- `moonshotai/kimi-k2.5` + +Configure your preferred model: +```bash +openquery configure --model "google/gemini-3-flash-preview" +``` + +## πŸ“ Project Structure + +``` +OpenQuery/ +β”œβ”€β”€ README.md # This file +β”œβ”€β”€ docs/ # Detailed documentation +β”‚ β”œβ”€β”€ installation.md +β”‚ β”œβ”€β”€ configuration.md +β”‚ β”œβ”€β”€ usage.md +β”‚ β”œβ”€β”€ architecture.md +β”‚ β”œβ”€β”€ components/ +β”‚ └── troubleshooting.md +β”œβ”€β”€ Program.cs # CLI entry point +β”œβ”€β”€ OpenQuery.cs # Main application logic +β”œβ”€β”€ Services/ # Business logic services +β”œβ”€β”€ Models/ # Data models +β”œβ”€β”€ Tools/ # Search orchestration +└── ConfigManager.cs # Configuration management +``` + +## πŸ—οΈ Architecture + +OpenQuery uses a multi-stage pipeline: + +``` +Query β†’ Multiple Searches β†’ Article Fetching β†’ Embeddings β†’ Ranking β†’ AI Answer +``` + +1. **Query Expansion**: LLM generates diverse search queries +2. **Parallel Search**: SearxNG executes all queries simultaneously +3. **Content Extraction**: SmartReader pulls clean article text +4. **Embedding Generation**: Vectorize query and chunks +5. **Semantic Ranking**: Cosine similarity scoring +6. **Answer Synthesis**: Final LLM response with sources + +## πŸ” How It Works + +1. **You ask a question** β†’ OpenQuery generates 3 optimized search queries +2. **Searches the web** β†’ All queries run in parallel via SearxNG +3. **Fetches articles** β†’ Extracts clean content from top results +4. **Splits into chunks** β†’ ~500 character pieces for embedding +5. **Ranks by relevance** β†’ Semantic similarity to your question +6. **Synthesizes answer** β†’ LLM reviews top 3 chunks and responds with citations + +## πŸ› Troubleshooting + +**Common issues** and solutions: + +| Problem | Solution | +|---------|----------| +| "API Key is missing" | Run `openquery configure -i` or set `OPENROUTER_API_KEY` | +| No search results | Check your SearxNG instance is running (`curl http://localhost:8002`) | +| Slow performance | Reduce `--results` or `--queries` count | +| Articles failing to fetch | Some sites block scrapers; try different queries | + +See [docs/troubleshooting.md](docs/troubleshooting.md) for detailed help. + +## πŸ“š Documentation + +- **[Installation Guide](docs/installation.md)** - Build and setup instructions +- **[Configuration](docs/configuration.md)** - All config options and environment variables +- **[Usage Guide](docs/usage.md)** - Complete CLI reference and examples +- **[Architecture](docs/architecture.md)** - System design and patterns +- **[Components](docs/components/)** - Deep dive into each module +- **[Troubleshooting](docs/troubleshooting.md)** - Solve common problems +- **[API Reference](docs/api-reference.md)** - Programmatic interfaces + +## 🀝 Contributing + +Contributions welcome! Please: +1. Fork the repository +2. Create a feature branch +3. Make your changes +4. Submit a pull request + +## πŸ“„ License + +MIT License - see LICENSE file for details. + +## πŸ™ Acknowledgments + +- [OpenRouter](https://openrouter.ai) - LLM API aggregation +- [SearxNG](https://searx.space) - Privacy-respecting metasearch +- [SmartReader](https://github.com/kfasten/SmartReader) - Article extraction +- Built with [.NET](https://dotnet.microsoft.com) + +--- + +**Need more details?** Check the comprehensive documentation in the [docs/](docs/) folder. diff --git a/docs/api/cli.md b/docs/api/cli.md new file mode 100644 index 0000000..5e9da38 --- /dev/null +++ b/docs/api/cli.md @@ -0,0 +1,309 @@ +# CLI Reference + +Complete command-line interface reference for OpenQuery. + +## πŸ“‹ Table of Contents + +1. [Command Structure](#command-structure) +2. [Main Command: `openquery`](#main-command-openquery) +3. [Configure Command: `openquery configure`](#configure-command-openquery-configure) +4. [Exit Codes](#exit-codes) +5. [Examples by Use Case](#examples-by-use-case) +6. [Shell Integration](#shell-integration) + +## Command Structure + +OpenQuery uses [System.CommandLine](https://learn.microsoft.com/dotnet/standard/commandline/) for CLI parsing. + +### Syntax +```bash +openquery [GLOBAL-OPTIONS] [COMMAND-OPTIONS] [ARGUMENTS] +``` + +If no command specified, `openquery` (main command) is assumed. + +### Help +```bash +openquery --help +openquery configure --help +``` + +Shows usage, options, examples. + +### Version +```bash +openquery --version # if implemented +``` + +--- + +## Main Command: `openquery` + +Ask a question and get an AI-powered answer. + +### Synopsis +```bash +openquery [OPTIONS] +``` + +### Arguments + +| Name | Arity | Type | Description | +|------|-------|------|-------------| +| `question` | ZeroOrMore | `string[]` | The question to ask (positional, concatenated with spaces) | + +**Notes**: +- `ZeroOrMore` means you can omit the question (shows help) +- Multiple words are combined: `openquery what is quantum` β†’ `"what is quantum"` +- Use quotes for questions with special characters: `openquery "what's the weather?"` + +### Options + +| Option | Aliases | Type | Default | Description | +|--------|---------|------|---------|-------------| +| `--chunks` | `-c` | `int` | `DefaultChunks` (config) | Number of top context chunks to pass to LLM | +| `--results` | `-r` | `int` | `DefaultResults` (config) | Number of search results per query | +| `--queries` | `-q` | `int` | `DefaultQueries` (config) | Number of search queries to generate | +| `--short` | `-s` | `bool` | `false` | Request a concise answer | +| `--long` | `-l` | `bool` | `false` | Request a detailed answer | +| `--verbose` | `-v` | `bool` | `false` | Show detailed progress information | + +**Option Notes**: +- `--short` and `--long` are flags; if both specified, `--long` takes precedence +- Integer options validate as positive numbers (parsed by System.CommandLine) +- Defaults come from config file or hardcoded (3, 5, 3 respectively) + +### Behavior + +1. Loads API key (env `OPENROUTER_API_KEY` or config file) +2. Loads model (env `OPENROUTER_MODEL` or config) +3. Executes workflow: + - Generate queries (if `--queries > 1`) + - Run search pipeline + - Stream final answer +4. Exits with code 0 on success, 1 on error + +### Examples + +```bash +# Basic +openquery "What is the capital of France?" + +# With options +openquery -q 5 -r 10 -c 4 "Explain quantum computing" + +# Short answer +openquery -s "Who won the 2024 election?" + +# Verbose mode +openquery -v "How does photosynthesis work?" + +# Combined +openquery -l -v -q 8 "History of the internet" +``` + +--- + +## Configure Command: `openquery configure` + +Configure OpenQuery settings (API key, model, defaults). + +### Synopsis +```bash +openquery configure [OPTIONS] +``` + +### Options + +| Option | Type | Description | +|--------|------|-------------| +| `--interactive` / `-i` | `bool` | Launch interactive configuration wizard | +| `--key` | `string` | Set OpenRouter API key | +| `--model` | `string` | Set default LLM model | +| `--queries` | `int?` | Set default number of queries | +| `--chunks` | `int?` | Set default number of chunks | +| `--results` | `int?` | Set default number of results | + +**Note**: Nullable options (`int?`) only update if provided. + +### Behavior + +- **Interactive mode** (`-i`): Prompts for each setting with current defaults shown in brackets +- **Non-interactive**: Only updates provided options, leaves others untouched +- Writes to `~/.config/openquery/config` (creates directory if missing) +- Overwrites entire file (not incremental) + +### Interactive Mode Details + +Models presented with numbered menu: + +``` +Available models: +1. qwen/qwen3.5-flash-02-23 +2. qwen/qwen3.5-122b-a10b +3. minimax/minimax-m2.5 +4. google/gemini-3-flash-preview +5. deepseek/deepseek-v3.2 +6. moonshotai/kuki-k2.5 +Model [qwen/qwen3.5-flash-02-23]: +``` + +- Enter number (1-6) to select preset +- Or enter custom model string (any OpenRouter model) + +### Examples + +```bash +# Interactive wizard +openquery configure -i + +# Set just API key +openquery configure --key "sk-or-xxxxxxxxxxxx" + +# Set multiple defaults +openquery configure --model "google/gemini-3-flash-preview" --queries 5 --chunks 4 + +# Update model only +openquery configure --model "deepseek/deepseek-v3.2" +``` + +--- + +## Exit Codes + +| Code | Meaning | +|------|---------| +| `0` | Success - answer generated and streamed | +| `1` | Error - API key missing, network failure, or exception | + +**Usage in scripts**: +```bash +openquery "question" +if [ $? -eq 0 ]; then + echo "Success" +else + echo "Failed" >&2 +fi +``` + +--- + +## Examples by Use Case + +### Quick Facts +```bash +openquery -s "capital of France" +``` +Fast, concise, minimal tokens. + +### Research Paper +```bash +openquery -l -q 5 -r 10 -c 4 "quantum entanglement experiments" +``` +Multiple angles, deep sources, detailed synthesis. + +### News & Current Events +```bash +openquery -v "latest news about OpenAI" +``` +See everything: queries, results, which sources fetched. + +### Troubleshooting +```bash +# Reduce scope if errors +openquery -q 1 -r 2 "test question" +``` + +### Save Answer to File +```bash +openquery "question" 2>/dev/null | sed 's/.\x08//g' > answer.md +``` + +(Removes spinner characters) + +### Batch Processing +```bash +for q in $(cat questions.txt); do + echo "## $q" >> all-answers.md + openquery -s "$q" 2>/dev/null | sed 's/.\x08//g' >> all-answers.md + echo "" >> all-answers.md +done +``` + +--- + +## Shell Integration + +### Aliases (add to ~/.bashrc or ~/.zshrc) + +```bash +# Short alias +alias oq='openquery' + +# Presets +alias oqs='openquery -s' # short +alias oql='openquery -l' # long +alias oqv='openquery -v' # verbose +alias oqr='openquery -q 5 -r 10 -c 4' # research mode + +# Config shortcuts +alias oqcfg='openquery configure -i' +``` + +### Functions + +```bash +# Save answer cleanly (removes spinner chars) +oqsave() { + local query="$*" + local filename="answer-$(date +%Y%m%d-%H%M%S).md" + openquery "$query" 2>/dev/null | sed 's/.\x08//g' > "$filename" + echo "Saved to $filename" +} + +# Search and grep results +oqgrep() { + openquery "$1" 2>/dev/null | sed 's/.\x08//g' | grep -i "$2" +} +``` + +### Environment Setup Script + +```bash +# ~/.local/bin/openquery-env.sh +export OPENROUTER_API_KEY="sk-or-..." +export OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23" +export SEARXNG_URL="http://localhost:8002" +``` + +Source it: `source ~/.local/bin/openquery-env.sh` + +--- + +## Next Steps + +- **[Configuration](configuration.md)** - Set up your environment +- **[Usage](usage.md)** - Learn usage patterns and tips +- **[Troubleshooting](troubleshooting.md)** - Fix common problems + +--- + +**Quick Reference Card** + +``` +# Ask +openquery "question" +openquery -s "quick fact" +openquery -l -q 5 "deep research" + +# Configure +openquery configure -i +openquery configure --key "..." +openquery configure --model "..." + +# Debug +openquery -v "question" + +# Help +openquery --help +``` diff --git a/docs/api/environment-variables.md b/docs/api/environment-variables.md new file mode 100644 index 0000000..3cb39d2 --- /dev/null +++ b/docs/api/environment-variables.md @@ -0,0 +1,235 @@ +# Environment Variables + +Reference for all environment variables used by OpenQuery. + +## πŸ“‹ Summary + +| Variable | Purpose | Required | Default | Example | +|----------|---------|----------|---------|---------| +| `OPENROUTER_API_KEY` | OpenRouter authentication | **Yes** | (none) | `sk-or-...` | +| `OPENROUTER_MODEL` | Override default LLM model | No | `qwen/qwen3.5-flash-02-23` | `google/gemini-3-flash-preview` | +| `SEARXNG_URL` | SearxNG instance URL | No | `http://localhost:8002` | `https://searx.example.com` | + +## Detailed Reference + +### `OPENROUTER_API_KEY` + +**Purpose**: Your OpenRouter API authentication token. + +**Required**: Yes, unless you have `ApiKey` set in config file. + +**How to Obtain**: +1. Sign up at https://openrouter.ai +2. Go to Dashboard β†’ API Keys +3. Copy your key (starts with `sk-or-`) + +**Priority**: Overrides config file `ApiKey`. + +**Setting**: + +```bash +# Bash/Zsh +export OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" + +# Fish +set -x OPENROUTER_API_KEY "sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" + +# PowerShell +$env:OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" + +# Windows CMD +set OPENROUTER_API_KEY=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +``` + +**Security**: +- Never commit API key to version control +- Don't share key publicly +- Use environment variables or config file with restrictive permissions (600) +- Rotate key if accidentally exposed + +**Validation**: OpenQuery checks if key is empty string and exits with error if missing: + +``` +[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i' to set it up. +``` + +### `OPENROUTER_MODEL` + +**Purpose**: Override the default LLM model used for both query generation and final answer. + +**Required**: No. + +**Default**: `qwen/qwen3.5-flash-02-23` + +**Available Models** (from OpenRouter): + +| Model | Provider | Context | Cost (Input/Output per 1M tokens) | +|-------|----------|---------|-----------------------------------| +| `qwen/qwen3.5-flash-02-23` | Alibaba | 200K | \$0.10 / \$0.20 | +| `qwen/qwen3.5-122b-a10b` | Alibaba | 200K | ~\$0.20 / ~\$0.40 | +| `minimax/minimax-m2.5` | MiniMax | 200K | ~\$0.20 / ~\$0.40 | +| `google/gemini-3-flash-preview` | Google | 1M | ~\$0.10 / ~\$0.40 | +| `deepseek/deepseek-v3.2` | DeepSeek | 200K | ~\$0.10 / ~\$0.30 | +| `moonshotai/kimi-k2.5` | Moonshot AI | 200K | ~\$0.10 / ~\$0.30 | + +(See OpenRouter for current pricing.) + +**Setting**: + +```bash +export OPENROUTER_MODEL="google/gemini-3-flash-preview" +``` + +**Interactive Config Models**: The `configure -i` wizard shows only these 6 models for convenience, but you can set any OpenRouter model via environment variable or non-interactive configure. + +**Note**: Different models have different: +- Speed (Flash models faster) +- Cost (check pricing) +- Quality (may vary by task) +- Context window size (Gemini 3 Flash has 1M tokens, others ~200K) + +### `SEARXNG_URL` + +**Purpose**: URL of the SearxNG metasearch instance. + +**Required**: No. + +**Default**: `http://localhost:8002` + +**Format**: Must include protocol (`http://` or `https://`) and host:port. + +**Setting**: + +```bash +# Local Docker instance +export SEARXNG_URL="http://localhost:8002" + +# Remote instance with HTTPS +export SEARXNG_URL="https://searx.example.com" + +# Custom port +export SEARXNG_URL="http://localhost:8080" +``` + +**Finding a Public Instance**: +- Visit https://searx.space for list of public instances +- Choose one with HTTPS and low latency +- Note: Public instances may have rate limits or require attribution + +**Priority**: Overrides any default, but not config file (no config setting for SearxNG URL - only env var). Could be added to config in future. + +**Test Your Instance**: +```bash +curl "$SEARXNG_URL/search?q=test&format=json" | head +``` + +Expected: JSON with `"results": [...]`. + +--- + +## Configuration Priority Recap + +When OpenQuery needs a value: + +1. **Command-line option** (`--model`, `--key` from configure) - highest +2. **Environment variable** (`OPENROUTER_MODEL`, `OPENROUTER_API_KEY`, `SEARXNG_URL`) +3. **Configuration file** (`~/.config/openquery/config`: `Model`, `ApiKey`) +4. **Hard-coded default** (only for model) + +**Example**: +```bash +# Config file: Model=qwen/qwen3.5-flash-02-23 +export OPENROUTER_MODEL="deepseek/deepseek-v3.2" +openquery --model "google/gemini-3-flash-preview" "question" +# Uses: model=google (CLI override), overrides env and config +``` + +--- + +## Troubleshooting Environment Variables + +### Variable Not Taking Effect + +**Symptom**: `openquery` still uses old value after export. + +**Causes**: +- Exported in different shell session +- Exported after running `openquery` +- Shell profile not reloaded + +**Check**: +```bash +echo $OPENROUTER_API_KEY +# Should print the key (or blank if unset) +``` + +**Fix**: +```bash +# Export in current session +export OPENROUTER_API_KEY="sk-or-..." + +# Or add to ~/.bashrc / ~/.zshrc and restart terminal +``` + +### Special Characters in Values + +If your API key contains special characters (`$`, `!`, etc.), quote properly: + +```bash +export OPENROUTER_API_KEY='sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' +# Single quotes prevent shell expansion +``` + +### Variable Name Typos + +`OPENROUTER_API_KEY` is all caps with underscores. `openrouter_api_key` (lowercase) won't work. + +**Check spelling**: +```bash +env | grep -i openrouter +``` + +### Windows Environment Variables + +On Windows, environment variables are set per-session or user-level: + +**PowerShell** (current session): +```powershell +$env:OPENROUTER_API_KEY="sk-or-..." +``` + +**Persistent** (PowerShell): +```powershell +[Environment]::SetEnvironmentVariable("OPENROUTER_API_KEY", "sk-or-...", "User") +``` + +**CMD**: +```cmd +set OPENROUTER_API_KEY=sk-or-... +``` + +**System Properties** β†’ Advanced β†’ Environment Variables (GUI) + +--- + +## Next Steps + +- **[Configuration File](../configuration.md)** - Persistent configuration +- **[Usage Guide](../usage.md)** - How to use these variables +- **[Troubleshooting](../troubleshooting.md)** - Diagnose environment issues + +--- + +**Quick Reference** + +```bash +# Required +export OPENROUTER_API_KEY="sk-or-..." + +# Optional (override defaults) +export OPENROUTER_MODEL="google/gemini-3-flash-preview" +export SEARXNG_URL="https://searx.example.com" + +# Run +openquery "your question" +``` diff --git a/docs/api/programmatic.md b/docs/api/programmatic.md new file mode 100644 index 0000000..f40ddac --- /dev/null +++ b/docs/api/programmatic.md @@ -0,0 +1,508 @@ +# Programmatic API Reference + +How to use OpenQuery components programmatically in your own C# code. + +## πŸ“‹ Table of Contents + +1. [Overview](#overview) +2. [Using OpenQueryApp Programmatically](#using-openqueryapp-programmatically) +3. [Using Individual Services](#using-individual-services) +4. [Custom Implementations](#custom-implementations) +5. [Thread Safety](#thread-safety) +6. [Error Handling](#error-handling) + +## Overview + +OpenQuery is designed as a library of composable services, not just a CLI tool. You can reference the project (or extract the core classes) and use them in your own applications. + +### Core Interfaces + +Currently, OpenQuery uses concrete classes rather than interfaces. To use programmatically: + +1. Reference the `OpenQuery` project/dll +2. Add `using OpenQuery.Services;` and `using OpenQuery.Tools;` +3. Instantiate dependencies +4. Call methods + +### Dependency Chain + +``` +Your Code + β”œβ”€β”€ OpenRouterClient (LLM API) + β”œβ”€β”€ SearxngClient (Search API) + β”œβ”€β”€ EmbeddingService (requires OpenRouterClient) + └── SearchTool (requires SearxngClient + EmbeddingService) + └── (internally uses ArticleService, ChunkingService, RateLimiter) +``` + +--- + +## Using OpenQueryApp Programmatically + +### Minimal Example + +```csharp +using OpenQuery; +using OpenQuery.Services; +using OpenQuery.Tools; +using OpenQuery.Models; + +// 1. Configure +string apiKey = Environment.GetEnvironmentVariable("OPENROUTER_API_KEY") + ?? throw new InvalidOperationException("API key required"); +string searxngUrl = Environment.GetEnvironmentVariable("SEARXNG_URL") + ?? "http://localhost:8002"; +string model = Environment.GetEnvironmentVariable("OPENROUTER_MODEL") + ?? "qwen/qwen3.5-flash-02-23"; + +// 2. Instantiate services +var openRouterClient = new OpenRouterClient(apiKey); +var searxngClient = new SearxngClient(searxngUrl); +var embeddingService = new EmbeddingService(openRouterClient); +var searchTool = new SearchTool(searxngClient, embeddingService); +var openQuery = new OpenQueryApp(openRouterClient, searchTool, model); + +// 3. Execute +var options = new OpenQueryOptions( + Chunks: 3, + Results: 5, + Queries: 3, + Short: false, + Long: false, + Verbose: false, + Question: "What is quantum entanglement?" +); + +await openQuery.RunAsync(options); +``` + +**Output**: Streams answer to `Console.Out` (hardcoded in `OpenQueryApp`). To capture output, modify `OpenQueryApp` or redirect console. + +### Capturing Output + +`OpenQueryApp.RunAsync` writes directly to `Console`. To capture: + +**Option 1**: Redirect Console (hacky) +```csharp +var sw = new StringWriter(); +Console.SetOut(sw); +await openQuery.RunAsync(options); +string answer = sw.ToString(); +``` + +**Option 2**: Modify OpenQueryApp to accept TextWriter (not currently supported) + +**Option 3**: Reimplement using OpenQuery components without `OpenQueryApp` + +```csharp +public async Task GetAnswerAsync(string question, OpenQueryOptions options) +{ + var sb = new StringBuilder(); + var reporter = new StatusReporter(options.Verbose); + + // Replicate OpenQueryApp.RunAsync but collect output + // ... (copy logic from OpenQuery.cs) + + return sb.ToString(); +} +``` + +--- + +## Using Individual Services + +### OpenRouterClient + +```csharp +var client = new OpenRouterClient("your-api-key"); + +// Non-streaming chat completion +var request = new ChatCompletionRequest( + model: "qwen/qwen3.5-flash-02-23", + messages: new List + { + new Message("system", "You are a helpful assistant."), + new Message("user", "What is 2+2?") + } +); + +var response = await client.CompleteAsync(request); +Console.WriteLine(response.Choices[0].Message.Content); + +// Streaming chat completion +var streamRequest = request with { Stream = true }; +await foreach (var chunk in client.StreamAsync(streamRequest)) +{ + if (chunk.TextDelta != null) + Console.Write(chunk.TextDelta); +} + +// Embeddings +var embeddingRequest = new EmbeddingRequest( + model: "openai/text-embedding-3-small", + input: new List { "text 1", "text 2" } +); +float[][] embeddings = await client.EmbedAsync(embeddingRequest.Model, embeddingRequest.Input); +// embeddings[0] is vector for "text 1" +``` + +### SearxngClient + +```csharp +var searxng = new SearxngClient("http://localhost:8002"); + +List results = await searxng.SearchAsync("quantum physics", limit: 5); + +foreach (var result in results) +{ + Console.WriteLine($"{result.Title}"); + Console.WriteLine($"{result.Url}"); + Console.WriteLine($"{result.Content}"); + Console.WriteLine(); +} +``` + +### EmbeddingService + +```csharp +var client = new OpenRouterClient("your-api-key"); +var embeddingService = new EmbeddingService(client); // default model: openai/text-embedding-3-small + +// Single embedding +float[] embedding = await embeddingService.GetEmbeddingAsync("Hello world"); + +// Batch embeddings (with progress) +List texts = new() { "text 1", "text 2", "text 3" }; +float[][] embeddings = await embeddingService.GetEmbeddingsAsync( + texts, + onProgress: msg => Console.WriteLine(msg) +); + +// Cosine similarity +float similarity = EmbeddingService.CosineSimilarity(embedding1, embedding2); +``` + +### ArticleService + +```csharp +var article = await ArticleService.FetchArticleAsync("https://example.com/article"); +Console.WriteLine(article.Title); +Console.WriteLine(article.TextContent); +Console.WriteLine($"Readable: {article.IsReadable}"); +``` + +Note: `Article` type comes from SmartReader library (not OpenQuery-specific). + +### ChunkingService + +```csharp +List chunks = ChunkingService.ChunkText("Long article text..."); + +foreach (var chunk in chunks) +{ + Console.WriteLine($"Chunk ({chunk.Length} chars): {chunk.Substring(0, 50)}..."); +} +``` + +### SearchTool (Orchestration) + +```csharp +var searxngClient = new SearxngClient("http://localhost:8002"); +var embeddingService = new EmbeddingService(openRouterClient); +var searchTool = new SearchTool(searxngClient, embeddingService); + +string context = await searchTool.ExecuteAsync( + originalQuery: "What is quantum entanglement?", + generatedQueries: new List + { + "quantum entanglement definition", + "how quantum entanglement works" + }, + maxResults: 5, + topChunksLimit: 3, + onProgress: msg => Console.WriteLine(msg), + verbose: true +); + +Console.WriteLine("Context:"); +Console.WriteLine(context); +``` + +Output is a formatted string: +``` +[Source 1: Title](https://example.com/1) +Content chunk... + +[Source 2: Title](https://example.com/2) +Content chunk... +``` + +--- + +## Custom Implementations + +### Custom Progress Reporter + +`SearchTool.ExecuteAsync` accepts `Action? onProgress`. Provide your own: + +```csharp +public class MyProgressReporter +{ + public void Report(string message) + { + // Log to file + File.AppendAllText("log.txt", $"{DateTime.UtcNow}: {message}\n"); + + // Update UI + myLabel.Text = message; + + // Send to telemetry + Telemetry.TrackEvent("OpenQueryProgress", new { message }); + } +} + +// Usage +var reporter = new MyProgressReporter(); +await searchTool.ExecuteAsync(..., reporter.Report, verbose: false); +``` + +### Custom Chunking Strategy + +Extend `ChunkingService` or implement your own: + +```csharp +public static class MyChunkingService +{ + public static List ChunkText(string text, int maxSize = 500, int overlap = 50) + { + // Overlapping chunks for better context retrieval + var chunks = new List(); + int start = 0; + while (start < text.Length) + { + int end = Math.Min(start + maxSize, text.Length); + var chunk = text.Substring(start, end - start); + chunks.Add(chunk); + start += maxSize - overlap; // Slide window + } + return chunks; + } +} +``` + +### Custom Rate Limiter + +Implement `IAsyncDisposable` with your own strategy (token bucket, leaky bucket): + +```csharp +public class TokenBucketRateLimiter : IAsyncDisposable +{ + private readonly SemaphoreSlim _semaphore; + private readonly TimeSpan _refillPeriod; + private int _tokens; + private readonly int _maxTokens; + + // Implementation details... + + public async Task ExecuteAsync(Func> action, CancellationToken ct) + { + await WaitForTokenAsync(ct); + try + { + return await action(); + } + finally + { + // Return tokens or replenish bucket + } + } +} +``` + +--- + +## Thread Safety + +**Thread-Safe Components**: +- `RateLimiter` - `SemaphoreSlim` is thread-safe +- `StatusReporter` - Channel is thread-safe +- Static utility classes (`ChunkingService`) - no state + +**Not Thread-Safe** (instances should not be shared across threads): +- `OpenRouterClient` - wraps `HttpClient` (which is thread-safe but instance may have state) +- `SearxngClient` - `HttpClient` (thread-safe but reuse recommendations apply) +- `EmbeddingService` - has mutable fields (`_rateLimiter`, `_retryPipeline`) +- `SearchTool` - has mutable `_options` + +**Recommendation**: Create new instances per operation or use locks if sharing. + +### Example: Parallel Queries + +```csharp +var tasks = questions.Select(async question => +{ + var options = new OpenQueryOptions(..., question: question); + var query = new OpenQueryApp(client, searchTool, model); + await query.RunAsync(options); + // Separate instances per task +}); + +await Task.WhenAll(tasks); +``` + +**Better**: Create factory that spawns fresh instances. + +--- + +## Error Handling + +All public async methods may throw: + +- `HttpRequestException` - network errors, non-2xx responses +- `TaskCanceledException` - timeout or cancellation +- `JsonException` - malformed JSON +- `Argument*Exception` - invalid arguments +- `Exception` - any other error + +### Pattern: Try-Catch + +```csharp +try +{ + var response = await client.CompleteAsync(request); + Console.WriteLine(response.Choices[0].Message.Content); +} +catch (HttpRequestException ex) +{ + Console.Error.WriteLine($"Network error: {ex.Message}"); +} +catch (Exception ex) +{ + Console.Error.WriteLine($"Unexpected error: {ex.Message}"); +} +``` + +### Pattern: Resilience with Polly + +`EmbeddingService` already wraps `client.EmbedAsync` with Polly retry. For other calls, you can add your own: + +```csharp +var retryPolicy = Policy + .Handle() + .WaitAndRetryAsync(3, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt))); + +await retryPolicy.ExecuteAsync(async () => +{ + var response = await client.CompleteAsync(request); + // ... +}); +``` + +--- + +## Advanced Usage + +### Streaming Responses to Network + +```csharp +var request = new ChatCompletionRequest(model, messages) { Stream = true }; +var response = await client.StreamAsync(request); + +await foreach (var chunk in response) +{ + if (chunk.TextDelta != null) + { + await networkStream.WriteAsync(Encoding.UTF8.GetBytes(chunk.TextDelta)); + } +} +``` + +### Parallel Embedding Batches with Progress + +```csharp +var texts = Enumerable.Range(0, 1000).Select(i => $"Text {i}").ToList(); + +await embeddingService.GetEmbeddingsAsync(texts, + onProgress: progress => + { + Console.WriteLine(progress); // "[Generating embeddings: batch 5/4]" + }); +``` + +### Custom Embedding Service with Different Model + +```csharp +var client = new OpenRouterClient(apiKey); +var customService = new EmbeddingService(client, "your-embedding-model"); + +float[] embedding = await customService.GetEmbeddingAsync("text"); +``` + +--- + +## Limitations + +### No Interface-based Design + +OpenQuery uses concrete classes. For mocking in tests, you'd need to create wrappers or use tools like JustMock/Moq that can mock non-virtual methods (not recommended). Better: define interfaces like `IOpenRouterClient` and have implementations. + +### Hardcoded Concurrency Settings + +`ParallelProcessingOptions` is instantiated in `SearchTool` with hardcoded defaults. To customize, you'd need to: + +1. Subclass `SearchTool` and override access to `_options` +2. Or modify source to accept `ParallelProcessingOptions` in constructor +3. Or use reflection (hacky) + +Suggested improvement: Add constructor parameter. + +### Single Responsibility Blur + +`OpenQueryApp` does query generation + pipeline + streaming. Could split: +- `IQueryGenerator` (for expanding queries) +- `IPipelineExecutor` (for search tool) +- `IAnswerStreamer` (for final LLM streaming) + +Currently, `OpenQueryApp` is the facade. + +--- + +## Next Steps + +- **[Components](../components/overview.md)** - Understand architecture +- **[CLI Reference](../api/cli.md)** - CLI that uses these APIs +- **[Source Code](../)** - Read implementation details + +--- + +**Code Snippet: Full Programmatic Flow** + +```csharp +using OpenQuery.Services; +using OpenQuery.Tools; +using OpenQuery.Models; + +async Task Research(string question) +{ + var apiKey = GetApiKey(); // your method + var client = new OpenRouterClient(apiKey); + var searxng = new SearxngClient("http://localhost:8002"); + var embeddings = new EmbeddingService(client); + var search = new SearchTool(searxng, embeddings); + var app = new OpenQueryApp(client, search, "qwen/qwen3.5-flash-02-23"); + + var options = new OpenQueryOptions( + Chunks: 3, + Results: 5, + Queries: 3, + Short: false, + Long: false, + Verbose: false, + Question: question + ); + + // Capture output by redirecting Console or modifying OpenQueryApp + await app.RunAsync(options); + return "streamed to console"; // would need custom capture +} +``` diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..215c9a2 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,682 @@ +# Architecture + +Deep dive into OpenQuery's system design, architectural patterns, and data flow. + +## πŸ“‹ Table of Contents + +1. [System Overview](#system-overview) +2. [Architectural Patterns](#architectural-patterns) +3. [Component Architecture](#component-architecture) +4. [Data Flow](#data-flow) +5. [Concurrency Model](#concurrency-model) +6. [Error Handling & Resilience](#error-handling--resilience) +7. [Performance Considerations](#performance-considerations) +8. [Design Decisions](#design-decisions) + +## System Overview + +OpenQuery is a **pipeline-based AI application** that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content. + +### Core Design Principles + +1. **Separation of Concerns** - Each component has a single, well-defined responsibility +2. **Parallel First** - Wherever possible, operations are parallelized for speed +3. **Resilient by Default** - Built-in retries, rate limiting, and graceful degradation +4. **Configurable** - Most parameters can be adjusted without code changes +5. **Observable** - Progress reporting and verbose mode for debugging + +### High-Level Architecture + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ User Layer β”‚ +β”‚ CLI (System.CommandLine) β†’ OpenQueryApp β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Orchestration Layer β”‚ +β”‚ OpenQueryApp β†’ SearchTool (4-phase pipeline) β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ β”‚ + β–Ό β–Ό β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Search Layer β”‚ β”‚ Processing Layer β”‚ β”‚ AI Layer β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ SearxngClientβ”‚ β”‚ ArticleService β”‚ β”‚OpenRouterClientβ”‚ +β”‚ β”‚ β”‚ ChunkingService β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ EmbeddingService β”‚ β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## Architectural Patterns + +### 1. Pipeline Pattern + +The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline: + +```csharp +Phase 1: ExecuteParallelSearchesAsync + ↓ (List) +Phase 2: ExecuteParallelArticleFetchingAsync + ↓ (List) +Phase 3: ExecuteParallelEmbeddingsAsync + ↓ ((queryEmbedding, chunkEmbeddings)) +Phase 4: RankAndSelectTopChunks + ↓ (List topChunks) +β†’ Formatted context string returned +``` + +Each phase: +- Accepts input from previous phase +- Processes in parallel where applicable +- Returns output to next phase +- Reports progress via callbacks + +### 2. Service Layer Pattern + +Services (`Services/` directory) are stateless classes that encapsulate specific operations: + +- **Clients**: `OpenRouterClient`, `SearxngClient` (HTTP communication) +- **Processors**: `EmbeddingService`, `ChunkingService` (data transformation) +- **Extractors**: `ArticleService` (content extraction) +- **Infrastructure**: `RateLimiter`, `StatusReporter` (cross-cutting concerns) + +All dependencies are explicit (constructor injection), making services easily testable. + +### 3. Dependency Injection (Manual) + +While not using a DI container, OpenQuery follows DI principles: + +```csharp +// Program.cs: instantiate dependencies with explicit parameters +var client = new OpenRouterClient(apiKey); +var searxngClient = new SearxngClient(searxngUrl); +var embeddingService = new EmbeddingService(client); +var searchTool = new SearchTool(searxngClient, embeddingService); +var openQuery = new OpenQueryApp(client, searchTool, model); +``` + +Benefits: +- Clear dependency graph +- Easy to substitute mocks for testing +- No magic; visible construction + +### 4. Observer Pattern (Progress Reporting) + +`StatusReporter` and progress callbacks implement observer pattern: + +```csharp +// SearchTool receives a progress callback +public Task ExecuteAsync(..., Action? onProgress = null, ...) + +// Components invoke callback at key milestones +onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]"); + +// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback +_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...); +``` + +### 5. Resilience Patterns (Polly) + +`EmbeddingService` uses Polly's retry policy: + +```csharp +_retryPipeline = new ResiliencePipelineBuilder() + .AddRetry(new RetryStrategyOptions + { + MaxRetryAttempts = 3, + Delay = TimeSpan.FromSeconds(1), + BackoffType = DelayBackoffType.Exponential, + ShouldHandle = new PredicateBuilder() + .Handle() + }) + .Build(); +``` + +This automatically retries failed embedding requests with exponential backoff. + +### 6. Producer-Consumer Pattern (Channel-based) + +`StatusReporter` uses `System.Threading.Channels.Channel` for asynchronous progress updates: + +- Producer: `UpdateStatus()` writes messages to channel +- Consumer: Background task `ProcessStatusUpdatesAsync()` reads and displays +- Benefit: No blocking between progress generation and display + +### 7. Disposable Pattern + +Components that hold unmanaged resources implement `IDisposable` or `IAsyncDisposable`: + +- `StatusReporter` - stops background spinner task +- `RateLimiter` - disposes semaphore + +Used via `using` statements for deterministic cleanup. + +## Component Architecture + +### OpenQueryApp (OpenQuery.cs) + +**Role**: Main orchestrator; coordinates the entire workflow + +**Responsibilities**: +- Parse CLI options into `OpenQueryOptions` +- Load configuration and resolve API keys/models +- Optionally generate expanded search queries via LLM +- Invoke `SearchTool` with progress callbacks +- Stream final answer from LLM + +**Key Methods**: +- `RunAsync(OpenQueryOptions)` - Main entry point + +**Interactions**: +- Instantiates `OpenRouterClient` (for both query gen and final answer) +- Instantiates `SearxngClient` (passed to `SearchTool`) +- Instantiates `EmbeddingService` (passed to `SearchTool`) +- Instantiates `SearchTool` (orchestration) +- `StatusReporter` for UI updates + +### SearchTool (Tools/SearchTool.cs) + +**Role**: Core search-retrieve-rank pipeline orchestrator + +**Responsibilities**: +- Execute 4-phase pipeline (search β†’ fetch β†’ embed β†’ rank) +- Manage concurrency limits (via semaphores) +- Coordinate parallel operations +- Generate context string for final answer + +**Interactions**: +- Uses `SearxngClient` for Phase 1 +- Uses `ArticleService` + `ChunkingService` for Phase 2 +- Uses `EmbeddingService` for Phase 3 +- Has no external UI dependency (pure logic) + +**Parallelization Strategy**: +- **Phase 1**: `Task.WhenAll` on search tasks (unbounded but limited by SearxNG instance) +- **Phase 2**: Semaphore (max 10 concurrent fetches) +- **Phase 3**: `Parallel.ForEachAsync` (max 4 concurrent embedding batches) + +### EmbeddingService (Services/EmbeddingService.cs) + +**Role**: Generate vector embeddings with batching, rate limiting, and retries + +**Responsibilities**: +- Batch embedding requests (default: 300 per batch) +- Parallelize batches (default: 4 concurrent) +- Apply rate limiting (via `RateLimiter`) +- Retry failed requests (Polly) +- Calculate cosine similarity + +**Key Methods**: +- `GetEmbeddingsAsync(List texts, ...)` - batch with progress +- `GetEmbeddingAsync(string text)` - single embedding +- `CosineSimilarity(float[], float[])` - static vector math + +**Design Notes**: +- Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint +- Batches of 300 reduce API overhead +- Polly retry handles transient failures (429, 500, network blips) + +### OpenRouterClient (Services/OpenRouterClient.cs) + +**Role**: HTTP client for OpenRouter API (completions + embeddings) + +**Responsibilities**: +- Serialize requests to JSON (source-generated) +- Send HTTP with authorization header +- Stream responses for chat completions (IAsyncEnumerable) +- Return full responses for non-streaming +- Throw on non-2xx status codes + +**Endpoints**: +- POST `/chat/completions` (stream and non-stream) +- POST `/embeddings` + +**Configuration**: +- Base URL: `https://openrouter.ai/api/v1` +- Headers: `Authorization: Bearer {apiKey}`, `Accept: application/json` + +**Design**: +- Low-level client; no retry logic (retry is in `EmbeddingService`) +- Thin wrapper around `HttpClient` +- Could be replaced with `HttpClientFactory` in larger apps + +### SearxngClient (Services/SearxngClient.cs) + +**Role**: HTTP client for SearxNG metasearch + +**Responsibilities**: +- Construct search URL with query param +- GET request and deserialize JSON +- Limit results (`.Take(limit)`) +- Return empty list on failure (no exceptions) + +**Endpoint**: `GET /search?q={query}&format=json` + +**Design**: +- Very simple; no retry (failures are acceptable, OpenQuery continues with other queries) +- `DistinctBy(r => r.Url)` deduplication happens upstream + +### ArticleService (Services/ArticleService.cs) + +**Role**: Extract clean article content from URLs + +**Responsibilities**: +- Call `SmartReader.ParseArticleAsync(url)` +- Return `Article` object with `Title`, `TextContent`, `IsReadable` + +**Design**: +- Single responsibility: extraction only (no fetching, no chunking) +- SmartReader handles all complexity (HTML parsing, boilerplate removal) +- Exceptions propagate to `SearchTool` (handled there) + +### ChunkingService (Services/ChunkingService.cs) + +**Role**: Split long text into 500-char chunks at natural boundaries + +**Algorithm**: +1. Start at index 0 +2. Take up to 500 chars +3. If not at end, backtrack to last space/newline/period +4. Add chunk, advance start +5. Repeat until done + +**Design**: +- Static class (stateless utility) +- No dependencies +- Pure function (input text β†’ output chunks) + +### RateLimiter (Services/RateLimiter.cs) + +**Role**: Limit concurrent operations via semaphore + +**Responsibilities**: +- Wrap actions with semaphore acquisition +- Support both sync and async actions +- Implement `IAsyncDisposable` for cleanup + +**Usage Pattern**: +```csharp +await _rateLimiter.ExecuteAsync(async () => +{ + // operation limited by semaphore + return await SomeApiCall(); +}); +``` + +**Design**: +- Generic wrapper (can return TResult) +- `SemaphoreSlim` initialized at construction +- Used in `EmbeddingService` for parallel embedding batches + +### StatusReporter (Services/StatusReporter.cs) + +**Role**: Show real-time progress with spinner or verbose mode + +**Responsibilities**: +- Maintain spinner animation (background task) +- Receive status updates via channel +- Display updates with appropriate formatting +- Stop spinner on completion + +**Features**: +- **Spinner mode** (non-verbose): `β ‹ Fetching...` with animated Braille chars +- **Verbose mode**: `[Fetching article 1/10: example.com]` on separate lines +- **Thread-safe**: Channel is safe for concurrent writes +- **Non-blocking**: Background spinner doesn't block updates + +**Design**: +- `Channel` for asynchronous producer-consumer +- Background task (`_statusProcessor`) reads from channel +- Spinner runs on its own task with 100ms delay per frame +- `IDisposable` ensures proper cleanup + +### ConfigManager (ConfigManager.cs) + +**Role**: Load and save configuration from/to file + +**Responsibilities**: +- Get config path (XDG: `~/.config/openquery/config`) +- Parse key-value pairs (no INI library, manual parsing) +- Provide `AppConfig` object with defaults +- Save settings back to file + +**Design**: +- Static class (no instances) +- Creates config directory if missing +- Line-by-line parsing (simple, no dependencies) +- Could be improved with proper INI parser or JSON + +## Data Flow + +### End-to-End Data Flow + +``` +User: "What is quantum entanglement?" + +1. OpenQueryOptions created + { Question = "...", Queries = 3, Results = 5, Chunks = 3, ... } + +2. Query Generation (if Queries > 1) + β†’ ChatCompletionRequest to OpenRouter (system prompt for JSON queries) + β†’ Deserialize to List (generatedQueries) + +3. Search Phase + generatedQueries β†’ Parallel.SearxngClient.SearchAsync β†’ ConcurrentBag + β†’ DistinctBy(Url) β†’ List (15 results = 3 queries Γ— 5 results) + +4. Fetch Phase + searchResults β†’ Parallel.ArticleService.FetchArticleAsync β†’ Article + β†’ ChunkingService.ChunkText (split into ~500-char pieces) + β†’ ConcurrentBag (could be 50-100 chunks from 15 articles) + +5. Embedding Phase + originalQuery β†’ EmbeddingService.GetEmbeddingAsync β†’ float[] (queryEmbedding) + chunk.Contents β†’ EmbeddingService.GetEmbeddingsAsync β†’ float[][] (chunkEmbeddings) + +6. Ranking Phase + For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding) + OrderByDescending(Score).Take(3) β†’ topChunks (final 3 chunks) + +7. Answer Phase + context = string.Join("\n\n", topChunks.Select(...)) + β†’ ChatCompletionRequest to OpenRouter with context + question + β†’ StreamAsync β†’ Console.Write(delta) (real-time display) + +Result: User sees answer with [Source N] citations +``` + +### Data Structures + +**Chunk** - The core data structure flowing through the pipeline: +```csharp +public record Chunk( + string Content, // Text content (~500 chars) + string SourceUrl, // Where it came from + string? Title = null // Article title +) +{ + public float[]? Embedding { get; set; } // Added in Phase 3 + public float Score { get; set; } // Added in Phase 4 +} +``` + +**Data Flow State**: +- Phase 1-2: `Chunk` without embedding +- Phase 3: `Chunk.Embedding` populated +- Phase 4: `Chunk.Score` populated +- Phase 5: Serialized into context string + +### Memory Footprint + +**Per 15-article run (approximate)**: +- Raw HTML (fetched): ~5MB (transient, discarded after extract) +- Articles: ~500KB (15 articles Γ— ~30KB extracted text) +- Chunks: ~50-100 items Γ— 500 chars β‰ˆ 25-50KB text +- Embeddings: ~50-100 Γ— 1536 floats Γ— 4 bytes β‰ˆ 300-600KB +- Total peak: ~1-2MB (excluding OpenRouter's memory usage) + +**Note**: AOT compilation reduces runtime memory compared to JIT. + +## Concurrency Model + +OpenQuery uses multiple parallelization strategies: + +### Unbounded Parallelism (Task.WhenAll) + +**Where**: Search queries (Phase 1) + +```csharp +var searchTasks = generatedQueries.Select(async query => { ... }); +await Task.WhenAll(searchTasks); +``` + +**Rationale**: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally. + +**Risk**: Could overwhelm SearxNG if `--queries` is set very high (100+). Default 3 is safe. + +### Semaphore-Controlled Parallelism + +**Where**: Article fetching (Phase 2) + +```csharp +var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10 +await Task.WhenAll(fetchTasks); // Each task waits on semaphore +``` + +**Rationale**: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast. + +**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentArticleFetches` (compile-time constant currently). + +### Parallel.ForEachAsync with MaxDegreeOfParallelism + +**Where**: Embedding batch processing (Phase 3) + +```csharp +await Parallel.ForEachAsync( + batchIndices, + new ParallelOptions { MaxDegreeOfParallelism = 4 }, + async (batchIndex, ct) => { ... } +); +``` + +**Rationale**: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default. + +**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentEmbeddingRequests` (compile-time). + +### Progress Reporting (Channel) + +**Where**: All phases pass `onProgress` callback + +**Implementation**: +- `StatusReporter.UpdateStatus()` β†’ writes to channel +- Background task reads channel and displays +- Non-blocking; callbacks are fire-and-forget (TryWrite) + +**Thread Safety**: Channel is thread-safe; multiple phases may write concurrently. + +## Error Handling & Resilience + +### HTTP Errors + +**OpenRouterClient**: +- Calls `response.EnsureSuccessStatusCode()` β†’ throws `HttpRequestException` on 4xx/5xx +- No retry (handled at higher level in `EmbeddingService`) + +**SearxngClient**: +- Returns empty `List` on non-success +- No exception thrown (searches are non-critical; if some queries fail, others proceed) + +### Retry Policy (Polly) + +**Location**: `EmbeddingService` constructor + +**Scope**: Only embedding requests (`_client.EmbedAsync`) + +**Policy**: +- Max 3 attempts +- Exponential backoff: 1s, 2s, 4s +- Only retries `HttpRequestException` (network errors, 429, 5xx) + +**Why not on chat completions?** +- Query generation and final answer are critical; failures should surface immediately +- Could be added in future if transient failures are common + +### Graceful Degradation + +**Query Generation Failure**: +```csharp +try { ... generate queries ... } +catch (Exception ex) +{ + // Fall back to original question as sole query + if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]"); +} +``` + +**Embedding Batch Failure**: +```csharp +catch +{ + // Return empty embeddings for this batch (chunk will be filtered out) + var emptyBatch = new float[batch.Count][]; + // fill with empty arrays + results.Add((batchIndex, emptyBatch)); +} +``` + +**Article Fetch Failure**: +```csharp +try { await ArticleService.FetchArticleAsync(url); } +catch (Exception ex) +{ + if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}"); + // Chunk not added; continue with others +} +``` + +### User-Facing Errors + +Top-level exception handler in `Program.cs`: +```csharp +try { await openQuery.RunAsync(options); } +catch (HttpRequestException ex) +{ + Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}"); + Environment.Exit(1); +} +catch (Exception ex) +{ + Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}"); + Environment.Exit(1); +} +``` + +### Cancellation Support + +`OpenRouterClient.StreamAsync` and `EmbeddingService` methods accept `CancellationToken`. + +Used in: +- Streaming answer (Ctrl+C stops immediately) +- Parallel embeddings (can be cancelled) + +## Performance Considerations + +### Latency Breakdown (Typical) + +| Stage | Time | Description | +|-------|------|-------------| +| Query generation | 2-5s | LLM generates 3-5 queries | +| Searches | 3-8s | 3-5 parallel SearxNG queries | +| Article fetching | 5-15s | 10-20 parallel fetches (network + parse) | +| Embeddings | 2-4s | 50-100 chunks in 4-parallel batches | +| Final answer | 5-20s | Depends on answer length (streaming) | +| **Total** | **15-50s** | Varies widely based on network & content | + +### Bottlenecks + +1. **Network I/O** (article fetching, API calls) - can't be CPU-bound +2. **OpenRouter API latency** - varies by model and load +3. **SmartReader parsing** - CPU-bound for large HTML +4. **Embedding API rate** - OpenRouter may rate limit if too many concurrent + +### Optimization Strategies + +- **Parallelism**: Already maximized within API constraints +- **Caching**: Not implemented; future enhancement could cache embeddings per URL +- **Batching**: 300-chunk batches reduce API overhead +- **AOT**: Native compilation reduces startup overhead vs JIT + +### Scalability Limits + +- **Memory**: Scales with number of chunks. 100 chunks Γ— 1536 floats Γ— 4 bytes β‰ˆ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n). +- **API Limits**: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s. +- **SearxNG Limits**: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope). + +## Design Decisions + +### Why Not Use a DI Container? + +OpenQuery manually wires dependencies in `Program.cs`. For a small CLI app, this is: +- Simpler (no container configuration) +- More explicit (easy to trace dependency graph) +- No runtime overhead +- Easier to understand for contributors + +Container-based DI would be overengineering. + +### Why AOT? + +.NET 10 AOT provides: +- **Fast startup** (<100ms vs ~500ms JIT) +- **Smaller footprint** (trimmed, no JIT) +- **No runtime dependencies** (self-contained) +- Better for CLI tools distributed to users + +Trade-offs: +- Longer build time +- Some reflection-based APIs not supported (not needed here) +- Less flexible (can't load dynamic assemblies, but not needed) + +### Why SmartReader for Article Extraction? + +SmartReader uses Readability-based algorithm similar to Firefox Reader View: +- Removes ads, navigation, comments, boilerplate +- Extracts main article content +- Handles malformed HTML gracefully +- Zero dependencies (pure .NET) + +Alternatives considered: +- `HtmlAgilityPack` (too low-level, need to implement extraction logic) +- `AngleSharp` (similar, still need extraction) +- External services (like diffbot) - require API keys, costs money + +SmartReader is the sweet spot: free, good quality, easy integration. + +### Why Embeddings + Cosine Similarity vs Full-Text Search? + +Full-text search (like Lucene) would: +- Require inverted index, more complex +- Be faster for exact keyword matching +- Not understand semantic similarity + +Embeddings provide: +- Semantic similarity (understand meaning, not just keywords) +- Simple math (cosine similarity of float arrays) +- No index to maintain (just compute on-the-fly) + +Trade-off: Embedding API cost and latency vs LRU cache potential. + +### Why Not RAG (Retrieval Augmented Generation) Framework? + +OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would: +- Add dependency bloat +- Reduce control +- Increase abstraction complexity + +Custom implementation is ~1000 LOC and perfectly matches needs. + +### Why System.CommandLine? + +Provides: +- Native-like CLI help (`openquery --help`) +- Strongly-typed options +- Command hierarchy (main + subcommands) +- Good error messages + +Alternative: `CommandLineParser` (older) or manual parsing. System.CommandLine is modern and actively developed. + +--- + +## Next Steps + +- [Components](components/overview.md) - Deep dive into each module +- [API Reference](api/cli.md) - Complete command documentation +- [Troubleshooting](troubleshooting.md) - Debug issues + +--- + +**Want to extend OpenQuery?** Check the [Components](components/overview.md) guide to understand each piece. diff --git a/docs/components/models.md b/docs/components/models.md new file mode 100644 index 0000000..8db2cf6 --- /dev/null +++ b/docs/components/models.md @@ -0,0 +1,528 @@ +# Models Reference + +Complete reference for all data models, DTOs, and records in OpenQuery. + +## πŸ“‹ Table of Contents + +1. [Core Data Models](#core-data-models) +2. [OpenRouter API Models](#openrouter-api-models) +3. [SearxNG API Models](#searxng-api-models) +4. [JSON Serialization](#json-serialization) +5. [Model Relationships](#model-relationships) + +## Core Data Models + +### OpenQueryOptions + +**Location**: `Models/OpenQueryOptions.cs` +**Type**: `record` +**Purpose**: Immutable options object for a single query execution + +```csharp +public record OpenQueryOptions( + int Chunks, // Number of top chunks to include in context + int Results, // Search results per generated query + int Queries, // Number of search queries to generate (if >1) + bool Short, // Request concise answer + bool Long, // Request detailed answer + bool Verbose, // Enable verbose logging + string Question // Original user question (required) +); +``` + +**Lifecycle**: +- Created in `Program.cs` by combining CLI options, config defaults, and environment variables +- Passed to `OpenQueryApp.RunAsync(options)` + +**Validation**: None (assumes valid values from CLI parser/config) + +**Example**: +```csharp +var options = new OpenQueryOptions( + Chunks: 3, + Results: 5, + Queries: 3, + Short: false, + Long: false, + Verbose: true, + Question: "What is quantum entanglement?" +); +``` + +--- + +### Chunk + +**Location**: `Models/Chunk.cs` +**Type**: `record` +**Purpose**: Content chunk with metadata, embedding, and relevance score + +```csharp +public record Chunk( + string Content, // Text content (typically ~500 chars) + string SourceUrl, // Original article URL + string? Title = null // Article title (optional, may be null) +) +{ + public float[]? Embedding { get; set; } // Vector embedding (1536-dim for text-embedding-3-small) + public float Score { get; set; } // Relevance score (0-1, higher = more relevant) +} +``` + +**Lifecycle**: +1. **Created** in `SearchTool.ExecuteParallelArticleFetchingAsync`: + ```csharp + chunks.Add(new Chunk(chunkText, result.Url, article.Title)); + ``` + At this point: `Embedding = null`, `Score = 0` + +2. **Embedded** in `SearchTool.ExecuteParallelEmbeddingsAsync`: + ```csharp + validChunks[i].Embedding = validEmbeddings[i]; + ``` + +3. **Scored** in `SearchTool.RankAndSelectTopChunks`: + ```csharp + chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!); + ``` + +4. **Formatted** into context string: + ```csharp + $"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}" + ``` + +**Properties**: +- `Content`: Never null/empty (filters empty chunks in `ChunkingService`) +- `SourceUrl`: Always provided (from `SearxngResult.Url`) +- `Title`: May be null if article extraction failed to get title +- `Embedding`: Null until phase 3; may remain null if embedding failed +- `Score`: 0 until phase 4; irrelevant for non-embedded chunks + +**Equality**: Records use value equality (all properties compared). Two chunks with same content/url/title are equal; embeddings and scores ignored for equality (as they're mutable). + +--- + +### ParallelProcessingOptions + +**Location**: `Models/ParallelOptions.cs` +**Type**: `class` +**Purpose**: Configuration for parallel/concurrent operations + +```csharp +public class ParallelProcessingOptions +{ + public int MaxConcurrentArticleFetches { get; set; } = 10; + public int MaxConcurrentEmbeddingRequests { get; set; } = 4; + public int EmbeddingBatchSize { get; set; } = 300; +} +``` + +**Usage**: +- Instantiated in `SearchTool` constructor (hardcoded new) +- Passed to `EmbeddingService` constructor +- Read by `SearchTool` for article fetching semaphore + +**Default Values**: +| Property | Default | Effect | +|----------|---------|--------| +| `MaxConcurrentArticleFetches` | 10 | Up to 10 articles fetched simultaneously | +| `MaxConcurrentEmbeddingRequests` | 4 | Up to 4 embedding batches in parallel | +| `EmbeddingBatchSize` | 300 | Each embedding API call handles up to 300 texts | + +**Current Limitation**: These are **compile-time defaults** (hardcoded in `SearchTool.cs`). To make them configurable: +1. Add to `AppConfig` +2. Read in `ConfigManager` +3. Pass through `SearchTool` constructor + +--- + +## OpenRouter API Models + +**Location**: `Models/OpenRouter.cs` +**Purpose**: DTOs for OpenRouter's REST API (JSON serialization) + +### Chat Completion + +#### `ChatCompletionRequest` +```csharp +public record ChatCompletionRequest( + [property: JsonPropertyName("model")] string Model, + [property: JsonPropertyName("messages")] List Messages, + [property: JsonPropertyName("tools")] List? Tools = null, + [property: JsonPropertyName("stream")] bool Stream = false +); +``` + +**Example**: +```json +{ + "model": "qwen/qwen3.5-flash-02-23", + "messages": [ + { "role": "system", "content": "You are a helpful assistant." }, + { "role": "user", "content": "What is 2+2?" } + ], + "stream": true +} +``` + +#### `Message` +```csharp +public record Message( + [property: JsonPropertyName("role")] string Role, + [property: JsonPropertyName("content")] string? Content = null, + [property: JsonPropertyName("tool_calls")] List? ToolCalls = null, + [property: JsonPropertyName("tool_call_id")] string? ToolCallId = null +) +{ + // Factory method for tool responses + public static Message FromTool(string content, string toolCallId) => + new Message("tool", content, null, toolCallId); +} +``` + +**Roles**: `"system"`, `"user"`, `"assistant"`, `"tool"` + +**Usage**: +- `Content` for text messages +- `ToolCalls` when assistant requests tool use +- `ToolCallId` when responding to tool call + +#### `ChatCompletionResponse` +```csharp +public record ChatCompletionResponse( + [property: JsonPropertyName("choices")] List Choices, + [property: JsonPropertyName("usage")] Usage? Usage = null +); + +public record Choice( + [property: JsonPropertyName("message")] Message Message, + [property: JsonPropertyName("finish_reason")] string? FinishReason = null +); +``` + +**Response Example**: +```json +{ + "choices": [ + { + "message": { + "role": "assistant", + "content": "Answer text..." + }, + "finish_reason": "stop" + } + ], + "usage": { + "prompt_tokens": 100, + "completion_tokens": 50, + "total_tokens": 150 + } +} +``` + +#### `Usage` +```csharp +public record Usage( + [property: JsonPropertyName("prompt_tokens")] int PromptTokens, + [property: JsonPropertyName("completion_tokens")] int CompletionTokens, + [property: JsonPropertyName("total_tokens")] int TotalTokens +); +``` + +### Tool Calling (Not Currently Used) + +#### `ToolDefinition` / `ToolFunction` +```csharp +public record ToolDefinition( + [property: JsonPropertyName("type")] string Type, // e.g., "function" + [property: JsonPropertyName("function")] ToolFunction Function +); + +public record ToolFunction( + [property: JsonPropertyName("name")] string Name, + [property: JsonPropertyName("description")] string Description, + [property: JsonPropertyName("parameters")] JsonElement Parameters // JSON Schema +); +``` + +#### `ToolCall` / `FunctionCall` +```csharp +public record ToolCall( + [property: JsonPropertyName("id")] string Id, + [property: JsonPropertyName("type")] string Type, + [property: JsonPropertyName("function")] FunctionCall Function +); + +public record FunctionCall( + [property: JsonPropertyName("name")] string Name, + [property: JsonPropertyName("arguments")] string Arguments // JSON string +); +``` + +**Note**: OpenQuery doesn't use tools currently, but models are defined for future tool-calling capability. + +### Streaming + +#### `StreamChunk` +```csharp +public record StreamChunk( + string? TextDelta = null, + ClientToolCall? Tool = null +); +``` + +Yielded by `OpenRouterClient.StreamAsync()` for each SSE event. + +#### `ChatCompletionChunk` (Server Response) +```csharp +public record ChatCompletionChunk( + [property: JsonPropertyName("choices")] List Choices +); + +public record ChunkChoice( + [property: JsonPropertyName("delta")] ChunkDelta Delta +); + +public record ChunkDelta( + [property: JsonPropertyName("content")] string? Content = null, + [property: JsonPropertyName("tool_calls")] List? ToolCalls = null +); +``` + +**Streaming Response Example** (SSE): +``` +data: {"choices":[{"delta":{"content":"Hello"}}]} +data: {"choices":[{"delta":{"content":" world"}}]} +data: [DONE] +``` + +`OpenRouterClient.StreamAsync` parses and yields `StreamChunk` with non-null `TextDelta` for content. + +### Embeddings + +#### `EmbeddingRequest` +```csharp +public record EmbeddingRequest( + [property: JsonPropertyName("model")] string Model, + [property: JsonPropertyName("input")] List Input +); +``` + +**Example**: +```json +{ + "model": "openai/text-embedding-3-small", + "input": ["text 1", "text 2", ...] +} +``` + +#### `EmbeddingResponse` +```csharp +public record EmbeddingResponse( + [property: JsonPropertyName("data")] List Data, + [property: JsonPropertyName("usage")] Usage Usage +); + +public record EmbeddingData( + [property: JsonPropertyName("embedding")] float[] Embedding, + [property: JsonPropertyName("index")] int Index +); +``` + +**Response Example**: +```json +{ + "data": [ + { "embedding": [0.1, 0.2, ...], "index": 0 }, + { "embedding": [0.3, 0.4, ...], "index": 1 } + ], + "usage": { + "prompt_tokens": 100, + "total_tokens": 100 + } +} +``` + +**Note**: `_client.EmbedAsync` orders by `index` to match input order. + +--- + +## SearxNG API Models + +**Location**: `Models/Searxng.cs` +**Purpose**: DTOs for SearxNG's JSON response format + +### `SearxngRoot` +```csharp +public record SearxngRoot( + [property: JsonPropertyName("results")] List Results +); +``` + +Top-level response object. + +### `SearxngResult` +```csharp +public record SearxngResult( + [property: JsonPropertyName("title")] string Title, + [property: JsonPropertyName("url")] string Url, + [property: JsonPropertyName("content")] string Content // Snippet/description +); +``` + +**Fields**: +- `Title`: Result title (from page `` or OpenGraph) +- `Url`: Absolute URL to article +- `Content`: Short snippet (~200 chars) from search engine + +**Usage**: +- `Url` passed to `ArticleService.FetchArticleAsync` +- `Title` used as fallback if article extraction fails +- `Content` currently unused (could be for quick answer without fetching) + +**Example Response**: +```json +{ + "results": [ + { + "title": "Quantum Entanglement - Wikipedia", + "url": "https://en.wikipedia.org/wiki/Quantum_entanglement", + "content": "Quantum entanglement is a physical phenomenon..." + } + ] +} +``` + +--- + +## JSON Serialization + +### JsonContext (Source Generation) + +**Location**: `Models/JsonContexts.cs` +**Purpose**: Provide source-generated JSON serializer context for AOT compatibility + +#### Declaration +```csharp +[JsonSerializable(typeof(ChatCompletionRequest))] +[JsonSerializable(typeof(ChatCompletionResponse))] +[JsonSerializable(typeof(ChatCompletionChunk))] +[JsonSerializable(typeof(EmbeddingRequest))] +[JsonSerializable(typeof(EmbeddingResponse))] +[JsonSerializable(typeof(SearxngRoot))] +[JsonJsonSerializer(typeof(List<string>))] +internal partial class AppJsonContext : JsonSerializerContext +{ +} +``` + +**Usage**: +```csharp +var json = JsonSerializer.Serialize(request, AppJsonContext.Default.ChatCompletionRequest); +var response = JsonSerializer.Deserialize(json, AppJsonContext.Default.ChatCompletionResponse); +``` + +**Benefits**: +- **AOT-compatible**: No reflection, works with PublishAot=true +- **Performance**: Pre-compiled serializers are faster +- **Trimming safe**: Unused serializers trimmed automatically + +**Generated**: Partial class compiled by source generator (no manual implementation) + +**Important**: Must include ALL types that will be serialized/deserialized in `[JsonSerializable]` attributes, otherwise runtime exception in AOT. + +--- + +## Model Relationships + +### Object Graph (Typical Execution) + +``` +OpenQueryOptions + ↓ +OpenQueryApp.RunAsync() + β”‚ + β”œβ”€ queryGenerationMessages (List<Message>) + β”‚ β”œβ”€ system: "You are an expert researcher..." + β”‚ └─ user: "Generate N queries for: {question}" + β”‚ ↓ + β”‚ ChatCompletionRequest β†’ OpenRouter β†’ ChatCompletionResponse + β”‚ ↓ + β”‚ List<string> generatedQueries + β”‚ + β”œβ”€ SearchTool.ExecuteAsync() + β”‚ ↓ + β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ Phase 1: Parallel Searches β”‚ + β”‚ β”‚ SearxngClient.SearchAsync(query) Γ— N + β”‚ β”‚ β†’ List<SearxngResult> β”‚ + β”‚ β”‚ (Title, Url, Content) β”‚ + β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ ↓ + β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ Phase 2: Article Fetch & Chunking β”‚ + β”‚ β”‚ ArticleService.FetchAsync(Url) Γ— M + β”‚ β”‚ β†’ Article (TextContent, Title) + β”‚ β”‚ β†’ ChunkingService.ChunkText β†’ List<string> chunks + β”‚ β”‚ β†’ Chunk(content, url, title) Γ— K β”‚ + β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ ↓ + β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ Phase 3: Embeddings β”‚ + β”‚ β”‚ EmbeddingService.GetEmbeddingsAsync(chunkContents) + β”‚ β”‚ β†’ float[][] chunkEmbeddings β”‚ + β”‚ β”‚ β†’ Set chunk.Embedding for each β”‚ + β”‚ β”‚ Also: GetEmbeddingAsync(question) β†’ float[] queryEmbedding + β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ ↓ + β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ Phase 4: Ranking β”‚ + β”‚ β”‚ For each chunk: Score = CosineSimilarity(queryEmbedding, chunk.Embedding) + β”‚ β”‚ β†’ Set chunk.Score β”‚ + β”‚ β”‚ β†’ OrderByDescending(Score) β”‚ + β”‚ β”‚ β†’ Take(topChunksLimit) β†’ topChunks (List<Chunk>) + β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ ↓ + β”‚ Context string: formatted topChunks + β”‚ ↓ + └─ OpenQueryApp β†’ final ChatCompletionRequest + System: "Answer based on context..." + User: "Context:\n{context}\n\nQuestion: {question}" + ↓ + StreamAsync() β†’ StreamChunk.TextDelta β†’ Console +``` + +### Record Immutability + +Most DTOs are `record` types: +- **Immutable**: Properties are init-only (`{ get; init; }`) +- **Value semantics**: Equality based on content +- **Thread-safe**: Can be shared across threads + +**Exception**: +- `Chunk`: Has mutable properties `Embedding` and `Score` (set during pipeline) +- `ParallelProcessingOptions`: Class with mutable setters +- `AppConfig`: Class with mutable setters + +--- + +## Next Steps + +- **[API Reference](../../api/cli.md)** - How these models are used in CLI commands +- **[OpenRouterClient](../../services/OpenRouterClient.md)** - Uses OpenRouter models +- **[SearxngClient](../../services/SearxngClient.md)** - Uses Searxng models +- **[SearchTool](../../components/search-tool.md)** - Orchestrates all models + +--- + +**Quick Reference Table** + +| Model | Category | Purpose | Mutable? | +|-------|----------|---------|----------| +| `OpenQueryOptions` | Core | CLI options | No (record) | +| `Chunk` | Core | Content + metadata + ranking | Partially (Embedding, Score) | +| `ParallelProcessingOptions` | Config | Concurrency settings | Yes (class) | +| `ChatCompletionRequest/Response` | OpenRouter | LLM API | No | +| `EmbeddingRequest/Response` | OpenRouter | Embeddings API | No | +| `SearxngRoot/Result` | SearxNG | Search results | No | +| `AppJsonContext` | Internal | JSON serialization | No (generated partial) | diff --git a/docs/components/openquery-app.md b/docs/components/openquery-app.md new file mode 100644 index 0000000..41cdeec --- /dev/null +++ b/docs/components/openquery-app.md @@ -0,0 +1,395 @@ +# OpenQueryApp Component + +Deep dive into the `OpenQueryApp` class - the main application orchestrator. + +## Overview + +`OpenQueryApp` is the heart of OpenQuery. It coordinates all components, manages the workflow from question to answer, and handles progress reporting. + +## Location +`OpenQuery.cs` in project root + +## Class Definition + +```csharp +public class OpenQueryApp +{ + private readonly OpenRouterClient _client; + private readonly SearchTool _searchTool; + private readonly string _model; + + public OpenQueryApp( + OpenRouterClient client, + SearchTool searchTool, + string model); + + public async Task RunAsync(OpenQueryOptions options); +} +``` + +**Dependencies**: +- `OpenRouterClient` - for query generation and final answer streaming +- `SearchTool` - for search-retrieve-rank pipeline +- `string _model` - model identifier to use for LLM calls + +**Lifecycle**: Instantiated once per query execution in `Program.cs`, then `RunAsync()` called once. + +## RunAsync Workflow + +```csharp +public async Task RunAsync(OpenQueryOptions options) +{ + // 1. Setup + using var reporter = new StatusReporter(options.Verbose); + reporter.StartSpinner(); + + // 2. Query Generation (if needed) + List<string> queries = await GenerateQueriesIfNeededAsync(options, reporter); + + // 3. Search Pipeline + string searchResult = await ExecuteSearchPipelineAsync(options, queries, reporter); + + // 4. Final Answer Streaming + await StreamFinalAnswerAsync(options, searchResult, reporter); +} +``` + +### Step 1: Status Reporter Setup + +```csharp +using var reporter = new StatusReporter(options.Verbose); +reporter.StartSpinner(); +``` + +- Creates `StatusReporter` (implements `IDisposable`) +- Starts spinner animation (unless verbose) +- `using` ensures disposal on exit + +### Step 2: Query Generation + +**When**: `options.Queries > 1` (user wants multiple search queries) + +**Purpose**: Use LLM to generate diverse, optimized search queries from the original question + +**System Prompt** (hardcoded in `OpenQuery.cs`): +``` +You are an expert researcher. The user will ask a question. Your task is to +generate optimal search queries to gather comprehensive information. + +Instructions: +1. Break down complex questions. +2. Use synonyms and alternative phrasing. +3. Target different aspects (entities, mechanisms, pros/cons, history). + +CRITICAL: Output must be a valid JSON array of strings ONLY. No markdown, +explanations, or other text. +``` + +**Request**: +```csharp +var queryGenMessages = new List<Message> +{ + new Message("system", systemPrompt), + new Message("user", $"Generate {options.Queries} distinct search queries for:\n{options.Question}") +}; +var request = new ChatCompletionRequest(_model, queryGenMessages); +var response = await _client.CompleteAsync(request); +``` + +**Response Parsing**: +```csharp +var content = response.Choices.FirstOrDefault()?.Message.Content; +if (!string.IsNullOrEmpty(content)) +{ + // Remove markdown code fences if present + content = Regex.Replace(content, @"```json\s*|\s*```", "").Trim(); + + // Deserialize to List<string> + var generatedQueries = JsonSerializer.Deserialize(content, AppJsonContext.Default.ListString); + if (generatedQueries != null && generatedQueries.Count > 0) + { + queries = generatedQueries; + } +} +``` + +**Fallback**: If any step fails (exception, null, empty, invalid JSON), use `new List<string> { options.Question }` (single query = original) + +**Note**: Query generation reuses the same model as final answer. This could be optimized: +- Use cheaper/faster model for query gen +- Separate model configuration +- Cache query generation results + +### Step 3: Search Pipeline Execution + +```csharp +var searchResult = await _searchTool.ExecuteAsync( + options.Question, + queries, + options.Results, + options.Chunks, + (progress) => { + if (options.Verbose) + reporter.WriteLine(progress); + else + reporter.UpdateStatus(parsedMessage); + }, + options.Verbose); +``` + +**Parameters**: +- `originalQuery`: User's original question (used for final embedding) +- `generatedQueries`: From step 2 (or fallback) +- `maxResults`: `options.Results` (search results per query) +- `topChunksLimit`: `options.Chunks` (top N chunks to return) +- `onProgress`: Callback to update UI +- `verbose`: Passed through to `SearchTool` + +**Returns**: `string context` - formatted context with source citations + +**Progress Handling**: +- In verbose mode: all progress printed as lines (via `reporter.WriteLine()`) +- In compact mode: parse progress messages to show concise status (e.g., "Fetching articles 3/10...") + +### Step 4: Final Answer Streaming + +**Status Update**: +```csharp +if (!options.Verbose) + reporter.UpdateStatus("Asking AI..."); +else +{ + reporter.ClearStatus(); + Console.WriteLine(); +} +``` + +**Build System Prompt**: +```csharp +var systemPrompt = "You are a helpful AI assistant. Answer the user's question in depth, based on the provided context. Be precise and accurate. You can mention sources or citations."; +if (options.Short) systemPrompt += " Give a very short concise answer."; +if (options.Long) systemPrompt += " Give a long elaborate detailed answer."; +``` + +**Prompt Structure**: +``` +System: {systemPrompt} +User: Context: +{searchResult} + +Question: {options.Question} +``` + +Where `searchResult` is: +``` +[Source 1: Title](URL) +Content chunk 1 + +[Source 2: Title](URL) +Content chunk 2 + +... +``` + +**Streaming**: +```csharp +var requestStream = new ChatCompletionRequest(_model, messages); +var assistantResponse = new StringBuilder(); +var isFirstChunk = true; + +using var streamCts = new CancellationTokenSource(); +await foreach (var chunk in _client.StreamAsync(requestStream, streamCts.Token)) +{ + if (chunk.TextDelta == null) continue; + + if (isFirstChunk) + { + reporter.StopSpinner(); + if (!options.Verbose) reporter.ClearStatus(); + else Console.Write("Assistant: "); + isFirstChunk = false; + } + + Console.Write(chunk.TextDelta); + assistantResponse.Append(chunk.TextDelta); +} +``` + +**Key Points**: +- `StreamAsync` yields `StreamChunk` objects (text deltas) +- First chunk stops spinner and clears status line +- Each delta written to Console immediately (real-time feel) +- Entire response accumulated in `assistantResponse` (though not used elsewhere) +- `CancellationTokenSource` passed but not canceled (Ctrl+C would cancel from outside) + +**Finally Block**: +```csharp +finally +{ + reporter.StopSpinner(); +} +``` +Ensures spinner stops even if streaming fails. + +**End**: +```csharp +Console.WriteLine(); // Newline after complete answer +``` + +## Error Handling + +`RunAsync` itself does not catch exceptions. All exceptions propagate to `Program.cs`: + +```csharp +try +{ + var openQuery = new OpenQueryApp(client, searchTool, model); + await openQuery.RunAsync(options); +} +catch (HttpRequestException ex) +{ + Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}"); + Environment.Exit(1); +} +catch (Exception ex) +{ + Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}"); + Environment.Exit(1); +} +``` + +**Common Exceptions**: +- `HttpRequestException` - network failures, API errors +- `JsonException` - malformed JSON from API +- `TaskCanceledException` - timeout or user interrupt +- `Exception` - anything else + +**No Retries at This Level**: Fail fast; user sees error immediately. Lower-level retries exist (embedding service). + +## Performance Characteristics + +**Query Generation**: +- One non-streaming LLM call +- Takes 2-5 seconds depending on model +- Typically <1000 tokens + +**Search Pipeline** (`SearchTool.ExecuteAsync`): +- See `SearchTool.md` for detailed timing breakdown +- Total 10-30 seconds typically + +**Final Answer Streaming**: +- Streaming LLM call +- Time depends on answer length (typically 5-20 seconds) +- User sees words appear progressively + +**Total End-to-End**: 15-50 seconds for typical query + +## Design Decisions + +### Why Not Stream Query Generation? + +Query generation currently uses `CompleteAsync` (non-streaming). Could be streamed but: +- Queries are short (JSON array) +- Streaming offers no UX benefit (user doesn't see intermediate queries) +- Simpler to wait for all queries before proceeding + +### Why Build Prompt Manually Instead of Templates? + +Simple string concatenation is fine for few prompts. Pros: +- No template dependencies +- Easy to read and modify +- No runtime compilation overhead + +Cons: +- No validation +- Could benefit from prompt engineering framework + +### Why Accumulate `assistantResponse` StringBuilder? + +Currently built but not used. Could be: +- Saved to file (future feature: `--output file.md`) +- Analyzed for token counting +- Removed if not needed + +### Could Query Generation Be Cached? + +Yes! For repeated questions (common in scripts), cache query results: +- `Dictionary<string, List<string>>` cache in memory +- Or persistent cache (Redis, file) +- Not implemented (low priority) + +### Single Responsibility Violation? + +`OpenQueryApp` does: +- Query generation +- Pipeline orchestration +- Answer streaming + +That's 3 responsibilities, but they're tightly coupled to the "query β†’ answer" workflow. Separating them would add complexity without clear benefit. Acceptable as "application coordinator". + +## Extension Points + +### Adding New Model for Query Generation + +Currently uses same `_model` for queries and answer. To use different models: + +1. Add `queryGenerationModel` parameter to constructor +2. Use it for query gen: `new ChatCompletionRequest(queryGenerationModel, queryGenMessages)` +3. Keep `_model` for final answer + +Or make it configurable via environment variable: `OPENROUTER_QUERY_MODEL` + +### Post-Processing Answer + +Opportunity to add: +- Source citation formatting (footnotes, clickable links) +- Answer summarization +- Export to Markdown/JSON +- Text-to-speech + +Add after streaming loop, before final newline. + +### Progress UI Enhancement + +Current `StatusReporter` is basic. Could add: +- Progress bar with percentage +- ETA calculation +- Colors (ANSI) for different message types +- Logging to file +- Web dashboard + +Would require extending `StatusReporter` or replacing it. + +## Testing Considerations + +**Challenges**: +- `RunAsync` is cohesive (hard to unit test in isolation) +- Depends on many services (need mocks) +- Asynchronous and streaming + +**Recommended Approach**: +1. Extract interfaces: + - `ISearchTool` (wrapper around `SearchTool`) + - `IOpenRouterClient` (wrapper around `OpenRouterClient`) +2. Mock interfaces in tests +3. Test query generation parsing separately +4. Test progress callback counting +5. Test final answer prompt construction + +**Integration Tests**: +- End-to-end with real/mocked APIs +- Automated tests with test SearxNG/OpenRouter instances + +## Related Components + +- **[SearchTool](search-tool.md)** - pipeline executed by `OpenQueryApp` +- **[Program.cs](../Program.md)** - creates `OpenQueryApp` +- **[StatusReporter](../services/StatusReporter.md)** - progress UI used by `OpenQueryApp` + +--- + +## Next Steps + +- [SearchTool](search-tool.md) - See the pipeline in detail +- [Services](../services/overview.md) - Understand each service +- [CLI Reference](../../api/cli.md) - How users invoke this diff --git a/docs/components/overview.md b/docs/components/overview.md new file mode 100644 index 0000000..6ee0edb --- /dev/null +++ b/docs/components/overview.md @@ -0,0 +1,603 @@ +# Components Overview + +Detailed documentation for each major component in the OpenQuery system. + +## πŸ“‹ Table of Contents + +1. [Component Hierarchy](#component-hierarchy) +2. [Core Components](#core-components) +3. [Services](#services) +4. [Data Models](#data-models) +5. [Component Interactions](#component-interactions) + +## Component Hierarchy + +``` +OpenQuery/ +β”œβ”€β”€ Program.cs [Entry Point, CLI] +β”œβ”€β”€ OpenQuery.cs [OpenQueryApp - Orchestrator] +β”œβ”€β”€ Tools/ +β”‚ └── SearchTool.cs [Pipeline Orchestration] +β”œβ”€β”€ Services/ +β”‚ β”œβ”€β”€ OpenRouterClient.cs [LLM & Embedding API] +β”‚ β”œβ”€β”€ SearxngClient.cs [Search API] +β”‚ β”œβ”€β”€ EmbeddingService.cs [Embedding Generation + Math] +β”‚ β”œβ”€β”€ ChunkingService.cs [Text Splitting] +β”‚ β”œβ”€β”€ ArticleService.cs [Content Extraction] +β”‚ β”œβ”€β”€ RateLimiter.cs [Concurrency Control] +β”‚ └── StatusReporter.cs [Progress Display] +β”œβ”€β”€ Models/ +β”‚ β”œβ”€β”€ OpenQueryOptions.cs [CLI Options Record] +β”‚ β”œβ”€β”€ Chunk.cs [Content + Metadata] +β”‚ β”œβ”€β”€ ParallelOptions.cs [Concurrency Settings] +β”‚ β”œβ”€β”€ OpenRouter.cs [API DTOs] +β”‚ β”œβ”€β”€ Searxng.cs [Search Result DTOs] +β”‚ └── JsonContexts.cs [JSON Context] +└── ConfigManager.cs [Configuration Persistence] +``` + +## Core Components + +### 1. Program.cs + +**Type**: Console Application Entry Point +**Responsibilities**: CLI parsing, dependency wiring, error handling + +**Key Elements**: +- `RootCommand` from System.CommandLine +- Options: `--chunks`, `--results`, `--queries`, `--short`, `--long`, `--verbose` +- Subcommand: `configure` (with interactive mode) +- Configuration loading via `ConfigManager.Load()` +- Environment variable resolution +- Service instantiation and coordination +- Top-level try-catch for error reporting + +**Code Flow**: +1. Load config file +2. Define CLI options and commands +3. Set handler for root command +4. Handler: resolve API key/model β†’ instantiate services β†’ call `OpenQueryApp.RunAsync()` +5. Set handler for configure command (writes config file) +6. Invoke command parser: `await rootCommand.InvokeAsync(args)` + +**Exit Codes**: +- 0 = success +- 1 = error + +### 2. OpenQueryApp (OpenQuery.cs) + +**Type**: Main Application Class +**Responsibilities**: Workflow orchestration, query generation, answer streaming + +**Constructor Parameters**: +- `OpenRouterClient client` - for query gen and final answer +- `SearchTool searchTool` - for search-retrieve-rank pipeline +- `string model` - LLM model identifier + +**Main Method**: `RunAsync(OpenQueryOptions options)` + +**Workflow Steps**: +1. Create `StatusReporter` (for progress UI) +2. **Optional Query Generation** (if `options.Queries > 1`): + - Create system message instructing JSON array output + - Create user message with `options.Question` + - Call `client.CompleteAsync()` with query gen model + - Parse JSON response; fall back to original question on failure + - Result: `List<string> queries` (1 or many) +3. **Execute Search Pipeline**: + - Call `_searchTool.ExecuteAsync()` with queries, options + - Receive `string context` (formatted context with source citations) + - Progress reported via callback to `StatusReporter` +4. **Generate Final Answer**: + - Build system prompt (append "short" or "long" modifier) + - Create user message with `Context:\n{context}\n\nQuestion: {options.Question}` + - Stream answer via `client.StreamAsync()` + - Write each `chunk.TextDelta` to Console as it arrives + - Stop spinner on first chunk, continue streaming +5. Dispose reporter + +**Error Handling**: +- Exceptions propagate to `Program.cs` top-level handler +- `HttpRequestException` vs generic `Exception` + +**Note**: Query generation uses the same model as final answer; could be separated for cost/performance. + +### 3. SearchTool (Tools/SearchTool.cs) + +**Type**: Pipeline Orchestrator +**Responsibilities**: Execute 4-phase search-retrieve-rank-return workflow + +**Constructor Parameters**: +- `SearxngClient searxngClient` +- `EmbeddingService embeddingService` + +**Main Method**: `ExecuteAsync(originalQuery, generatedQueries, maxResults, topChunksLimit, onProgress, verbose)` + +**Returns**: `Task<string>` - formatted context string with source citations + +**Pipeline Phases**: + +#### Phase 1: ExecuteParallelSearchesAsync +- Parallelize `searxngClient.SearchAsync(query, maxResults)` for each query +- Collect all results in `ConcurrentBag<SearxngResult>` +- Deduplicate by `DistinctBy(r => r.Url)` + +**Output**: `List<SearxngResult>` (aggregated, unique) + +#### Phase 2: ExecuteParallelArticleFetchingAsync +- Semaphore: `MaxConcurrentArticleFetches` (default 10) +- For each `SearxngResult`: fetch URL via `ArticleService.FetchArticleAsync()` +- Extract article text, title +- Chunk via `ChunkingService.ChunkText(article.TextContent)` +- Add each chunk as new `Chunk(content, url, title)` + +**Output**: `List<Chunk>` (potentially 50-100 chunks) + +#### Phase 3: ExecuteParallelEmbeddingsAsync +- Start two parallel tasks: + 1. Query embedding: `embeddingService.GetEmbeddingAsync(originalQuery)` + 2. Chunk embeddings: `embeddingService.GetEmbeddingsWithRateLimitAsync(chunkTexts, onProgress)` +- `Parallel.ForEachAsync` with `MaxConcurrentEmbeddingRequests` (default 4) +- Batch size: 300 chunks per embedding API call +- Filter chunks with empty embeddings (failed batches) + +**Output**: `(float[] queryEmbedding, float[][] chunkEmbeddings)` + +#### Phase 4: RankAndSelectTopChunks +- Calculate cosine similarity for each chunk vs query +- Assign `chunk.Score` +- Order by descending score +- Take `topChunksLimit` (from `--chunks` option) +- Return `List<Chunk>` (top N) + +**Formatting**: +```csharp +string context = string.Join("\n\n", topChunks.Select((c, i) => + $"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}")); +``` + +**Progress Callbacks**: Invoked at each major step for UI feedback + +## Services + +### OpenRouterClient + +**Purpose**: HTTP client for OpenRouter API (chat completions + embeddings) + +**Base URL**: `https://openrouter.ai/api/v1` + +**Authentication**: `Authorization: Bearer {apiKey}` + +**Methods**: + +#### `StreamAsync(ChatCompletionRequest request, CancellationToken)` +- Sets `request.Stream = true` +- POST to `/chat/completions` +- Reads SSE stream line-by-line +- Parses `data: {json}` chunks +- Yields `StreamChunk` (text delta or tool call) +- Supports cancellation + +#### `CompleteAsync(ChatCompletionRequest request)` +- Sets `request.Stream = false` +- POST to `/chat/completions` +- Deserializes full response +- Returns `ChatCompletionResponse` + +#### `EmbedAsync(string model, List<string> inputs)` +- POST to `/embeddings` +- Returns `float[][]` (ordered by input index) + +**Error Handling**: `EnsureSuccessStatusCode()` throws `HttpRequestException` on failure + +**Design**: Thin wrapper; no retry logic (delegated to EmbeddingService) + +### SearxngClient + +**Purpose**: HTTP client for SearxNG metasearch + +**Base URL**: Configurable (default `http://localhost:8002`) + +**Methods**: + +#### `SearchAsync(string query, int limit = 10)` +- GET `{baseUrl}/search?q={query}&format=json` +- Deserializes to `SearxngRoot` +- Returns `Results.Take(limit).ToList()` +- On failure: returns empty `List<SearxngResult>` (no exception) + +**Design**: Very simple; failures are tolerated (OpenQuery continues with other queries) + +### EmbeddingService + +**Purpose**: Batch embedding generation with rate limiting, parallelization, and retries + +**Configuration** (from `ParallelProcessingOptions`): +- `MaxConcurrentEmbeddingRequests` = 4 +- `EmbeddingBatchSize` = 300 + +**Default Embedding Model**: `openai/text-embedding-3-small` + +**Methods**: + +#### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)` +- Splits `texts` into batches of `EmbeddingBatchSize` +- Parallelizes batches with `Parallel.ForEachAsync` + `MaxConcurrentEmbeddingRequests` +- Each batch: rate-limited + retry-wrapped `client.EmbedAsync(model, batch)` +- Collects results in order (by batch index) +- Returns `float[][]` (same order as input texts) +- Failed batches return empty `float[]` for each text + +#### `GetEmbeddingAsync(string text, CancellationToken)` +- Wraps single-text call in rate limiter + retry +- Returns `float[]` + +#### `CosineSimilarity(float[] v1, float[] v2)` +- Static method using `TensorPrimitives.CosineSimilarity` +- Returns float between -1 and 1 (typically 0-1 for normalized embeddings) + +**Retry Policy** (Polly): +- Max 3 attempts +- 1s base delay, exponential backoff +- Only `HttpRequestException` + +**Rate Limiting**: `RateLimiter` semaphore with `MaxConcurrentEmbeddingRequests` + +**Design Notes**: +- Two similar methods (`GetEmbeddingsAsync` and `GetEmbeddingsWithRateLimitAsync`) - could be consolidated +- Uses Polly for resilience (good pattern) +- Concurrency control prevents overwhelming OpenRouter + +### ChunkingService + +**Purpose**: Split long text into manageable pieces + +**Static Class** (no dependencies, pure function) + +**Algorithm** (in `ChunkText(string text)`): +- Constant `MAX_CHUNK_SIZE = 500` +- While remaining text: + - Take up to 500 chars + - If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']` + - Trim and add non-empty chunk + - Advance start position + +**Rationale**: 500 chars is a sweet spot for embeddings - long enough for context, short enough for semantic coherence. + +**Edge Cases**: Handles text shorter than 500 chars, empty text, text with no natural breaks. + +### ArticleService + +**Purpose**: Extract clean article content from URLs + +**Method**: `FetchArticleAsync(string url)` + +**Implementation**: Delegates to `SmartReader.ParseArticleAsync(url)` + +**Returns**: `Article` object (from SmartReader) +- `Title` (string) +- `TextContent` (string) - cleaned article body +- `IsReadable` (bool) - quality indicator +- Other metadata (author, date, etc.) + +**Error Handling**: Exceptions propagate (handled by `SearchTool`) + +**Design**: Thin wrapper around third-party library. Could be extended to add caching, custom extraction rules, etc. + +### RateLimiter + +**Purpose**: Limit concurrent operations via semaphore + +**Interface**: +```csharp +public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken); +public async Task ExecuteAsync(Func<Task> action, CancellationToken); +``` + +**Implementation**: `SemaphoreSlim` with `WaitAsync` and `Release` + +**Disposal**: `IAsyncDisposable` (awaits semaphore disposal) + +**Usage**: Wrap API calls that need concurrency control +```csharp +var result = await _rateLimiter.ExecuteAsync(async () => + await _client.EmbedAsync(model, batch), cancellationToken); +``` + +**Design**: Simple, reusable. Could be replaced with `Polly.RateLimiting` policy but this is lightweight. + +### StatusReporter + +**Purpose**: Real-time progress UI with spinner and verbose modes + +**Architecture**: +- Producer: UpdateStatus(text) β†’ writes to `Channel<string>` +- Consumer: Background task `ProcessStatusUpdatesAsync()` reads from channel +- Spinner: Separate task animates Braille characters every 100ms + +**Modes**: + +**Verbose Mode** (`_verbose = true`): +- All progress messages written as `Console.WriteLine()` +- No spinner +- Full audit trail + +**Compact Mode** (default): +- Status line with spinner (overwrites same line) +- Only latest status visible +- Example: `β ‹ Fetching articles 3/10...` + +**Key Methods**: +- `UpdateStatus(message)` - fire-and-forget, non-blocking +- `WriteLine(text)` - stops spinner temporarily, writes full line +- `StartSpinner()` / `StopSpinner()` - manual control +- `ClearStatus()` - ANSI escape `\r\x1b[K` to clear line +- `Dispose()` - completes channel, waits for background tasks + +**Spinner Chars**: `['β ‹', 'β ™', 'β Ή', 'β Έ', 'β Ό', 'β ΄', 'β ¦', 'β §', 'β ‡', '⠏']` (Braille patterns, smooth animation) + +**ANSI Codes**: `\r` (carriage return), `\x1b[K` (erase to end of line) + +**Thread Safety**: Channel is thread-safe; multiple components can write concurrently without locks + +**Design**: Well-encapsulated; could be reused in other CLI projects. + +### ConfigManager + +**Purpose**: Load/save configuration from XDG-compliant location + +**Config Path**: +- `Environment.SpecialFolder.UserProfile` β†’ `~/.config/openquery/config` + +**Schema** (`AppConfig`): +```csharp +public class AppConfig +{ + public string ApiKey { get; set; } = ""; + public string Model { get; set; } = "qwen/qwen3.5-flash-02-23"; + public int DefaultQueries { get; set; } = 3; + public int DefaultChunks { get; set; } = 3; + public int DefaultResults { get; set; } = 5; +} +``` + +**Format**: Simple `key=value` (no INI parser, manual line split) + +**Methods**: +- `Load()` β†’ reads file if exists, returns `AppConfig` (with defaults) +- `Save(AppConfig)` β†’ writes all 5 keys, overwrites existing + +**Design**: +- Static class (no instances) +- Creates directory if missing +- No validation (writes whatever values given) +- Could be improved with JSON format (but keep simple) + +## Data Models + +### OpenQueryOptions + +**Location**: `Models/OpenQueryOptions.cs` + +**Type**: `record` + +**Purpose**: Immutable options object passed through workflow + +**Properties**: +- `int Chunks` - top N chunks for context +- `int Results` - search results per query +- `int Queries` - number of expanded queries to generate +- `bool Short` - concise answer flag +- `bool Long` - detailed answer flag +- `bool Verbose` - verbose logging flag +- `string Question` - original user question + +**Created**: In `Program.cs` from CLI options + config defaults + +**Used By**: `OpenQueryApp.RunAsync()` + +### Chunk + +**Location**: `Models/Chunk.cs` + +**Type**: `record` + +**Purpose**: Content chunk with metadata and embedding + +**Properties**: +- `string Content` - extracted text (~500 chars) +- `string SourceUrl` - article URL +- `string? Title` - article title (nullable) +- `float[]? Embedding` - vector embedding (populated by EmbeddingService) +- `float Score` - relevance score (populated during ranking) + +**Lifecycle**: +1. Instantiated in `SearchTool.ExecuteParallelArticleFetchingAsync` with content, url, title +2. `Embedding` set in `ExecuteParallelEmbeddingsAsync` after batch processing +3. `Score` set in `RankAndSelectTopChunks` after cosine similarity +4. Serialized into context string for final answer + +**Equality**: Records provide value equality (based on all properties) + +### ParallelProcessingOptions + +**Location**: `Models/ParallelOptions.cs` + +**Type**: `class` (mutable) + +**Purpose**: Concurrency settings for parallel operations + +**Properties** (with defaults): +- `MaxConcurrentArticleFetches` = 10 +- `MaxConcurrentEmbeddingRequests` = 4 +- `EmbeddingBatchSize` = 300 + +**Used By**: `EmbeddingService` (for embeddings), `SearchTool` (for article fetching) + +**Currently**: Hardcoded in `SearchTool` constructor; could be made configurable + +### OpenRouter Models (Models/OpenRouter.cs) + +**Purpose**: DTOs for OpenRouter API (JSON serializable) + +**Chat Completion**: +- `ChatCompletionRequest` (model, messages, tools, stream) +- `ChatCompletionResponse` (choices[], usage[]) +- `Message` (role, content, tool_calls, tool_call_id) +- `ToolDefinition`, `ToolFunction`, `ToolCall`, `FunctionCall` +- `Choice`, `Usage` + +**Embedding**: +- `EmbeddingRequest` (model, input[]) +- `EmbeddingResponse` (data[], usage) +- `EmbeddingData` (embedding[], index) + +**Streaming**: +- `StreamChunk` (TextDelta, Tool) +- `ChatCompletionChunk`, `ChunkChoice`, `ChunkDelta` + +**JSON Properties**: Uses `[JsonPropertyName]` to match API + +**Serialization**: System.Text.Json with source generation (AppJsonContext) + +### Searxng Models (Models/Searxng.cs) + +**Purpose**: DTOs for SearxNG search results + +**Records**: +- `SearxngRoot` with `List<SearxngResult> Results` +- `SearxngResult` with `Title`, `Url`, `Content` (snippet) + +**Usage**: Deserialized from SearxNG's JSON response + +### JsonContexts + +**Location**: `Models/JsonContexts.cs` + +**Purpose**: Source-generated JSON serializer context for AOT compatibility + +**Pattern**: +```csharp +[JsonSerializable(typeof(ChatCompletionRequest))] +[JsonSerializable(typeof(ChatCompletionResponse))] +... etc ... +internal partial class AppJsonContext : JsonSerializerContext +{ +} +``` + +**Generated**: Partial class compiled by source generator + +**Used By**: All `JsonSerializer.Serialize/Deserialize` calls with `AppJsonContext.Default.{Type}` + +**Benefits**: +- AOT-compatible (no reflection) +- Faster serialization (compiled delegates) +- Smaller binary (trimming-safe) + +## Component Interactions + +### Dependencies Graph + +``` +Program.cs +β”œβ”€β”€ ConfigManager (load/save) +β”œβ”€β”€ OpenRouterClient ──┐ +β”œβ”€β”€ SearxngClient ────── +β”œβ”€β”€ EmbeddingService ─── +└── SearchTool ───────── + β”‚ +OpenQueryApp β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β”œβ”€β”€ OpenRouterClient (query gen + answer streaming) + β”œβ”€β”€ SearchTool (pipeline) + β”‚ β”œβ”€β”€ SearxngClient (searches) + β”‚ β”œβ”€β”€ ArticleService (fetch) + β”‚ β”œβ”€β”€ ChunkingService (split) + β”‚ β”œβ”€β”€ EmbeddingService (embeddings) + β”‚ β”œβ”€β”€ RateLimiter (concurrency) + β”‚ └── StatusReporter (progress via callback) + └── StatusReporter (UI) +``` + +### Data Flow Between Components + +``` +OpenQueryOptions + ↓ +OpenQueryApp + β”œβ”€ Query Generation + β”‚ └─ OpenRouterClient.CompleteAsync() + β”‚ β†’ List<string> generatedQueries + β”‚ + β”œβ”€ Search Pipeline + β”‚ └─ SearchTool.ExecuteAsync(originalQuery, generatedQueries, ...) + β”‚ ↓ + β”‚ Phase 1: SearxngClient.SearchAsync(query) Γ— N + β”‚ β†’ ConcurrentBag<SearxngResult> + β”‚ β†’ List<SearxngResult> (unique) + β”‚ ↓ + β”‚ Phase 2: ArticleService.FetchArticleAsync(url) Γ— M + β”‚ β†’ ChunkingService.ChunkText(article.TextContent) + β”‚ β†’ ConcurrentBag<Chunk> (content, url, title) + β”‚ ↓ + β”‚ Phase 3: EmbeddingService.GetEmbeddingsAsync(chunkContents) + β”‚ β†’ (queryEmbedding, chunkEmbeddings) + β”‚ ↓ + β”‚ Phase 4: CosineSimilarity + Rank + β”‚ β†’ List<Chunk> topChunks (with Score, Embedding set) + β”‚ ↓ + β”‚ Format: context string with [Source N: Title](Url) + β”‚ β†’ return context string + β”‚ + └─ Final Answer + └─ OpenRouterClient.StreamAsync(prompt with context) + β†’ stream deltas to Console +``` + +### Interface Contracts + +**SearchTool β†’ Progress**: +```csharp +// Invoked as: onProgress?.Invoke("[Fetching article 1/10: example.com]") +Action<string>? onProgress +``` + +**StatusReporter ← Progress**: +```csharp +// Handler in OpenQueryApp: +(progress) => { + if (options.Verbose) reporter.WriteLine(progress); + else reporter.UpdateStatus(parsedShorterMessage); +} +``` + +**SearchTool β†’ ArticleService**: +```csharp +Article article = await ArticleService.FetchArticleAsync(url); +``` + +**SearchTool β†’ EmbeddingService**: +```csharp +(float[] queryEmbedding, float[][] chunkEmbeddings) = await ExecuteParallelEmbeddingsAsync(...); +// Also: embeddingService.GetEmbeddingAsync(text), GetEmbeddingsWithRateLimitAsync(...) +``` + +**SearchTool β†’ ChunkingService**: +```csharp +List<string> chunks = ChunkingService.ChunkText(article.TextContent); +``` + +**SearchTool β†’ RateLimiter**: +```csharp +await _rateLimiter.ExecuteAsync(async () => await _client.EmbedAsync(...), ct); +``` + +--- + +## Next Steps + +- [OpenQueryApp](openquery-app.md) - Main orchestrator details +- [SearchTool](search-tool.md) - Pipeline implementation +- [Services](services.md) - All service classes documented +- [Models](models.md) - Complete data model reference diff --git a/docs/components/search-tool.md b/docs/components/search-tool.md new file mode 100644 index 0000000..3cd0157 --- /dev/null +++ b/docs/components/search-tool.md @@ -0,0 +1,555 @@ +# SearchTool Component + +Deep dive into `SearchTool` - the core pipeline orchestrator that implements the 4-phase search-retrieve-rank workflow. + +## Overview + +`SearchTool` is the workhorse of OpenQuery. It Takes search queries, fetches articles, generates embeddings, ranks by relevance, and returns formatted context for the final AI answer. + +## Location +`Tools/SearchTool.cs` + +## Class Definition + +```csharp +public class SearchTool +{ + private readonly SearxngClient _searxngClient; + private readonly EmbeddingService _embeddingService; + private readonly ParallelProcessingOptions _options; + + public static string Name => "search"; + public static string Description => "Search the web for information on a topic"; + + public SearchTool( + SearxngClient searxngClient, + EmbeddingService embeddingService); + + public Task<string> ExecuteAsync( + string originalQuery, + List<string> generatedQueries, + int maxResults, + int topChunksLimit, + Action<string>? onProgress = null, + bool verbose = true); +} +``` + +**Dependencies**: +- `SearxngClient` - for web searches +- `EmbeddingService` - for vector generation +- `ParallelProcessingOptions` - concurrency settings (hardcoded new instance) + +**Static Properties**: +- `Name` - tool identifier (currently "search") +- `Description` - tool description + +## ExecuteAsync Method + +**Signature**: +```csharp +public async Task<string> ExecuteAsync( + string originalQuery, // User's original question + List<string> generatedQueries, // Expanded search queries + int maxResults, // Results per query + int topChunksLimit, // Top N chunks to return + Action<string>? onProgress, // Progress callback + bool verbose) // Verbose mode flag +``` + +**Returns**: `Task<string>` - formatted context with source citations + +**Contract**: +- Never returns `null` (returns "No search results found." on zero results) +- Progress callback may be invoked frequently (many phases) +- `verbose` passed to sub-components for their own logging + +## The 4-Phase Pipeline + +``` +ExecuteAsync() + β”‚ + β”œβ”€ Phase 1: ExecuteParallelSearchesAsync + β”‚ Input: generatedQueries Γ— maxResults + β”‚ Output: List<SearxngResult> (deduplicated) + β”‚ + β”œβ”€ Phase 2: ExecuteParallelArticleFetchingAsync + β”‚ Input: List<SearxngResult> + β”‚ Output: List<Chunk> (with content, url, title) + β”‚ + β”œβ”€ Phase 3: ExecuteParallelEmbeddingsAsync + β”‚ Input: originalQuery + List<Chunk> + β”‚ Output: (queryEmbedding, chunkEmbeddings) + β”‚ (also sets Chunk.Embedding for valid chunks) + β”‚ + β”œβ”€ Phase 4: RankAndSelectTopChunks + β”‚ Input: List<Chunk> + queryEmbedding + chunkEmbeddings + β”‚ Output: List<Chunk> topChunks (with Score set) + β”‚ + └─ Format Context β†’ return string +``` + +### Phase 1: ExecuteParallelSearchesAsync + +**Purpose**: Execute all search queries in parallel, collect and deduplicate results. + +**Implementation**: +```csharp +var allResults = new ConcurrentBag<SearxngResult>(); + +var searchTasks = generatedQueries.Select(async query => +{ + onProgress?.Invoke($"[Searching web for '{query}'...]"); + try + { + var results = await _searsult in results) + { + allResults.Add(result); + } + } + catch (Exception ex) + { + if (verbose) + Console.WriteLine($"Warning: Search failed for query '{query}': {ex.Message}"); + } +}); + +await Task.WhenAll(searchTasks); + +var uniqueResults = allResults.DistinctBy(r => r.Url).ToList(); +return uniqueResults; +``` + +**Details**: +- `ConcurrentBag<SearxngResult>` collects results thread-safely +- `Task.WhenAll` - unbounded parallelism (parallel to `generatedQueries.Count`) +- Each task: calls `_searxngClient.SearchAsync(query, maxResults)` +- Errors caught and logged (verbose only); other queries continue +- `DistinctBy(r => r.Url)` removes duplicates + +**Return**: `List<SearxngResult>` (unique URLs only) + +**Progress**: `[Searching web for '{query}'...]` + +**Potential Issues**: +- Could overwhelm local SearxNG if `generatedQueries` is large (100+) +- SearxNG itself may have its own rate limiting + +**Future Enhancement**: +- Add semaphore to limit search concurrency +- Add timeout per search task +- Cache search results (same query across runs) + +### Phase 2: ExecuteParallelArticleFetchingAsync + +**Purpose**: Fetch each search result URL, extract article content, split into chunks. + +**Implementation**: +```csharp +var chunks = new ConcurrentBag<Chunk>(); +var completedFetches = 0; +var totalFetches = searchResults.Count; +var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10 + +var fetchTasks = searchResults.Select(async result => +{ + await semaphore.WaitAsync(); + try + { + var current = Interlocked.Increment(ref completedFetches); + var uri = new Uri(result.Url); + var domain = uri.Host; + onProgress?.Invoke($"[Fetching article {current}/{totalFetches}: {domain}]"); + + try + { + var article = await ArticleService.FetchArticleAsync(result.Url); + if (!article.IsReadable || string.IsNullOrEmpty(article.TextContent)) + return; + + var textChunks = ChunkingService.ChunkText(article.TextContent); + foreach (var chunkText in textChunks) + { + chunks.Add(new Chunk(chunkText, result.Url, article.Title)); + } + } + catch (Exception ex) + { + if (verbose) + Console.WriteLine($"Warning: Failed to fetch article {result.Url}: {ex.Message}"); + } + } + finally + { + semaphore.Release(); + } +}); + +await Task.WhenAll(fetchTasks); +return chunks.ToList(); +``` + +**Details**: +- `SemaphoreSlim` limits concurrency to `MaxConcurrentArticleFetches` (10) +- `Interlocked.Increment` for thread-safe progress counting +- Progress: `[Fetching article X/Y: domain]` (extracts host from URL) +- `ArticleService.FetchArticleAsync` uses SmartReader +- Article must be `IsReadable` and have `TextContent` +- `ChunkingService.ChunkText` splits into ~500-char pieces +- Each chunk becomes a `Chunk(content, url, article.Title)` +- Errors logged (verbose only); failed URLs yield no chunks + +**Return**: `List<Chunk>` (potentially many per article) + +**Chunk Count Estimate**: +- 15 articles Γ— average 3000 chars/article = 45,000 chars +- With 500-char chunks β‰ˆ 90 chunks +- With natural breaks β†’ maybe 70-80 chunks + +**Potential Issues**: +- Some sites block SmartReader (JS-heavy, paywalls) +- Slow article fetches may cause long tail latency +- Large articles create many chunks β†’ memory + embedding cost + +**Future Enhancements**: +- Add per-URL timeout +- Filter chunks by length threshold (skip tiny chunks) +- Deduplicate chunks across articles (same content on different sites) +- Cache article fetches by URL + +### Phase 3: ExecuteParallelEmbeddingsAsync + +**Purpose**: Generate embeddings for the original query and all chunks, with batching, rate limiting, and concurrency control. + +**Implementation**: +```csharp +onProgress?.Invoke($"[Generating embeddings for {chunks.Count} chunks and query...]"); + +// Start query embedding (single) and chunk embeddings (batch) concurrently +var queryEmbeddingTask = _embeddingService.GetEmbeddingAsync(originalQuery); + +var chunkTexts = chunks.Select(c => c.Embedding).ToList(); // WRONG in original code? +// Actually: chunks.Select(c => c.Content).ToList(); +var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync( + chunkTexts, onProgress); + +await Task.WhenAll(queryEmbeddingTask, chunkEmbeddingsTask); + +var queryEmbedding = await queryEmbeddingTask; +var chunkEmbeddings = await chunkEmbeddingsTask; + +// Filter out chunks with empty embeddings +var validChunks = new List<Chunk>(); +var validEmbeddings = new List<float[]>(); + +for (var i = 0; i < chunks.Count; i++) +{ + if (chunkEmbeddings[i].Length > 0) + { + validChunks.Add(chunks[i]); + validEmbeddings.Add(chunkEmbeddings[i]); + } +} + +// Update chunks with embeddings +for (var i = 0; i < validChunks.Count; i++) +{ + validChunks[i].Embedding = validEmbeddings[i]; +} + +return (queryEmbedding, validEmbeddings.ToArray()); +``` + +**Corrected Code** (matching actual source): +```csharp +var chunkTexts = chunks.Select(c => c.Content).ToList(); +var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync( + chunkTexts, onProgress); +``` + +**Details**: +- **Query embedding**: Single request for original question (one embedding) +- **Chunk embeddings**: Batch processing of all chunk texts +- Both run concurrently via `Task.WhenAll` +- `_embeddingService.GetEmbeddingsWithRateLimitAsync` uses: + - Batch size: 300 (default) + - Max concurrent batches: 4 (default) + - Polly retry (3 attempts, exponential backoff) + - `RateLimiter` (semaphore) for API concurrency +- Failed batches return empty `float[]` (length 0) +- Filters out failed chunks (won't be ranked) +- `validChunks[i].Embedding = validEmbeddings[i]` attaches embedding to chunk + +**Return**: `(float[] queryEmbedding, float[][] chunkEmbeddings)` where: +- `chunkEmbeddings` length = `validChunks.Count` (filtered) +- Order matches `validChunks` order (since we filtered parallel arrays) + +**Progress**: Interleaved from embedding service's own progress callbacks (batch X/Y) + +**Potential Issues**: +- `GetEmbeddingsWithRateLimitAsync` uses `results[batchIndex] = ...` which is not thread-safe without synchronization - **BUG**? + - Actually `results` is an array, not a list, so indexing is thread-safe + - But concurrent writes to different indices are safe +- Filtering loop assumes `chunkEmbeddings` has same count as `chunks`; if embedding service returns fewer, might index out of range + - Looking at `GetEmbeddingsWithRateLimitAsync`: returns `results.SelectMany(r => r).ToArray()` which should match input count (including empty arrays for failed batches) + - So safe + +**Memory Consideration**: +- `chunkTexts` list holds all chunk strings (may be large, but still in memory) +- `chunkEmbeddings` holds all float arrays (600KB for 100 chunks) +- Total: modest (~few MB) + +**Future Enhancements**: +- Stream embeddings? (No benefit, need all for ranking) +- Cache embeddings by content hash (cross-run) +- Support different embedding model per query + +### Phase 4: RankAndSelectTopChunks + +**Purpose**: Score chunks by semantic relevance to query, sort, and select top N. + +**Implementation**: +```csharp +var chunksWithEmbeddings = chunks.Where(c => c.Embedding != null).ToList(); + +foreach (var chunk in chunksWithEmbeddings) +{ + chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!); +} + +var topChunks = chunksWithEmbeddings + .OrderByDescending(c => c.Score) + .Take(topChunksLimit) + .ToList(); + +return topChunks; +``` + +**Details**: +- Filters to chunks that have embeddings (successful phase 3) +- For each: `Score = CosineSimilarity(queryEmbedding, chunkEmbedding)` + - Uses `TensorPrimitives.CosineSimilarity` (SIMD-accelerated) + - Returns float typically 0-1 (higher = more relevant) +- `OrderByDescending` - highest scores first +- `Take(topChunksLimit)` - select top N (from `--chunks` option) +- Returns `List<Chunk>` (now with `Score` set) + +**Return**: Top N chunks ready for context formatting + +**Complexity**: +- O(n) for scoring (where n = valid chunks, typically 50-100) +- O(n log n) for sorting (fast for n=100) +- Negligible CPU time + +**Edge Cases**: +- If `topChunksLimit` > `chunksWithEmbeddings.Count`, returns all (no padding) +- If all embeddings failed, returns empty list +- Should handle `topChunksLimit == 0` (returns empty) + +### Context Formatting (After Phase 4) + +**Location**: In `ExecuteAsync`, after ranking: + +```csharp +var context = string.Join("\n\n", topChunks.Select((c, i) => + $"[Source {i + 1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}")); + +return context; +``` + +**Format**: +``` +[Source 1: Article Title](https://example.com/article) +Chunk content text... + +[Source 2: Another Title](https://example.com/another) +Chunk content text... + +[Source 3: Third Title](https://example.com/third) +Chunk content text... +``` + +**Features**: +- Each source numbered 1, 2, 3... (matches order of topChunks = descending relevance) +- Title or "Unknown" if null +- Title is markdown link to original URL +- Chunk content as plain text (may contain its own formatting) +- Double newline between sources + +**Rationale**: +- Markdown links allow copy-pasting to browsers +- Numbers allow LLM to cite `[Source 1]` in answer +- Original title helps user recognize source + +**Potential Issues**: +- LLM might misinterpret "Source 1" as literal citation required +- If chunks contain markdown, may conflict (no escaping) +- Some titles may have markdown special chars (unlikely but possible) + +**Alternative**: Could use XML-style tags or more robust citation format. + +## Error Handling & Edge Cases + +### Empty Results Handling + +At end of `ExecuteAsync`: +```csharp +if (searchResults.Count == 0) + return "No search results found."; + +if (chunks.Count == 0) + return "Found search results but could not extract readable content."; +``` + +These messages appear in final answer (LLM will respond to these contexts). + +### Partial Failures + +- Some search queries fail β†’ proceed with others +- Some articles fail to fetch β†’ continue +- Some embedding batches fail β†’ those chunks filtered out +- Ranking proceeds with whatever valid embeddings exist + +### Verbose vs Compact Progress + +`verbose` parameter affects what's passed to phases: +- **Article fetching**: errors only shown if `verbose` +- **Embeddings**: always shows batch progress via `onProgress` (from EmbeddingService) +- **Searches**: no error suppression (warning always logged to Console, not through callback) + +### Progress Callback Pattern + +`onProgress` is invoked at major milestones: +- Searching: `[Searching web for '{query}'...]` +- Article fetch: `[Fetching article X/Y: domain]` +- Embeddings: `[Generating embeddings: batch X/Y]` +- Final: `[Found top X most relevant chunks overall. Generating answer...]` + +Each phase may invoke many times (e.g., embedding batches). `StatusReporter` handles these appropriately. + +## Performance Characteristics + +### Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles): + +| Phase | Time | Dominated By | +|-------|------|--------------| +| Searches | 3-8s | Network latency to SearxNG | +| Article Fetching | 5-15s | Network + SmartReader CPU | +| Embeddings | 2-4s | OpenRouter API latency (4 concurrent batches) | +| Ranking | <0.1s | CPU (O(n log n) sort, n~100) | +| **Total Pipeline** | **10-30s** | Articles + Searches | + +### Concurrency Limits Effect + +**Article Fetching** (`MaxConcurrentArticleFetches` = 10): +- 15 articles β†’ 2 waves (10 then 5) +- If each takes 2s β†’ ~4s total (vs 30s sequential) + +**Embedding Batching** (`MaxConcurrentEmbeddingRequests` = 4, `EmbeddingBatchSize` = 300): +- 80 chunks β†’ 1 batch of 300 (all fit) +- If 300 chunks β†’ 1 batch (300 fits), but max concurrent = 4 if multiple embedding calls +- Here: single embedding call with 80 items = 1 batch (no parallelism needed) + +### Memory Usage + +- `searchResults` (15 items) β†’ ~30KB +- `chunks` (80 items Γ— 500 chars) β†’ ~40KB text + embeddings ~400KB (80 Γ— 1536 Γ— 4) +- Total β‰ˆ 500KB excluding temporary HTTP buffers + +## Design Decisions + +### Why Use ConcurrentBag for Results/Chunks? + +Thread-safe collection allows parallel tasks to add without locks. Order is not preserved (but we `DistinctBy` and `Select` maintains order of insertion? Actually no, `ConcurrentBag` doesn't guarantee order. But we later `ToList()` and `DistinctBy` preserves first occurrence order from the bag's enumeration (which is nondeterministic). This is acceptable because order doesn't matter (ranking is semantic). If order mattered, would need `ConcurrentQueue` or sorting by source. + +### Why Not Use Parallel.ForEach for Article Fetching? + +We use `Task.WhenAll` with `Select` + semaphore. `Parallel.ForEachAsync` could also work but requires .NET 6+ and we want to use same pattern as other phases. Semaphore gives explicit concurrency control. + +### Why Separate Query Embedding from Chunk Embeddings? + +`GetEmbeddingAsync` is called directly (not batched) because there's only one query. Could be batched with chunks but: +- Query is small (single string) +- Batch API has overhead (request structure) +- Separate call allows independent completion (no need to wait for chunks to start query embedding) + +### Why Two Different Embedding Methods? + +`EmbeddingService` has: +- `GetEmbeddingsWithRateLimitAsync` (used in SearchTool) +- `GetEmbeddingsAsync` (similar but different implementation) + +Probably legacy/refactor artifact. Could consolidate. + +### Why Not Deduplicate URLs Earlier? + +Deduplication happens after search aggregation. Could also deduplicate within each search result (SearxNG might already dedupe across engines). But global dedupe is necessary. + +### Why Not Early Filtering (e.g., by domain, length)? + +Possibly could improve quality: +- Filter by domain reputation +- Filter articles too short (<200 chars) or too long (>50KB) +- Not implemented (keep simple) + +## Testing Considerations + +**Unit Testability**: `SearchTool` is fairly testable with mocks: +- Mock `SearxngClient` to return predetermined results +- Mock `ArticleService` via `EmbeddingService` (or mock that too) +- Verify progress callback invocations +- Verify final context format + +**Integration Testing**: +- End-to-end with real/mocked external services +- Need test SearxNG instance and test OpenRouter key (or mock responses) + +**Performance Testing**: +- Benchmark with different concurrency settings +- Profile memory for large result sets (1000+ articles) +- Measure embedding API latency impact + +## Known Issues + +### Bug in ExecuteParallelEmbeddingsAsync? + +Looking at the actual source code of `ExecuteParallelEmbeddingsAsync` **in the core SearchTool**: + +```csharp +var chunkTexts = chunks.Select(c => c.Content).ToList(); +var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync( + chunkTexts, onProgress); +``` + +This is correct. + +But in the **initial search result**, I notice there might be confusion. I'll verify this when writing the full component documentation. + +### Potential Race Condition in GetEmbeddingsWithRateLimitAsync + +```csharp +results[batchIndex] = batchResults; +``` + +This is writing to an array index from multiple parallel tasks. Array index writes are atomic for reference types on 64-bit? Actually, writes to different indices are safe because they don't overlap. This is fine. + +### Progress Callback May Overwhelm + +If invoked synchronously from many parallel tasks, could saturate the channel. `Channel.TryWrite` will return false if buffer full; we ignore return value. Could drop messages under heavy load. Acceptable for CLI UI (some messages may be lost but overall progress visible). + +## Related Components + +- **[OpenQueryApp](openquery-app.md)** - calls this +- **[SearxngClient](../../services/SearxngClient.md)** - phase 1 +- **[ArticleService](../../services/ArticleService.md)** - phase 2a +- **[ChunkingService](../../services/ChunkingService.md)** - phase 2b +- **[EmbeddingService](../../services/EmbeddingService.md)** - phase 3 +- **[Ranking](../../services/EmbeddingService.md#cosinesimilarity)** - cosine similarity + +--- + +## Next Steps + +- [Services Overview](../services/overview.md) - See supporting services +- [CLI Reference](../../api/cli.md) - How users trigger this pipeline +- [Performance](../performance.md) - Optimize pipeline settings diff --git a/docs/components/services.md b/docs/components/services.md new file mode 100644 index 0000000..c13d7d6 --- /dev/null +++ b/docs/components/services.md @@ -0,0 +1,471 @@ +# Services Overview + +Comprehensive reference for all service classes in OpenQuery. + +## πŸ“‹ Table of Contents + +1. [Service Catalog](#service-catalog) +2. [Client Services](#client-services) +3. [Processing Services](#processing-services) +4. [Infrastructure Services](#infrastructure-services) +5. [Service Interactions](#service-interactions) + +## Service Catalog + +OpenQuery's services are organized into three categories: + +| Category | Services | Purpose | +|-----------|----------|---------| +| **Clients** | `OpenRouterClient`, `SearxngClient` | External API communication | +| **Processors** | `EmbeddingService`, `ChunkingService`, `ArticleService` | Data transformation & extraction | +| **Infrastructure** | `RateLimiter`, `StatusReporter` | Cross-cutting concerns | + +All services are **stateless** (except for internal configuration) and can be safely reused across multiple operations. + +--- + +## Client Services + +### OpenRouterClient + +**Location**: `Services/OpenRouterClient.cs` +**Purpose**: HTTP client for OpenRouter AI APIs (chat completions & embeddings) + +#### API Endpoints + +| Method | Endpoint | Purpose | +|--------|----------|---------| +| POST | `/chat/completions` | Chat completion (streaming or non-streaming) | +| POST | `/embeddings` | Embedding generation for text inputs | + +#### Authentication +``` +Authorization: Bearer {apiKey} +Accept: application/json +``` + +#### Public Methods + +##### `StreamAsync(ChatCompletionRequest request, CancellationToken cancellationToken)` +- **Returns**: `IAsyncEnumerable<StreamChunk>` +- **Behavior**: Sets `request.Stream = true`, posts, reads Server-Sent Events stream +- **Use Case**: Final answer streaming, real-time responses +- **Stream Format**: SSE lines `data: {json}`; yields `TextDelta` or `ToolCall` + +##### `CompleteAsync(ChatCompletionRequest request)` +- **Returns**: `Task<ChatCompletionResponse>` +- **Behavior**: Sets `request.Stream = false`, posts, returns full response +- **Use Case**: Query generation (non-streaming) + +##### `EmbedAsync(string model, List<string> inputs)` +- **Returns**: `Task<float[][]>` +- **Behavior**: POST `/embeddings`, returns array of vectors (ordered by input index) +- **Use Case**: Batch embedding generation + +##### `HttpClient` +- **Property**: Internal `_httpClient` (created per instance) +- **Note**: Could use `IHttpClientFactory` for pooling (not needed for CLI) + +#### Error Handling +- `EnsureSuccessStatusCode()` throws `HttpRequestException` on 4xx/5xx +- No retry logic (handled by `EmbeddingService`) + +#### Configuration +```csharp +public OpenRouterClient(string apiKey) +{ + _apiKey = apiKey; + _httpClient = new HttpClient(); + _httpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey); + _httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json")); +} +``` + +#### Example Usage +```csharp +var client = new OpenRouterClient("sk-or-..."); +var request = new ChatCompletionRequest("model", new List<Message> { ... }); +await foreach (var chunk in client.StreamAsync(request)) +{ + Console.Write(chunk.TextDelta); +} +``` + +--- + +### SearxngClient + +**Location**: `Services/SearxngClient.cs` +**Purpose**: HTTP client for SearxNG metasearch engine + +#### API Endpoint +``` +GET /search?q={query}&format=json +``` + +#### Constructor +```csharp +public SearxngClient(string baseUrl) // e.g., "http://localhost:8002" +``` +- `baseUrl` trimmed of trailing `/` + +#### Public Methods + +##### `SearchAsync(string query, int limit = 10)` +- **Returns**: `Task<List<SearxngResult>>` +- **Behavior**: GET request, deserialize JSON, take up to `limit` results +- **On Failure**: Returns empty `List<SearxngResult>` (no exception) + +#### Error Handling +- `response.EnsureSuccessStatusCode()` would throw, but code doesn't call it +- If invalid JSON or missing `Results`, returns empty list +- Failures are **tolerated** - individual search queries may fail without aborting whole operation + +#### Example Searxng Response +```json +{ + "results": [ + { + "title": "Quantum Entanglement - Wikipedia", + "url": "https://en.wikipedia.org/wiki/Quantum_entanglement", + "content": "Quantum entanglement is a physical phenomenon..." + }, + ... + ] +} +``` + +--- + +## Processing Services + +### EmbeddingService + +**Location**: `Services/EmbeddingService.cs` +**Purpose**: Generate embeddings with batching, rate limiting, and retry logic + +#### Configuration + +**Embedding Model**: `openai/text-embedding-3-small` (default, configurable via constructor) + +**ParallelProcessingOptions** (hardcoded defaults): +```csharp +public class ParallelProcessingOptions +{ + public int MaxConcurrentEmbeddingRequests { get; set; } = 4; + public int EmbeddingBatchSize { get; set; } = 300; +} +``` + +#### Public Methods + +##### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)` +- **Returns**: `Task<float[][]>` +- **Behavior**: + - Splits `texts` into batches of `EmbeddingBatchSize` + - Parallel executes batches (max `MaxConcurrentEmbeddingRequests` concurrent) + - Each batch: rate-limited, retry-wrapped `client.EmbedAsync(model, batch)` + - Reassembles in original order + - Failed batches β†’ empty `float[]` for each text +- **Progress**: Invokes `onProgress` for each batch: `"[Generating embeddings: batch X/Y]"` +- **Thread-Safe**: Uses lock for collecting results + +##### `GetEmbeddingAsync(string text, CancellationToken)` +- **Returns**: `Task<float[]>` +- **Behavior**: Single embedding with rate limiting and retry +- **Use Case**: Query embedding + +##### `Cos static float CosineSimilarity(float[] vector1, float[] vector2) +``` +Uses `System.Numerics.Tensors.TensorPrimitives.CosineSimilarity` + +Returns float between -1 and 1 (typically 0-1 for normalized embeddings) +``` + +**Implementation**: Single line calling SIMD-accelerated tensor primitive + +--- + +### ArticleService + +**Location**: `Services/ArticleService.cs` +**Purpose**: Extract clean article content from web URLs + +#### Public Methods + +##### `FetchArticleAsync(string url)` +- **Returns**: `Task<Article>` +- **Behavior**: Delegates to `SmartReader.ParseArticleAsync(url)` +- **Result**: `Article` with `Title`, `TextContent`, `IsReadable`, and metadata + +#### Errors +- Propagates exceptions (SmartReader may throw on network failures, malformed HTML) +- `SearchTool` catches and logs + +#### SmartReader Notes +- Open-source article extraction library (bundled via NuGet) +- Uses Readability algorithm (similar to Firefox Reader View) +- Removes ads, navigation, boilerplate +- `IsReadable` indicates quality (e.g., not a 404 page, not too short) + +--- + +### ChunkingService + +**Location**: `Services/ChunkingService.cs` +**Purpose**: Split text into 500-character chunks at natural boundaries + +#### Public Methods + +##### `ChunkText(string text)` +- **Returns**: `List<string>` +- **Algorithm**: + - Constant `MAX_CHUNK_SIZE = 500` + - While remaining text: + - Take up to 500 chars + - If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']` + - Trim, add if non-empty + - Advance start + - Returns all chunks + +#### Characteristics +- Static class (no instances) +- Pure function (no side effects) +- Zero dependencies +- Handles edge cases (empty text, short text, text without breaks) + +--- + +## Infrastructure Services + +### RateLimiter + +**Location**: `Services/RateLimiter.cs` +**Purpose**: Limit concurrent operations using semaphore + +#### Constructor +```csharp +public RateLimiter(int maxConcurrentRequests) +``` +Creates `SemaphoreSlim` with `maxConcurrentRequests` + +#### Public Methods + +##### `ExecuteAsync<T>(Func<Task<T>> action, CancellationToken)` +```csharp +public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken cancellationToken = default) +{ + await _semaphore.WaitAsync(cancellationToken); + try + { + return await action(); + } + finally + { + _semaphore.Release(); + } +} +``` +- Waits for semaphore slot +- Executes `action` (typically an API call) +- Releases semaphore (even if exception) +- Returns result from `action` + +##### `ExecuteAsync(Func<Task> action, CancellationToken)` +- Non-generic version (for void-returning actions) + +#### Disposal +```csharp +public async ValueTask DisposeAsync() +{ + _semaphore.Dispose(); +} +``` +Implements `IAsyncDisposable` for async cleanup + +#### Usage Pattern +```csharp +var result = await _rateLimiter.ExecuteAsync(async () => +{ + return await SomeApiCall(); +}, cancellationToken); +``` + +#### Where Used +- `EmbeddingService`: Limits concurrent embedding batch requests (default 4) + +--- + +### StatusReporter + +**Location**: `Services/StatusReporter.cs` +**Purpose**: Real-time progress display with spinner (compact) or verbose lines + +#### Constructor +```csharp +public StatusReporter(bool verbose) +``` +- `verbose = true`: all progress via `WriteLine()` (no spinner) +- `verbose = false`: spinner with latest status + +#### Architecture + +**Components**: +- `Channel<string> _statusChannel` - producer-consumer queue +- `Task _statusProcessor` - background task reading from channel +- `CancellationTokenSource _spinnerCts` - spinner task cancellation +- `Task _spinnerTask` - spinner animation task +- `char[] _spinnerChars` - Braille spinner pattern + +**Spinner Animation**: +- Runs at 10 FPS (100ms interval) +- Cycles through `['β ‹','β ™','β Ή','β Έ','β Ό','β ΄','β ¦','β §','β ‡','⠏']` +- Displays: `β ‹ Fetching articles...` +- Updates in place using ANSI: `\r\x1b[K` (carriage return + erase line) + +#### Public Methods + +##### `UpdateStatus(string message)` +- Fire-and-forget: writes to channel via `TryWrite` (non-blocking) +- If channel full, message dropped (acceptable loss for UI) + +##### `WriteLine(string text)` +- Stops spinner temporarily +- Clears current status line +- Writes `text` with newline +- In verbose mode: just `Console.WriteLine(text)` + +##### `ClearStatus()` +- In compact mode: `Console.Write("\r\x1b[K")` (erase line) +- In verbose: no-op +- Sets `_currentMessage = null` + +##### `StartSpinner()` / `StopSpinner()` +- Manual control (usually `StartSpinner` constructor call, `StopSpinner` by `Dispose`) + +##### `Dispose()` +- Completes channel writer +- Awaits `_statusProcessor` completion +- Calls `StopSpinner()` + +#### Background Processing + +**Status Processor**: +```csharp +private async Task ProcessStatusUpdatesAsync() +{ + await foreach (var message in _statusChannel.Reader.ReadAllAsync()) + { + if (_verbose) + { + Console.WriteLine(message); + continue; + } + Console.Write("\r\x1b[K"); // Clear line + Console.Write($"{_spinnerChars[0]} {message}"); // Static spinner + _currentMessage = message; + } +} +``` + +**Spinner Task**: +```csharp +_spinnerTask = Task.Run(async () => +{ + while (_spinnerCts is { Token.IsCancellationRequested: false }) + { + if (_currentMessage != null) + { + Console.Write("\r\x1b[K"); + var charIndex = index++ % spinner.Length; + Console.Write($"{spinner[charIndex]} {_currentMessage}"); + } + await Task.Delay(100, _spinnerCts.Token); + } +}); +``` + +#### Thread Safety +- `UpdateStatus` (producer) writes to channel +- `ProcessStatusUpdatesAsync` (consumer) reads from channel +- `_spinnerTask` runs concurrently +- All UI writes happen in consumer/spinner task context (single-threaded UI) + +#### Design Notes +- Could be simplified: just use `Console.CursorLeft` for spinner, no channel +- Channel allows random `UpdateStatus` calls from any thread without blocking +- Braille spinner requires terminal that supports Unicode (most modern terminals do) + +--- + +## Service Interactions + +### Dependency Graph + +``` +OpenQueryApp +β”œβ”€β”€ OpenRouterClient ← (used for query gen + final answer) +└── SearchTool + β”œβ”€β”€ SearxngClient + β”œβ”€β”€ ArticleService (uses SmartReader) + β”œβ”€β”€ ChunkingService (static) + β”œβ”€β”€ EmbeddingService + β”‚ └── OpenRouterClient (different instance) + β”‚ └── RateLimiter + └── ParallelProcessingOptions (config) +``` + +### Service Lifetimes + +All services are **transient** (new instance per query execution): +- `OpenRouterClient` β†’ 1 instance for query gen + answer +- `SearxngClient` β†’ 1 instance for all searches +- `EmbeddingService` β†’ 1 instance with its own `OpenRouterClient` and `RateLimiter` +- `SearchTool` β†’ 1 instance per query (constructed in `Program.cs`) + +No singleton or static state (except static utility classes like `ChunkingService`). + +### Data Flow Through Services + +``` +OpenQueryApp + β”‚ + β”œβ”€ OpenRouterClient.CompleteAsync() β†’ query generation + β”‚ Messages β†’ JSON β†’ HTTP request β†’ response β†’ JSON β†’ Messages + β”‚ + └─ SearchTool.ExecuteAsync() + β”‚ + β”œβ”€ SearxngClient.SearchAsync() Γ— N + β”‚ query β†’ URL encode β†’ GET β†’ JSON β†’ SearxngResult[] + β”‚ + β”œβ”€ ArticleService.FetchArticleAsync() Γ— M + β”‚ URL β†’ HTTP GET β†’ SmartReader β†’ Article + β”‚ + β”œβ”€ ChunkingService.ChunkText() Γ— M + β”‚ Article.TextContent β†’ List<string> chunks + β”‚ + β”œβ”€ EmbeddingService.GetEmbeddingAsync(query) + GetEmbeddingsAsync(chunks[]) + β”‚ texts β†’ batches β†’ rate-limited HTTP POST β†’ JSON β†’ float[][] + β”‚ + β”œβ”€ CosineSimilarity(queryEmbedding, chunkEmbedding) Γ— M + β”‚ Vectors β†’ dot product β†’ magnitude β†’ score + β”‚ + └─ return context string (formatted chunks) +``` + +--- + +## Next Steps + +- **[OpenQueryApp](../components/openquery-app.md)** - Orchestrates services +- **[SearchTool](../components/search-tool.md)** - Coordinates pipeline +- **[Models](../components/models.md)** - Data structures passed between services +- **[API Reference](../../api/cli.md)** - CLI that uses these services + +--- + +**Service Design Principles**: +- Single Responsibility: Each service does one thing well +- Stateless: No instance state beyond constructor args +- Composable: Services depend on abstractions (other services) not implementations +- Testable: Can mock dependencies for unit testing diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 0000000..dae589e --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,356 @@ +# Configuration + +Complete guide to configuring OpenQuery for your environment. + +## πŸ“‹ Table of Contents + +1. [Configuration Methods](#configuration-methods) +2. [Configuration File](#configuration-file) +3. [Environment Variables](#environment-variables) +4. [Command-Line Options](#command-line-options) +5. [Configuration Priority](#configuration-priority) +6. [Recommended Settings](#recommended-settings) +7. [Advanced Configuration](#advanced-configuration) + +## Configuration Methods + +OpenQuery can be configured through three methods, which merge together with clear priority: + +| Method | Persistence | Use Case | +|--------|-------------|----------| +| Configuration File | Permanent | Default values you use daily | +| Environment Variables | Session/Shell | CI/CD, scripting, temporary overrides | +| Command-Line Options | Per-execution | One-off customizations | + +## Configuration File + +### Location +OpenQuery follows the [XDG Base Directory](https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html) specification: + +- **Linux/macOS**: `~/.config/openquery/config` +- **Windows**: `%APPDATA%\openquery\config` (e.g., `C:\Users\<user>\AppData\Roaming\openquery\config`) + +### Format +Simple `key=value` pairs, one per line: + +```ini +ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +Model=qwen/qwen3.5-flash-02-23 +DefaultQueries=3 +DefaultChunks=3 +DefaultResults=5 +``` + +### Schema + +| Key | Type | Default | Description | +|-----|------|---------|-------------| +| `ApiKey` | string | "" | OpenRouter API authentication key | +| `Model` | string | `qwen/qwen3.5-flash-02-23` | Default LLM model to use | +| `DefaultQueries` | int | 3 | Number of search queries to generate | +| `DefaultChunks` | int | 3 | Number of top context chunks to include | +| `DefaultResults` | int | 5 | Number of search results per query | + +### Example Configurations + +**Minimal** (just API key): +```ini +ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +``` + +**Optimized for Research**: +```ini +ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +Model=google/gemini-3-flash-preview +DefaultQueries=5 +DefaultChunks=4 +DefaultResults=10 +``` + +**Cost-Conscious**: +```ini +ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +Model=qwen/qwen3.5-flash-02-23 +DefaultQueries=2 +DefaultChunks=2 +DefaultResults=3 +``` + +## Environment Variables + +Environment variables override the configuration file and can be set temporarily or permanently in your shell profile. + +### Available Variables + +| Variable | Purpose | Required | Example | +|----------|---------|----------|---------| +| `OPENROUTER_API_KEY` | OpenRouter API key | **Yes** (unless in config file) | `export OPENROUTER_API_KEY="sk-or-..."` | +| `OPENROUTER_MODEL` | Override default LLM model | No | `export OPENROUTER_MODEL="deepseek/deepseek-v3.2"` | +| `SEARXNG_URL` | URL of SearxNG instance | No (default: `http://localhost:8002`) | `export SEARXNG_URL="https://searx.example.com"` | + +### Setting Environment Variables + +#### Temporary (Current Session) +```bash +# Linux/macOS +export OPENROUTER_API_KEY="sk-or-..." +export SEARXNG_URL="http://localhost:8002" + +# Windows PowerShell +$env:OPENROUTER_API_KEY="sk-or-..." +$env:SEARXNG_URL="http://localhost:8002" +``` + +#### Permanent (Shell Profile) + +**bash** (`~/.bashrc` or `~/.bash_profile`): +```bash +export OPENROUTER_API_KEY="sk-or-..." +export SEARXNG_URL="http://localhost:8002" +``` + +**zsh** (`~/.zshrc`): +```zsh +export OPENROUTER_API_KEY="sk-or-..." +export SEARXNG_URL="http://localhost:8002" +``` + +**fish** (`~/.config/fish/config.fish`): +```fish +set -x OPENROUTER_API_KEY "sk-or-..." +set -x SEARXNG_URL "http://localhost:8002" +``` + +**Windows** (PowerShell profile): +```powershell +[Environment]::SetEnvironmentVariable("OPENROUTER_API_KEY", "sk-or-...", "User") +[Environment]::SetEnvironmentVariable("SEARXNG_URL", "http://localhost:8002", "User") +``` + +After editing profile files, restart your terminal or run `source ~/.bashrc` (or equivalent). + +### Security Note +Never commit your API key to version control. Use environment variables or config file that's in `.gitignore`. The default `.gitignore` already excludes common build directories but doesn't include the config file since it's outside the project directory (`~/.config/`). + +## Command-Line Options + +Options passed directly to the `openquery` command override both config file and environment variables for that specific execution. + +### Main Command Options + +```bash +openquery [OPTIONS] <question> +``` + +| Option | Aliases | Type | Default Source | Description | +|--------|---------|------|----------------|-------------| +| `--chunks` | `-c` | int | Config `DefaultChunks` | Number of top context chunks | +| `--results` | `-r` | int | Config `DefaultResults` | Search results per query | +| ``--queries` | `-q` | int | Config `DefaultQueries` | Number of search queries | +| `--short` | `-s` | bool | false | Request concise answer | +| `--long` | `-l` | bool | false | Request detailed answer | +| `--verbose` | `-v` | bool | false | Show detailed progress | + +### Configure Command Options + +```bash +openquery configure [OPTIONS] +``` + +| Option | Type | Description | +|--------|------|-------------| +| `--interactive` / `-i` | bool | Launch interactive configuration wizard | +| `--key` | string | Set API key | +| `--model` | string | Set default model | +| `--queries` | int? | Set default queries | +| `--chunks` | int? | Set default chunks | +| `--results` | int? | Set default results | + +## Configuration Priority + +When OpenQuery needs a value, it checks sources in this order (highest to lowest priority): + +1. **Command-line option** (if provided) +2. **Environment variable** (if set) +3. **Configuration file** (if key exists) +4. **Hard-coded default** (if all above missing) + +### Examples + +**Example 1**: Environment overrides config +```bash +# config file: DefaultQueries=5 +export OPENROUTER_MODEL="deepseek/deepseek-v3.2" +openquery --queries 2 "question" # Uses: queries=2 (CLI), model=deepseek (env), chunks=3 (config) +``` + +**Example 2**: CLI overrides everything +```bash +export OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23" +openquery --model "google/gemini-3-flash-preview" --chunks 5 "question" +# Uses: model=google (CLI), chunks=5 (CLI), queries=3 (default) +``` + +**Example 3**: All sources combined +```bash +# config: DefaultChunks=4 +# env: OPENROUTER_MODEL="moonshotai/kimi-k2.5", SEARXNG_URL="http://custom:8002" +# CLI: --queries 6 --short +openquery "question" +# Uses: queries=6 (CLI), chunks=4 (config), results=5 (config), +# model=kimi-k2.5 (env), searxng=custom (env), short=true (CLI) +``` + +## Recommended Settings + +### For Quick Questions (Facts, Definitions) +```bash +openquery -q 2 -r 3 -c 2 "What is the capital of France?" +``` +- Few queries (2) for straightforward facts +- Few results (3) to minimize processing +- Few chunks (2) for focused answer + +### For Research (Complex Topics) +```bash +openquery -q 5 -r 10 -c 4 -l "Explain the causes of the French Revolution" +``` +- More queries (5) for diverse perspectives +- More results (10) for comprehensive coverage +- More chunks (4) for rich context +- Long format for depth + +### For Exploration (Broad Topics) +```bash +openquery -q 8 -r 15 -c 5 "What are the latest developments in AI?" +``` +- Many queries (8) to explore different angles +- Many results (15) for breadth +- More chunks (5) for extensive context + +### Cost Optimization +```bash +openquery configure --model "qwen/qwen3.5-flash-02-23" +# Keep defaults: -q 3 -r 5 -c 3 +``` +- Qwen Flash is very cost-effective +- Default parameters provide good balance + +### Performance Optimization +```bash +# Adjust ParallelProcessingOptions in SearchTool.cs if needed +# Default: MaxConcurrentArticleFetches=10, MaxConcurrentEmbeddingRequests=4 +``` +- Reduce these values if you see rate limits or memory pressure +- Increase them if you have fast network/API and want more speed + +## Advanced Configuration + +### Changing Concurrency Limits + +Concurrency limits are currently hardcoded in `SearchTool.cs` but can be adjusted: + +```csharp +public class ParallelProcessingOptions +{ + public int MaxConcurrentArticleFetches { get; set; } = 10; // ← Change this + public int MaxConcurrentEmbeddingRequests { get; set; } = 4; // ← Change this + public int EmbeddingBatchSize { get; set; } = 300; // ← Change this +} +``` + +To make these configurable, you could: +1. Add fields to `AppConfig` +2. Read from config file +3. Pass through to `SearchTool` constructor + +### Custom Embedding Model + +The embedding model is hardcoded to `openai/text-embedding-3-small`. To change: + +Edit the `EmbeddingService` constructor: +```csharp +public EmbeddingService(OpenRouterClient client, string embeddingModel = "your-model") +``` + +Or make it configurable via CLI/config (future enhancement). + +### Changing Chunk Size + +Chunk size (500 chars) is defined in `ChunkingService.cs`: +```csharp +private const int MAX_CHUNK_SIZE = 500; +``` + +Modify this constant to change how articles are split. Larger chunks: +- βœ… More context per chunk +- ❌ Fewer chunks for same article +- ❌ Higher token usage in final answer + +Smaller chunks: +- βœ… More granular matching +- ❌ May lose context across chunk boundaries + +### Using a Custom SearxNG Instance + +Some SearxNG deployments may require HTTPS, authentication, or custom paths: + +```bash +# With authentication (if supported) +export SEARXNG_URL="https://user:pass@searx.example.com:8080" + +# With custom path +export SEARXNG_URL="https://searx.example.com/custom-path" +``` + +Note: Most SearxNG instances don't require auth as they're designed for privacy. + +### OpenRouter Settings + +OpenRouter supports additional parameters (not yet exposed in OpenQuery): + +- `temperature` - Randomness (0-2, default ~1) +- `max_tokens` - Response length limit +- `top_p` - Nucleus sampling +- `frequency_penalty` / `presence_penalty` + +These could be added to `ChatCompletionRequest` in future versions. + +## Managing Multiple Configurations + +You can maintain multiple config files and symlink or set per-project: + +```bash +# Create project-specific config +cp ~/.config/openquery/config ~/myproject/openquery.config + +# Use it temporarily +OPENQUERY_CONFIG=~/myproject/openquery.config openquery "question" +``` + +**Note**: Currently OpenQuery only looks at `~/.config/openquery/config`. Multi-config support would require code changes (reading from `OPENQUERY_CONFIG` env var). + +## Configuration Validation + +OpenQuery doesn't strictly validate config values. Invalid settings may cause runtime errors: + +- `DefaultQueries <= 0` β†’ May cause exceptions or zero queries +- `DefaultChunks <= 0` β†’ May return no context +- `DefaultResults <= 0` β†’ No search results + +Validate manually: +```bash +# Test your config loads +cat ~/.config/openquery/config + +# Test with verbose mode +openquery -v "test" +``` + +--- + +## Next Steps + +- [Usage Guide](usage.md) - Learn how to use the CLI +- [Architecture](architecture.md) - Understand the system design +- [Troubleshooting](troubleshooting.md) - Fix common issues diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..eaf2cbc --- /dev/null +++ b/docs/index.md @@ -0,0 +1,173 @@ +# OpenQuery Documentation + +Welcome to the comprehensive documentation for OpenQuery - the AI-powered search and answer system. + +## πŸ“š Documentation Overview + +### Getting Started +- **[Installation Guide](installation.md)** - Build, install, and setup instructions +- **[Configuration](configuration.md)** - Configure API keys, models, and settings +- **[Usage Guide](usage.md)** - Complete CLI reference with examples + +### Deep Dive +- **[Architecture](architecture.md)** - System design, patterns, and data flow +- **[Components](components/overview.md)** - Detailed component documentation + - [OpenQueryApp](components/openquery-app.md) + - [SearchTool](components/search-tool.md) + - [Services](components/services.md) + - [Models](components/models.md) +- **[API Reference](api/cli.md)** - Complete command-line interface reference + - [Environment Variables](api/environment-variables.md) + - [Programmatic APIs](api/programmatic.md) + +### Support +- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions +- **[Performance](performance.md)** - Performance characteristics and optimization + +## 🎯 Quick Links + +### For Users +- [Install OpenQuery](installation.md) in 5 minutes +- [Configure your API key](configuration.md) +- [Learn the basics](usage.md) +- [Solve common problems](troubleshooting.md) + +### For Developers +- [Understand the architecture](architecture.md) +- [Explore components](components/overview.md) +- [Use the APIs programmatically](api/programmatic.md) +- [Performance tuning](performance.md) + +## πŸ“‹ Table of Contents + +1. [Project Overview](#project-overview) +2. [Key Concepts](#key-concepts) +3. [Technology Stack](#technology-stack) +4. [System Workflow](#system-workflow) + +## Project Overview + +**OpenQuery** is a sophisticated CLI tool that combines the power of large language models with web search to provide accurate, well-sourced answers to complex questions. + +### What It Does +- Takes a natural language question as input +- Generates multiple diverse search queries +- Searches the web via SearxNG +- Extracts and processes article content +- Uses semantic similarity to rank relevance +- Synthesizes a comprehensive AI-generated answer with citations + +### Why Use OpenQuery? +- **Accuracy**: Multiple search queries reduce bias and increase coverage +- **Transparency**: Sources are cited in the final answer +- **Speed**: Parallel processing minimizes latency +- **Control**: Fine-tune every aspect from query count to chunk selection +- **Privacy**: SearxNG provides anonymous, aggregating search + +## Key Concepts + +### Search Queries +Instead of using your exact question, OpenQuery generates multiple optimized search queries (default: 3). For example, "What is quantum entanglement?" might become: +- "quantum entanglement definition" +- "how quantum entanglement works" +- "quantum entanglement experiments" + +### Content Chunks +Long articles are split into ~500-character chunks. Each chunk is: +- Stored with its source URL and title +- Converted to a vector embedding (1536 dimensions) +- Scored against your query embedding + +### Semantic Ranking +Using cosine similarity between embeddings, OpenQuery ranks chunks by relevance and selects the top N (default: 3) for the final context. + +### Streaming Answer +The LLM receives your question plus the top chunks as context and streams the answer in real-time, citing sources like `[Source 1]`. + +## Technology Stack + +| Layer | Technology | Purpose | +|-------|------------|---------| +| Runtime | .NET 10.0 AOT | Native performance, minimal footprint | +| LLM | OpenRouter API | Chat completions and embeddings | +| Search | SearxNG | Metasearch engine | +| Content Extraction | SmartReader | Article text extraction | +| Vector Math | System.Numerics.Tensors | High-performance cosine similarity | +| Resilience | Polly | Retry and circuit breaker policies | +| CLI | System.CommandLine | Command parsing and help | +| JSON | System.Text.Json (source-gen) | Fast serialization | + +## System Workflow + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ OpenQuery Workflow β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ β”‚ +β”‚ 1. User Query: "What is quantum entanglement?" β”‚ +β”‚ β”‚ +β”‚ 2. Query Generation (Optional) β”‚ +β”‚ LLM generates: ["quantum entanglement physics", β”‚ +β”‚ "quantum entanglement definition", β”‚ +β”‚ "how does quantum entanglement work"] β”‚ +β”‚ β”‚ +β”‚ 3. Parallel Searches β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Query 1 β†’ β”‚β†’ β”‚ SearxNG β”‚β†’ β”‚ Results β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Query 2 β†’ β”‚β†’ β”‚ SearxNG β”‚β†’ β”‚ Results β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ Query 3 β†’ β”‚β†’ β”‚ SearxNG β”‚β†’ β”‚ Results (combined) β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β”‚ 4. Parallel Article Fetching β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ URL 1 β†’ β”‚β†’ β”‚ Article β”‚β†’ β”‚ Chunks β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ URL 2 β†’ β”‚β†’ β”‚ Article β”‚β†’ β”‚ Chunks β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ ... (concurrent, max 10 at a time) β”‚ +β”‚ β”‚ +β”‚ 5. Parallel Embeddings β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Chunks β”‚β†’ β”‚ Embed- β”‚β†’ β”‚ Vectors β”‚ β”‚ +β”‚ β”‚ Batch 1 β”‚ β”‚ ding API β”‚ β”‚ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Chunks β”‚β†’ β”‚ Embed- β”‚β†’ β”‚ Vectors β”‚ β”‚ +β”‚ β”‚ Batch 2 β”‚ β”‚ ding API β”‚ β”‚ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ (batches of 300, up to 4 concurrent) β”‚ +β”‚ β”‚ +β”‚ 6. Semantic Ranking β”‚ +β”‚ Query Embedding + Chunk Embeddings β†’ Cosine Similarity β†’ β”‚ +β”‚ Score β†’ Sort Descending β†’ Top 3 Chunks β”‚ +β”‚ β”‚ +β”‚ 7. Final Answer Generation β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ System: "Answer based on this context:" β”‚ β”‚ +β”‚ β”‚ Context: [Top 3 chunks with sources] β”‚ β”‚ +β”‚ β”‚ Question: "What is quantum entanglement?" β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ ↓ β”‚ +β”‚ LLM Streams Answer β”‚ +β”‚ "Quantum entanglement is..." β”‚ +β”‚ with citations like [Source 1] β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## Next Steps + +1. **[Install OpenQuery](installation.md)** +2. **[Configure it](configuration.md)** +3. **[Start asking questions](usage.md)** + +For detailed technical information, continue to [the architecture guide](architecture.md). + +--- + +**Need help?** Check the [Troubleshooting](troubleshooting.md) guide. diff --git a/docs/installation.md b/docs/installation.md new file mode 100644 index 0000000..15e66f8 --- /dev/null +++ b/docs/installation.md @@ -0,0 +1,358 @@ +# Installation Guide + +This guide covers how to build, install, and configure OpenQuery on your system. + +## πŸ“‹ Table of Contents + +1. [Prerequisites](#prerequisites) +2. [Quick Install](#quick-install) +3. [Manual Build](#manual-build) +4. [Platform-Specific Instructions](#platform-specific-instructions) +5. [Post-Installation](#post-installation) +6. [Verification](#verification) +7. [Uninstallation](#uninstallation) + +## Prerequisites + +### Required Software +- **.NET SDK 10.0** or later + - Download from [dotnet.microsoft.com](https://dotnet.microsoft.com/download) + - Verify: `dotnet --version` should show 10.x or higher + +### External Services (Setup Required) +1. **SearxNG Instance** - Metasearch engine + - **Docker (Recommended)**: + ```bash + docker run -d \ + --name searxng \ + -p 8002:8080 \ + -v searxng-data:/etc/searxng \ + searxng/searxng:latest + ``` + - Access at `http://localhost:8002` + + - **Alternative**: Use a public SearxNG instance from [searx.space](https://searx.space) + +2. **OpenRouter API Key** - AI model provider + - Sign up at [openrouter.ai](https://openrouter.ai) + - Get your API key from dashboard + - Free tier available with rate limits + +## Quick Install + +The easiest way to get OpenQuery up and running: + +```bash +# 1. Clone the repository +git clone <your-repo-url> +cd OpenQuery + +# 2. Make install script executable and run +chmod +x install.sh +./install.sh + +# 3. Configure your API key +openquery configure -i + +# 4. Test it +openquery "Hello world" +``` + +**What the install script does**: +- Builds the project in Release mode +- Publishes as self-contained AOT binary +- Copies to `~/.local/bin/OpenQuery` (Linux/macOS) +- Creates config directory `~/.config/openquery/` + +## Manual Build + +If you prefer to build manually or need a specific platform: + +### Step 1: Restore Dependencies +```bash +dotnet restore +``` + +### Step 2: Build +```bash +dotnet build -c Release +``` + +### Step 3: Publish + +#### For Current Platform (Self-Contained AOT) +```bash +dotnet publish -c Release \ + --self-contained true \ + /p:PublishAot=true +``` + +The binary will be at: +``` +bin/Release/net10.0/<rid>/publish/OpenQuery +``` + +#### For Specific Platform (Cross-Compilation) + +**Runtime Identifiers (RIDs)**: +| Platform | RID | +|----------|-----| +| Linux x64 | `linux-x64` | +| Linux ARM64 | `linux-arm64` | +| macOS x64 | `osx-x64` | +| macOS ARM64 | `osx-arm64` | +| Windows x64 | `win-x64` | +| Windows ARM64 | `win-arm64` | + +Example for Linux x64: +```bash +dotnet publish -c Release \ + -r linux-x64 \ + --self-contained true \ + /p:PublishAot=true +``` + +### Step 4: Deploy + +Copy the binary to a directory in your PATH: + +```bash +# Linux/macOS +sudo cp bin/Release/net10.0/linux-x64/publish/OpenQuery /usr/local/bin/ +chmod +x /usr/local/bin/OpenQuery + +# Windows (PowerShell as Admin) +Copy-Item bin\Release\net10.0\win-x64\publish\OpenQuery.exe C:\Program Files\OpenQuery\ +``` + +Or use a local bin directory: +```bash +mkdir -p ~/.local/bin +cp bin/Release/net10.0/linux-x64/publish/OpenQuery ~/.local/bin/ +# Add to PATH if not already: export PATH="$HOME/.local/bin:$PATH" +``` + +## Platform-Specific Instructions + +### Linux + +#### Ubuntu/Debian +```bash +# Install .NET SDK 10.0 +wget https://dot.net/v10/dotnet-install.sh -O dotnet-install.sh +chmod +x dotnet-install.sh +./dotnet-install.sh --channel 10.0 + +# Add to PATH +export PATH="$HOME/.dotnet:$PATH" + +# Build and install (as shown above) +``` + +#### With Systemd Service (Optional) +If you run SearxNG locally, you might want it as a service: +```bash +# Create systemd service for SearxNG (if using Docker) +sudo nano /etc/systemd/system/searxng.service +``` + +```ini +[Unit] +Description=SearxNG Search Engine +Requires=docker.service +After=docker.service + +[Service] +Restart=always +ExecStart=/usr/bin/docker start -a searxng +ExecStop=/usr/bin/docker stop -t 2 searxng + +[Install] +WantedBy=multi-user.target +``` + +```bash +sudo systemctl enable searxng +sudo systemctl start searxng +``` + +### macOS + +#### Homebrew Install (if .NET available) +```bash +brew install dotnet-sdk +``` + +#### M1/M2 (ARM64) Notes +- Use RID: `osx-arm64` +- Ensure you have the ARM64 version of .NET SDK + +### Windows + +#### Using Winget (Windows 10/11) +```powershell +winget install Microsoft.DotNet.SDK.10 +``` + +#### Manual Install +1. Download installer from [dotnet.microsoft.com](https://dotnet.microsoft.com/download) +2. Run installer +3. Verify in PowerShell: + ```powershell + dotnet --version + ``` + +#### Building +```powershell +dotnet publish -c Release -r win-x64 --self-contained true /p:PublishAot=true +``` + +## Post-Installation + +### 1. Verify SearxNG is Running +```bash +curl "http://localhost:8002/search?q=test&format=json" +``` + +Expected: JSON response with results array. + +### 2. Configure OpenQuery +```bash +# Interactive setup +openquery configure -i + +# Or via environment variables +setx OPENROUTER_API_KEY "sk-or-..." # Windows +export OPENROUTER_API_KEY="sk-or-..." # Linux/macOS +``` + +### 3. Optional: Set Defaults +```bash +openquery configure --queries 5 --chunks 4 --results 10 +``` + +## Verification + +### Test Installation +```bash +# Check binary exists and is executable +which openquery # Linux/macOS +where openquery # Windows + +# If installed as OpenQuery (capital O) +which OpenQuery +``` + +### Test Configuration +```bash +# Should show your config or defaults +cat ~/.config/openquery/config +``` + +### Test the System +```bash +# Simple query (should work with any API key) +openquery "What is 2+2?" + +# More complex query +openquery -v "What are the benefits of exercise?" +``` + +Expected output: +- Spinner animation with status updates +- Streaming answer from the AI +- Citations like `[Source 1](url)` in the answer + +## Uninstallation + +### Using Uninstall Script +```bash +chmod +x uninstall.sh +./uninstall.sh +``` + +The script will: +- Remove binary from `~/.local/bin/` +- Ask if you want to delete config directory + +### Manual Removal +```bash +# Remove binary +rm ~/.local/bin/OpenQuery + +# Remove config (optional) +rm -r ~/.config/openquery +``` + +### Remove SearxNG (if no longer needed) +```bash +docker rm -f searxng +docker volume rm searxng-data +``` + +## Advanced Build Options + +### Reduce Binary Size +Edit `OpenQuery.csproj`: +```xml +<PropertyGroup> + <PublishAot>true</PublishAot> + <InvariantGlobalization>true</InvariantGlobalization> <!-- Already set --> + <StripSymbols>true</StripSymbols> +</PropertyGroup> +``` + +### Debug Build +```bash +dotnet build -c Debug +dotnet run -- "your question" +``` + +### With Symbols (for debugging) +```bash +dotnet publish -c Release -r linux-x64 \ + --self-contained true \ + /p:PublishAot=true \ + /p:DebugType=portable +``` + +## Troubleshooting Installation + +### "dotnet: command not found" +- Add `.dotnet` to PATH: `export PATH="$HOME/.dotnet:$PATH"` +- Restart terminal or source shell config + +### "The SDK 'Microsoft.NET.Sdk' was not found" +- .NET SDK not installed correctly +- Re-run installer or use `dotnet-install.sh` + +### AOT Build Fails +- Some platforms may not support AOT yet +- Remove `/p:PublishAot=true` to use JIT +- Check [.NET AOT support](https://docs.microsoft.com/dotnet/core/deploying/native-aot/) + +### Docker Pull Fails (SearxNG) +```bash +# Pull image separately first +docker pull searxng/searxng:latest +# Then run container +docker run -d --name searxng -p 8002:8080 searxng/searxng +``` + +### Port 8002 Already in Use +Change port in docker command: +```bash +docker run -d --name searxng -p 8080:8080 searxng/searxng +# Then set SEARXNG_URL=http://localhost:8080 +``` + +## Next Steps + +After successful installation: +1. [Configure OpenQuery](configuration.md) +2. [Learn how to use it](usage.md) +3. Read the [Architecture](architecture.md) to understand how it works + +--- + +**Need help?** See [Troubleshooting](troubleshooting.md) or open an issue. diff --git a/docs/performance.md b/docs/performance.md new file mode 100644 index 0000000..beb23be --- /dev/null +++ b/docs/performance.md @@ -0,0 +1,522 @@ +# Performance + +Performance characteristics, optimization strategies, and scalability considerations for OpenQuery. + +## πŸ“‹ Table of Contents + +1. [Performance Overview](#performance-overview) +2. [Latency Breakdown](#latency-breakdown) +3. [Throughput](#throughput) +4. [Memory Usage](#memory-usage) +5. [Benchmarking](#benchmarking) +6. [Optimization Strategies](#optimization-strategies) +7. [Scalability Limits](#scalability-limits) + +## Performance Overview + +OpenQuery is designed for **low-latency interactive use** (15-50 seconds end-to-end) while maximizing parallelization to minimize wait time. + +### Key Metrics + +| Metric | Typical | Best Case | Worst Case | +|--------|---------|-----------|------------| +| **End-to-End Latency** | 15-50s | 10s | 120s+ | +| **API Cost** | $0.01-0.05 | $0.005 | $0.20+ | +| **Memory Footprint** | 100-300MB | 50MB | 1GB+ | +| **Network I/O** | 5-20MB | 1MB | 100MB+ | + +**Note**: Wide variance due to network latency, content size, and LLM speed. + +--- + +## Latency Breakdown + +### Default Configuration + +`-q 3 -r 5 -c 3` (3 queries, 5 results each, 3 final chunks) + +| Stage | Operation | Parallelism | Time (p50) | Time (p95) | Dominant Factor | +|-------|-----------|-------------|------------|------------|-----------------| +| 1 | Query Generation | 1 | 2-5s | 10s | LLM inference speed | +| 2a | Searches (3 queries Γ— 5 results) | 3 concurrent | 3-8s | 15s | SearxNG latency | +| 2b | Article Fetching (β‰ˆ15 URLs) | 10 concurrent | 5-15s | 30s | Each site's response time | +| 2c | Chunking | 10 concurrent | <1s | 2s | CPU (HTML parsing) | +| 3a | Query Embedding | 1 | 0.5-1s | 3s | Embedding API latency | +| 3b | Chunk Embeddings (β‰ˆ50 chunks) | 4 concurrent | 1-3s | 10s | Batch API latency | +| 4 | Ranking | 1 | <0.1s | 0.5s | CPU (vector math) | +| 5 | Final Answer Streaming | 1 | 5-20s | 40s | LLM generation speed | +| **Total** | | | **16-50s** | **~60s** | | + +### Phase Details + +#### Phase 1: Query Generation (2-5s) +- Single non-streaming LLM call +- Input: system prompt + user question (~200 tokens) +- Output: JSON array of 3-5 short strings (~50 tokens) +- Fast because small context and output + +#### Phase 2a: Searches (3-8s) +- 3 parallel `SearxngClient.SearchAsync` calls +- Each: query β†’ SearxNG β†’ aggregator engines β†’ scraped results +- Latency highly variable based on: + - SearxNG instance performance + - Network distance to SearxNG + - SearxNG's upstream search engines + +#### Phase 2b: Article Fetching (5-15s) +- β‰ˆ15 URLs to fetch (3 queries Γ— 5 results minus duplicates) +- Up to 10 concurrent fetches (semaphore) +- Each: TCP connect + TLS handshake + HTTP GET + SmartReader parse +- Latency: + - Fast sites (CDN, cached): 200-500ms + - Normal sites: 1-3s + - Slow/unresponsive sites: timeout after ~30s + +Why 5-15s for 15 URLs with 10 concurrent? +- First wave (10 URLs): max latency among them β‰ˆ 3s β†’ 3s +- Second wave (5 URLs): another β‰ˆ 3s β†’ total 6s +- But many URLs faster (500ms) β†’ total β‰ˆ 2-3s +- However, some sites take 5-10s β†’ dominates + +**Tail latency**: Slowest few URLs can dominate total time. Cannot proceed until all fetch attempts complete (or fail). + +#### Phase 2c: Chunking (<1s) +- CPU-bound HTML cleaning and splitting +- SmartReader is surprisingly fast; C# HTML parser +- Typically 100-300 chunks total +- <1s on modern CPU + +#### Phase 3: Embeddings (1.5-4s) +- **Query embedding**: 1 call, ~200 tokens, β‰ˆ 0.5-1s +- **Chunk embeddings**: β‰ˆ50 chunks β†’ 1 batch of 50 (batch size 300 unused here) + - Batch of 50: still single API call, ~15K tokens (50 Γ— 300 chars β‰ˆ 15K tokens) + - If using `text-embedding-3-small`: $0.00002 per 1K β†’ ~$0.0003 per batch + - Latency: 1-3s for embedding API + +If more chunks (say 500), would be 2 batches β†’ maybe 2-4s. + +Parallel batches (4 concurrent) help if many batches (1500+ chunks). + +#### Phase 4: Ranking (<0.1s) +- Cosine similarity for 50-100 chunks +- Each: dot product + normalization (O(dim)=1536) +- 100 Γ— 1536 β‰ˆ 150K FLOPs β†’ negligible on modern CPU +- SIMD acceleration from `TensorPrimitives` + +#### Phase 5: Final Answer (5-20s) +- Streaming chat completion +- Input: system prompt + context (50K tokens for 3Γ—500-char chunks) + question +- Output: varies wildly (200-2000 tokens typically) +- Longer context slightly increases latency +- Model choice major factor: + - Qwen Flash: fast (5-10s for 1000 output tokens) + - Gemini Flash: moderate (10-15s) + - Llama-class: slower (20-40s) + +--- + +## Throughput + +### Sequential Execution + +Running queries one after another (default CLI behavior): +- Latency per query: 16-50s +- Throughput: 1 query / 20s β‰ˆ 180 queries/hour (theoretically) + +But API rate limits will kick in before that: +- OpenRouter free tier: limited RPM/TPM +- Even paid: soft limits + +### Concurrent Execution (Multiple OpenQuery Instances) + +You could run multiple OpenQuery processes in parallel (different terminals), but they share: +- Same API key (OpenRouter rate limit is per API key, not per process) +- Same SearxNG instance (could saturate it) + +**Practical**: 3-5 concurrent processes before hitting diminishing returns or rate limits. + +### Throughput Optimization + +To maximize queries per hour: +1. Use fastest model (Qwen Flash) +2. Reduce `--chunks` to 1-2 +3. Reduce `--queries` to 1 +4. Use local/fast SearxNG +5. Cache embedding results (not implemented) +6. Batch multiple questions in one process (not implemented; would require redesign) + +**Achievable**: Maybe 500-1000 queries/hour on paid OpenRouter plan with aggressive settings. + +--- + +## Memory Usage + +### Baseline + +.NET 10 AOT app with dependencies: +- **Code**: ~30MB (AOT compiled native code) +- **Runtime**: ~20MB (.NET runtime overhead) +- **Base Memory**: ~50MB + +### Per-Query Memory + +| Component | Memory | Lifetime | +|-----------|--------|----------| +| Search results (15 items) | ~30KB | Pipeline | +| Articles (raw HTML) | ~5MB (transient) | Freed after parse | +| Articles (extracted text) | ~500KB | Until pipeline complete | +| Chunks (β‰ˆ100 items) | ~50KB text + embeddings 600KB | Until pipeline complete | +| Embeddings (100 Γ— 1536 floats) | ~600KB | Until pipeline complete | +| HTTP buffers | ~1MB per concurrent request | Short-lived | +| **Total per query** | **~2-5MB** (excluding base) | Released after complete | + +**Peak**: When all articles fetched but not yet embedded, we have text ~500KB + chunks ~650KB = ~1.2MB + overhead β‰ˆ 2-3MB. + +**If processing many queries in parallel** (unlikely for CLI), memory would scale linearly. + +### Memory Leak Risks + +- `HttpClient` instances: Created per `OpenRouterClient` and `SearxngClient`. Should be disposed (not happening). But short-lived process exits anyway. +- `StatusReporter` background task: Disposed via `using` +- `RateLimiter` semaphore: Disposed via `IAsyncDisposable` if wrapped in `using` (not currently, but short-lived) + +No major leaks observed. + +### Memory Optimization Opportunities + +1. **Reuse HttpClient** with `IHttpClientFactory` (but not needed for CLI) +2. **Stream article fetching** instead of buffering all articles before embedding (possible: embed as URLs complete) +3. **Early chunk filtering**: Discard low-quality chunks before embedding to reduce embedding count +4. **Cache embeddings**: By content hash, avoid re-embedding seen text (would need persistent storage) + +--- + +## Benchmarking + +### Methodology + +Measure with `time` command and verbose logging: + +```bash +time openquery -v "What is quantum entanglement?" 2>&1 | tee log.txt +``` + +Parse log for timestamps (or add them manually by modifying code). + +### Sample Benchmark + +**Environment**: +- Linux x64, .NET 10 AOT +- SearxNG local Docker (localhost:8002) +- OpenRouter API (US East) +- Model: qwen/qwen3.5-flash-02-23 + +**Run 1**: +``` +real 0m23.4s +user 0m1.2s +sys 0m0.3s +``` +Log breakdown: +- Query generation: 3.2s +- Searches: 4.1s +- Article fetching: 8.7s (12 URLs) +- Embeddings: 2.8s (45 chunks) +- Final answer: 4.6s (325 tokens) + +**Run 2** (cached SearxNG results, same URLs): +``` +real 0m15.8s +``` +Faster article fetching (2.3s) because sites cached or faster second request. + +**Run 3** (verbose `-s` short answer): +``` +real 0m18.2s +``` +Final answer faster (2.1s instead of 4.6s) due to shorter output. + +### Benchmarking Tips + +1. **Warm up**: First run slower (JIT or AOT cold start). Discard first measurement. +2. **Network variance**: Run multiple times and average. +3. **Control variables**: Same question, same SearxNG instance, same network conditions. +4. **Measure API costs**: Check OpenRouter dashboard for token counts. +5. **Profile with dotTrace** or `perf` if investigating CPU bottlenecks. + +--- + +## Optimization Strategies + +### 1. Tune Concurrent Limits + +Edit `SearchTool.cs` where `_options` is created: + +```csharp +var _options = new ParallelProcessingOptions +{ + MaxConcurrentArticleFetches = 5, // ↓ from 10 + MaxConcurrentEmbeddingRequests = 2, // ↓ from 4 + EmbeddingBatchSize = 300 // ↑ or ↓ (rarely matters) +}; +``` + +**Why tune down?** +- Hit OpenRouter rate limits +- Network bandwidth saturated +- Too many concurrent fetches overwhelm target sites (ethical/scraping etiquette) + +**Why tune up?** +- Fast network, powerful CPU, no rate limits +- Many chunks (>500) needing parallel embedding batches + +**Monitor**: +- `openquery -v` shows embedding progress: `[Generating embeddings: batch X/Y]` +- If Y=1 (all fitted in one batch), batch size is fine +- If Y>1 and max concurrent = Y, you're using full parallelism + +### 2. Reduce Data Volume + +**Fewer search results**: +```bash +openquery -r 3 "question" # instead of 5 or 10 +``` +Effect: Fetches fewer URLs, extracts fewer chunks. Linear reduction in work. + +**Fewer queries**: +```bash +openquery -q 1 "question" +``` +Effect: One search instead of N. Quality may suffer (less diverse sources). + +**Fewer chunks**: +```bash +openquery -c 1 "question" +``` +Effect: Only top 1 chunk in context β†’ fewer tokens β†’ faster final answer, but may miss relevant info. + +**Chunk size** (compile-time constant): +Edit `ChunkingService.cs`: +```csharp +private const int MAX_CHUNK_SIZE = 300; // instead of 500 +``` +Effect: More chunks (more granular ranking) but each chunk shorter β†’ more chunks to rank, more embeddings to generate. Could increase or decrease total time. Likely more tokens overall (more chunks in context if `-c` is fixed number). + +### 3. Change Embedding Model + +Currently hardcoded to `openai/text-embedding-3-small`. Could use: +- `openai/text-embedding-3-large` (higher quality, slower, more expensive) +- `intfloat/multilingual-e5-large` (multilingual, smaller) + +Modify `EmbeddingService` constructor: +```csharp +public EmbeddingService(OpenRouterClient client, string embeddingModel = "your-model") +``` + +Then pass: +```csharp +var embeddingService = new EmbeddingService(client, "intfloat/multilingual-e5-large"); +``` + +**Impact**: Different dimensionality (1536 vs 1024 vs 4096). Memory scales with dim. Quality may vary for non-English queries. + +### 4. Caching + +**Current**: No caching. Every query hits all APIs. + +**Embedding cache** (by text hash): +- Could store in memory: `Dictionary<string, float[]>` +- Or disk: `~/.cache/openquery/embeddings/` +- Invalidation: embeddings are deterministic per model, so long-term cache viable + +**Search cache** (by query hash): +- Cache `List<SearxngResult>` for identical queries +- TTL: maybe 1 hour (search results change over time) + +**Article cache** (by URL hash): +- Cache `Article` (text content) per URL +- Invalidation: could check `Last-Modified` header or use TTL (1 day) + +**Implementation effort**: Medium. Would need cache abstraction (interface, in-memory + disk options). + +**Benefit**: Repeat queries (common in testing or similar questions) become instant. + +### 5. Parallelize More (Aggressive) + +**Currently**: +- Searches: unbounded (as many as `--queries`) +- Fetches: max 10 +- Embeddings: max 4 + +Could increase: +- Fetches to 20 or 50 (if network/CPU can handle) +- Embeddings to 8-16 (if OpenRouter rate limit allows) + +**Risk**: +- Overwhelming target sites (unethical scraping) +- API rate limits β†’ 429 errors +- Local bandwidth saturation + +### 6. Local Models (Self-Hosted) + +Replace OpenRouter with local LLM: +- **Query generation**: Could run tiny model locally (no API latency) +- **Embeddings**: Could run `all-MiniLM-L6-v2` locally (fast, free after setup) +- **Answer**: Could run Llama 3 8B locally (no cost, but slower than GPT-4/Gemini) + +**Benefits**: +- Zero API costs (after hardware) +- No network latency +- Unlimited queries + +**Drawbacks**: +- GPU required for decent speed (or CPU very slow) +- Setup complexity (Ollama, llama.cpp, vLLM, etc.) +- Model quality may lag behind commercial APIs + +**Integration**: Would need to implement local inference backends (separate project scope). + +--- + +## Scalability Limits + +### API Rate Limits + +**OpenRouter**: +- Free tier: Very limited (few RPM) +- Paid: Varies by model, but typical ~10-30 requests/second +- Embedding API has separate limits + +**Mitigation**: +- Reduce concurrency (see tuning) +- Add exponential backoff (already have for embeddings) +- Batch embedding requests (already done) + +### SearxNG Limits + +**Single instance**: +- Can handle ~10-50 QPS depending on hardware +- Upstream search engines may rate limit per instance +- Memory ~100-500MB + +**Mitigation**: +- Run multiple SearxNG instances behind load balancer +- Use different public instances +- Implement client-side rate limiting (currently only per-URL fetches limited, not searches) + +### Network Bandwidth + +**Typical data transfer**: +- Searches: 1KB per query Γ— 3 = 3KB +- Articles: 100-500KB per fetch Γ— 15 = 1.5-7.5MB (raw HTML) +- Extracted text: ~10% of HTML size = 150-750KB +- Embeddings: 100 chunks Γ— 1536 Γ— 4 bytes = 600KB (request + response) +- Final answer: 2-10KB + +**Total**: ~3-10MB per query + +**100 queries/hour**: ~300MB-1GB data transfer + +**Not an issue** for broadband, but could matter on metered connections. + +--- + +## Moatslaw's Law: Scaling with Chunk Count + +Let: +- C = number of chunks with valid embeddings +- d = embedding dimension (1536) +- B = embedding batch size (300) +- P = max parallel embedding batches (4) + +**Embedding Time** β‰ˆ `O(C/B * 1/P)` (batches divided by parallelism) + +**Ranking Time** β‰ˆ `O(C * d)` (dot product per chunk) + +**Context Tokens** (for final answer) β‰ˆ `C * avg_chunk_tokens` (β‰ˆ 500 chars = 125 tokens) + +**As C increases**: +- Embedding time: linear in C/B (sublinear if batch fits in one) +- Ranking time: linear in C +- Final answer latency: more tokens in context β†’ longer context processing + potentially longer answer (more relevant chunks to synthesize) + +**Practical limit**: +- With defaults, C ~ 50-100 (from 15 articles) +- Could reach C ~ 500-1000 if: + - `--queries` = 10 + - `--results` = 20 (200 URLs) + - Many articles long β†’ many chunks each +- At C = 1000: + - Embeddings: 1000/300 β‰ˆ 4 batches, with 4 parallel β†’ still 1 sequential step (if 4 batches, parallel all 4 β†’ time β‰ˆ 1 batch duration) + - But OpenRouter may have per-minute limits on embedding requests + - Ranking: 1000 Γ— 1536 = 1.5M FLOPs β†’ still <0.01s + - Context tokens: 1000 Γ— 125 = 125K tokens! Many LLMs have 200K context, so fits, but expensive and slow. + +**Conclusion**: Current defaults scale to C ~ 100-200 comfortably. Beyond that: +- Need to increase batch size or parallelism for embeddings +- May hit embedding API rate limits +- Context token count becomes expensive and may degrade answer quality (LLMs lose focus in very long context) + +--- + +## Profiling + +### CPU Profiling + +Use `dotnet-trace` or `perf`: + +```bash +# Collect trace for 30 seconds while running query +dotnet-trace collect --process-id $(pgrep OpenQuery) --duration 30s -o trace.nettrace + +# Analyze with Visual Studio or PerfView +``` + +Look for: +- Hot methods: `ChunkingService.ChunkText`, `EmbeddingService.GetEmbeddingsAsync`, cosine similarity +- Allocation hotspots + +### Memory Profiling + +```bash +dotnet-gcdump collect -p <pid> +# Open in VS or dotnet-gcdump analyze +``` + +Check heap size, object counts (look for large `string` objects from article content). + +### Network Profiling + +Use `tcpdump` or `wireshark`: +```bash +tcpdump -i any port 8002 or port 443 -w capture.pcap +``` + +Or simpler: `time` on individual curl commands to measure latency components. + +--- + +## Next Steps + +- [Configuration](../configuration.md) - Tune for your environment +- [Troubleshooting](../troubleshooting.md) - Diagnose slow performance +- [Architecture](../architecture.md) - Understand pipeline bottlenecks + +--- + +**Quick Tuning Cheatsheet** + +```bash +# Fast & cheap (factual Q&A) +openquery -q 1 -r 3 -c 2 -s "What is X?" + +# Thorough (research) +openquery -q 5 -r 10 -c 5 -l "Deep dive on X" + +# Custom code edit for concurrency +# In SearchTool.cs: +_options = new ParallelProcessingOptions { + MaxConcurrentArticleFetches = 20, // if network can handle + MaxConcurrentEmbeddingRequests = 8 // if API allows +}; +``` diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md new file mode 100644 index 0000000..5c976b1 --- /dev/null +++ b/docs/troubleshooting.md @@ -0,0 +1,699 @@ +# Troubleshooting + +Solve common issues, errors, and performance problems with OpenQuery. + +## πŸ“‹ Table of Contents + +1. [Common Errors](#common-errors) +2. [Performance Issues](#performance-issues) +3. [Debugging Strategies](#debugging-strategies) +4. [Getting Help](#getting-help) + +## Common Errors + +### ❌ "API Key is missing" + +**Error Message**: +``` +[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i' to set it up. +``` + +**Cause**: No API key available from environment or config file. + +**Solutions**: + +1. **Set environment variable** (temporary): +```bash +export OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" +``` + +2. **Configure interactively** (persistent): +```bash +openquery configure -i +# Follow prompts to enter API key +``` + +3. **Check config file**: +```bash +cat ~/.config/openquery/config +# Should contain: ApiKey=sk-or-... +``` + +4. **Verify environment**: +```bash +echo $OPENROUTER_API_KEY +# If empty, you didn't export or exported in wrong shell +``` + +--- + +### ❌ "Network request failed" + +**Error Message**: +``` +[Error] Network request failed. Details: Name or service not known +``` + +**Cause**: Cannot reach OpenRouter or SearxNG API endpoints. + +**Solutions**: + +1. **Check internet connectivity**: +```bash +ping 8.8.8.8 +curl https://openrouter.ai +``` + +2. **Verify SearxNG is running**: +```bash +curl "http://localhost:8002/search?q=test&format=json" +# Should return JSON +``` + +If connection refused: +```bash +# Start SearxNG if using Docker +docker start searxng +# Or run fresh +docker run -d --name searxng -p 8002:8080 searxng/searxng:latest +``` + +3. **Check firewall/proxy**: +```bash +# Test OpenRouter API +curl -H "Authorization: Bearer $OPENROUTER_API_KEY" \ + https://openrouter.ai/api/v1/models +``` + +4. **Test from different network** (if behind restrictive firewall) + +--- + +### ❌ "No search results found" + +**Error Message**: +``` +No search results found. +``` + +**Cause**: Search queries returned zero results from SearxNG. + +**Solutions**: + +1. **Test SearxNG manually**: +```bash +curl "http://localhost:8002/search?q=test&format=json" | jq '.results | length' +# Should be > 0 +``` + +2. **Check SearxNG configuration**: +- If self-hosted: ensure internet access is enabled in `/etc/searxng/settings.yml` +- Some public instances disable certain engines or have rate limits + +3. **Try a different SearxNG instance**: +```bash +export SEARXNG_URL="https://searx.example.com" +openquery "question" +``` + +4. **Use simpler queries**: Some queries may be too obscure or malformed + +5. **Verbose mode to see queries**: +```bash +openquery -v "complex question" +# See what queries were generated +``` + +--- + +### ❌ "Found search results but could not extract readable content." + +**Cause**: SearxNG returned results but `ArticleService` failed to extract content from all URLs. + +**Common Reasons**: +- JavaScript-heavy sites (React, Vue apps) where content loaded dynamically +- Paywalled sites (NYT, academic journals) +- PDFs or non-HTML content +- Malformed HTML +- Server returned error (404, 403, 500) +- `robots.txt` blocked crawler + +**Solutions**: +1. **Accept that some sites can't be scraped** - try different query to get different results +2. **Use site:reddit.com or site:wikipedia.org** - these are usually scrape-friendly +3. **Increase `--results`** to get more URLs (some will work) +4. **Check verbose output**: +```bash +openquery -v "question" +# Look for "Warning: Failed to fetch article" +``` +5. **Try a local SearxNG instance with more engines** - some engines fetch different sources + +--- + +### ❌ Rate Limiting (429 Too Many Requests) + +**Symptoms**: +```bash +[Error] Response status code does not indicate success: 429 (Too Many Requests). +``` + +Or retries exhausting after Polly attempts. + +**Cause**: Too many concurrent requests to OpenRouter API. + +**Solutions**: + +1. **Reduce concurrency** (edit `SearchTool.cs`): +```csharp +var _options = new ParallelProcessingOptions +{ + MaxConcurrentArticleFetches = 5, // reduce from 10 + MaxConcurrentEmbeddingRequests = 2, // reduce from 4 + EmbeddingBatchSize = 150 // reduce from 300 +}; +``` + +2. **Add delay** between embedding batches (custom implementation) + +3. **Upgrade OpenRouter plan** to higher rate limits + +4. **Wait and retry** - rate limits reset after time window + +--- + +### ❌ Slow Performance + +**Symptom**: Queries take 60+ seconds when they usually take 20s. + +**Diagnosis Steps**: + +1. **Run with verbose mode**: +```bash +openquery -v "question" +``` +Watch which phase takes longest: +- Query generation? +- Searching? +- Fetching articles? +- Embeddings? + +2. **Check network latency**: +```bash +time curl "https://openrouter.ai/api/v1/models" +time curl "http://localhost:8002/search?q=test&format=json" +``` + +**Common Causes & Fixes**: + +| Phase | Cause | Fix | +|-------|-------|-----| +| Searches | SearxNG overloaded/slow | Check CPU/memory, restart container | +| Fetching | Target sites slow | Reduce `--results` to fewer URLs | +| Embeddings | API rate limited | Reduce concurrency (see above) | +| Answer | Heavy model/load | Switch to faster model (e.g., Qwen Flash) | + +3. **Resource monitoring**: +```bash +htop # CPU/memory usage +iftop # network throughput +``` + +4. **Reduce parameters**: +```bash +openquery -q 2 -r 3 -c 2 "question" # lighter load +``` + +--- + +### ❌ Out of Memory + +**Symptoms**: +- Process killed by OOM killer (Linux) +- `System.OutOfMemoryException` +- System becomes unresponsive + +**Cause**: Processing too many large articles simultaneously. + +**Why**: Each article can be 100KB+ of text, split into many chunks, embeddings are 6KB per chunk (1536 floats Γ— 4 bytes). 200 chunks = 1.2MB embeddings, plus text ~100KB = 1.3MB. Not huge, but many large articles could create thousands of chunks. + +**Solutions**: + +1. **Reduce `--results`** (fewer URLs per query): +```bash +openquery -r 3 "question" # instead of 10 +``` + +2. **Reduce `--queries`** (fewer search queries): +```bash +openquery -q 2 "question" +``` + +3. **Fetches already limited** to 10 concurrent by default, which is reasonable + +4. **Check article size**: Some sites (PDFs, long documents) may yield megabytes of text; SmartReader should truncate but may not + +--- + +### ❌ Invalid JSON from Query Generation + +**Symptom**: Query generation fails silently, falls back to original question. + +**Cause**: LLM returned non-JSON (even though instructed). Could be: +- Model not instruction-following +- Output exceeded context window +- API error in response + +**Detection**: Run with `-v` to see: +``` +[Failed to generate queries, falling back to original question. Error: ...] +``` + +**Solutions**: +- Try a different model (configure to use Gemini or DeepSeek) +- Reduce `--queries` count (simpler task) +- Tune system prompt (would require code change) +- Accept fallback - the original question often works as sole query + +--- + +### ❌ Spinner Artifacts in Output + +**Symptom**: When redirecting output to file, you see weird characters like `β ‹`, `οΏ½`, etc. + +**Cause**: Spinner uses Unicode Braille characters and ANSI escape codes. + +**Fix**: Use `2>/dev/null | sed 's/.\x08//g'` to clean: +```bash +openquery "question" 2>/dev/null | sed 's/.\x08//g' > answer.md +``` + +Or run with `--verbose` (no spinner, only newline-separated messages): +```bash +openquery -v "question" > answer.txt +``` + +--- + +### ❌ "The type or namespace name '...' does not exist" (Build Error) + +**Cause**: Missing NuGet package or wrong .NET SDK version. + +**Solution**: + +1. **Verify .NET SDK 10.0**: +```bash +dotnet --version +# Should be 10.x +``` + +If lower: https://dotnet.microsoft.com/download/dotnet/10.0 + +2. **Restore packages**: +```bash +dotnet restore +``` + +3. **Clean and rebuild**: +```bash +dotnet clean +dotnet build +``` + +4. **Check OpenQuery.csproj** for package references: +```xml +<PackageReference Include="Polly.Core" Version="8.6.6" /> +<PackageReference Include="Polly.RateLimiting" Version="8.6.6" /> +<PackageReference Include="SmartReader" Version="0.11.0" /> +<PackageReference Include="System.CommandLine" Version="2.0.0-beta4.22272.1" /> +<PackageReference Include="System.Numerics.Tensors" Version="9.0.0" /> +``` + +If restore fails, these packages may not be available for .NET 10 preview. Consider: +- Downgrade to .NET 8.0 (if packages incompatible) +- Or find package versions compatible with .NET 10 + +--- + +### ❌ AOT Compilation Fails + +**Error**: `error NETSDK1085: The current .NET SDK does not support targeting .NET 10.0.` + +**Cause**: Using .NET SDK older than 10.0. + +**Fix**: Install .NET SDK 10.0 preview. + +**Or**: Disable AOT for development (edit `.csproj`): +```xml +<!-- Remove or set to false --> +<PublishAot>false</PublishAot> +``` + +--- + +## Performance Issues + +### Slow First Request + +**Expected**: First query slower (JIT compilation for .NET runtime if not AOT, or initial API connections). + +If not using AOT: +- Consider publishing with `/p:PublishAot=true` for production distribution +- Development builds use JIT, which adds 500ms-2s warmup + +**Mitigation**: Accept as warmup cost, or pre-warm with dummy query. + +--- + +### High Memory Usage + +**Check**: +```bash +ps aux | grep OpenQuery +# Look at RSS (resident set size) +``` + +**Typical**: 50-200MB (including .NET runtime, AOT code, data structures) + +**If >500MB**: +- Likely processing very many articles +- Check `--results` and `--queries` values +- Use `--verbose` to see counts: `[Fetched X search results]`, `[Extracted Y chunks]` + +**Reduce**: +- `--queries 2` instead of 10 +- `--results 3` instead of 15 +- These directly limit number of URLs to fetch + +--- + +### High CPU Usage + +**Cause**: +- SmartReader HTML parsing (CPU-bound) +- Cosine similarity calculations (many chunks, but usually fast) +- Spinner animation (negligible) + +**Check**: `htop` β†’ which core at 100%? If single core, likely parsing. If all cores, parallel fetch. + +**Mitigation**: +- Ensure `MaxConcurrentArticleFetches` not excessively high (default 10 is okay) +- Accept - CPU spikes normal during fetch phase + +--- + +### API Costs Higher Than Expected + +**Symptom**: OpenRouter dashboard shows high token usage. + +**Causes**: +1. Using expensive model (check `OPENROUTER_MODEL`) +2. High `--chunks` β†’ more tokens in context +3. High `--queries` + `--results` β†’ many articles β†’ many embedding tokens (usually cheap) +4. Long answers (many completion tokens) - especially with `--long` + +**Mitigation**: +- Use `qwen/qwen3.5-flash-02-23` (cheapest good option) +- Reduce `--chunks` to 2-3 +- Use `--short` when detailed answer not needed +- Set `MaxTokens` in request (would need code change or **LLM capabilities** + +--- + +## Debugging Strategies + +### 1. Enable Verbose Mode + +Always start with: +```bash +openquery -v "question" 2>&1 | tee debug.log +``` + +Logs everything: +- Generated queries +- URLs fetched +- Progress counts +- Errors/warnings + +**Analyze log**: +- How many queries generated? (Should match `--queries`) +- How many search results per query? (Should be ≀ `--results`) +- How many articles fetched successfully? +- How many chunks extracted? +- Any warnings? + +--- + +### 2. Isolate Components + +**Test SearxNG**: +```bash +curl "http://localhost:8002/search?q=test&format=json" | jq '.results[0]' +``` + +**Test OpenRouter API**: +```bash +curl -X POST https://openrouter.ai/api/v1/chat/completions \ + -H "Authorization: Bearer $OPENROUTER_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"qwen/qwen3.5-flash-02-23","messages":[{"role":"user","content":"Hello"}]}' +``` + +**Test Article Fetching** (with known good URL): +```bash +curl -L "https://example.com/article" | head -50 +``` +Then check if SmartReader can parse. + +--- + +### 3. Reduce Scope + +Test with minimal parameters to isolate failing phase: + +```bash +# 1 query, 2 results, 1 chunk - should be fast and simple +openquery -q 1 -r 2 -c 1 "simple test question" -v + +# If that works, gradually increase: +openquery -q 1 -r 5 -c 1 "simple question" +openquery -q 3 -r 5 -c 1 "simple question" +openquery -q 3 -r 5 -c 3 "simple question" + +# Then try complex question +``` + +--- + +### 4. Check Resource Limits + +**File descriptors**: If fetching many articles, may hit limit. +```bash +ulimit -n # usually 1024, should be fine +``` + +**Memory**: Monitor with `free -h` while running. + +**Disk space**: Not much disk use, but logs could fill if verbose mode used repeatedly. + +--- + +### 5. Examine Config File + +```bash +cat ~/.config/openquery/config +# Ensure no spaces around '=' +# Correct: ApiKey=sk-or-... +# Wrong: ApiKey = sk-or-... (spaces become part of value) +``` + +Reconfigure if needed: +```bash +openquery configure --key "sk-or-..." +``` + +--- + +### 6. Clear Cache / Reset + +No persistent cache exists, but: +- Re-start SearxNG container: `docker restart searxng` +- Clear DNS cache if network issues: `sudo systemd-resolve --flush-caches` + +--- + +## Getting Help + +### Before Asking + +Gather information: + +1. **OpenQuery version** (commit or build date if available) +2. **OS and architecture**: `uname -a` (Linux/macOS) or `systeminfo` (Windows) +3. **Full command** you ran +4. **Verbose output**: `openquery -v "question" 2>&1 | tee log.txt` +5. **Config** (redact API key): +```bash +sed 's/ApiKey=.*/ApiKey=REDACTED/' ~/.config/openquery/config +``` +6. **SearxNG test**: +```bash +curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length' +``` +7. **OpenRouter test**: +```bash +curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \ + https://openrouter.ai/api/v1/models | jq '.data[0].id' +``` + +--- + +### Where to Ask + +1. **GitHub Issues** (if repository hosted there): + - Search existing issues first + - Provide all info from above + - Include log file (or link to gist) + +2. **Community Forum** (if exists) + +3. **Self-Diagnose**: + - Check `docs/troubleshooting.md` (this file) + - Check `docs/configuration.md` + - Check `docs/usage.md` + +--- + +### Example Bug Report + +``` +Title: OpenQuery hangs on "Fetching article X/Y" + +Platform: Ubuntu 22.04, .NET 10.0, OpenQuery built from commit abc123 +Command: openquery -v "What is Docker?" 2>&1 | tee log.txt + +Verbose output shows: +[...] +[Fetching article 1/15: docker.com] +[Fetching article 2/15: hub.docker.com] +[Fetching article 3/15: docs.docker.com] +# Hangs here indefinitely, no more progress + +SearxNG test: +$ curl "http://localhost:8002/search?q=docker&format=json" | jq '.results | length' +15 # SearxNG works + +Config: +ApiKey=sk-or-xxxx (redacted) +Model=qwen/qwen3.5-flash-02-23 +DefaultQueries=3 +DefaultChunks=3 +DefaultResults=5 + +Observation: +- Fetches 3 articles fine, then stalls +- Nothing in log after "Fetching article 3/15" +- Process uses ~150MB memory, CPU 0% (idle) +- Ctrl+C exits immediately + +Expected: Should fetch remaining 12 articles (concurrent up to 10) +Actual: Only 3 fetched, then silent hang +``` + +--- + +## Known Issues + +### Issue: Spinner Characters Not Displaying + +Some terminals don't support Braille Unicode patterns. + +**Symptoms**: Spinner shows as `?` or boxes. + +**Fix**: Use font with Unicode support, or disable spinner by setting `TERM=dumb` or use `--verbose`. + +--- + +### Issue: Progress Messages Overwritten + +In very fast operations, progress updates may overlap. + +**Cause**: `StatusReporter` uses `Console.Write` without lock in compact mode; concurrent writes from channel processor and spinner task could interleave. + +**Mitigation**: Unlikely in practice (channel serializes, spinner only updates when `_currentMessage` set). If problematic, add lock around Console operations. + +--- + +### Issue: Articles with No Text Content + +Some URLs return articles with empty `TextContent`. + +**Cause**: SmartReader's quality heuristic (`IsReadable`) failed, or article truly has no text (image, script, error page). + +**Effect**: Those URLs contribute zero chunks. + +**Acceptable**: Part of normal operation; not all URLs yield readable content. + +--- + +### Issue: Duplicate Sources in Answer + +Same website may appear multiple times (different articles). + +**Cause**: Different URLs from different search results may be from same domain but different pages. + +**Effect**: `[Source 1]` and `[Source 3]` could both be `example.com`. Not necessarily bad - they're different articles. + +--- + +## Performance Tuning Reference + +| Setting | Default | Fastest | Most Thorough | Notes | +|---------|---------|---------|---------------|-------| +| `--queries` | 3 | 1 | 8+ | More queries = more searches | +| `--results` | 5 | 2 | 15+ | Fewer = fewer articles to fetch | +| `--chunks` | 3 | 1 | 5+ | More chunks = more context tokens | +| `MaxConcurrentArticleFetches` | 10 | 5 | 20 | Higher = more parallel fetches | +| `MaxConcurrentEmbeddingRequests` | 4 | 2 | 8 | Higher = faster embeddings (may hit rate limits) | +| `EmbeddingBatchSize` | 300 | 100 | 1000 | Larger = fewer API calls, more data per call | + +**Start**: Defaults are balanced. + +**Adjust if**: +- Slow: Reduce `--results`, `--queries`, or concurrency limits +- Poor quality: Increase `--chunks`, `--results`, `--queries` +- Rate limited: Reduce concurrency limits +- High cost: Use `--short`, reduce `--chunks`, choose cheaper model + +--- + +## Next Steps + +- [Performance](../performance.md) - Detailed performance analysis +- [Configuration](../configuration.md) - Adjust settings +- [Usage](../usage.md) - Optimize workflow + +--- + +**Quick Diagnostic Checklist** + +```bash +# 1. Check API key +echo $OPENROUTER_API_KEY | head -c 10 + +# 2. Test SearxNG +curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length' + +# 3. Test OpenRouter +curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \ + https://openrouter.ai/api/v1/models | jq '.data[0].id' + +# 4. Run verbose +openquery -v "test" 2>&1 | grep -E "Fetching|Generated|Found" + +# 5. Check resource usage while running +htop + +# 6. Reduce scope and retry +openquery -q 1 -r 2 -c 1 "simple test" +``` diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 0000000..e89cdb5 --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,483 @@ +# Usage Guide + +Complete reference for using the OpenQuery command-line interface. + +## πŸ“‹ Table of Contents + +1. [Basic Usage](#basic-usage) +2. [Command Reference](#command-reference) +3. [Examples](#examples) +4. [Output Format](#output-format) +5. [Tips and Tricks](#tips-and-tricks) + +## Basic Usage + +### Simplest Form +```bash +openquery "your question here" +``` + +That's it! OpenQuery will: +1. Generate search queries +2. Search the web +3. Extract relevant content +4. Stream an answer with sources + +### Common Pattern +```bash +openquery [OPTIONS] "your question" +``` + +Quotes around the question are recommended to preserve spaces. + +## Command Reference + +### Main Command + +#### `openquery [options] <question>` + +Ask a question and get an AI-powered answer with citations. + +**Arguments**: +- `question` (positional, one or more words) - The question to ask + +**Options**: + +| Option | Aliases | Type | Default | Description | +|--------|---------|------|---------|-------------| +| `--chunks` | `-c` | int | 3 (from config) | Number of top relevant content chunks to include in context | +| `--results` | `-r` | int | 5 (from config) | Number of search results to fetch per generated query | +| `--queries` | `-q` | int | 3 (from config) | Number of search queries to generate from your question | +| `--short` | `-s` | bool | false | Request a concise, to-the-point answer | +| `--long` | `-l` | bool | false | Request a detailed, comprehensive answer | +| `--verbose` | `-v` | bool | false | Show detailed progress information and debug output | + +**Behavior**: +- Short and long are mutually exclusive but can both be omitted (balanced answer) +- If both `--short` and `--long` are specified, `--long` takes precedence +- Options override configuration file defaults + +#### `openquery configure [options]` + +Configure OpenQuery settings (API key, model, defaults). + +**Options**: + +| Option | Type | Description | +|--------|------|-------------| +| `--interactive` / `-i` | bool | Launch interactive configuration wizard | +| `--key` | string | Set the OpenRouter API key | +| `--model` | string | Set the default model | +| `--queries` | int? | Set default number of queries | +| `--chunks` | int? | Set default number of chunks | +| `--results` | int? | Set default number of results | + +**Examples**: +```bash +# Interactive wizard +openquery configure -i + +# Set just the API key +openquery configure --key "sk-or-..." + +# Set multiple defaults non-interactively +openquery configure --model "deepseek/deepseek-v3.2" --queries 5 --chunks 4 +``` + +**Note**: Options with `?` are nullable; only provided values are updated. + +## Examples + +### Everyday Queries + +**Simple factual question**: +```bash +openquery "What is the speed of light?" +``` + +**Multi-word question**: +```bash +openquery "How do solar panels work?" +``` + +**Question with special characters**: +```bash +openquery "What's the weather in New York?" +``` + +### Customizing Output + +**Get a quick answer**: +```bash +openquery -s "Who is the CEO of Tesla?" +``` +Output: "Elon Musk is the CEO of Tesla." (minimal explanation) + +**Get detailed analysis**: +```bash +openquery -l "Explain how nuclear fusion works" +``` +Output: Multi-paragraph detailed explanation with scientific details + +**See everything**: +```bash +openquery -v "What is machine learning?" +``` +Output: Shows all progress messages alongside the answer + +### Adjusting Search Depth + +**Minimal search** (fast, cheap): +```bash +openquery -q 1 -r 2 -c 1 "What time is it in London?" +``` +- 1 generated query +- 2 results per query +- 1 context chunk + +**Thorough research** (slow, comprehensive): +```bash +openquery -q 8 -r 15 -c 5 "History and applications of cryptography" +``` +- 8 diverse queries +- 15 results each +- 5 top chunks + +**Balanced (recommended defaults)**: +```bash +openquery "Latest advancements in CRISPR technology" +``` +- 3 queries +- 5 results each +- 3 top chunks + +### Combining Options + +**Verbose custom search**: +```bash +openquery -v -q 5 -r 10 -c 4 "What are the ethical implications of AI?" +``` + +**Short answer with more context**: +```bash +openquery -s -c 5 "Python vs JavaScript for web development" +``` + +**Long answer, lots of research**: +```bash +openquery -l -q 10 -r 20 -c 6 "Complete guide to quantum computing" +``` + +### Practical Use Cases + +**News and Current Events**: +```bash +openquery "Latest developments in the Ukraine conflict" +``` + +**Technical Questions**: +```bash +openquery "How to set up a PostgreSQL replication cluster" +``` + +**Health Information** (verify with doctor!): +```bash +openquery "What are the symptoms of vitamin D deficiency?" +``` + +**Cooking**: +```bash +openquery "How to make authentic Italian pizza dough" +``` + +**Travel**: +```bash +openquery "Best things to do in Tokyo in spring" +``` + +**Programming**: +```bash +openquery "Rust vs Go for backend development in 2025" +``` + +### Configuration Examples + +**Set up for the first time**: +```bash +openquery configure -i +# Follow prompts to enter API key, choose model, set defaults +``` + +**Switch to a different model**: +```bash +openquery configure --model "google/gemini-3-flash-preview" +``` + +**Update default number of queries**: +```bash +openquery configure --queries 5 +``` + +**Set cost-effective defaults**: +```bash +openquery configure --model "qwen/qwen3.5-flash-02-23" --queries 2 --chunks 2 --results 3 +``` + +**Check your configuration**: +```bash +cat ~/.config/openquery/config +``` + +## Output Format + +### Standard Output (Streaming) + +The answer streams in real-time, character by character, like this: + +``` +β ‹ Generating search queries... (spinner with status) +β Ή Searching web... +β Έ Fetching articles... +β Ό Processing embeddings... +β ΄ Generating answer... +Assistant: Quantum entanglement is a phenomenon where pairs or groups of +particles interact in ways such that the quantum state of each particle +cannot be described independently of the others, even when separated by +large distances. + +[Source 1: Understanding Quantum Mechanics](https://example.com/quantum) +[Source 2: Quantum Physics Overview](https://example.com/physics) +``` + +### Verbose Mode Output (`-v`) + +When `--verbose` is enabled, you see detailed progress: + +``` +[Generating 3 search queries based on your question...] +[Generated queries: + 1. quantum entanglement definition + 2. how quantum entanglement works + 3. quantum entanglement Bell's theorem +] +[Searching web for 'quantum entanglement definition'...] +[Searching web for 'how quantum entanglement works'...] +[Searching web for 'quantum entanglement Bell's theorem'...] +[Fetched 15 search results total] +[Fetching article 1/12: physicsworld.com] +[Fetching article 2/12: nature.com] +... +[Fetching article 12/12: scientificamerican.com] +[Extracted 48 content chunks] +[Generating embeddings: batch 1/4] +[Generating embeddings: batch 2/4] +[Generating embeddings: batch 3/4] +[Generating embeddings: batch 4/4] +[Ranked chunks by relevance] +[Found top 3 most relevant chunks overall. Generating answer...] + +Assistant: Quantum entanglement is a fundamental phenomenon in quantum +mechanics where... +``` + +### Source Citations + +Sources are formatted as markdown links in the answer: +``` +[Source 1: Article Title](https://example.com/article) +``` + +These appear inline where the AI references that source. Multiple sources can be cited in a single paragraph. + +### Error Output + +Errors are written to stderr and exit with non-zero status: + +``` +[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i'. +``` + +## Tips and Tricks + +### Speed Tips + +1. **Reduce concurrency limits** (edit `SearchTool.cs` if constantly rate-limited) +2. **Reduce `--results`** - fewer articles to fetch and process +3. **Reduce `--queries`** - fewer parallel searches +4. **Use local SearxNG** - minimize network latency to search backend +5. **Cache results** - future enhancement could add caching + +### Quality Tips + +1. **Increase `--chunks`** to 4-5 for complex topics +2. **Increase `--queries`** to 5-8 for broad exploration +3. **Use `--long`** for deep topics that need elaboration +4. **Check `-v` output** to see which sources were selected +5. **Try different models** - some are better at synthesis, others at facts + +### Cost Tips + +1. **Use `qwen/qwen3.5-flash-02-23`** - cheapest good model +2. **Reduce `--chunks`** and `--results`** - fewer tokens in context +3. **Use `--short`** - shorter answers use fewer completion tokens +4. **Monitor usage** at [openrouter.ai](https://openrouter.ai) dashboard + +### Workflow Tips + +**Iterative deepening**: +```bash +# Start broad +openquery -v "machine learning" + +# Identify subtopics from answer, then dive deeper +openquery "What is transformer architecture in LLMs?" +``` + +**Compare answers**: +```bash +# Same question with different models +OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23" openquery "question" +OPENROUTER_MODEL="google/gemini-3-flash-preview" openquery "question" +``` + +**Save answers**: +```bash +openquery "What is Docker?" > answer.md +# answer.md will contain the streamed output (including spinner chars, so filter): +openquery "What is Docker?" 2>/dev/null | sed 's/.\x08//g' > clean-answer.md +``` + +### Shell Aliases and Functions + +Add to `~/.bashrc` or `~/.zshrc`: + +```bash +# Short alias +alias oq='openquery' + +# With common options +alias oql='openquery -l -q 5 -r 10' # long, thorough +alias oqs='openquery -s' # short +alias oqv='openquery -v' # verbose + +# Function to save output cleanly +oqsave() { + openquery "$@" 2>/dev/null | sed 's/.\x08//g' > "answer-$(date +%Y%m%d-%H%M%S).md" +} +``` + +### Scripting + +```bash +#!/bin/bash +# batch-questions.sh + +while IFS= read -r question; do + echo "## $question" >> research.md + echo "" >> research.md + openquery -l "$question" 2>/dev/null | sed 's/.\x08//g' >> research.md + echo "" >> research.md +done < questions.txt +``` + +### Chaining with Other Tools + +Pipe to `jq` (if you modify to output JSON): +```bash +# Future: openquery --json "question" | jq '.answer' +``` + +Pipe to `pbcopy` (macOS) or `xclip` (Linux): +```bash +openquery "quick fact" 2>/dev/null | sed 's/.\x08//g' | pbcopy +``` + +Filter sources: +```bash +openquery "topic" 2>/dev/null | sed 's/.\x08//g' | grep -E '^\[Source' +``` + +## Keyboard Interrupts + +- **Ctrl+C** during processing: Cancels current operation, exits gracefully +- **Ctrl+C** during streaming answer: Stops streaming, shows partial answer +- **Ctrl+Z** (suspend): Not recommended; may leave background tasks running + +OpenQuery uses proper cancellation tokens to clean up resources on interrupt. + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success - answer was generated | +| 1 | Error - see stderr message | +| 2 | Configuration error (missing API key) | + +You can check the exit code in shell scripts: +```bash +openquery "question" +if [ $? -eq 0 ]; then + echo "Success!" +else + echo "Failed" +fi +``` + +## Limitation and Workarounds + +### Question Length +Very long questions (>2000 chars) may be truncated by the LLM's context window or cause token limits. + +**Workaround**: Keep questions concise; discuss complex multi-part questions separately. + +### Answer Length Limits +The LLM may hit `max_tokens` limits for very complex questions. + +**Workaround**: Use `--long` flag (already maximizes allowed tokens) or break into sub-questions. + +### Rate Limiting +OpenRouter may rate limit if you send too many requests too quickly. + +**Symptoms**: 429 errors, occasional timeouts. + +**Workaround**: The built-in retry (Polly) handles this automatically. For persistent issues: +- Reduce concurrency (edit code) +- Add delays between queries +- Upgrade OpenRouter plan + +### SearxNG Timeouts +Large SearxNG responses or slow targets may timeout. + +**Workaround**: Reduce `--results` or check SearxNG logs. Nothing to do on OpenQuery side (HTTP client timeout is ~30s default). + +--- + +## Next Steps + +- [Architecture](architecture.md) - Understand how OpenQuery works under the hood +- [Configuration](configuration.md) - Fine-tune your setup +- [Troubleshooting](troubleshooting.md) - Solve common problems + +--- + +**Quick Reference Card** + +```bash +# Basic +openquery "question" + +# Quick fact +openquery -s "question" + +# Deep research +openquery -l -q 5 -r 10 -c 4 "question" + +# See progress +openquery -v "question" + +# Configure +openquery configure -i + +# Check config +cat ~/.config/openquery/config +```