docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
2026-03-19 10:01:58 +01:00
parent b28d8998f7
commit 65ca2401ae
16 changed files with 7073 additions and 0 deletions
@@ -0,0 +1,196 @@
 # OpenQuery
 **AI-powered search and answer system** that finds accurate, well-sourced answers to your questions by searching the web, extracting relevant content, and synthesizing intelligent responses.
 ![.NET](https://img.shields.io/badge/.NET-10.0-blue)
 [AOT](https://img.shields.io/badge/AOT-Compiled-green)
 [License](https://img.shields.io/badge/license-MIT-green)
 ## ✨ Features
 - 🤖 **Smart Query Generation** - Automatically creates multiple diverse search queries from your question
 - ⚡ **Parallel Processing** - Fast concurrent searches, article fetching, and embedding generation
 - 🎯 **Semantic Search** - Uses vector embeddings to find the most relevant information
 - 📚 **Clean Article Extraction** - Intelligently extracts article content using SmartReader
 - 🔄 **Streaming Responses** - Watch the AI answer generate in real-time
 - ⚙️ **Fully Configurable** - Control queries, results, and context chunks
 - 🛡️ **Production Ready** - Built with rate limiting, retries, and error handling
 ## 🚀 Quick Start
 ### 1. Prerequisites
 - A **SearxNG** instance (Docker recommended):
  ```bash
  docker run -d --name searxng -p 8002:8080 searxng/searxng:latest
  ```
 - A **OpenRouter API key** from [openrouter.ai](https://openrouter.ai)
 ### 2. Installation
 ```bash
 # Clone and build
 git clone <your-repo-url>
 cd OpenQuery
 chmod +x install.sh
 ./install.sh
 # Or build manually
 dotnet publish -c Release -r linux-x64 --self-contained true /p:PublishAot=true
 ```
 ### 3. Configuration
 ```bash
 # Interactive setup
 openquery configure -i
 # Or set environment variables
 export OPENROUTER_API_KEY="sk-or-..."
 export SEARXNG_URL="http://localhost:8002"  # default
 ```
 ### 4. Ask a Question
 ```bash
 openquery "What is quantum entanglement and how does it work?"
 ```
 That's it! The system will:
 1. Generate 3 search queries (configurable)
 2. Search the web via SearxNG
 3. Extract and chunk relevant articles
 4. Rank content by semantic relevance
 5. Stream a comprehensive answer with citations
 ## 📖 Usage Examples
 ```bash
 # Concise answer
 openquery -s "Who won the 2024 US presidential election?"
 # Detailed research
 openquery -l -q 5 -r 10 "Explain quantum computing and its applications"
 # See everything
 openquery -v "What are the health benefits of meditation?"
 # Customize
 openquery -c 5 -r 8 "Current state of SpaceX Starship development"
 ```
 ## 🔧 Options
 ```
 -c, --chunks N     Number of top context chunks (default: 3)
 -r, --results N    Search results per query (default: 5)
 -q, --queries N    Number of search queries to generate (default: 3)
 -s, --short        Give a concise answer
 -l, --long         Give a detailed answer
 -v, --verbose      Show detailed progress
 ```
 ## 🌐 Supported Models
 OpenQuery works with any OpenRouter model. Popular choices:
 - `qwen/qwen3.5-flash-02-23` (default, fast & affordable)
 - `google/gemini-3-flash-preview`
 - `deepseek/deepseek-v3.2`
 - `moonshotai/kimi-k2.5`
 Configure your preferred model:
 ```bash
 openquery configure --model "google/gemini-3-flash-preview"
 ```
 ## 📁 Project Structure
 ```
 OpenQuery/
 ├── README.md                 # This file
 ├── docs/                     # Detailed documentation
 │   ├── installation.md
 │   ├── configuration.md
 │   ├── usage.md
 │   ├── architecture.md
 │   ├── components/
 │   └── troubleshooting.md
 ├── Program.cs                # CLI entry point
 ├── OpenQuery.cs              # Main application logic
 ├── Services/                 # Business logic services
 ├── Models/                   # Data models
 ├── Tools/                    # Search orchestration
 └── ConfigManager.cs          # Configuration management
 ```
 ## 🏗️ Architecture
 OpenQuery uses a multi-stage pipeline:
 ```
 Query → Multiple Searches → Article Fetching → Embeddings → Ranking → AI Answer
 ```
 1. **Query Expansion**: LLM generates diverse search queries
 2. **Parallel Search**: SearxNG executes all queries simultaneously
 3. **Content Extraction**: SmartReader pulls clean article text
 4. **Embedding Generation**: Vectorize query and chunks
 5. **Semantic Ranking**: Cosine similarity scoring
 6. **Answer Synthesis**: Final LLM response with sources
 ## 🔍 How It Works
 1. **You ask a question** → OpenQuery generates 3 optimized search queries
 2. **Searches the web** → All queries run in parallel via SearxNG
 3. **Fetches articles** → Extracts clean content from top results
 4. **Splits into chunks** → ~500 character pieces for embedding
 5. **Ranks by relevance** → Semantic similarity to your question
 6. **Synthesizes answer** → LLM reviews top 3 chunks and responds with citations
 ## 🐛 Troubleshooting
 **Common issues** and solutions:
 | Problem | Solution |
 |---------|----------|
 | "API Key is missing" | Run `openquery configure -i` or set `OPENROUTER_API_KEY` |
 | No search results | Check your SearxNG instance is running (`curl http://localhost:8002`) |
 | Slow performance | Reduce `--results` or `--queries` count |
 | Articles failing to fetch | Some sites block scrapers; try different queries |
 See [docs/troubleshooting.md](docs/troubleshooting.md) for detailed help.
 ## 📚 Documentation
 - **[Installation Guide](docs/installation.md)** - Build and setup instructions
 - **[Configuration](docs/configuration.md)** - All config options and environment variables
 - **[Usage Guide](docs/usage.md)** - Complete CLI reference and examples
 - **[Architecture](docs/architecture.md)** - System design and patterns
 - **[Components](docs/components/)** - Deep dive into each module
 - **[Troubleshooting](docs/troubleshooting.md)** - Solve common problems
 - **[API Reference](docs/api-reference.md)** - Programmatic interfaces
 ## 🤝 Contributing
 Contributions welcome! Please:
 1. Fork the repository
 2. Create a feature branch
 3. Make your changes
 4. Submit a pull request
 ## 📄 License
 MIT License - see LICENSE file for details.
 ## 🙏 Acknowledgments
 - [OpenRouter](https://openrouter.ai) - LLM API aggregation
 - [SearxNG](https://searx.space) - Privacy-respecting metasearch
 - [SmartReader](https://github.com/kfasten/SmartReader) - Article extraction
 - Built with [.NET](https://dotnet.microsoft.com)
 ---
 **Need more details?** Check the comprehensive documentation in the [docs/](docs/) folder.
@@ -0,0 +1,309 @@
 # CLI Reference
 Complete command-line interface reference for OpenQuery.
 ## 📋 Table of Contents
 1. [Command Structure](#command-structure)
 2. [Main Command: `openquery`](#main-command-openquery)
 3. [Configure Command: `openquery configure`](#configure-command-openquery-configure)
 4. [Exit Codes](#exit-codes)
 5. [Examples by Use Case](#examples-by-use-case)
 6. [Shell Integration](#shell-integration)
 ## Command Structure
 OpenQuery uses [System.CommandLine](https://learn.microsoft.com/dotnet/standard/commandline/) for CLI parsing.
 ### Syntax
 ```bash
 openquery [GLOBAL-OPTIONS] <COMMAND> [COMMAND-OPTIONS] [ARGUMENTS]
 ```
 If no command specified, `openquery` (main command) is assumed.
 ### Help
 ```bash
 openquery --help
 openquery configure --help
 ```
 Shows usage, options, examples.
 ### Version
 ```bash
 openquery --version  # if implemented
 ```
 ---
 ## Main Command: `openquery`
 Ask a question and get an AI-powered answer.
 ### Synopsis
 ```bash
 openquery [OPTIONS] <question>
 ```
 ### Arguments
 | Name | Arity | Type | Description |
 |------|-------|------|-------------|
 | `question` | ZeroOrMore | `string[]` | The question to ask (positional, concatenated with spaces) |
 **Notes**:
 - `ZeroOrMore` means you can omit the question (shows help)
 - Multiple words are combined: `openquery what is quantum` → `"what is quantum"`
 - Use quotes for questions with special characters: `openquery "what's the weather?"`
 ### Options
 | Option | Aliases | Type | Default | Description |
 |--------|---------|------|---------|-------------|
 | `--chunks` | `-c` | `int` | `DefaultChunks` (config) | Number of top context chunks to pass to LLM |
 | `--results` | `-r` | `int` | `DefaultResults` (config) | Number of search results per query |
 | `--queries` | `-q` | `int` | `DefaultQueries` (config) | Number of search queries to generate |
 | `--short` | `-s` | `bool` | `false` | Request a concise answer |
 | `--long` | `-l` | `bool` | `false` | Request a detailed answer |
 | `--verbose` | `-v` | `bool` | `false` | Show detailed progress information |
 **Option Notes**:
 - `--short` and `--long` are flags; if both specified, `--long` takes precedence
 - Integer options validate as positive numbers (parsed by System.CommandLine)
 - Defaults come from config file or hardcoded (3, 5, 3 respectively)
 ### Behavior
 1. Loads API key (env `OPENROUTER_API_KEY` or config file)
 2. Loads model (env `OPENROUTER_MODEL` or config)
 3. Executes workflow:
   - Generate queries (if `--queries > 1`)
   - Run search pipeline
   - Stream final answer
 4. Exits with code 0 on success, 1 on error
 ### Examples
 ```bash
 # Basic
 openquery "What is the capital of France?"
 # With options
 openquery -q 5 -r 10 -c 4 "Explain quantum computing"
 # Short answer
 openquery -s "Who won the 2024 election?"
 # Verbose mode
 openquery -v "How does photosynthesis work?"
 # Combined
 openquery -l -v -q 8 "History of the internet"
 ```
 ---
 ## Configure Command: `openquery configure`
 Configure OpenQuery settings (API key, model, defaults).
 ### Synopsis
 ```bash
 openquery configure [OPTIONS]
 ```
 ### Options
 | Option | Type | Description |
 |--------|------|-------------|
 | `--interactive` / `-i` | `bool` | Launch interactive configuration wizard |
 | `--key` | `string` | Set OpenRouter API key |
 | `--model` | `string` | Set default LLM model |
 | `--queries` | `int?` | Set default number of queries |
 | `--chunks` | `int?` | Set default number of chunks |
 | `--results` | `int?` | Set default number of results |
 **Note**: Nullable options (`int?`) only update if provided.
 ### Behavior
 - **Interactive mode** (`-i`): Prompts for each setting with current defaults shown in brackets
 - **Non-interactive**: Only updates provided options, leaves others untouched
 - Writes to `~/.config/openquery/config` (creates directory if missing)
 - Overwrites entire file (not incremental)
 ### Interactive Mode Details
 Models presented with numbered menu:
 ```
 Available models:
 1. qwen/qwen3.5-flash-02-23
 2. qwen/qwen3.5-122b-a10b
 3. minimax/minimax-m2.5
 4. google/gemini-3-flash-preview
 5. deepseek/deepseek-v3.2
 6. moonshotai/kuki-k2.5
 Model [qwen/qwen3.5-flash-02-23]:
 ```
 - Enter number (1-6) to select preset
 - Or enter custom model string (any OpenRouter model)
 ### Examples
 ```bash
 # Interactive wizard
 openquery configure -i
 # Set just API key
 openquery configure --key "sk-or-xxxxxxxxxxxx"
 # Set multiple defaults
 openquery configure --model "google/gemini-3-flash-preview" --queries 5 --chunks 4
 # Update model only
 openquery configure --model "deepseek/deepseek-v3.2"
 ```
 ---
 ## Exit Codes
 | Code | Meaning |
 |------|---------|
 | `0` | Success - answer generated and streamed |
 | `1` | Error - API key missing, network failure, or exception |
 **Usage in scripts**:
 ```bash
 openquery "question"
 if [ $? -eq 0 ]; then
    echo "Success"
 else
    echo "Failed" >&2
 fi
 ```
 ---
 ## Examples by Use Case
 ### Quick Facts
 ```bash
 openquery -s "capital of France"
 ```
 Fast, concise, minimal tokens.
 ### Research Paper
 ```bash
 openquery -l -q 5 -r 10 -c 4 "quantum entanglement experiments"
 ```
 Multiple angles, deep sources, detailed synthesis.
 ### News & Current Events
 ```bash
 openquery -v "latest news about OpenAI"
 ```
 See everything: queries, results, which sources fetched.
 ### Troubleshooting
 ```bash
 # Reduce scope if errors
 openquery -q 1 -r 2 "test question"
 ```
 ### Save Answer to File
 ```bash
 openquery "question" 2>/dev/null | sed 's/.\x08//g' > answer.md
 ```
 (Removes spinner characters)
 ### Batch Processing
 ```bash
 for q in $(cat questions.txt); do
    echo "## $q" >> all-answers.md
    openquery -s "$q" 2>/dev/null | sed 's/.\x08//g' >> all-answers.md
    echo "" >> all-answers.md
 done
 ```
 ---
 ## Shell Integration
 ### Aliases (add to ~/.bashrc or ~/.zshrc)
 ```bash
 # Short alias
 alias oq='openquery'
 # Presets
 alias oqs='openquery -s'              # short
 alias oql='openquery -l'              # long
 alias oqv='openquery -v'              # verbose
 alias oqr='openquery -q 5 -r 10 -c 4' # research mode
 # Config shortcuts
 alias oqcfg='openquery configure -i'
 ```
 ### Functions
 ```bash
 # Save answer cleanly (removes spinner chars)
 oqsave() {
    local query="$*"
    local filename="answer-$(date +%Y%m%d-%H%M%S).md"
    openquery "$query" 2>/dev/null | sed 's/.\x08//g' > "$filename"
    echo "Saved to $filename"
 }
 # Search and grep results
 oqgrep() {
    openquery "$1" 2>/dev/null | sed 's/.\x08//g' | grep -i "$2"
 }
 ```
 ### Environment Setup Script
 ```bash
 # ~/.local/bin/openquery-env.sh
 export OPENROUTER_API_KEY="sk-or-..."
 export OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23"
 export SEARXNG_URL="http://localhost:8002"
 ```
 Source it: `source ~/.local/bin/openquery-env.sh`
 ---
 ## Next Steps
 - **[Configuration](configuration.md)** - Set up your environment
 - **[Usage](usage.md)** - Learn usage patterns and tips
 - **[Troubleshooting](troubleshooting.md)** - Fix common problems
 ---
 **Quick Reference Card**
 ```
 # Ask
 openquery "question"
 openquery -s "quick fact"
 openquery -l -q 5 "deep research"
 # Configure
 openquery configure -i
 openquery configure --key "..."
 openquery configure --model "..."
 # Debug
 openquery -v "question"
 # Help
 openquery --help
 ```
@@ -0,0 +1,235 @@
 # Environment Variables
 Reference for all environment variables used by OpenQuery.
 ## 📋 Summary
 | Variable | Purpose | Required | Default | Example |
 |----------|---------|----------|---------|---------|
 | `OPENROUTER_API_KEY` | OpenRouter authentication | **Yes** | (none) | `sk-or-...` |
 | `OPENROUTER_MODEL` | Override default LLM model | No | `qwen/qwen3.5-flash-02-23` | `google/gemini-3-flash-preview` |
 | `SEARXNG_URL` | SearxNG instance URL | No | `http://localhost:8002` | `https://searx.example.com` |
 ## Detailed Reference
 ### `OPENROUTER_API_KEY`
 **Purpose**: Your OpenRouter API authentication token.
 **Required**: Yes, unless you have `ApiKey` set in config file.
 **How to Obtain**:
 1. Sign up at https://openrouter.ai
 2. Go to Dashboard → API Keys
 3. Copy your key (starts with `sk-or-`)
 **Priority**: Overrides config file `ApiKey`.
 **Setting**:
 ```bash
 # Bash/Zsh
 export OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
 # Fish
 set -x OPENROUTER_API_KEY "sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
 # PowerShell
 $env:OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
 # Windows CMD
 set OPENROUTER_API_KEY=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 ```
 **Security**:
 - Never commit API key to version control
 - Don't share key publicly
 - Use environment variables or config file with restrictive permissions (600)
 - Rotate key if accidentally exposed
 **Validation**: OpenQuery checks if key is empty string and exits with error if missing:
 ```
 [Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i' to set it up.
 ```
 ### `OPENROUTER_MODEL`
 **Purpose**: Override the default LLM model used for both query generation and final answer.
 **Required**: No.
 **Default**: `qwen/qwen3.5-flash-02-23`
 **Available Models** (from OpenRouter):
 | Model | Provider | Context | Cost (Input/Output per 1M tokens) |
 |-------|----------|---------|-----------------------------------|
 | `qwen/qwen3.5-flash-02-23` | Alibaba | 200K | \$0.10 / \$0.20 |
 | `qwen/qwen3.5-122b-a10b` | Alibaba | 200K | ~\$0.20 / ~\$0.40 |
 | `minimax/minimax-m2.5` | MiniMax | 200K | ~\$0.20 / ~\$0.40 |
 | `google/gemini-3-flash-preview` | Google | 1M | ~\$0.10 / ~\$0.40 |
 | `deepseek/deepseek-v3.2` | DeepSeek | 200K | ~\$0.10 / ~\$0.30 |
 | `moonshotai/kimi-k2.5` | Moonshot AI | 200K | ~\$0.10 / ~\$0.30 |
 (See OpenRouter for current pricing.)
 **Setting**:
 ```bash
 export OPENROUTER_MODEL="google/gemini-3-flash-preview"
 ```
 **Interactive Config Models**: The `configure -i` wizard shows only these 6 models for convenience, but you can set any OpenRouter model via environment variable or non-interactive configure.
 **Note**: Different models have different:
 - Speed (Flash models faster)
 - Cost (check pricing)
 - Quality (may vary by task)
 - Context window size (Gemini 3 Flash has 1M tokens, others ~200K)
 ### `SEARXNG_URL`
 **Purpose**: URL of the SearxNG metasearch instance.
 **Required**: No.
 **Default**: `http://localhost:8002`
 **Format**: Must include protocol (`http://` or `https://`) and host:port.
 **Setting**:
 ```bash
 # Local Docker instance
 export SEARXNG_URL="http://localhost:8002"
 # Remote instance with HTTPS
 export SEARXNG_URL="https://searx.example.com"
 # Custom port
 export SEARXNG_URL="http://localhost:8080"
 ```
 **Finding a Public Instance**:
 - Visit https://searx.space for list of public instances
 - Choose one with HTTPS and low latency
 - Note: Public instances may have rate limits or require attribution
 **Priority**: Overrides any default, but not config file (no config setting for SearxNG URL - only env var). Could be added to config in future.
 **Test Your Instance**:
 ```bash
 curl "$SEARXNG_URL/search?q=test&format=json" | head
 ```
 Expected: JSON with `"results": [...]`.
 ---
 ## Configuration Priority Recap
 When OpenQuery needs a value:
 1. **Command-line option** (`--model`, `--key` from configure) - highest
 2. **Environment variable** (`OPENROUTER_MODEL`, `OPENROUTER_API_KEY`, `SEARXNG_URL`)
 3. **Configuration file** (`~/.config/openquery/config`: `Model`, `ApiKey`)
 4. **Hard-coded default** (only for model)
 **Example**:
 ```bash
 # Config file: Model=qwen/qwen3.5-flash-02-23
 export OPENROUTER_MODEL="deepseek/deepseek-v3.2"
 openquery --model "google/gemini-3-flash-preview" "question"
 # Uses: model=google (CLI override), overrides env and config
 ```
 ---
 ## Troubleshooting Environment Variables
 ### Variable Not Taking Effect
 **Symptom**: `openquery` still uses old value after export.
 **Causes**:
 - Exported in different shell session
 - Exported after running `openquery`
 - Shell profile not reloaded
 **Check**:
 ```bash
 echo $OPENROUTER_API_KEY
 # Should print the key (or blank if unset)
 ```
 **Fix**:
 ```bash
 # Export in current session
 export OPENROUTER_API_KEY="sk-or-..."
 # Or add to ~/.bashrc / ~/.zshrc and restart terminal
 ```
 ### Special Characters in Values
 If your API key contains special characters (`$`, `!`, etc.), quote properly:
 ```bash
 export OPENROUTER_API_KEY='sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
 # Single quotes prevent shell expansion
 ```
 ### Variable Name Typos
 `OPENROUTER_API_KEY` is all caps with underscores. `openrouter_api_key` (lowercase) won't work.
 **Check spelling**:
 ```bash
 env | grep -i openrouter
 ```
 ### Windows Environment Variables
 On Windows, environment variables are set per-session or user-level:
 **PowerShell** (current session):
 ```powershell
 $env:OPENROUTER_API_KEY="sk-or-..."
 ```
 **Persistent** (PowerShell):
 ```powershell
 [Environment]::SetEnvironmentVariable("OPENROUTER_API_KEY", "sk-or-...", "User")
 ```
 **CMD**:
 ```cmd
 set OPENROUTER_API_KEY=sk-or-...
 ```
 **System Properties** → Advanced → Environment Variables (GUI)
 ---
 ## Next Steps
 - **[Configuration File](../configuration.md)** - Persistent configuration
 - **[Usage Guide](../usage.md)** - How to use these variables
 - **[Troubleshooting](../troubleshooting.md)** - Diagnose environment issues
 ---
 **Quick Reference**
 ```bash
 # Required
 export OPENROUTER_API_KEY="sk-or-..."
 # Optional (override defaults)
 export OPENROUTER_MODEL="google/gemini-3-flash-preview"
 export SEARXNG_URL="https://searx.example.com"
 # Run
 openquery "your question"
 ```
@@ -0,0 +1,508 @@
 # Programmatic API Reference
 How to use OpenQuery components programmatically in your own C# code.
 ## 📋 Table of Contents
 1. [Overview](#overview)
 2. [Using OpenQueryApp Programmatically](#using-openqueryapp-programmatically)
 3. [Using Individual Services](#using-individual-services)
 4. [Custom Implementations](#custom-implementations)
 5. [Thread Safety](#thread-safety)
 6. [Error Handling](#error-handling)
 ## Overview
 OpenQuery is designed as a library of composable services, not just a CLI tool. You can reference the project (or extract the core classes) and use them in your own applications.
 ### Core Interfaces
 Currently, OpenQuery uses concrete classes rather than interfaces. To use programmatically:
 1. Reference the `OpenQuery` project/dll
 2. Add `using OpenQuery.Services;` and `using OpenQuery.Tools;`
 3. Instantiate dependencies
 4. Call methods
 ### Dependency Chain
 ```
 Your Code
  ├── OpenRouterClient (LLM API)
  ├── SearxngClient (Search API)
  ├── EmbeddingService (requires OpenRouterClient)
  └── SearchTool (requires SearxngClient + EmbeddingService)
      └── (internally uses ArticleService, ChunkingService, RateLimiter)
 ```
 ---
 ## Using OpenQueryApp Programmatically
 ### Minimal Example
 ```csharp
 using OpenQuery;
 using OpenQuery.Services;
 using OpenQuery.Tools;
 using OpenQuery.Models;
 // 1. Configure
 string apiKey = Environment.GetEnvironmentVariable("OPENROUTER_API_KEY") 
    ?? throw new InvalidOperationException("API key required");
 string searxngUrl = Environment.GetEnvironmentVariable("SEARXNG_URL") 
    ?? "http://localhost:8002";
 string model = Environment.GetEnvironmentVariable("OPENROUTER_MODEL") 
    ?? "qwen/qwen3.5-flash-02-23";
 // 2. Instantiate services
 var openRouterClient = new OpenRouterClient(apiKey);
 var searxngClient = new SearxngClient(searxngUrl);
 var embeddingService = new EmbeddingService(openRouterClient);
 var searchTool = new SearchTool(searxngClient, embeddingService);
 var openQuery = new OpenQueryApp(openRouterClient, searchTool, model);
 // 3. Execute
 var options = new OpenQueryOptions(
    Chunks: 3,
    Results: 5,
    Queries: 3,
    Short: false,
    Long: false,
    Verbose: false,
    Question: "What is quantum entanglement?"
 );
 await openQuery.RunAsync(options);
 ```
 **Output**: Streams answer to `Console.Out` (hardcoded in `OpenQueryApp`). To capture output, modify `OpenQueryApp` or redirect console.
 ### Capturing Output
 `OpenQueryApp.RunAsync` writes directly to `Console`. To capture:
 **Option 1**: Redirect Console (hacky)
 ```csharp
 var sw = new StringWriter();
 Console.SetOut(sw);
 await openQuery.RunAsync(options);
 string answer = sw.ToString();
 ```
 **Option 2**: Modify OpenQueryApp to accept TextWriter (not currently supported)
 **Option 3**: Reimplement using OpenQuery components without `OpenQueryApp`
 ```csharp
 public async Task<string> GetAnswerAsync(string question, OpenQueryOptions options)
 {
    var sb = new StringBuilder();
    var reporter = new StatusReporter(options.Verbose);
    // Replicate OpenQueryApp.RunAsync but collect output
    // ... (copy logic from OpenQuery.cs)
    return sb.ToString();
 }
 ```
 ---
 ## Using Individual Services
 ### OpenRouterClient
 ```csharp
 var client = new OpenRouterClient("your-api-key");
 // Non-streaming chat completion
 var request = new ChatCompletionRequest(
    model: "qwen/qwen3.5-flash-02-23",
    messages: new List<Message>
    {
        new Message("system", "You are a helpful assistant."),
        new Message("user", "What is 2+2?")
    }
 );
 var response = await client.CompleteAsync(request);
 Console.WriteLine(response.Choices[0].Message.Content);
 // Streaming chat completion
 var streamRequest = request with { Stream = true };
 await foreach (var chunk in client.StreamAsync(streamRequest))
 {
    if (chunk.TextDelta != null)
        Console.Write(chunk.TextDelta);
 }
 // Embeddings
 var embeddingRequest = new EmbeddingRequest(
    model: "openai/text-embedding-3-small",
    input: new List<string> { "text 1", "text 2" }
 );
 float[][] embeddings = await client.EmbedAsync(embeddingRequest.Model, embeddingRequest.Input);
 // embeddings[0] is vector for "text 1"
 ```
 ### SearxngClient
 ```csharp
 var searxng = new SearxngClient("http://localhost:8002");
 List<SearxngResult> results = await searxng.SearchAsync("quantum physics", limit: 5);
 foreach (var result in results)
 {
    Console.WriteLine($"{result.Title}");
    Console.WriteLine($"{result.Url}");
    Console.WriteLine($"{result.Content}");
    Console.WriteLine();
 }
 ```
 ### EmbeddingService
 ```csharp
 var client = new OpenRouterClient("your-api-key");
 var embeddingService = new EmbeddingService(client); // default model: openai/text-embedding-3-small
 // Single embedding
 float[] embedding = await embeddingService.GetEmbeddingAsync("Hello world");
 // Batch embeddings (with progress)
 List<string> texts = new() { "text 1", "text 2", "text 3" };
 float[][] embeddings = await embeddingService.GetEmbeddingsAsync(
    texts,
    onProgress: msg => Console.WriteLine(msg)
 );
 // Cosine similarity
 float similarity = EmbeddingService.CosineSimilarity(embedding1, embedding2);
 ```
 ### ArticleService
 ```csharp
 var article = await ArticleService.FetchArticleAsync("https://example.com/article");
 Console.WriteLine(article.Title);
 Console.WriteLine(article.TextContent);
 Console.WriteLine($"Readable: {article.IsReadable}");
 ```
 Note: `Article` type comes from SmartReader library (not OpenQuery-specific).
 ### ChunkingService
 ```csharp
 List<string> chunks = ChunkingService.ChunkText("Long article text...");
 foreach (var chunk in chunks)
 {
    Console.WriteLine($"Chunk ({chunk.Length} chars): {chunk.Substring(0, 50)}...");
 }
 ```
 ### SearchTool (Orchestration)
 ```csharp
 var searxngClient = new SearxngClient("http://localhost:8002");
 var embeddingService = new EmbeddingService(openRouterClient);
 var searchTool = new SearchTool(searxngClient, embeddingService);
 string context = await searchTool.ExecuteAsync(
    originalQuery: "What is quantum entanglement?",
    generatedQueries: new List<string>
    {
        "quantum entanglement definition",
        "how quantum entanglement works"
    },
    maxResults: 5,
    topChunksLimit: 3,
    onProgress: msg => Console.WriteLine(msg),
    verbose: true
 );
 Console.WriteLine("Context:");
 Console.WriteLine(context);
 ```
 Output is a formatted string:
 ```
 [Source 1: Title](https://example.com/1)
 Content chunk...
 [Source 2: Title](https://example.com/2)
 Content chunk...
 ```
 ---
 ## Custom Implementations
 ### Custom Progress Reporter
 `SearchTool.ExecuteAsync` accepts `Action<string>? onProgress`. Provide your own:
 ```csharp
 public class MyProgressReporter
 {
    public void Report(string message)
    {
        // Log to file
        File.AppendAllText("log.txt", $"{DateTime.UtcNow}: {message}\n");
        // Update UI
        myLabel.Text = message;
        // Send to telemetry
        Telemetry.TrackEvent("OpenQueryProgress", new { message });
    }
 }
 // Usage
 var reporter = new MyProgressReporter();
 await searchTool.ExecuteAsync(..., reporter.Report, verbose: false);
 ```
 ### Custom Chunking Strategy
 Extend `ChunkingService` or implement your own:
 ```csharp
 public static class MyChunkingService
 {
    public static List<string> ChunkText(string text, int maxSize = 500, int overlap = 50)
    {
        // Overlapping chunks for better context retrieval
        var chunks = new List<string>();
        int start = 0;
        while (start < text.Length)
        {
            int end = Math.Min(start + maxSize, text.Length);
            var chunk = text.Substring(start, end - start);
            chunks.Add(chunk);
            start += maxSize - overlap; // Slide window
        }
        return chunks;
    }
 }
 ```
 ### Custom Rate Limiter
 Implement `IAsyncDisposable` with your own strategy (token bucket, leaky bucket):
 ```csharp
 public class TokenBucketRateLimiter : IAsyncDisposable
 {
    private readonly SemaphoreSlim _semaphore;
    private readonly TimeSpan _refillPeriod;
    private int _tokens;
    private readonly int _maxTokens;
    // Implementation details...
    public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken ct)
    {
        await WaitForTokenAsync(ct);
        try
        {
            return await action();
        }
        finally
        {
            // Return tokens or replenish bucket
        }
    }
 }
 ```
 ---
 ## Thread Safety
 **Thread-Safe Components**:
 - `RateLimiter` - `SemaphoreSlim` is thread-safe
 - `StatusReporter` - Channel is thread-safe
 - Static utility classes (`ChunkingService`) - no state
 **Not Thread-Safe** (instances should not be shared across threads):
 - `OpenRouterClient` - wraps `HttpClient` (which is thread-safe but instance may have state)
 - `SearxngClient` - `HttpClient` (thread-safe but reuse recommendations apply)
 - `EmbeddingService` - has mutable fields (`_rateLimiter`, `_retryPipeline`)
 - `SearchTool` - has mutable `_options`
 **Recommendation**: Create new instances per operation or use locks if sharing.
 ### Example: Parallel Queries
 ```csharp
 var tasks = questions.Select(async question =>
 {
    var options = new OpenQueryOptions(..., question: question);
    var query = new OpenQueryApp(client, searchTool, model);
    await query.RunAsync(options);
    // Separate instances per task
 });
 await Task.WhenAll(tasks);
 ```
 **Better**: Create factory that spawns fresh instances.
 ---
 ## Error Handling
 All public async methods may throw:
 - `HttpRequestException` - network errors, non-2xx responses
 - `TaskCanceledException` - timeout or cancellation
 - `JsonException` - malformed JSON
 - `Argument*Exception` - invalid arguments
 - `Exception` - any other error
 ### Pattern: Try-Catch
 ```csharp
 try
 {
    var response = await client.CompleteAsync(request);
    Console.WriteLine(response.Choices[0].Message.Content);
 }
 catch (HttpRequestException ex)
 {
    Console.Error.WriteLine($"Network error: {ex.Message}");
 }
 catch (Exception ex)
 {
    Console.Error.WriteLine($"Unexpected error: {ex.Message}");
 }
 ```
 ### Pattern: Resilience with Polly
 `EmbeddingService` already wraps `client.EmbedAsync` with Polly retry. For other calls, you can add your own:
 ```csharp
 var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .WaitAndRetryAsync(3, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)));
 await retryPolicy.ExecuteAsync(async () =>
 {
    var response = await client.CompleteAsync(request);
    // ...
 });
 ```
 ---
 ## Advanced Usage
 ### Streaming Responses to Network
 ```csharp
 var request = new ChatCompletionRequest(model, messages) { Stream = true };
 var response = await client.StreamAsync(request);
 await foreach (var chunk in response)
 {
    if (chunk.TextDelta != null)
    {
        await networkStream.WriteAsync(Encoding.UTF8.GetBytes(chunk.TextDelta));
    }
 }
 ```
 ### Parallel Embedding Batches with Progress
 ```csharp
 var texts = Enumerable.Range(0, 1000).Select(i => $"Text {i}").ToList();
 await embeddingService.GetEmbeddingsAsync(texts, 
    onProgress: progress => 
    {
        Console.WriteLine(progress); // "[Generating embeddings: batch 5/4]"
    });
 ```
 ### Custom Embedding Service with Different Model
 ```csharp
 var client = new OpenRouterClient(apiKey);
 var customService = new EmbeddingService(client, "your-embedding-model");
 float[] embedding = await customService.GetEmbeddingAsync("text");
 ```
 ---
 ## Limitations
 ### No Interface-based Design
 OpenQuery uses concrete classes. For mocking in tests, you'd need to create wrappers or use tools like JustMock/Moq that can mock non-virtual methods (not recommended). Better: define interfaces like `IOpenRouterClient` and have implementations.
 ### Hardcoded Concurrency Settings
 `ParallelProcessingOptions` is instantiated in `SearchTool` with hardcoded defaults. To customize, you'd need to:
 1. Subclass `SearchTool` and override access to `_options`
 2. Or modify source to accept `ParallelProcessingOptions` in constructor
 3. Or use reflection (hacky)
 Suggested improvement: Add constructor parameter.
 ### Single Responsibility Blur
 `OpenQueryApp` does query generation + pipeline + streaming. Could split:
 - `IQueryGenerator` (for expanding queries)
 - `IPipelineExecutor` (for search tool)
 - `IAnswerStreamer` (for final LLM streaming)
 Currently, `OpenQueryApp` is the facade.
 ---
 ## Next Steps
 - **[Components](../components/overview.md)** - Understand architecture
 - **[CLI Reference](../api/cli.md)** - CLI that uses these APIs
 - **[Source Code](../)** - Read implementation details
 ---
 **Code Snippet: Full Programmatic Flow**
 ```csharp
 using OpenQuery.Services;
 using OpenQuery.Tools;
 using OpenQuery.Models;
 async Task<string> Research(string question)
 {
    var apiKey = GetApiKey(); // your method
    var client = new OpenRouterClient(apiKey);
    var searxng = new SearxngClient("http://localhost:8002");
    var embeddings = new EmbeddingService(client);
    var search = new SearchTool(searxng, embeddings);
    var app = new OpenQueryApp(client, search, "qwen/qwen3.5-flash-02-23");
    var options = new OpenQueryOptions(
        Chunks: 3,
        Results: 5,
        Queries: 3,
        Short: false,
        Long: false,
        Verbose: false,
        Question: question
    );
    // Capture output by redirecting Console or modifying OpenQueryApp
    await app.RunAsync(options);
    return "streamed to console"; // would need custom capture
 }
 ```
@@ -0,0 +1,682 @@
 # Architecture
 Deep dive into OpenQuery's system design, architectural patterns, and data flow.
 ## 📋 Table of Contents
 1. [System Overview](#system-overview)
 2. [Architectural Patterns](#architectural-patterns)
 3. [Component Architecture](#component-architecture)
 4. [Data Flow](#data-flow)
 5. [Concurrency Model](#concurrency-model)
 6. [Error Handling & Resilience](#error-handling--resilience)
 7. [Performance Considerations](#performance-considerations)
 8. [Design Decisions](#design-decisions)
 ## System Overview
 OpenQuery is a **pipeline-based AI application** that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.
 ### Core Design Principles
 1. **Separation of Concerns** - Each component has a single, well-defined responsibility
 2. **Parallel First** - Wherever possible, operations are parallelized for speed
 3. **Resilient by Default** - Built-in retries, rate limiting, and graceful degradation
 4. **Configurable** - Most parameters can be adjusted without code changes
 5. **Observable** - Progress reporting and verbose mode for debugging
 ### High-Level Architecture
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                         User Layer                          │
 │  CLI (System.CommandLine) → OpenQueryApp                   │
 └─────────────────────────────┬───────────────────────────────┘
                              │
                              ▼
 ┌─────────────────────────────────────────────────────────────┐
 │                    Orchestration Layer                      │
 │  OpenQueryApp → SearchTool (4-phase pipeline)              │
 └─────────────────────────────┬───────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
 ┌──────────────┐   ┌──────────────────┐   ┌──────────────┐
 │ Search Layer │   │ Processing Layer │   │ AI Layer     │
 │              │   │                  │   │              │
 │ SearxngClient│   │ ArticleService   │   │OpenRouterClient│
 │              │   │ ChunkingService  │   │              │
 │              │   │ EmbeddingService │   │              │
 └──────────────┘   └──────────────────┘   └──────────────┘
 ```
 ## Architectural Patterns
 ### 1. Pipeline Pattern
 The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:
 ```csharp
 Phase 1: ExecuteParallelSearchesAsync
   ↓ (List<SearxngResult>)
 Phase 2: ExecuteParallelArticleFetchingAsync
   ↓ (List<Chunk>)
 Phase 3: ExecuteParallelEmbeddingsAsync
   ↓ ((queryEmbedding, chunkEmbeddings))
 Phase 4: RankAndSelectTopChunks
   ↓ (List<Chunk> topChunks)
 → Formatted context string returned
 ```
 Each phase:
 - Accepts input from previous phase
 - Processes in parallel where applicable
 - Returns output to next phase
 - Reports progress via callbacks
 ### 2. Service Layer Pattern
 Services (`Services/` directory) are stateless classes that encapsulate specific operations:
 - **Clients**: `OpenRouterClient`, `SearxngClient` (HTTP communication)
 - **Processors**: `EmbeddingService`, `ChunkingService` (data transformation)
 - **Extractors**: `ArticleService` (content extraction)
 - **Infrastructure**: `RateLimiter`, `StatusReporter` (cross-cutting concerns)
 All dependencies are explicit (constructor injection), making services easily testable.
 ### 3. Dependency Injection (Manual)
 While not using a DI container, OpenQuery follows DI principles:
 ```csharp
 // Program.cs: instantiate dependencies with explicit parameters
 var client = new OpenRouterClient(apiKey);
 var searxngClient = new SearxngClient(searxngUrl);
 var embeddingService = new EmbeddingService(client);
 var searchTool = new SearchTool(searxngClient, embeddingService);
 var openQuery = new OpenQueryApp(client, searchTool, model);
 ```
 Benefits:
 - Clear dependency graph
 - Easy to substitute mocks for testing
 - No magic; visible construction
 ### 4. Observer Pattern (Progress Reporting)
 `StatusReporter` and progress callbacks implement observer pattern:
 ```csharp
 // SearchTool receives a progress callback
 public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)
 // Components invoke callback at key milestones
 onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");
 // Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
 _searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);
 ```
 ### 5. Resilience Patterns (Polly)
 `EmbeddingService` uses Polly's retry policy:
 ```csharp
 _retryPipeline = new ResiliencePipelineBuilder()
    .AddRetry(new RetryStrategyOptions
    {
        MaxRetryAttempts = 3,
        Delay = TimeSpan.FromSeconds(1),
        BackoffType = DelayBackoffType.Exponential,
        ShouldHandle = new PredicateBuilder()
            .Handle<HttpRequestException>()
    })
    .Build();
 ```
 This automatically retries failed embedding requests with exponential backoff.
 ### 6. Producer-Consumer Pattern (Channel-based)
 `StatusReporter` uses `System.Threading.Channels.Channel<string>` for asynchronous progress updates:
 - Producer: `UpdateStatus()` writes messages to channel
 - Consumer: Background task `ProcessStatusUpdatesAsync()` reads and displays
 - Benefit: No blocking between progress generation and display
 ### 7. Disposable Pattern
 Components that hold unmanaged resources implement `IDisposable` or `IAsyncDisposable`:
 - `StatusReporter` - stops background spinner task
 - `RateLimiter` - disposes semaphore
 Used via `using` statements for deterministic cleanup.
 ## Component Architecture
 ### OpenQueryApp (OpenQuery.cs)
 **Role**: Main orchestrator; coordinates the entire workflow
 **Responsibilities**:
 - Parse CLI options into `OpenQueryOptions`
 - Load configuration and resolve API keys/models
 - Optionally generate expanded search queries via LLM
 - Invoke `SearchTool` with progress callbacks
 - Stream final answer from LLM
 **Key Methods**:
 - `RunAsync(OpenQueryOptions)` - Main entry point
 **Interactions**:
 - Instantiates `OpenRouterClient` (for both query gen and final answer)
 - Instantiates `SearxngClient` (passed to `SearchTool`)
 - Instantiates `EmbeddingService` (passed to `SearchTool`)
 - Instantiates `SearchTool` (orchestration)
 - `StatusReporter` for UI updates
 ### SearchTool (Tools/SearchTool.cs)
 **Role**: Core search-retrieve-rank pipeline orchestrator
 **Responsibilities**:
 - Execute 4-phase pipeline (search → fetch → embed → rank)
 - Manage concurrency limits (via semaphores)
 - Coordinate parallel operations
 - Generate context string for final answer
 **Interactions**:
 - Uses `SearxngClient` for Phase 1
 - Uses `ArticleService` + `ChunkingService` for Phase 2
 - Uses `EmbeddingService` for Phase 3
 - Has no external UI dependency (pure logic)
 **Parallelization Strategy**:
 - **Phase 1**: `Task.WhenAll` on search tasks (unbounded but limited by SearxNG instance)
 - **Phase 2**: Semaphore (max 10 concurrent fetches)
 - **Phase 3**: `Parallel.ForEachAsync` (max 4 concurrent embedding batches)
 ### EmbeddingService (Services/EmbeddingService.cs)
 **Role**: Generate vector embeddings with batching, rate limiting, and retries
 **Responsibilities**:
 - Batch embedding requests (default: 300 per batch)
 - Parallelize batches (default: 4 concurrent)
 - Apply rate limiting (via `RateLimiter`)
 - Retry failed requests (Polly)
 - Calculate cosine similarity
 **Key Methods**:
 - `GetEmbeddingsAsync(List<string> texts, ...)` - batch with progress
 - `GetEmbeddingAsync(string text)` - single embedding
 - `CosineSimilarity(float[], float[])` - static vector math
 **Design Notes**:
 - Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
 - Batches of 300 reduce API overhead
 - Polly retry handles transient failures (429, 500, network blips)
 ### OpenRouterClient (Services/OpenRouterClient.cs)
 **Role**: HTTP client for OpenRouter API (completions + embeddings)
 **Responsibilities**:
 - Serialize requests to JSON (source-generated)
 - Send HTTP with authorization header
 - Stream responses for chat completions (IAsyncEnumerable)
 - Return full responses for non-streaming
 - Throw on non-2xx status codes
 **Endpoints**:
 - POST `/chat/completions` (stream and non-stream)
 - POST `/embeddings`
 **Configuration**:
 - Base URL: `https://openrouter.ai/api/v1`
 - Headers: `Authorization: Bearer {apiKey}`, `Accept: application/json`
 **Design**:
 - Low-level client; no retry logic (retry is in `EmbeddingService`)
 - Thin wrapper around `HttpClient`
 - Could be replaced with `HttpClientFactory` in larger apps
 ### SearxngClient (Services/SearxngClient.cs)
 **Role**: HTTP client for SearxNG metasearch
 **Responsibilities**:
 - Construct search URL with query param
 - GET request and deserialize JSON
 - Limit results (`.Take(limit)`)
 - Return empty list on failure (no exceptions)
 **Endpoint**: `GET /search?q={query}&format=json`
 **Design**:
 - Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
 - `DistinctBy(r => r.Url)` deduplication happens upstream
 ### ArticleService (Services/ArticleService.cs)
 **Role**: Extract clean article content from URLs
 **Responsibilities**:
 - Call `SmartReader.ParseArticleAsync(url)`
 - Return `Article` object with `Title`, `TextContent`, `IsReadable`
 **Design**:
 - Single responsibility: extraction only (no fetching, no chunking)
 - SmartReader handles all complexity (HTML parsing, boilerplate removal)
 - Exceptions propagate to `SearchTool` (handled there)
 ### ChunkingService (Services/ChunkingService.cs)
 **Role**: Split long text into 500-char chunks at natural boundaries
 **Algorithm**:
 1. Start at index 0
 2. Take up to 500 chars
 3. If not at end, backtrack to last space/newline/period
 4. Add chunk, advance start
 5. Repeat until done
 **Design**:
 - Static class (stateless utility)
 - No dependencies
 - Pure function (input text → output chunks)
 ### RateLimiter (Services/RateLimiter.cs)
 **Role**: Limit concurrent operations via semaphore
 **Responsibilities**:
 - Wrap actions with semaphore acquisition
 - Support both sync and async actions
 - Implement `IAsyncDisposable` for cleanup
 **Usage Pattern**:
 ```csharp
 await _rateLimiter.ExecuteAsync(async () =>
 {
    // operation limited by semaphore
    return await SomeApiCall();
 });
 ```
 **Design**:
 - Generic wrapper (can return TResult)
 - `SemaphoreSlim` initialized at construction
 - Used in `EmbeddingService` for parallel embedding batches
 ### StatusReporter (Services/StatusReporter.cs)
 **Role**: Show real-time progress with spinner or verbose mode
 **Responsibilities**:
 - Maintain spinner animation (background task)
 - Receive status updates via channel
 - Display updates with appropriate formatting
 - Stop spinner on completion
 **Features**:
 - **Spinner mode** (non-verbose): `⠋ Fetching...` with animated Braille chars
 - **Verbose mode**: `[Fetching article 1/10: example.com]` on separate lines
 - **Thread-safe**: Channel is safe for concurrent writes
 - **Non-blocking**: Background spinner doesn't block updates
 **Design**:
 - `Channel<string>` for asynchronous producer-consumer
 - Background task (`_statusProcessor`) reads from channel
 - Spinner runs on its own task with 100ms delay per frame
 - `IDisposable` ensures proper cleanup
 ### ConfigManager (ConfigManager.cs)
 **Role**: Load and save configuration from/to file
 **Responsibilities**:
 - Get config path (XDG: `~/.config/openquery/config`)
 - Parse key-value pairs (no INI library, manual parsing)
 - Provide `AppConfig` object with defaults
 - Save settings back to file
 **Design**:
 - Static class (no instances)
 - Creates config directory if missing
 - Line-by-line parsing (simple, no dependencies)
 - Could be improved with proper INI parser or JSON
 ## Data Flow
 ### End-to-End Data Flow
 ```
 User: "What is quantum entanglement?"
 1. OpenQueryOptions created
   { Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }
 2. Query Generation (if Queries > 1)
   → ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
   → Deserialize to List<string> (generatedQueries)
 3. Search Phase
   generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
   → DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)
 4. Fetch Phase
   searchResults → Parallel.ArticleService.FetchArticleAsync → Article
   → ChunkingService.ChunkText (split into ~500-char pieces)
   → ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)
 5. Embedding Phase
   originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
   chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)
 6. Ranking Phase
   For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
   OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)
 7. Answer Phase
   context = string.Join("\n\n", topChunks.Select(...)) 
   → ChatCompletionRequest to OpenRouter with context + question
   → StreamAsync → Console.Write(delta) (real-time display)
 Result: User sees answer with [Source N] citations
 ```
 ### Data Structures
 **Chunk** - The core data structure flowing through the pipeline:
 ```csharp
 public record Chunk(
    string Content,      // Text content (~500 chars)
    string SourceUrl,    // Where it came from
    string? Title = null // Article title
 )
 {
    public float[]? Embedding { get; set; }  // Added in Phase 3
    public float Score { get; set; }         // Added in Phase 4
 }
 ```
 **Data Flow State**:
 - Phase 1-2: `Chunk` without embedding
 - Phase 3: `Chunk.Embedding` populated
 - Phase 4: `Chunk.Score` populated
 - Phase 5: Serialized into context string
 ### Memory Footprint
 **Per 15-article run (approximate)**:
 - Raw HTML (fetched): ~5MB (transient, discarded after extract)
 - Articles: ~500KB (15 articles × ~30KB extracted text)
 - Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
 - Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
 - Total peak: ~1-2MB (excluding OpenRouter's memory usage)
 **Note**: AOT compilation reduces runtime memory compared to JIT.
 ## Concurrency Model
 OpenQuery uses multiple parallelization strategies:
 ### Unbounded Parallelism (Task.WhenAll)
 **Where**: Search queries (Phase 1)
 ```csharp
 var searchTasks = generatedQueries.Select(async query => { ... });
 await Task.WhenAll(searchTasks);
 ```
 **Rationale**: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.
 **Risk**: Could overwhelm SearxNG if `--queries` is set very high (100+). Default 3 is safe.
 ### Semaphore-Controlled Parallelism
 **Where**: Article fetching (Phase 2)
 ```csharp
 var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
 await Task.WhenAll(fetchTasks); // Each task waits on semaphore
 ```
 **Rationale**: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.
 **Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentArticleFetches` (compile-time constant currently).
 ### Parallel.ForEachAsync with MaxDegreeOfParallelism
 **Where**: Embedding batch processing (Phase 3)
 ```csharp
 await Parallel.ForEachAsync(
    batchIndices,
    new ParallelOptions { MaxDegreeOfParallelism = 4 },
    async (batchIndex, ct) => { ... }
 );
 ```
 **Rationale**: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.
 **Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentEmbeddingRequests` (compile-time).
 ### Progress Reporting (Channel)
 **Where**: All phases pass `onProgress` callback
 **Implementation**: 
 - `StatusReporter.UpdateStatus()` → writes to channel
 - Background task reads channel and displays
 - Non-blocking; callbacks are fire-and-forget (TryWrite)
 **Thread Safety**: Channel is thread-safe; multiple phases may write concurrently.
 ## Error Handling & Resilience
 ### HTTP Errors
 **OpenRouterClient**:
 - Calls `response.EnsureSuccessStatusCode()` → throws `HttpRequestException` on 4xx/5xx
 - No retry (handled at higher level in `EmbeddingService`)
 **SearxngClient**:
 - Returns empty `List<SearxngResult>` on non-success
 - No exception thrown (searches are non-critical; if some queries fail, others proceed)
 ### Retry Policy (Polly)
 **Location**: `EmbeddingService` constructor
 **Scope**: Only embedding requests (`_client.EmbedAsync`)
 **Policy**:
 - Max 3 attempts
 - Exponential backoff: 1s, 2s, 4s
 - Only retries `HttpRequestException` (network errors, 429, 5xx)
 **Why not on chat completions?**
 - Query generation and final answer are critical; failures should surface immediately
 - Could be added in future if transient failures are common
 ### Graceful Degradation
 **Query Generation Failure**:
 ```csharp
 try { ... generate queries ... }
 catch (Exception ex)
 {
    // Fall back to original question as sole query
    if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
 }
 ```
 **Embedding Batch Failure**:
 ```csharp
 catch
 {
    // Return empty embeddings for this batch (chunk will be filtered out)
    var emptyBatch = new float[batch.Count][];
    // fill with empty arrays
    results.Add((batchIndex, emptyBatch));
 }
 ```
 **Article Fetch Failure**:
 ```csharp
 try { await ArticleService.FetchArticleAsync(url); }
 catch (Exception ex)
 {
    if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
    // Chunk not added; continue with others
 }
 ```
 ### User-Facing Errors
 Top-level exception handler in `Program.cs`:
 ```csharp
 try { await openQuery.RunAsync(options); }
 catch (HttpRequestException ex)
 {
    Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
    Environment.Exit(1);
 }
 catch (Exception ex)
 {
    Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
    Environment.Exit(1);
 }
 ```
 ### Cancellation Support
 `OpenRouterClient.StreamAsync` and `EmbeddingService` methods accept `CancellationToken`.
 Used in:
 - Streaming answer (Ctrl+C stops immediately)
 - Parallel embeddings (can be cancelled)
 ## Performance Considerations
 ### Latency Breakdown (Typical)
 | Stage | Time | Description |
 |-------|------|-------------|
 | Query generation | 2-5s | LLM generates 3-5 queries |
 | Searches | 3-8s | 3-5 parallel SearxNG queries |
 | Article fetching | 5-15s | 10-20 parallel fetches (network + parse) |
 | Embeddings | 2-4s | 50-100 chunks in 4-parallel batches |
 | Final answer | 5-20s | Depends on answer length (streaming) |
 | **Total** | **15-50s** | Varies widely based on network & content |
 ### Bottlenecks
 1. **Network I/O** (article fetching, API calls) - can't be CPU-bound
 2. **OpenRouter API latency** - varies by model and load
 3. **SmartReader parsing** - CPU-bound for large HTML
 4. **Embedding API rate** - OpenRouter may rate limit if too many concurrent
 ### Optimization Strategies
 - **Parallelism**: Already maximized within API constraints
 - **Caching**: Not implemented; future enhancement could cache embeddings per URL
 - **Batching**: 300-chunk batches reduce API overhead
 - **AOT**: Native compilation reduces startup overhead vs JIT
 ### Scalability Limits
 - **Memory**: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
 - **API Limits**: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
 - **SearxNG Limits**: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).
 ## Design Decisions
 ### Why Not Use a DI Container?
 OpenQuery manually wires dependencies in `Program.cs`. For a small CLI app, this is:
 - Simpler (no container configuration)
 - More explicit (easy to trace dependency graph)
 - No runtime overhead
 - Easier to understand for contributors
 Container-based DI would be overengineering.
 ### Why AOT?
 .NET 10 AOT provides:
 - **Fast startup** (<100ms vs ~500ms JIT)
 - **Smaller footprint** (trimmed, no JIT)
 - **No runtime dependencies** (self-contained)
 - Better for CLI tools distributed to users
 Trade-offs:
 - Longer build time
 - Some reflection-based APIs not supported (not needed here)
 - Less flexible (can't load dynamic assemblies, but not needed)
 ### Why SmartReader for Article Extraction?
 SmartReader uses Readability-based algorithm similar to Firefox Reader View:
 - Removes ads, navigation, comments, boilerplate
 - Extracts main article content
 - Handles malformed HTML gracefully
 - Zero dependencies (pure .NET)
 Alternatives considered:
 - `HtmlAgilityPack` (too low-level, need to implement extraction logic)
 - `AngleSharp` (similar, still need extraction)
 - External services (like diffbot) - require API keys, costs money
 SmartReader is the sweet spot: free, good quality, easy integration.
 ### Why Embeddings + Cosine Similarity vs Full-Text Search?
 Full-text search (like Lucene) would:
 - Require inverted index, more complex
 - Be faster for exact keyword matching
 - Not understand semantic similarity
 Embeddings provide:
 - Semantic similarity (understand meaning, not just keywords)
 - Simple math (cosine similarity of float arrays)
 - No index to maintain (just compute on-the-fly)
 Trade-off: Embedding API cost and latency vs LRU cache potential.
 ### Why Not RAG (Retrieval Augmented Generation) Framework?
 OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:
 - Add dependency bloat
 - Reduce control
 - Increase abstraction complexity
 Custom implementation is ~1000 LOC and perfectly matches needs.
 ### Why System.CommandLine?
 Provides:
 - Native-like CLI help (`openquery --help`)
 - Strongly-typed options
 - Command hierarchy (main + subcommands)
 - Good error messages
 Alternative: `CommandLineParser` (older) or manual parsing. System.CommandLine is modern and actively developed.
 ---
 ## Next Steps
 - [Components](components/overview.md) - Deep dive into each module
 - [API Reference](api/cli.md) - Complete command documentation
 - [Troubleshooting](troubleshooting.md) - Debug issues
 ---
 **Want to extend OpenQuery?** Check the [Components](components/overview.md) guide to understand each piece.
@@ -0,0 +1,528 @@
 # Models Reference
 Complete reference for all data models, DTOs, and records in OpenQuery.
 ## 📋 Table of Contents
 1. [Core Data Models](#core-data-models)
 2. [OpenRouter API Models](#openrouter-api-models)
 3. [SearxNG API Models](#searxng-api-models)
 4. [JSON Serialization](#json-serialization)
 5. [Model Relationships](#model-relationships)
 ## Core Data Models
 ### OpenQueryOptions
 **Location**: `Models/OpenQueryOptions.cs`  
 **Type**: `record`  
 **Purpose**: Immutable options object for a single query execution
 ```csharp
 public record OpenQueryOptions(
    int Chunks,      // Number of top chunks to include in context
    int Results,     // Search results per generated query
    int Queries,     // Number of search queries to generate (if >1)
    bool Short,      // Request concise answer
    bool Long,       // Request detailed answer
    bool Verbose,    // Enable verbose logging
    string Question  // Original user question (required)
 );
 ```
 **Lifecycle**:
 - Created in `Program.cs` by combining CLI options, config defaults, and environment variables
 - Passed to `OpenQueryApp.RunAsync(options)`
 **Validation**: None (assumes valid values from CLI parser/config)
 **Example**:
 ```csharp
 var options = new OpenQueryOptions(
    Chunks: 3,
    Results: 5,
    Queries: 3,
    Short: false,
    Long: false,
    Verbose: true,
    Question: "What is quantum entanglement?"
 );
 ```
 ---
 ### Chunk
 **Location**: `Models/Chunk.cs`  
 **Type**: `record`  
 **Purpose**: Content chunk with metadata, embedding, and relevance score
 ```csharp
 public record Chunk(
    string Content,      // Text content (typically ~500 chars)
    string SourceUrl,    // Original article URL
    string? Title = null // Article title (optional, may be null)
 )
 {
    public float[]? Embedding { get; set; }  // Vector embedding (1536-dim for text-embedding-3-small)
    public float Score { get; set; }         // Relevance score (0-1, higher = more relevant)
 }
 ```
 **Lifecycle**:
 1. **Created** in `SearchTool.ExecuteParallelArticleFetchingAsync`:
   ```csharp
   chunks.Add(new Chunk(chunkText, result.Url, article.Title));
   ```
   At this point: `Embedding = null`, `Score = 0`
 2. **Embedded** in `SearchTool.ExecuteParallelEmbeddingsAsync`:
   ```csharp
   validChunks[i].Embedding = validEmbeddings[i];
   ```
 3. **Scored** in `SearchTool.RankAndSelectTopChunks`:
   ```csharp
   chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
   ```
 4. **Formatted** into context string:
   ```csharp
   $"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"
   ```
 **Properties**:
 - `Content`: Never null/empty (filters empty chunks in `ChunkingService`)
 - `SourceUrl`: Always provided (from `SearxngResult.Url`)
 - `Title`: May be null if article extraction failed to get title
 - `Embedding`: Null until phase 3; may remain null if embedding failed
 - `Score`: 0 until phase 4; irrelevant for non-embedded chunks
 **Equality**: Records use value equality (all properties compared). Two chunks with same content/url/title are equal; embeddings and scores ignored for equality (as they're mutable).
 ---
 ### ParallelProcessingOptions
 **Location**: `Models/ParallelOptions.cs`  
 **Type**: `class`  
 **Purpose**: Configuration for parallel/concurrent operations
 ```csharp
 public class ParallelProcessingOptions
 {
    public int MaxConcurrentArticleFetches { get; set; } = 10;
    public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
    public int EmbeddingBatchSize { get; set; } = 300;
 }
 ```
 **Usage**:
 - Instantiated in `SearchTool` constructor (hardcoded new)
 - Passed to `EmbeddingService` constructor
 - Read by `SearchTool` for article fetching semaphore
 **Default Values**:
 | Property | Default | Effect |
 |----------|---------|--------|
 | `MaxConcurrentArticleFetches` | 10 | Up to 10 articles fetched simultaneously |
 | `MaxConcurrentEmbeddingRequests` | 4 | Up to 4 embedding batches in parallel |
 | `EmbeddingBatchSize` | 300 | Each embedding API call handles up to 300 texts |
 **Current Limitation**: These are **compile-time defaults** (hardcoded in `SearchTool.cs`). To make them configurable:
 1. Add to `AppConfig`
 2. Read in `ConfigManager`
 3. Pass through `SearchTool` constructor
 ---
 ## OpenRouter API Models
 **Location**: `Models/OpenRouter.cs`  
 **Purpose**: DTOs for OpenRouter's REST API (JSON serialization)
 ### Chat Completion
 #### `ChatCompletionRequest`
 ```csharp
 public record ChatCompletionRequest(
    [property: JsonPropertyName("model")] string Model,
    [property: JsonPropertyName("messages")] List<Message> Messages,
    [property: JsonPropertyName("tools")] List<ToolDefinition>? Tools = null,
    [property: JsonPropertyName("stream")] bool Stream = false
 );
 ```
 **Example**:
 ```json
 {
  "model": "qwen/qwen3.5-flash-02-23",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is 2+2?" }
  ],
  "stream": true
 }
 ```
 #### `Message`
 ```csharp
 public record Message(
    [property: JsonPropertyName("role")] string Role,
    [property: JsonPropertyName("content")] string? Content = null,
    [property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null,
    [property: JsonPropertyName("tool_call_id")] string? ToolCallId = null
 )
 {
    // Factory method for tool responses
    public static Message FromTool(string content, string toolCallId) =>
        new Message("tool", content, null, toolCallId);
 }
 ```
 **Roles**: `"system"`, `"user"`, `"assistant"`, `"tool"`
 **Usage**:
 - `Content` for text messages
 - `ToolCalls` when assistant requests tool use
 - `ToolCallId` when responding to tool call
 #### `ChatCompletionResponse`
 ```csharp
 public record ChatCompletionResponse(
    [property: JsonPropertyName("choices")] List<Choice> Choices,
    [property: JsonPropertyName("usage")] Usage? Usage = null
 );
 public record Choice(
    [property: JsonPropertyName("message")] Message Message,
    [property: JsonPropertyName("finish_reason")] string? FinishReason = null
 );
 ```
 **Response Example**:
 ```json
 {
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Answer text..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 50,
    "total_tokens": 150
  }
 }
 ```
 #### `Usage`
 ```csharp
 public record Usage(
    [property: JsonPropertyName("prompt_tokens")] int PromptTokens,
    [property: JsonPropertyName("completion_tokens")] int CompletionTokens,
    [property: JsonPropertyName("total_tokens")] int TotalTokens
 );
 ```
 ### Tool Calling (Not Currently Used)
 #### `ToolDefinition` / `ToolFunction`
 ```csharp
 public record ToolDefinition(
    [property: JsonPropertyName("type")] string Type,  // e.g., "function"
    [property: JsonPropertyName("function")] ToolFunction Function
 );
 public record ToolFunction(
    [property: JsonPropertyName("name")] string Name,
    [property: JsonPropertyName("description")] string Description,
    [property: JsonPropertyName("parameters")] JsonElement Parameters  // JSON Schema
 );
 ```
 #### `ToolCall` / `FunctionCall`
 ```csharp
 public record ToolCall(
    [property: JsonPropertyName("id")] string Id,
    [property: JsonPropertyName("type")] string Type,
    [property: JsonPropertyName("function")] FunctionCall Function
 );
 public record FunctionCall(
    [property: JsonPropertyName("name")] string Name,
    [property: JsonPropertyName("arguments")] string Arguments  // JSON string
 );
 ```
 **Note**: OpenQuery doesn't use tools currently, but models are defined for future tool-calling capability.
 ### Streaming
 #### `StreamChunk`
 ```csharp
 public record StreamChunk(
    string? TextDelta = null,
    ClientToolCall? Tool = null
 );
 ```
 Yielded by `OpenRouterClient.StreamAsync()` for each SSE event.
 #### `ChatCompletionChunk` (Server Response)
 ```csharp
 public record ChatCompletionChunk(
    [property: JsonPropertyName("choices")] List<ChunkChoice> Choices
 );
 public record ChunkChoice(
    [property: JsonPropertyName("delta")] ChunkDelta Delta
 );
 public record ChunkDelta(
    [property: JsonPropertyName("content")] string? Content = null,
    [property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null
 );
 ```
 **Streaming Response Example** (SSE):
 ```
 data: {"choices":[{"delta":{"content":"Hello"}}]}
 data: {"choices":[{"delta":{"content":" world"}}]}
 data: [DONE]
 ```
 `OpenRouterClient.StreamAsync` parses and yields `StreamChunk` with non-null `TextDelta` for content.
 ### Embeddings
 #### `EmbeddingRequest`
 ```csharp
 public record EmbeddingRequest(
    [property: JsonPropertyName("model")] string Model,
    [property: JsonPropertyName("input")] List<string> Input
 );
 ```
 **Example**:
 ```json
 {
  "model": "openai/text-embedding-3-small",
  "input": ["text 1", "text 2", ...]
 }
 ```
 #### `EmbeddingResponse`
 ```csharp
 public record EmbeddingResponse(
    [property: JsonPropertyName("data")] List<EmbeddingData> Data,
    [property: JsonPropertyName("usage")] Usage Usage
 );
 public record EmbeddingData(
    [property: JsonPropertyName("embedding")] float[] Embedding,
    [property: JsonPropertyName("index")] int Index
 );
 ```
 **Response Example**:
 ```json
 {
  "data": [
    { "embedding": [0.1, 0.2, ...], "index": 0 },
    { "embedding": [0.3, 0.4, ...], "index": 1 }
  ],
  "usage": {
    "prompt_tokens": 100,
    "total_tokens": 100
  }
 }
 ```
 **Note**: `_client.EmbedAsync` orders by `index` to match input order.
 ---
 ## SearxNG API Models
 **Location**: `Models/Searxng.cs`  
 **Purpose**: DTOs for SearxNG's JSON response format
 ### `SearxngRoot`
 ```csharp
 public record SearxngRoot(
    [property: JsonPropertyName("results")] List<SearxngResult> Results
 );
 ```
 Top-level response object.
 ### `SearxngResult`
 ```csharp
 public record SearxngResult(
    [property: JsonPropertyName("title")] string Title,
    [property: JsonPropertyName("url")] string Url,
    [property: JsonPropertyName("content")] string Content  // Snippet/description
 );
 ```
 **Fields**:
 - `Title`: Result title (from page `<title>` or OpenGraph)
 - `Url`: Absolute URL to article
 - `Content`: Short snippet (~200 chars) from search engine
 **Usage**:
 - `Url` passed to `ArticleService.FetchArticleAsync`
 - `Title` used as fallback if article extraction fails
 - `Content` currently unused (could be for quick answer without fetching)
 **Example Response**:
 ```json
 {
  "results": [
    {
      "title": "Quantum Entanglement - Wikipedia",
      "url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
      "content": "Quantum entanglement is a physical phenomenon..."
    }
  ]
 }
 ```
 ---
 ## JSON Serialization
 ### JsonContext (Source Generation)
 **Location**: `Models/JsonContexts.cs`  
 **Purpose**: Provide source-generated JSON serializer context for AOT compatibility
 #### Declaration
 ```csharp
 [JsonSerializable(typeof(ChatCompletionRequest))]
 [JsonSerializable(typeof(ChatCompletionResponse))]
 [JsonSerializable(typeof(ChatCompletionChunk))]
 [JsonSerializable(typeof(EmbeddingRequest))]
 [JsonSerializable(typeof(EmbeddingResponse))]
 [JsonSerializable(typeof(SearxngRoot))]
 [JsonJsonSerializer(typeof(List<string>))]
 internal partial class AppJsonContext : JsonSerializerContext
 {
 }
 ```
 **Usage**:
 ```csharp
 var json = JsonSerializer.Serialize(request, AppJsonContext.Default.ChatCompletionRequest);
 var response = JsonSerializer.Deserialize(json, AppJsonContext.Default.ChatCompletionResponse);
 ```
 **Benefits**:
 - **AOT-compatible**: No reflection, works with PublishAot=true
 - **Performance**: Pre-compiled serializers are faster
 - **Trimming safe**: Unused serializers trimmed automatically
 **Generated**: Partial class compiled by source generator (no manual implementation)
 **Important**: Must include ALL types that will be serialized/deserialized in `[JsonSerializable]` attributes, otherwise runtime exception in AOT.
 ---
 ## Model Relationships
 ### Object Graph (Typical Execution)
 ```
 OpenQueryOptions
    ↓
 OpenQueryApp.RunAsync()
    │
    ├─ queryGenerationMessages (List<Message>)
    │   ├─ system: "You are an expert researcher..."
    │   └─ user: "Generate N queries for: {question}"
    │       ↓
    │   ChatCompletionRequest → OpenRouter → ChatCompletionResponse
    │       ↓
    │   List<string> generatedQueries
    │
    ├─ SearchTool.ExecuteAsync()
    │       ↓
    │   ┌─────────────────────────────────────┐
    │   │ Phase 1: Parallel Searches         │
    │   │ SearxngClient.SearchAsync(query) × N
    │   │ → List<SearxngResult>              │
    │   │   (Title, Url, Content)            │
    │   └─────────────────────────────────────┘
    │       ↓
    │   ┌─────────────────────────────────────┐
    │   │ Phase 2: Article Fetch & Chunking  │
    │   │ ArticleService.FetchAsync(Url) × M
    │   │ → Article (TextContent, Title)
    │   │ → ChunkingService.ChunkText → List<string> chunks
    │   │ → Chunk(content, url, title) × K  │
    │   └─────────────────────────────────────┘
    │       ↓
    │   ┌─────────────────────────────────────┐
    │   │ Phase 3: Embeddings                │
    │   │ EmbeddingService.GetEmbeddingsAsync(chunkContents)
    │   │ → float[][] chunkEmbeddings        │
    │   │ → Set chunk.Embedding for each     │
    │   │ Also: GetEmbeddingAsync(question) → float[] queryEmbedding
    │   └─────────────────────────────────────┘
    │       ↓
    │   ┌─────────────────────────────────────┐
    │   │ Phase 4: Ranking                   │
    │   │ For each chunk: Score = CosineSimilarity(queryEmbedding, chunk.Embedding)
    │   │ → Set chunk.Score                 │
    │   │ → OrderByDescending(Score)        │
    │   │ → Take(topChunksLimit) → topChunks (List<Chunk>)
    │   └─────────────────────────────────────┘
    │       ↓
    │   Context string: formatted topChunks
    │       ↓
    └─ OpenQueryApp → final ChatCompletionRequest
        System: "Answer based on context..."
        User: "Context:\n{context}\n\nQuestion: {question}"
            ↓
        StreamAsync() → StreamChunk.TextDelta → Console
 ```
 ### Record Immutability
 Most DTOs are `record` types:
 - **Immutable**: Properties are init-only (`{ get; init; }`)
 - **Value semantics**: Equality based on content
 - **Thread-safe**: Can be shared across threads
 **Exception**:
 - `Chunk`: Has mutable properties `Embedding` and `Score` (set during pipeline)
 - `ParallelProcessingOptions`: Class with mutable setters
 - `AppConfig`: Class with mutable setters
 ---
 ## Next Steps
 - **[API Reference](../../api/cli.md)** - How these models are used in CLI commands
 - **[OpenRouterClient](../../services/OpenRouterClient.md)** - Uses OpenRouter models
 - **[SearxngClient](../../services/SearxngClient.md)** - Uses Searxng models
 - **[SearchTool](../../components/search-tool.md)** - Orchestrates all models
 ---
 **Quick Reference Table**
 | Model | Category | Purpose | Mutable? |
 |-------|----------|---------|----------|
 | `OpenQueryOptions` | Core | CLI options | No (record) |
 | `Chunk` | Core | Content + metadata + ranking | Partially (Embedding, Score) |
 | `ParallelProcessingOptions` | Config | Concurrency settings | Yes (class) |
 | `ChatCompletionRequest/Response` | OpenRouter | LLM API | No |
 | `EmbeddingRequest/Response` | OpenRouter | Embeddings API | No |
 | `SearxngRoot/Result` | SearxNG | Search results | No |
 | `AppJsonContext` | Internal | JSON serialization | No (generated partial) |
@@ -0,0 +1,395 @@
 # OpenQueryApp Component
 Deep dive into the `OpenQueryApp` class - the main application orchestrator.
 ## Overview
 `OpenQueryApp` is the heart of OpenQuery. It coordinates all components, manages the workflow from question to answer, and handles progress reporting.
 ## Location
 `OpenQuery.cs` in project root
 ## Class Definition
 ```csharp
 public class OpenQueryApp
 {
    private readonly OpenRouterClient _client;
    private readonly SearchTool _searchTool;
    private readonly string _model;
    public OpenQueryApp(
        OpenRouterClient client,
        SearchTool searchTool,
        string model);
    public async Task RunAsync(OpenQueryOptions options);
 }
 ```
 **Dependencies**:
 - `OpenRouterClient` - for query generation and final answer streaming
 - `SearchTool` - for search-retrieve-rank pipeline
 - `string _model` - model identifier to use for LLM calls
 **Lifecycle**: Instantiated once per query execution in `Program.cs`, then `RunAsync()` called once.
 ## RunAsync Workflow
 ```csharp
 public async Task RunAsync(OpenQueryOptions options)
 {
    // 1. Setup
    using var reporter = new StatusReporter(options.Verbose);
    reporter.StartSpinner();
    // 2. Query Generation (if needed)
    List<string> queries = await GenerateQueriesIfNeededAsync(options, reporter);
    // 3. Search Pipeline
    string searchResult = await ExecuteSearchPipelineAsync(options, queries, reporter);
    // 4. Final Answer Streaming
    await StreamFinalAnswerAsync(options, searchResult, reporter);
 }
 ```
 ### Step 1: Status Reporter Setup
 ```csharp
 using var reporter = new StatusReporter(options.Verbose);
 reporter.StartSpinner();
 ```
 - Creates `StatusReporter` (implements `IDisposable`)
 - Starts spinner animation (unless verbose)
 - `using` ensures disposal on exit
 ### Step 2: Query Generation
 **When**: `options.Queries > 1` (user wants multiple search queries)
 **Purpose**: Use LLM to generate diverse, optimized search queries from the original question
 **System Prompt** (hardcoded in `OpenQuery.cs`):
 ```
 You are an expert researcher. The user will ask a question. Your task is to 
 generate optimal search queries to gather comprehensive information.
 Instructions:
 1. Break down complex questions.
 2. Use synonyms and alternative phrasing.
 3. Target different aspects (entities, mechanisms, pros/cons, history).
 CRITICAL: Output must be a valid JSON array of strings ONLY. No markdown, 
 explanations, or other text.
 ```
 **Request**:
 ```csharp
 var queryGenMessages = new List<Message>
 {
    new Message("system", systemPrompt),
    new Message("user", $"Generate {options.Queries} distinct search queries for:\n{options.Question}")
 };
 var request = new ChatCompletionRequest(_model, queryGenMessages);
 var response = await _client.CompleteAsync(request);
 ```
 **Response Parsing**:
 ```csharp
 var content = response.Choices.FirstOrDefault()?.Message.Content;
 if (!string.IsNullOrEmpty(content))
 {
    // Remove markdown code fences if present
    content = Regex.Replace(content, @"```json\s*|\s*```", "").Trim();
    // Deserialize to List<string>
    var generatedQueries = JsonSerializer.Deserialize(content, AppJsonContext.Default.ListString);
    if (generatedQueries != null && generatedQueries.Count > 0)
    {
        queries = generatedQueries;
    }
 }
 ```
 **Fallback**: If any step fails (exception, null, empty, invalid JSON), use `new List<string> { options.Question }` (single query = original)
 **Note**: Query generation reuses the same model as final answer. This could be optimized:
 - Use cheaper/faster model for query gen
 - Separate model configuration
 - Cache query generation results
 ### Step 3: Search Pipeline Execution
 ```csharp
 var searchResult = await _searchTool.ExecuteAsync(
    options.Question,
    queries,
    options.Results,
    options.Chunks,
    (progress) => {
        if (options.Verbose)
            reporter.WriteLine(progress);
        else
            reporter.UpdateStatus(parsedMessage);
    },
    options.Verbose);
 ```
 **Parameters**:
 - `originalQuery`: User's original question (used for final embedding)
 - `generatedQueries`: From step 2 (or fallback)
 - `maxResults`: `options.Results` (search results per query)
 - `topChunksLimit`: `options.Chunks` (top N chunks to return)
 - `onProgress`: Callback to update UI
 - `verbose`: Passed through to `SearchTool`
 **Returns**: `string context` - formatted context with source citations
 **Progress Handling**:
 - In verbose mode: all progress printed as lines (via `reporter.WriteLine()`)
 - In compact mode: parse progress messages to show concise status (e.g., "Fetching articles 3/10...")
 ### Step 4: Final Answer Streaming
 **Status Update**:
 ```csharp
 if (!options.Verbose)
    reporter.UpdateStatus("Asking AI...");
 else
 {
    reporter.ClearStatus();
    Console.WriteLine();
 }
 ```
 **Build System Prompt**:
 ```csharp
 var systemPrompt = "You are a helpful AI assistant. Answer the user's question in depth, based on the provided context. Be precise and accurate. You can mention sources or citations.";
 if (options.Short) systemPrompt += " Give a very short concise answer.";
 if (options.Long) systemPrompt += " Give a long elaborate detailed answer.";
 ```
 **Prompt Structure**:
 ```
 System: {systemPrompt}
 User: Context:
 {searchResult}
 Question: {options.Question}
 ```
 Where `searchResult` is:
 ```
 [Source 1: Title](URL)
 Content chunk 1
 [Source 2: Title](URL)
 Content chunk 2
 ...
 ```
 **Streaming**:
 ```csharp
 var requestStream = new ChatCompletionRequest(_model, messages);
 var assistantResponse = new StringBuilder();
 var isFirstChunk = true;
 using var streamCts = new CancellationTokenSource();
 await foreach (var chunk in _client.StreamAsync(requestStream, streamCts.Token))
 {
    if (chunk.TextDelta == null) continue;
    if (isFirstChunk)
    {
        reporter.StopSpinner();
        if (!options.Verbose) reporter.ClearStatus();
        else Console.Write("Assistant: ");
        isFirstChunk = false;
    }
    Console.Write(chunk.TextDelta);
    assistantResponse.Append(chunk.TextDelta);
 }
 ```
 **Key Points**:
 - `StreamAsync` yields `StreamChunk` objects (text deltas)
 - First chunk stops spinner and clears status line
 - Each delta written to Console immediately (real-time feel)
 - Entire response accumulated in `assistantResponse` (though not used elsewhere)
 - `CancellationTokenSource` passed but not canceled (Ctrl+C would cancel from outside)
 **Finally Block**:
 ```csharp
 finally
 {
    reporter.StopSpinner();
 }
 ```
 Ensures spinner stops even if streaming fails.
 **End**:
 ```csharp
 Console.WriteLine(); // Newline after complete answer
 ```
 ## Error Handling
 `RunAsync` itself does not catch exceptions. All exceptions propagate to `Program.cs`:
 ```csharp
 try
 {
    var openQuery = new OpenQueryApp(client, searchTool, model);
    await openQuery.RunAsync(options);
 }
 catch (HttpRequestException ex)
 {
    Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
    Environment.Exit(1);
 }
 catch (Exception ex)
 {
    Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
    Environment.Exit(1);
 }
 ```
 **Common Exceptions**:
 - `HttpRequestException` - network failures, API errors
 - `JsonException` - malformed JSON from API
 - `TaskCanceledException` - timeout or user interrupt
 - `Exception` - anything else
 **No Retries at This Level**: Fail fast; user sees error immediately. Lower-level retries exist (embedding service).
 ## Performance Characteristics
 **Query Generation**:
 - One non-streaming LLM call
 - Takes 2-5 seconds depending on model
 - Typically <1000 tokens
 **Search Pipeline** (`SearchTool.ExecuteAsync`):
 - See `SearchTool.md` for detailed timing breakdown
 - Total 10-30 seconds typically
 **Final Answer Streaming**:
 - Streaming LLM call
 - Time depends on answer length (typically 5-20 seconds)
 - User sees words appear progressively
 **Total End-to-End**: 15-50 seconds for typical query
 ## Design Decisions
 ### Why Not Stream Query Generation?
 Query generation currently uses `CompleteAsync` (non-streaming). Could be streamed but:
 - Queries are short (JSON array)
 - Streaming offers no UX benefit (user doesn't see intermediate queries)
 - Simpler to wait for all queries before proceeding
 ### Why Build Prompt Manually Instead of Templates?
 Simple string concatenation is fine for few prompts. Pros:
 - No template dependencies
 - Easy to read and modify
 - No runtime compilation overhead
 Cons:
 - No validation
 - Could benefit from prompt engineering framework
 ### Why Accumulate `assistantResponse` StringBuilder?
 Currently built but not used. Could be:
 - Saved to file (future feature: `--output file.md`)
 - Analyzed for token counting
 - Removed if not needed
 ### Could Query Generation Be Cached?
 Yes! For repeated questions (common in scripts), cache query results:
 - `Dictionary<string, List<string>>` cache in memory
 - Or persistent cache (Redis, file)
 - Not implemented (low priority)
 ### Single Responsibility Violation?
 `OpenQueryApp` does:
 - Query generation
 - Pipeline orchestration
 - Answer streaming
 That's 3 responsibilities, but they're tightly coupled to the "query → answer" workflow. Separating them would add complexity without clear benefit. Acceptable as "application coordinator".
 ## Extension Points
 ### Adding New Model for Query Generation
 Currently uses same `_model` for queries and answer. To use different models:
 1. Add `queryGenerationModel` parameter to constructor
 2. Use it for query gen: `new ChatCompletionRequest(queryGenerationModel, queryGenMessages)`
 3. Keep `_model` for final answer
 Or make it configurable via environment variable: `OPENROUTER_QUERY_MODEL`
 ### Post-Processing Answer
 Opportunity to add:
 - Source citation formatting (footnotes, clickable links)
 - Answer summarization
 - Export to Markdown/JSON
 - Text-to-speech
 Add after streaming loop, before final newline.
 ### Progress UI Enhancement
 Current `StatusReporter` is basic. Could add:
 - Progress bar with percentage
 - ETA calculation
 - Colors (ANSI) for different message types
 - Logging to file
 - Web dashboard
 Would require extending `StatusReporter` or replacing it.
 ## Testing Considerations
 **Challenges**:
 - `RunAsync` is cohesive (hard to unit test in isolation)
 - Depends on many services (need mocks)
 - Asynchronous and streaming
 **Recommended Approach**:
 1. Extract interfaces:
   - `ISearchTool` (wrapper around `SearchTool`)
   - `IOpenRouterClient` (wrapper around `OpenRouterClient`)
 2. Mock interfaces in tests
 3. Test query generation parsing separately
 4. Test progress callback counting
 5. Test final answer prompt construction
 **Integration Tests**:
 - End-to-end with real/mocked APIs
 - Automated tests with test SearxNG/OpenRouter instances
 ## Related Components
 - **[SearchTool](search-tool.md)** - pipeline executed by `OpenQueryApp`
 - **[Program.cs](../Program.md)** - creates `OpenQueryApp`
 - **[StatusReporter](../services/StatusReporter.md)** - progress UI used by `OpenQueryApp`
 ---
 ## Next Steps
 - [SearchTool](search-tool.md) - See the pipeline in detail
 - [Services](../services/overview.md) - Understand each service
 - [CLI Reference](../../api/cli.md) - How users invoke this
@@ -0,0 +1,603 @@
 # Components Overview
 Detailed documentation for each major component in the OpenQuery system.
 ## 📋 Table of Contents
 1. [Component Hierarchy](#component-hierarchy)
 2. [Core Components](#core-components)
 3. [Services](#services)
 4. [Data Models](#data-models)
 5. [Component Interactions](#component-interactions)
 ## Component Hierarchy
 ```
 OpenQuery/
 ├── Program.cs                    [Entry Point, CLI]
 ├── OpenQuery.cs                  [OpenQueryApp - Orchestrator]
 ├── Tools/
 │   └── SearchTool.cs            [Pipeline Orchestration]
 ├── Services/
 │   ├── OpenRouterClient.cs      [LLM & Embedding API]
 │   ├── SearxngClient.cs         [Search API]
 │   ├── EmbeddingService.cs      [Embedding Generation + Math]
 │   ├── ChunkingService.cs       [Text Splitting]
 │   ├── ArticleService.cs        [Content Extraction]
 │   ├── RateLimiter.cs           [Concurrency Control]
 │   └── StatusReporter.cs        [Progress Display]
 ├── Models/
 │   ├── OpenQueryOptions.cs      [CLI Options Record]
 │   ├── Chunk.cs                 [Content + Metadata]
 │   ├── ParallelOptions.cs       [Concurrency Settings]
 │   ├── OpenRouter.cs            [API DTOs]
 │   ├── Searxng.cs               [Search Result DTOs]
 │   └── JsonContexts.cs          [JSON Context]
 └── ConfigManager.cs             [Configuration Persistence]
 ```
 ## Core Components
 ### 1. Program.cs
 **Type**: Console Application Entry Point  
 **Responsibilities**: CLI parsing, dependency wiring, error handling
 **Key Elements**:
 - `RootCommand` from System.CommandLine
 - Options: `--chunks`, `--results`, `--queries`, `--short`, `--long`, `--verbose`
 - Subcommand: `configure` (with interactive mode)
 - Configuration loading via `ConfigManager.Load()`
 - Environment variable resolution
 - Service instantiation and coordination
 - Top-level try-catch for error reporting
 **Code Flow**:
 1. Load config file
 2. Define CLI options and commands
 3. Set handler for root command
 4. Handler: resolve API key/model → instantiate services → call `OpenQueryApp.RunAsync()`
 5. Set handler for configure command (writes config file)
 6. Invoke command parser: `await rootCommand.InvokeAsync(args)`
 **Exit Codes**:
 - 0 = success
 - 1 = error
 ### 2. OpenQueryApp (OpenQuery.cs)
 **Type**: Main Application Class  
 **Responsibilities**: Workflow orchestration, query generation, answer streaming
 **Constructor Parameters**:
 - `OpenRouterClient client` - for query gen and final answer
 - `SearchTool searchTool` - for search-retrieve-rank pipeline
 - `string model` - LLM model identifier
 **Main Method**: `RunAsync(OpenQueryOptions options)`
 **Workflow Steps**:
 1. Create `StatusReporter` (for progress UI)
 2. **Optional Query Generation** (if `options.Queries > 1`):
   - Create system message instructing JSON array output
   - Create user message with `options.Question`
   - Call `client.CompleteAsync()` with query gen model
   - Parse JSON response; fall back to original question on failure
   - Result: `List<string> queries` (1 or many)
 3. **Execute Search Pipeline**:
   - Call `_searchTool.ExecuteAsync()` with queries, options
   - Receive `string context` (formatted context with source citations)
   - Progress reported via callback to `StatusReporter`
 4. **Generate Final Answer**:
   - Build system prompt (append "short" or "long" modifier)
   - Create user message with `Context:\n{context}\n\nQuestion: {options.Question}`
   - Stream answer via `client.StreamAsync()`
   - Write each `chunk.TextDelta` to Console as it arrives
   - Stop spinner on first chunk, continue streaming
 5. Dispose reporter
 **Error Handling**:
 - Exceptions propagate to `Program.cs` top-level handler
 - `HttpRequestException` vs generic `Exception`
 **Note**: Query generation uses the same model as final answer; could be separated for cost/performance.
 ### 3. SearchTool (Tools/SearchTool.cs)
 **Type**: Pipeline Orchestrator  
 **Responsibilities**: Execute 4-phase search-retrieve-rank-return workflow
 **Constructor Parameters**:
 - `SearxngClient searxngClient`
 - `EmbeddingService embeddingService`
 **Main Method**: `ExecuteAsync(originalQuery, generatedQueries, maxResults, topChunksLimit, onProgress, verbose)`
 **Returns**: `Task<string>` - formatted context string with source citations
 **Pipeline Phases**:
 #### Phase 1: ExecuteParallelSearchesAsync
 - Parallelize `searxngClient.SearchAsync(query, maxResults)` for each query
 - Collect all results in `ConcurrentBag<SearxngResult>`
 - Deduplicate by `DistinctBy(r => r.Url)`
 **Output**: `List<SearxngResult>` (aggregated, unique)
 #### Phase 2: ExecuteParallelArticleFetchingAsync
 - Semaphore: `MaxConcurrentArticleFetches` (default 10)
 - For each `SearxngResult`: fetch URL via `ArticleService.FetchArticleAsync()`
 - Extract article text, title
 - Chunk via `ChunkingService.ChunkText(article.TextContent)`
 - Add each chunk as new `Chunk(content, url, title)`
 **Output**: `List<Chunk>` (potentially 50-100 chunks)
 #### Phase 3: ExecuteParallelEmbeddingsAsync
 - Start two parallel tasks:
  1. Query embedding: `embeddingService.GetEmbeddingAsync(originalQuery)`
  2. Chunk embeddings: `embeddingService.GetEmbeddingsWithRateLimitAsync(chunkTexts, onProgress)`
 - `Parallel.ForEachAsync` with `MaxConcurrentEmbeddingRequests` (default 4)
 - Batch size: 300 chunks per embedding API call
 - Filter chunks with empty embeddings (failed batches)
 **Output**: `(float[] queryEmbedding, float[][] chunkEmbeddings)`
 #### Phase 4: RankAndSelectTopChunks
 - Calculate cosine similarity for each chunk vs query
 - Assign `chunk.Score`
 - Order by descending score
 - Take `topChunksLimit` (from `--chunks` option)
 - Return `List<Chunk>` (top N)
 **Formatting**:
 ```csharp
 string context = string.Join("\n\n", topChunks.Select((c, i) =>
    $"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
 ```
 **Progress Callbacks**: Invoked at each major step for UI feedback
 ## Services
 ### OpenRouterClient
 **Purpose**: HTTP client for OpenRouter API (chat completions + embeddings)
 **Base URL**: `https://openrouter.ai/api/v1`
 **Authentication**: `Authorization: Bearer {apiKey}`
 **Methods**:
 #### `StreamAsync(ChatCompletionRequest request, CancellationToken)`
 - Sets `request.Stream = true`
 - POST to `/chat/completions`
 - Reads SSE stream line-by-line
 - Parses `data: {json}` chunks
 - Yields `StreamChunk` (text delta or tool call)
 - Supports cancellation
 #### `CompleteAsync(ChatCompletionRequest request)`
 - Sets `request.Stream = false`
 - POST to `/chat/completions`
 - Deserializes full response
 - Returns `ChatCompletionResponse`
 #### `EmbedAsync(string model, List<string> inputs)`
 - POST to `/embeddings`
 - Returns `float[][]` (ordered by input index)
 **Error Handling**: `EnsureSuccessStatusCode()` throws `HttpRequestException` on failure
 **Design**: Thin wrapper; no retry logic (delegated to EmbeddingService)
 ### SearxngClient
 **Purpose**: HTTP client for SearxNG metasearch
 **Base URL**: Configurable (default `http://localhost:8002`)
 **Methods**:
 #### `SearchAsync(string query, int limit = 10)`
 - GET `{baseUrl}/search?q={query}&format=json`
 - Deserializes to `SearxngRoot`
 - Returns `Results.Take(limit).ToList()`
 - On failure: returns empty `List<SearxngResult>` (no exception)
 **Design**: Very simple; failures are tolerated (OpenQuery continues with other queries)
 ### EmbeddingService
 **Purpose**: Batch embedding generation with rate limiting, parallelization, and retries
 **Configuration** (from `ParallelProcessingOptions`):
 - `MaxConcurrentEmbeddingRequests` = 4
 - `EmbeddingBatchSize` = 300
 **Default Embedding Model**: `openai/text-embedding-3-small`
 **Methods**:
 #### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
 - Splits `texts` into batches of `EmbeddingBatchSize`
 - Parallelizes batches with `Parallel.ForEachAsync` + `MaxConcurrentEmbeddingRequests`
 - Each batch: rate-limited + retry-wrapped `client.EmbedAsync(model, batch)`
 - Collects results in order (by batch index)
 - Returns `float[][]` (same order as input texts)
 - Failed batches return empty `float[]` for each text
 #### `GetEmbeddingAsync(string text, CancellationToken)`
 - Wraps single-text call in rate limiter + retry
 - Returns `float[]`
 #### `CosineSimilarity(float[] v1, float[] v2)`
 - Static method using `TensorPrimitives.CosineSimilarity`
 - Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
 **Retry Policy** (Polly):
 - Max 3 attempts
 - 1s base delay, exponential backoff
 - Only `HttpRequestException`
 **Rate Limiting**: `RateLimiter` semaphore with `MaxConcurrentEmbeddingRequests`
 **Design Notes**:
 - Two similar methods (`GetEmbeddingsAsync` and `GetEmbeddingsWithRateLimitAsync`) - could be consolidated
 - Uses Polly for resilience (good pattern)
 - Concurrency control prevents overwhelming OpenRouter
 ### ChunkingService
 **Purpose**: Split long text into manageable pieces
 **Static Class** (no dependencies, pure function)
 **Algorithm** (in `ChunkText(string text)`):
 - Constant `MAX_CHUNK_SIZE = 500`
 - While remaining text:
  - Take up to 500 chars
  - If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
  - Trim and add non-empty chunk
  - Advance start position
 **Rationale**: 500 chars is a sweet spot for embeddings - long enough for context, short enough for semantic coherence.
 **Edge Cases**: Handles text shorter than 500 chars, empty text, text with no natural breaks.
 ### ArticleService
 **Purpose**: Extract clean article content from URLs
 **Method**: `FetchArticleAsync(string url)`
 **Implementation**: Delegates to `SmartReader.ParseArticleAsync(url)`
 **Returns**: `Article` object (from SmartReader)
 - `Title` (string)
 - `TextContent` (string) - cleaned article body
 - `IsReadable` (bool) - quality indicator
 - Other metadata (author, date, etc.)
 **Error Handling**: Exceptions propagate (handled by `SearchTool`)
 **Design**: Thin wrapper around third-party library. Could be extended to add caching, custom extraction rules, etc.
 ### RateLimiter
 **Purpose**: Limit concurrent operations via semaphore
 **Interface**:
 ```csharp
 public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken);
 public async Task ExecuteAsync(Func<Task> action, CancellationToken);
 ```
 **Implementation**: `SemaphoreSlim` with `WaitAsync` and `Release`
 **Disposal**: `IAsyncDisposable` (awaits semaphore disposal)
 **Usage**: Wrap API calls that need concurrency control
 ```csharp
 var result = await _rateLimiter.ExecuteAsync(async () =>
    await _client.EmbedAsync(model, batch), cancellationToken);
 ```
 **Design**: Simple, reusable. Could be replaced with `Polly.RateLimiting` policy but this is lightweight.
 ### StatusReporter
 **Purpose**: Real-time progress UI with spinner and verbose modes
 **Architecture**:
 - Producer: UpdateStatus(text) → writes to `Channel<string>`
 - Consumer: Background task `ProcessStatusUpdatesAsync()` reads from channel
 - Spinner: Separate task animates Braille characters every 100ms
 **Modes**:
 **Verbose Mode** (`_verbose = true`):
 - All progress messages written as `Console.WriteLine()`
 - No spinner
 - Full audit trail
 **Compact Mode** (default):
 - Status line with spinner (overwrites same line)
 - Only latest status visible
 - Example: `⠋ Fetching articles 3/10...`
 **Key Methods**:
 - `UpdateStatus(message)` - fire-and-forget, non-blocking
 - `WriteLine(text)` - stops spinner temporarily, writes full line
 - `StartSpinner()` / `StopSpinner()` - manual control
 - `ClearStatus()` - ANSI escape `\r\x1b[K` to clear line
 - `Dispose()` - completes channel, waits for background tasks
 **Spinner Chars**: `['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']` (Braille patterns, smooth animation)
 **ANSI Codes**: `\r` (carriage return), `\x1b[K` (erase to end of line)
 **Thread Safety**: Channel is thread-safe; multiple components can write concurrently without locks
 **Design**: Well-encapsulated; could be reused in other CLI projects.
 ### ConfigManager
 **Purpose**: Load/save configuration from XDG-compliant location
 **Config Path**:
 - `Environment.SpecialFolder.UserProfile` → `~/.config/openquery/config`
 **Schema** (`AppConfig`):
 ```csharp
 public class AppConfig
 {
    public string ApiKey { get; set; } = "";
    public string Model { get; set; } = "qwen/qwen3.5-flash-02-23";
    public int DefaultQueries { get; set; } = 3;
    public int DefaultChunks { get; set; } = 3;
    public int DefaultResults { get; set; } = 5;
 }
 ```
 **Format**: Simple `key=value` (no INI parser, manual line split)
 **Methods**:
 - `Load()` → reads file if exists, returns `AppConfig` (with defaults)
 - `Save(AppConfig)` → writes all 5 keys, overwrites existing
 **Design**:
 - Static class (no instances)
 - Creates directory if missing
 - No validation (writes whatever values given)
 - Could be improved with JSON format (but keep simple)
 ## Data Models
 ### OpenQueryOptions
 **Location**: `Models/OpenQueryOptions.cs`
 **Type**: `record`
 **Purpose**: Immutable options object passed through workflow
 **Properties**:
 - `int Chunks` - top N chunks for context
 - `int Results` - search results per query
 - `int Queries` - number of expanded queries to generate
 - `bool Short` - concise answer flag
 - `bool Long` - detailed answer flag
 - `bool Verbose` - verbose logging flag
 - `string Question` - original user question
 **Created**: In `Program.cs` from CLI options + config defaults
 **Used By**: `OpenQueryApp.RunAsync()`
 ### Chunk
 **Location**: `Models/Chunk.cs`
 **Type**: `record`
 **Purpose**: Content chunk with metadata and embedding
 **Properties**:
 - `string Content` - extracted text (~500 chars)
 - `string SourceUrl` - article URL
 - `string? Title` - article title (nullable)
 - `float[]? Embedding` - vector embedding (populated by EmbeddingService)
 - `float Score` - relevance score (populated during ranking)
 **Lifecycle**:
 1. Instantiated in `SearchTool.ExecuteParallelArticleFetchingAsync` with content, url, title
 2. `Embedding` set in `ExecuteParallelEmbeddingsAsync` after batch processing
 3. `Score` set in `RankAndSelectTopChunks` after cosine similarity
 4. Serialized into context string for final answer
 **Equality**: Records provide value equality (based on all properties)
 ### ParallelProcessingOptions
 **Location**: `Models/ParallelOptions.cs`
 **Type**: `class` (mutable)
 **Purpose**: Concurrency settings for parallel operations
 **Properties** (with defaults):
 - `MaxConcurrentArticleFetches` = 10
 - `MaxConcurrentEmbeddingRequests` = 4
 - `EmbeddingBatchSize` = 300
 **Used By**: `EmbeddingService` (for embeddings), `SearchTool` (for article fetching)
 **Currently**: Hardcoded in `SearchTool` constructor; could be made configurable
 ### OpenRouter Models (Models/OpenRouter.cs)
 **Purpose**: DTOs for OpenRouter API (JSON serializable)
 **Chat Completion**:
 - `ChatCompletionRequest` (model, messages, tools, stream)
 - `ChatCompletionResponse` (choices[], usage[])
 - `Message` (role, content, tool_calls, tool_call_id)
 - `ToolDefinition`, `ToolFunction`, `ToolCall`, `FunctionCall`
 - `Choice`, `Usage`
 **Embedding**:
 - `EmbeddingRequest` (model, input[])
 - `EmbeddingResponse` (data[], usage)
 - `EmbeddingData` (embedding[], index)
 **Streaming**:
 - `StreamChunk` (TextDelta, Tool)
 - `ChatCompletionChunk`, `ChunkChoice`, `ChunkDelta`
 **JSON Properties**: Uses `[JsonPropertyName]` to match API
 **Serialization**: System.Text.Json with source generation (AppJsonContext)
 ### Searxng Models (Models/Searxng.cs)
 **Purpose**: DTOs for SearxNG search results
 **Records**:
 - `SearxngRoot` with `List<SearxngResult> Results`
 - `SearxngResult` with `Title`, `Url`, `Content` (snippet)
 **Usage**: Deserialized from SearxNG's JSON response
 ### JsonContexts
 **Location**: `Models/JsonContexts.cs`
 **Purpose**: Source-generated JSON serializer context for AOT compatibility
 **Pattern**:
 ```csharp
 [JsonSerializable(typeof(ChatCompletionRequest))]
 [JsonSerializable(typeof(ChatCompletionResponse))]
 ... etc ...
 internal partial class AppJsonContext : JsonSerializerContext
 {
 }
 ```
 **Generated**: Partial class compiled by source generator
 **Used By**: All `JsonSerializer.Serialize/Deserialize` calls with `AppJsonContext.Default.{Type}`
 **Benefits**:
 - AOT-compatible (no reflection)
 - Faster serialization (compiled delegates)
 - Smaller binary (trimming-safe)
 ## Component Interactions
 ### Dependencies Graph
 ```
 Program.cs
 ├── ConfigManager (load/save)
 ├── OpenRouterClient ──┐
 ├── SearxngClient ─────┤
 ├── EmbeddingService ──┤
 └── SearchTool ────────┤
                        │
 OpenQueryApp ◄──────────┘
    │
    ├── OpenRouterClient (query gen + answer streaming)
    ├── SearchTool (pipeline)
    │   ├── SearxngClient (searches)
    │   ├── ArticleService (fetch)
    │   ├── ChunkingService (split)
    │   ├── EmbeddingService (embeddings)
    │   ├── RateLimiter (concurrency)
    │   └── StatusReporter (progress via callback)
    └── StatusReporter (UI)
 ```
 ### Data Flow Between Components
 ```
 OpenQueryOptions
    ↓
 OpenQueryApp
    ├─ Query Generation
    │   └─ OpenRouterClient.CompleteAsync()
    │       → List<string> generatedQueries
    │
    ├─ Search Pipeline
    │   └─ SearchTool.ExecuteAsync(originalQuery, generatedQueries, ...)
    │       ↓
    │   Phase 1: SearxngClient.SearchAsync(query) × N
    │       → ConcurrentBag<SearxngResult>
    │       → List<SearxngResult> (unique)
    │       ↓
    │   Phase 2: ArticleService.FetchArticleAsync(url) × M
    │       → ChunkingService.ChunkText(article.TextContent)
    │       → ConcurrentBag<Chunk> (content, url, title)
    │       ↓
    │   Phase 3: EmbeddingService.GetEmbeddingsAsync(chunkContents)
    │       → (queryEmbedding, chunkEmbeddings)
    │       ↓
    │   Phase 4: CosineSimilarity + Rank
    │       → List<Chunk> topChunks (with Score, Embedding set)
    │       ↓
    │   Format: context string with [Source N: Title](Url)
    │       → return context string
    │
    └─ Final Answer
        └─ OpenRouterClient.StreamAsync(prompt with context)
            → stream deltas to Console
 ```
 ### Interface Contracts
 **SearchTool → Progress**:
 ```csharp
 // Invoked as: onProgress?.Invoke("[Fetching article 1/10: example.com]")
 Action<string>? onProgress
 ```
 **StatusReporter ← Progress**:
 ```csharp
 // Handler in OpenQueryApp:
 (progress) => {
    if (options.Verbose) reporter.WriteLine(progress);
    else reporter.UpdateStatus(parsedShorterMessage);
 }
 ```
 **SearchTool → ArticleService**:
 ```csharp
 Article article = await ArticleService.FetchArticleAsync(url);
 ```
 **SearchTool → EmbeddingService**:
 ```csharp
 (float[] queryEmbedding, float[][] chunkEmbeddings) = await ExecuteParallelEmbeddingsAsync(...);
 // Also: embeddingService.GetEmbeddingAsync(text), GetEmbeddingsWithRateLimitAsync(...)
 ```
 **SearchTool → ChunkingService**:
 ```csharp
 List<string> chunks = ChunkingService.ChunkText(article.TextContent);
 ```
 **SearchTool → RateLimiter**:
 ```csharp
 await _rateLimiter.ExecuteAsync(async () => await _client.EmbedAsync(...), ct);
 ```
 ---
 ## Next Steps
 - [OpenQueryApp](openquery-app.md) - Main orchestrator details
 - [SearchTool](search-tool.md) - Pipeline implementation
 - [Services](services.md) - All service classes documented
 - [Models](models.md) - Complete data model reference
@@ -0,0 +1,555 @@
 # SearchTool Component
 Deep dive into `SearchTool` - the core pipeline orchestrator that implements the 4-phase search-retrieve-rank workflow.
 ## Overview
 `SearchTool` is the workhorse of OpenQuery. It Takes search queries, fetches articles, generates embeddings, ranks by relevance, and returns formatted context for the final AI answer.
 ## Location
 `Tools/SearchTool.cs`
 ## Class Definition
 ```csharp
 public class SearchTool
 {
    private readonly SearxngClient _searxngClient;
    private readonly EmbeddingService _embeddingService;
    private readonly ParallelProcessingOptions _options;
    public static string Name => "search";
    public static string Description => "Search the web for information on a topic";
    public SearchTool(
        SearxngClient searxngClient,
        EmbeddingService embeddingService);
    public Task<string> ExecuteAsync(
        string originalQuery,
        List<string> generatedQueries,
        int maxResults,
        int topChunksLimit,
        Action<string>? onProgress = null,
        bool verbose = true);
 }
 ```
 **Dependencies**:
 - `SearxngClient` - for web searches
 - `EmbeddingService` - for vector generation
 - `ParallelProcessingOptions` - concurrency settings (hardcoded new instance)
 **Static Properties**:
 - `Name` - tool identifier (currently "search")
 - `Description` - tool description
 ## ExecuteAsync Method
 **Signature**:
 ```csharp
 public async Task<string> ExecuteAsync(
    string originalQuery,        // User's original question
    List<string> generatedQueries, // Expanded search queries
    int maxResults,              // Results per query
    int topChunksLimit,          // Top N chunks to return
    Action<string>? onProgress,  // Progress callback
    bool verbose)                // Verbose mode flag
 ```
 **Returns**: `Task<string>` - formatted context with source citations
 **Contract**:
 - Never returns `null` (returns "No search results found." on zero results)
 - Progress callback may be invoked frequently (many phases)
 - `verbose` passed to sub-components for their own logging
 ## The 4-Phase Pipeline
 ```
 ExecuteAsync()
  │
  ├─ Phase 1: ExecuteParallelSearchesAsync
  │   Input: generatedQueries × maxResults
  │   Output: List<SearxngResult> (deduplicated)
  │
  ├─ Phase 2: ExecuteParallelArticleFetchingAsync
  │   Input: List<SearxngResult>
  │   Output: List<Chunk> (with content, url, title)
  │
  ├─ Phase 3: ExecuteParallelEmbeddingsAsync
  │   Input: originalQuery + List<Chunk>
  │   Output: (queryEmbedding, chunkEmbeddings)
  │   (also sets Chunk.Embedding for valid chunks)
  │
  ├─ Phase 4: RankAndSelectTopChunks
  │   Input: List<Chunk> + queryEmbedding + chunkEmbeddings
  │   Output: List<Chunk> topChunks (with Score set)
  │
  └─ Format Context → return string
 ```
 ### Phase 1: ExecuteParallelSearchesAsync
 **Purpose**: Execute all search queries in parallel, collect and deduplicate results.
 **Implementation**:
 ```csharp
 var allResults = new ConcurrentBag<SearxngResult>();
 var searchTasks = generatedQueries.Select(async query =>
 {
    onProgress?.Invoke($"[Searching web for '{query}'...]");
    try
    {
        var results = await _searsult in results)
            {
                allResults.Add(result);
            }
    }
    catch (Exception ex)
    {
        if (verbose)
            Console.WriteLine($"Warning: Search failed for query '{query}': {ex.Message}");
    }
 });
 await Task.WhenAll(searchTasks);
 var uniqueResults = allResults.DistinctBy(r => r.Url).ToList();
 return uniqueResults;
 ```
 **Details**:
 - `ConcurrentBag<SearxngResult>` collects results thread-safely
 - `Task.WhenAll` - unbounded parallelism (parallel to `generatedQueries.Count`)
 - Each task: calls `_searxngClient.SearchAsync(query, maxResults)`
 - Errors caught and logged (verbose only); other queries continue
 - `DistinctBy(r => r.Url)` removes duplicates
 **Return**: `List<SearxngResult>` (unique URLs only)
 **Progress**: `[Searching web for '{query}'...]`
 **Potential Issues**:
 - Could overwhelm local SearxNG if `generatedQueries` is large (100+)
 - SearxNG itself may have its own rate limiting
 **Future Enhancement**:
 - Add semaphore to limit search concurrency
 - Add timeout per search task
 - Cache search results (same query across runs)
 ### Phase 2: ExecuteParallelArticleFetchingAsync
 **Purpose**: Fetch each search result URL, extract article content, split into chunks.
 **Implementation**:
 ```csharp
 var chunks = new ConcurrentBag<Chunk>();
 var completedFetches = 0;
 var totalFetches = searchResults.Count;
 var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
 var fetchTasks = searchResults.Select(async result =>
 {
    await semaphore.WaitAsync();
    try
    {
        var current = Interlocked.Increment(ref completedFetches);
        var uri = new Uri(result.Url);
        var domain = uri.Host;
        onProgress?.Invoke($"[Fetching article {current}/{totalFetches}: {domain}]");
        try
        {
            var article = await ArticleService.FetchArticleAsync(result.Url);
            if (!article.IsReadable || string.IsNullOrEmpty(article.TextContent))
                return;
            var textChunks = ChunkingService.ChunkText(article.TextContent);
            foreach (var chunkText in textChunks)
            {
                chunks.Add(new Chunk(chunkText, result.Url, article.Title));
            }
        }
        catch (Exception ex)
        {
            if (verbose)
                Console.WriteLine($"Warning: Failed to fetch article {result.Url}: {ex.Message}");
        }
    }
    finally
    {
        semaphore.Release();
    }
 });
 await Task.WhenAll(fetchTasks);
 return chunks.ToList();
 ```
 **Details**:
 - `SemaphoreSlim` limits concurrency to `MaxConcurrentArticleFetches` (10)
 - `Interlocked.Increment` for thread-safe progress counting
 - Progress: `[Fetching article X/Y: domain]` (extracts host from URL)
 - `ArticleService.FetchArticleAsync` uses SmartReader
 - Article must be `IsReadable` and have `TextContent`
 - `ChunkingService.ChunkText` splits into ~500-char pieces
 - Each chunk becomes a `Chunk(content, url, article.Title)`
 - Errors logged (verbose only); failed URLs yield no chunks
 **Return**: `List<Chunk>` (potentially many per article)
 **Chunk Count Estimate**:
 - 15 articles × average 3000 chars/article = 45,000 chars
 - With 500-char chunks ≈ 90 chunks
 - With natural breaks → maybe 70-80 chunks
 **Potential Issues**:
 - Some sites block SmartReader (JS-heavy, paywalls)
 - Slow article fetches may cause long tail latency
 - Large articles create many chunks → memory + embedding cost
 **Future Enhancements**:
 - Add per-URL timeout
 - Filter chunks by length threshold (skip tiny chunks)
 - Deduplicate chunks across articles (same content on different sites)
 - Cache article fetches by URL
 ### Phase 3: ExecuteParallelEmbeddingsAsync
 **Purpose**: Generate embeddings for the original query and all chunks, with batching, rate limiting, and concurrency control.
 **Implementation**:
 ```csharp
 onProgress?.Invoke($"[Generating embeddings for {chunks.Count} chunks and query...]");
 // Start query embedding (single) and chunk embeddings (batch) concurrently
 var queryEmbeddingTask = _embeddingService.GetEmbeddingAsync(originalQuery);
 var chunkTexts = chunks.Select(c => c.Embedding).ToList(); // WRONG in original code?
 // Actually: chunks.Select(c => c.Content).ToList();
 var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
    chunkTexts, onProgress);
 await Task.WhenAll(queryEmbeddingTask, chunkEmbeddingsTask);
 var queryEmbedding = await queryEmbeddingTask;
 var chunkEmbeddings = await chunkEmbeddingsTask;
 // Filter out chunks with empty embeddings
 var validChunks = new List<Chunk>();
 var validEmbeddings = new List<float[]>();
 for (var i = 0; i < chunks.Count; i++)
 {
    if (chunkEmbeddings[i].Length > 0)
    {
        validChunks.Add(chunks[i]);
        validEmbeddings.Add(chunkEmbeddings[i]);
    }
 }
 // Update chunks with embeddings
 for (var i = 0; i < validChunks.Count; i++)
 {
    validChunks[i].Embedding = validEmbeddings[i];
 }
 return (queryEmbedding, validEmbeddings.ToArray());
 ```
 **Corrected Code** (matching actual source):
 ```csharp
 var chunkTexts = chunks.Select(c => c.Content).ToList();
 var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
    chunkTexts, onProgress);
 ```
 **Details**:
 - **Query embedding**: Single request for original question (one embedding)
 - **Chunk embeddings**: Batch processing of all chunk texts
 - Both run concurrently via `Task.WhenAll`
 - `_embeddingService.GetEmbeddingsWithRateLimitAsync` uses:
  - Batch size: 300 (default)
  - Max concurrent batches: 4 (default)
  - Polly retry (3 attempts, exponential backoff)
  - `RateLimiter` (semaphore) for API concurrency
 - Failed batches return empty `float[]` (length 0)
 - Filters out failed chunks (won't be ranked)
 - `validChunks[i].Embedding = validEmbeddings[i]` attaches embedding to chunk
 **Return**: `(float[] queryEmbedding, float[][] chunkEmbeddings)` where:
 - `chunkEmbeddings` length = `validChunks.Count` (filtered)
 - Order matches `validChunks` order (since we filtered parallel arrays)
 **Progress**: Interleaved from embedding service's own progress callbacks (batch X/Y)
 **Potential Issues**:
 - `GetEmbeddingsWithRateLimitAsync` uses `results[batchIndex] = ...` which is not thread-safe without synchronization - **BUG**?
  - Actually `results` is an array, not a list, so indexing is thread-safe
  - But concurrent writes to different indices are safe
 - Filtering loop assumes `chunkEmbeddings` has same count as `chunks`; if embedding service returns fewer, might index out of range
  - Looking at `GetEmbeddingsWithRateLimitAsync`: returns `results.SelectMany(r => r).ToArray()` which should match input count (including empty arrays for failed batches)
  - So safe
 **Memory Consideration**:
 - `chunkTexts` list holds all chunk strings (may be large, but still in memory)
 - `chunkEmbeddings` holds all float arrays (600KB for 100 chunks)
 - Total: modest (~few MB)
 **Future Enhancements**:
 - Stream embeddings? (No benefit, need all for ranking)
 - Cache embeddings by content hash (cross-run)
 - Support different embedding model per query
 ### Phase 4: RankAndSelectTopChunks
 **Purpose**: Score chunks by semantic relevance to query, sort, and select top N.
 **Implementation**:
 ```csharp
 var chunksWithEmbeddings = chunks.Where(c => c.Embedding != null).ToList();
 foreach (var chunk in chunksWithEmbeddings)
 {
    chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
 }
 var topChunks = chunksWithEmbeddings
    .OrderByDescending(c => c.Score)
    .Take(topChunksLimit)
    .ToList();
 return topChunks;
 ```
 **Details**:
 - Filters to chunks that have embeddings (successful phase 3)
 - For each: `Score = CosineSimilarity(queryEmbedding, chunkEmbedding)`
  - Uses `TensorPrimitives.CosineSimilarity` (SIMD-accelerated)
  - Returns float typically 0-1 (higher = more relevant)
 - `OrderByDescending` - highest scores first
 - `Take(topChunksLimit)` - select top N (from `--chunks` option)
 - Returns `List<Chunk>` (now with `Score` set)
 **Return**: Top N chunks ready for context formatting
 **Complexity**:
 - O(n) for scoring (where n = valid chunks, typically 50-100)
 - O(n log n) for sorting (fast for n=100)
 - Negligible CPU time
 **Edge Cases**:
 - If `topChunksLimit` > `chunksWithEmbeddings.Count`, returns all (no padding)
 - If all embeddings failed, returns empty list
 - Should handle `topChunksLimit == 0` (returns empty)
 ### Context Formatting (After Phase 4)
 **Location**: In `ExecuteAsync`, after ranking:
 ```csharp
 var context = string.Join("\n\n", topChunks.Select((c, i) =>
    $"[Source {i + 1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
 return context;
 ```
 **Format**:
 ```
 [Source 1: Article Title](https://example.com/article)
 Chunk content text...
 [Source 2: Another Title](https://example.com/another)
 Chunk content text...
 [Source 3: Third Title](https://example.com/third)
 Chunk content text...
 ```
 **Features**:
 - Each source numbered 1, 2, 3... (matches order of topChunks = descending relevance)
 - Title or "Unknown" if null
 - Title is markdown link to original URL
 - Chunk content as plain text (may contain its own formatting)
 - Double newline between sources
 **Rationale**:
 - Markdown links allow copy-pasting to browsers
 - Numbers allow LLM to cite `[Source 1]` in answer
 - Original title helps user recognize source
 **Potential Issues**:
 - LLM might misinterpret "Source 1" as literal citation required
 - If chunks contain markdown, may conflict (no escaping)
 - Some titles may have markdown special chars (unlikely but possible)
 **Alternative**: Could use XML-style tags or more robust citation format.
 ## Error Handling & Edge Cases
 ### Empty Results Handling
 At end of `ExecuteAsync`:
 ```csharp
 if (searchResults.Count == 0)
    return "No search results found.";
 if (chunks.Count == 0)
    return "Found search results but could not extract readable content.";
 ```
 These messages appear in final answer (LLM will respond to these contexts).
 ### Partial Failures
 - Some search queries fail → proceed with others
 - Some articles fail to fetch → continue
 - Some embedding batches fail → those chunks filtered out
 - Ranking proceeds with whatever valid embeddings exist
 ### Verbose vs Compact Progress
 `verbose` parameter affects what's passed to phases:
 - **Article fetching**: errors only shown if `verbose`
 - **Embeddings**: always shows batch progress via `onProgress` (from EmbeddingService)
 - **Searches**: no error suppression (warning always logged to Console, not through callback)
 ### Progress Callback Pattern
 `onProgress` is invoked at major milestones:
 - Searching: `[Searching web for '{query}'...]`
 - Article fetch: `[Fetching article X/Y: domain]`
 - Embeddings: `[Generating embeddings: batch X/Y]`
 - Final: `[Found top X most relevant chunks overall. Generating answer...]`
 Each phase may invoke many times (e.g., embedding batches). `StatusReporter` handles these appropriately.
 ## Performance Characteristics
 ### Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles):
 | Phase | Time | Dominated By |
 |-------|------|--------------|
 | Searches | 3-8s | Network latency to SearxNG |
 | Article Fetching | 5-15s | Network + SmartReader CPU |
 | Embeddings | 2-4s | OpenRouter API latency (4 concurrent batches) |
 | Ranking | <0.1s | CPU (O(n log n) sort, n~100) |
 | **Total Pipeline** | **10-30s** | Articles + Searches |
 ### Concurrency Limits Effect
 **Article Fetching** (`MaxConcurrentArticleFetches` = 10):
 - 15 articles → 2 waves (10 then 5)
 - If each takes 2s → ~4s total (vs 30s sequential)
 **Embedding Batching** (`MaxConcurrentEmbeddingRequests` = 4, `EmbeddingBatchSize` = 300):
 - 80 chunks → 1 batch of 300 (all fit)
 - If 300 chunks → 1 batch (300 fits), but max concurrent = 4 if multiple embedding calls
 - Here: single embedding call with 80 items = 1 batch (no parallelism needed)
 ### Memory Usage
 - `searchResults` (15 items) → ~30KB
 - `chunks` (80 items × 500 chars) → ~40KB text + embeddings ~400KB (80 × 1536 × 4)
 - Total ≈ 500KB excluding temporary HTTP buffers
 ## Design Decisions
 ### Why Use ConcurrentBag for Results/Chunks?
 Thread-safe collection allows parallel tasks to add without locks. Order is not preserved (but we `DistinctBy` and `Select` maintains order of insertion? Actually no, `ConcurrentBag` doesn't guarantee order. But we later `ToList()` and `DistinctBy` preserves first occurrence order from the bag's enumeration (which is nondeterministic). This is acceptable because order doesn't matter (ranking is semantic). If order mattered, would need `ConcurrentQueue` or sorting by source.
 ### Why Not Use Parallel.ForEach for Article Fetching?
 We use `Task.WhenAll` with `Select` + semaphore. `Parallel.ForEachAsync` could also work but requires .NET 6+ and we want to use same pattern as other phases. Semaphore gives explicit concurrency control.
 ### Why Separate Query Embedding from Chunk Embeddings?
 `GetEmbeddingAsync` is called directly (not batched) because there's only one query. Could be batched with chunks but:
 - Query is small (single string)
 - Batch API has overhead (request structure)
 - Separate call allows independent completion (no need to wait for chunks to start query embedding)
 ### Why Two Different Embedding Methods?
 `EmbeddingService` has:
 - `GetEmbeddingsWithRateLimitAsync` (used in SearchTool)
 - `GetEmbeddingsAsync` (similar but different implementation)
 Probably legacy/refactor artifact. Could consolidate.
 ### Why Not Deduplicate URLs Earlier?
 Deduplication happens after search aggregation. Could also deduplicate within each search result (SearxNG might already dedupe across engines). But global dedupe is necessary.
 ### Why Not Early Filtering (e.g., by domain, length)?
 Possibly could improve quality:
 - Filter by domain reputation
 - Filter articles too short (<200 chars) or too long (>50KB)
 - Not implemented (keep simple)
 ## Testing Considerations
 **Unit Testability**: `SearchTool` is fairly testable with mocks:
 - Mock `SearxngClient` to return predetermined results
 - Mock `ArticleService` via `EmbeddingService` (or mock that too)
 - Verify progress callback invocations
 - Verify final context format
 **Integration Testing**:
 - End-to-end with real/mocked external services
 - Need test SearxNG instance and test OpenRouter key (or mock responses)
 **Performance Testing**:
 - Benchmark with different concurrency settings
 - Profile memory for large result sets (1000+ articles)
 - Measure embedding API latency impact
 ## Known Issues
 ### Bug in ExecuteParallelEmbeddingsAsync?
 Looking at the actual source code of `ExecuteParallelEmbeddingsAsync` **in the core SearchTool**:
 ```csharp
 var chunkTexts = chunks.Select(c => c.Content).ToList();
 var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
    chunkTexts, onProgress);
 ```
 This is correct.
 But in the **initial search result**, I notice there might be confusion. I'll verify this when writing the full component documentation.
 ### Potential Race Condition in GetEmbeddingsWithRateLimitAsync
 ```csharp
 results[batchIndex] = batchResults;
 ```
 This is writing to an array index from multiple parallel tasks. Array index writes are atomic for reference types on 64-bit? Actually, writes to different indices are safe because they don't overlap. This is fine.
 ### Progress Callback May Overwhelm
 If invoked synchronously from many parallel tasks, could saturate the channel. `Channel.TryWrite` will return false if buffer full; we ignore return value. Could drop messages under heavy load. Acceptable for CLI UI (some messages may be lost but overall progress visible).
 ## Related Components
 - **[OpenQueryApp](openquery-app.md)** - calls this
 - **[SearxngClient](../../services/SearxngClient.md)** - phase 1
 - **[ArticleService](../../services/ArticleService.md)** - phase 2a
 - **[ChunkingService](../../services/ChunkingService.md)** - phase 2b
 - **[EmbeddingService](../../services/EmbeddingService.md)** - phase 3
 - **[Ranking](../../services/EmbeddingService.md#cosinesimilarity)** - cosine similarity
 ---
 ## Next Steps
 - [Services Overview](../services/overview.md) - See supporting services
 - [CLI Reference](../../api/cli.md) - How users trigger this pipeline
 - [Performance](../performance.md) - Optimize pipeline settings
@@ -0,0 +1,471 @@
 # Services Overview
 Comprehensive reference for all service classes in OpenQuery.
 ## 📋 Table of Contents
 1. [Service Catalog](#service-catalog)
 2. [Client Services](#client-services)
 3. [Processing Services](#processing-services)
 4. [Infrastructure Services](#infrastructure-services)
 5. [Service Interactions](#service-interactions)
 ## Service Catalog
 OpenQuery's services are organized into three categories:
 | Category | Services | Purpose |
 |-----------|----------|---------|
 | **Clients** | `OpenRouterClient`, `SearxngClient` | External API communication |
 | **Processors** | `EmbeddingService`, `ChunkingService`, `ArticleService` | Data transformation & extraction |
 | **Infrastructure** | `RateLimiter`, `StatusReporter` | Cross-cutting concerns |
 All services are **stateless** (except for internal configuration) and can be safely reused across multiple operations.
 ---
 ## Client Services
 ### OpenRouterClient
 **Location**: `Services/OpenRouterClient.cs`  
 **Purpose**: HTTP client for OpenRouter AI APIs (chat completions & embeddings)
 #### API Endpoints
 | Method | Endpoint | Purpose |
 |--------|----------|---------|
 | POST | `/chat/completions` | Chat completion (streaming or non-streaming) |
 | POST | `/embeddings` | Embedding generation for text inputs |
 #### Authentication
 ```
 Authorization: Bearer {apiKey}
 Accept: application/json
 ```
 #### Public Methods
 ##### `StreamAsync(ChatCompletionRequest request, CancellationToken cancellationToken)`
 - **Returns**: `IAsyncEnumerable<StreamChunk>`
 - **Behavior**: Sets `request.Stream = true`, posts, reads Server-Sent Events stream
 - **Use Case**: Final answer streaming, real-time responses
 - **Stream Format**: SSE lines `data: {json}`; yields `TextDelta` or `ToolCall`
 ##### `CompleteAsync(ChatCompletionRequest request)`
 - **Returns**: `Task<ChatCompletionResponse>`
 - **Behavior**: Sets `request.Stream = false`, posts, returns full response
 - **Use Case**: Query generation (non-streaming)
 ##### `EmbedAsync(string model, List<string> inputs)`
 - **Returns**: `Task<float[][]>`
 - **Behavior**: POST `/embeddings`, returns array of vectors (ordered by input index)
 - **Use Case**: Batch embedding generation
 ##### `HttpClient`
 - **Property**: Internal `_httpClient` (created per instance)
 - **Note**: Could use `IHttpClientFactory` for pooling (not needed for CLI)
 #### Error Handling
 - `EnsureSuccessStatusCode()` throws `HttpRequestException` on 4xx/5xx
 - No retry logic (handled by `EmbeddingService`)
 #### Configuration
 ```csharp
 public OpenRouterClient(string apiKey)
 {
    _apiKey = apiKey;
    _httpClient = new HttpClient();
    _httpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
    _httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
 }
 ```
 #### Example Usage
 ```csharp
 var client = new OpenRouterClient("sk-or-...");
 var request = new ChatCompletionRequest("model", new List<Message> { ... });
 await foreach (var chunk in client.StreamAsync(request))
 {
    Console.Write(chunk.TextDelta);
 }
 ```
 ---
 ### SearxngClient
 **Location**: `Services/SearxngClient.cs`  
 **Purpose**: HTTP client for SearxNG metasearch engine
 #### API Endpoint
 ```
 GET /search?q={query}&format=json
 ```
 #### Constructor
 ```csharp
 public SearxngClient(string baseUrl)  // e.g., "http://localhost:8002"
 ```
 - `baseUrl` trimmed of trailing `/`
 #### Public Methods
 ##### `SearchAsync(string query, int limit = 10)`
 - **Returns**: `Task<List<SearxngResult>>`
 - **Behavior**: GET request, deserialize JSON, take up to `limit` results
 - **On Failure**: Returns empty `List<SearxngResult>` (no exception)
 #### Error Handling
 - `response.EnsureSuccessStatusCode()` would throw, but code doesn't call it
 - If invalid JSON or missing `Results`, returns empty list
 - Failures are **tolerated** - individual search queries may fail without aborting whole operation
 #### Example Searxng Response
 ```json
 {
  "results": [
    {
      "title": "Quantum Entanglement - Wikipedia",
      "url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
      "content": "Quantum entanglement is a physical phenomenon..."
    },
    ...
  ]
 }
 ```
 ---
 ## Processing Services
 ### EmbeddingService
 **Location**: `Services/EmbeddingService.cs`  
 **Purpose**: Generate embeddings with batching, rate limiting, and retry logic
 #### Configuration
 **Embedding Model**: `openai/text-embedding-3-small` (default, configurable via constructor)
 **ParallelProcessingOptions** (hardcoded defaults):
 ```csharp
 public class ParallelProcessingOptions
 {
    public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
    public int EmbeddingBatchSize { get; set; } = 300;
 }
 ```
 #### Public Methods
 ##### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
 - **Returns**: `Task<float[][]>`
 - **Behavior**:
  - Splits `texts` into batches of `EmbeddingBatchSize`
  - Parallel executes batches (max `MaxConcurrentEmbeddingRequests` concurrent)
  - Each batch: rate-limited, retry-wrapped `client.EmbedAsync(model, batch)`
  - Reassembles in original order
  - Failed batches → empty `float[]` for each text
 - **Progress**: Invokes `onProgress` for each batch: `"[Generating embeddings: batch X/Y]"`
 - **Thread-Safe**: Uses lock for collecting results
 ##### `GetEmbeddingAsync(string text, CancellationToken)`
 - **Returns**: `Task<float[]>`
 - **Behavior**: Single embedding with rate limiting and retry
 - **Use Case**: Query embedding
 ##### `Cos  static float CosineSimilarity(float[] vector1, float[] vector2)
 ```
 Uses `System.Numerics.Tensors.TensorPrimitives.CosineSimilarity`
 Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
 ```
 **Implementation**: Single line calling SIMD-accelerated tensor primitive
 ---
 ### ArticleService
 **Location**: `Services/ArticleService.cs`  
 **Purpose**: Extract clean article content from web URLs
 #### Public Methods
 ##### `FetchArticleAsync(string url)`
 - **Returns**: `Task<Article>`
 - **Behavior**: Delegates to `SmartReader.ParseArticleAsync(url)`
 - **Result**: `Article` with `Title`, `TextContent`, `IsReadable`, and metadata
 #### Errors
 - Propagates exceptions (SmartReader may throw on network failures, malformed HTML)
 - `SearchTool` catches and logs
 #### SmartReader Notes
 - Open-source article extraction library (bundled via NuGet)
 - Uses Readability algorithm (similar to Firefox Reader View)
 - Removes ads, navigation, boilerplate
 - `IsReadable` indicates quality (e.g., not a 404 page, not too short)
 ---
 ### ChunkingService
 **Location**: `Services/ChunkingService.cs`  
 **Purpose**: Split text into 500-character chunks at natural boundaries
 #### Public Methods
 ##### `ChunkText(string text)`
 - **Returns**: `List<string>`
 - **Algorithm**:
  - Constant `MAX_CHUNK_SIZE = 500`
  - While remaining text:
    - Take up to 500 chars
    - If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
    - Trim, add if non-empty
    - Advance start
  - Returns all chunks
 #### Characteristics
 - Static class (no instances)
 - Pure function (no side effects)
 - Zero dependencies
 - Handles edge cases (empty text, short text, text without breaks)
 ---
 ## Infrastructure Services
 ### RateLimiter
 **Location**: `Services/RateLimiter.cs`  
 **Purpose**: Limit concurrent operations using semaphore
 #### Constructor
 ```csharp
 public RateLimiter(int maxConcurrentRequests)
 ```
 Creates `SemaphoreSlim` with `maxConcurrentRequests`
 #### Public Methods
 ##### `ExecuteAsync<T>(Func<Task<T>> action, CancellationToken)`
 ```csharp
 public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken cancellationToken = default)
 {
    await _semaphore.WaitAsync(cancellationToken);
    try
    {
        return await action();
    }
    finally
    {
        _semaphore.Release();
    }
 }
 ```
 - Waits for semaphore slot
 - Executes `action` (typically an API call)
 - Releases semaphore (even if exception)
 - Returns result from `action`
 ##### `ExecuteAsync(Func<Task> action, CancellationToken)`
 - Non-generic version (for void-returning actions)
 #### Disposal
 ```csharp
 public async ValueTask DisposeAsync()
 {
    _semaphore.Dispose();
 }
 ```
 Implements `IAsyncDisposable` for async cleanup
 #### Usage Pattern
 ```csharp
 var result = await _rateLimiter.ExecuteAsync(async () =>
 {
    return await SomeApiCall();
 }, cancellationToken);
 ```
 #### Where Used
 - `EmbeddingService`: Limits concurrent embedding batch requests (default 4)
 ---
 ### StatusReporter
 **Location**: `Services/StatusReporter.cs`  
 **Purpose**: Real-time progress display with spinner (compact) or verbose lines
 #### Constructor
 ```csharp
 public StatusReporter(bool verbose)
 ```
 - `verbose = true`: all progress via `WriteLine()` (no spinner)
 - `verbose = false`: spinner with latest status
 #### Architecture
 **Components**:
 - `Channel<string> _statusChannel` - producer-consumer queue
 - `Task _statusProcessor` - background task reading from channel
 - `CancellationTokenSource _spinnerCts` - spinner task cancellation
 - `Task _spinnerTask` - spinner animation task
 - `char[] _spinnerChars` - Braille spinner pattern
 **Spinner Animation**:
 - Runs at 10 FPS (100ms interval)
 - Cycles through `['⠋','⠙','⠹','⠸','⠼','⠴','⠦','⠧','⠇','⠏']`
 - Displays: `⠋ Fetching articles...`
 - Updates in place using ANSI: `\r\x1b[K` (carriage return + erase line)
 #### Public Methods
 ##### `UpdateStatus(string message)`
 - Fire-and-forget: writes to channel via `TryWrite` (non-blocking)
 - If channel full, message dropped (acceptable loss for UI)
 ##### `WriteLine(string text)`
 - Stops spinner temporarily
 - Clears current status line
 - Writes `text` with newline
 - In verbose mode: just `Console.WriteLine(text)`
 ##### `ClearStatus()`
 - In compact mode: `Console.Write("\r\x1b[K")` (erase line)
 - In verbose: no-op
 - Sets `_currentMessage = null`
 ##### `StartSpinner()` / `StopSpinner()`
 - Manual control (usually `StartSpinner` constructor call, `StopSpinner` by `Dispose`)
 ##### `Dispose()`
 - Completes channel writer
 - Awaits `_statusProcessor` completion
 - Calls `StopSpinner()`
 #### Background Processing
 **Status Processor**:
 ```csharp
 private async Task ProcessStatusUpdatesAsync()
 {
    await foreach (var message in _statusChannel.Reader.ReadAllAsync())
    {
        if (_verbose)
        {
            Console.WriteLine(message);
            continue;
        }
        Console.Write("\r\x1b[K");  // Clear line
        Console.Write($"{_spinnerChars[0]} {message}");  // Static spinner
        _currentMessage = message;
    }
 }
 ```
 **Spinner Task**:
 ```csharp
 _spinnerTask = Task.Run(async () =>
 {
    while (_spinnerCts is { Token.IsCancellationRequested: false })
    {
        if (_currentMessage != null)
        {
            Console.Write("\r\x1b[K");
            var charIndex = index++ % spinner.Length;
            Console.Write($"{spinner[charIndex]} {_currentMessage}");
        }
        await Task.Delay(100, _spinnerCts.Token);
    }
 });
 ```
 #### Thread Safety
 - `UpdateStatus` (producer) writes to channel
 - `ProcessStatusUpdatesAsync` (consumer) reads from channel
 - `_spinnerTask` runs concurrently
 - All UI writes happen in consumer/spinner task context (single-threaded UI)
 #### Design Notes
 - Could be simplified: just use `Console.CursorLeft` for spinner, no channel
 - Channel allows random `UpdateStatus` calls from any thread without blocking
 - Braille spinner requires terminal that supports Unicode (most modern terminals do)
 ---
 ## Service Interactions
 ### Dependency Graph
 ```
 OpenQueryApp
 ├── OpenRouterClient ← (used for query gen + final answer)
 └── SearchTool
    ├── SearxngClient
    ├── ArticleService (uses SmartReader)
    ├── ChunkingService (static)
    ├── EmbeddingService
    │   └── OpenRouterClient (different instance)
    │   └── RateLimiter
    └── ParallelProcessingOptions (config)
 ```
 ### Service Lifetimes
 All services are **transient** (new instance per query execution):
 - `OpenRouterClient` → 1 instance for query gen + answer
 - `SearxngClient` → 1 instance for all searches
 - `EmbeddingService` → 1 instance with its own `OpenRouterClient` and `RateLimiter`
 - `SearchTool` → 1 instance per query (constructed in `Program.cs`)
 No singleton or static state (except static utility classes like `ChunkingService`).
 ### Data Flow Through Services
 ```
 OpenQueryApp
  │
  ├─ OpenRouterClient.CompleteAsync() → query generation
  │   Messages → JSON → HTTP request → response → JSON → Messages
  │
  └─ SearchTool.ExecuteAsync()
      │
      ├─ SearxngClient.SearchAsync() × N
      │   query → URL encode → GET → JSON → SearxngResult[]
      │
      ├─ ArticleService.FetchArticleAsync() × M
      │   URL → HTTP GET → SmartReader → Article
      │
      ├─ ChunkingService.ChunkText() × M
      │   Article.TextContent → List<string> chunks
      │
      ├─ EmbeddingService.GetEmbeddingAsync(query) + GetEmbeddingsAsync(chunks[])
      │   texts → batches → rate-limited HTTP POST → JSON → float[][]
      │
      ├─ CosineSimilarity(queryEmbedding, chunkEmbedding) × M
      │   Vectors → dot product → magnitude → score
      │
      └─ return context string (formatted chunks)
 ```
 ---
 ## Next Steps
 - **[OpenQueryApp](../components/openquery-app.md)** - Orchestrates services
 - **[SearchTool](../components/search-tool.md)** - Coordinates pipeline
 - **[Models](../components/models.md)** - Data structures passed between services
 - **[API Reference](../../api/cli.md)** - CLI that uses these services
 ---
 **Service Design Principles**:
 - Single Responsibility: Each service does one thing well
 - Stateless: No instance state beyond constructor args
 - Composable: Services depend on abstractions (other services) not implementations
 - Testable: Can mock dependencies for unit testing
@@ -0,0 +1,356 @@
 # Configuration
 Complete guide to configuring OpenQuery for your environment.
 ## 📋 Table of Contents
 1. [Configuration Methods](#configuration-methods)
 2. [Configuration File](#configuration-file)
 3. [Environment Variables](#environment-variables)
 4. [Command-Line Options](#command-line-options)
 5. [Configuration Priority](#configuration-priority)
 6. [Recommended Settings](#recommended-settings)
 7. [Advanced Configuration](#advanced-configuration)
 ## Configuration Methods
 OpenQuery can be configured through three methods, which merge together with clear priority:
 | Method | Persistence | Use Case |
 |--------|-------------|----------|
 | Configuration File | Permanent | Default values you use daily |
 | Environment Variables | Session/Shell | CI/CD, scripting, temporary overrides |
 | Command-Line Options | Per-execution | One-off customizations |
 ## Configuration File
 ### Location
 OpenQuery follows the [XDG Base Directory](https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html) specification:
 - **Linux/macOS**: `~/.config/openquery/config`
 - **Windows**: `%APPDATA%\openquery\config` (e.g., `C:\Users\<user>\AppData\Roaming\openquery\config`)
 ### Format
 Simple `key=value` pairs, one per line:
 ```ini
 ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 Model=qwen/qwen3.5-flash-02-23
 DefaultQueries=3
 DefaultChunks=3
 DefaultResults=5
 ```
 ### Schema
 | Key | Type | Default | Description |
 |-----|------|---------|-------------|
 | `ApiKey` | string | "" | OpenRouter API authentication key |
 | `Model` | string | `qwen/qwen3.5-flash-02-23` | Default LLM model to use |
 | `DefaultQueries` | int | 3 | Number of search queries to generate |
 | `DefaultChunks` | int | 3 | Number of top context chunks to include |
 | `DefaultResults` | int | 5 | Number of search results per query |
 ### Example Configurations
 **Minimal** (just API key):
 ```ini
 ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 ```
 **Optimized for Research**:
 ```ini
 ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 Model=google/gemini-3-flash-preview
 DefaultQueries=5
 DefaultChunks=4
 DefaultResults=10
 ```
 **Cost-Conscious**:
 ```ini
 ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 Model=qwen/qwen3.5-flash-02-23
 DefaultQueries=2
 DefaultChunks=2
 DefaultResults=3
 ```
 ## Environment Variables
 Environment variables override the configuration file and can be set temporarily or permanently in your shell profile.
 ### Available Variables
 | Variable | Purpose | Required | Example |
 |----------|---------|----------|---------|
 | `OPENROUTER_API_KEY` | OpenRouter API key | **Yes** (unless in config file) | `export OPENROUTER_API_KEY="sk-or-..."` |
 | `OPENROUTER_MODEL` | Override default LLM model | No | `export OPENROUTER_MODEL="deepseek/deepseek-v3.2"` |
 | `SEARXNG_URL` | URL of SearxNG instance | No (default: `http://localhost:8002`) | `export SEARXNG_URL="https://searx.example.com"` |
 ### Setting Environment Variables
 #### Temporary (Current Session)
 ```bash
 # Linux/macOS
 export OPENROUTER_API_KEY="sk-or-..."
 export SEARXNG_URL="http://localhost:8002"
 # Windows PowerShell
 $env:OPENROUTER_API_KEY="sk-or-..."
 $env:SEARXNG_URL="http://localhost:8002"
 ```
 #### Permanent (Shell Profile)
 **bash** (`~/.bashrc` or `~/.bash_profile`):
 ```bash
 export OPENROUTER_API_KEY="sk-or-..."
 export SEARXNG_URL="http://localhost:8002"
 ```
 **zsh** (`~/.zshrc`):
 ```zsh
 export OPENROUTER_API_KEY="sk-or-..."
 export SEARXNG_URL="http://localhost:8002"
 ```
 **fish** (`~/.config/fish/config.fish`):
 ```fish
 set -x OPENROUTER_API_KEY "sk-or-..."
 set -x SEARXNG_URL "http://localhost:8002"
 ```
 **Windows** (PowerShell profile):
 ```powershell
 [Environment]::SetEnvironmentVariable("OPENROUTER_API_KEY", "sk-or-...", "User")
 [Environment]::SetEnvironmentVariable("SEARXNG_URL", "http://localhost:8002", "User")
 ```
 After editing profile files, restart your terminal or run `source ~/.bashrc` (or equivalent).
 ### Security Note
 Never commit your API key to version control. Use environment variables or config file that's in `.gitignore`. The default `.gitignore` already excludes common build directories but doesn't include the config file since it's outside the project directory (`~/.config/`).
 ## Command-Line Options
 Options passed directly to the `openquery` command override both config file and environment variables for that specific execution.
 ### Main Command Options
 ```bash
 openquery [OPTIONS] <question>
 ```
 | Option | Aliases | Type | Default Source | Description |
 |--------|---------|------|----------------|-------------|
 | `--chunks` | `-c` | int | Config `DefaultChunks` | Number of top context chunks |
 | `--results` | `-r` | int | Config `DefaultResults` | Search results per query |
 | ``--queries` | `-q` | int | Config `DefaultQueries` | Number of search queries |
 | `--short` | `-s` | bool | false | Request concise answer |
 | `--long` | `-l` | bool | false | Request detailed answer |
 | `--verbose` | `-v` | bool | false | Show detailed progress |
 ### Configure Command Options
 ```bash
 openquery configure [OPTIONS]
 ```
 | Option | Type | Description |
 |--------|------|-------------|
 | `--interactive` / `-i` | bool | Launch interactive configuration wizard |
 | `--key` | string | Set API key |
 | `--model` | string | Set default model |
 | `--queries` | int? | Set default queries |
 | `--chunks` | int? | Set default chunks |
 | `--results` | int? | Set default results |
 ## Configuration Priority
 When OpenQuery needs a value, it checks sources in this order (highest to lowest priority):
 1. **Command-line option** (if provided)
 2. **Environment variable** (if set)
 3. **Configuration file** (if key exists)
 4. **Hard-coded default** (if all above missing)
 ### Examples
 **Example 1**: Environment overrides config
 ```bash
 # config file: DefaultQueries=5
 export OPENROUTER_MODEL="deepseek/deepseek-v3.2"
 openquery --queries 2 "question"  # Uses: queries=2 (CLI), model=deepseek (env), chunks=3 (config)
 ```
 **Example 2**: CLI overrides everything
 ```bash
 export OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23"
 openquery --model "google/gemini-3-flash-preview" --chunks 5 "question"
 # Uses: model=google (CLI), chunks=5 (CLI), queries=3 (default)
 ```
 **Example 3**: All sources combined
 ```bash
 # config: DefaultChunks=4
 # env: OPENROUTER_MODEL="moonshotai/kimi-k2.5", SEARXNG_URL="http://custom:8002"
 # CLI: --queries 6 --short
 openquery "question"
 # Uses: queries=6 (CLI), chunks=4 (config), results=5 (config), 
 #       model=kimi-k2.5 (env), searxng=custom (env), short=true (CLI)
 ```
 ## Recommended Settings
 ### For Quick Questions (Facts, Definitions)
 ```bash
 openquery -q 2 -r 3 -c 2 "What is the capital of France?"
 ```
 - Few queries (2) for straightforward facts
 - Few results (3) to minimize processing
 - Few chunks (2) for focused answer
 ### For Research (Complex Topics)
 ```bash
 openquery -q 5 -r 10 -c 4 -l "Explain the causes of the French Revolution"
 ```
 - More queries (5) for diverse perspectives
 - More results (10) for comprehensive coverage
 - More chunks (4) for rich context
 - Long format for depth
 ### For Exploration (Broad Topics)
 ```bash
 openquery -q 8 -r 15 -c 5 "What are the latest developments in AI?"
 ```
 - Many queries (8) to explore different angles
 - Many results (15) for breadth
 - More chunks (5) for extensive context
 ### Cost Optimization
 ```bash
 openquery configure --model "qwen/qwen3.5-flash-02-23"
 # Keep defaults: -q 3 -r 5 -c 3
 ```
 - Qwen Flash is very cost-effective
 - Default parameters provide good balance
 ### Performance Optimization
 ```bash
 # Adjust ParallelProcessingOptions in SearchTool.cs if needed
 # Default: MaxConcurrentArticleFetches=10, MaxConcurrentEmbeddingRequests=4
 ```
 - Reduce these values if you see rate limits or memory pressure
 - Increase them if you have fast network/API and want more speed
 ## Advanced Configuration
 ### Changing Concurrency Limits
 Concurrency limits are currently hardcoded in `SearchTool.cs` but can be adjusted:
 ```csharp
 public class ParallelProcessingOptions
 {
    public int MaxConcurrentArticleFetches { get; set; } = 10;  // ← Change this
    public int MaxConcurrentEmbeddingRequests { get; set; } = 4;  // ← Change this
    public int EmbeddingBatchSize { get; set; } = 300;  // ← Change this
 }
 ```
 To make these configurable, you could:
 1. Add fields to `AppConfig`
 2. Read from config file
 3. Pass through to `SearchTool` constructor
 ### Custom Embedding Model
 The embedding model is hardcoded to `openai/text-embedding-3-small`. To change:
 Edit the `EmbeddingService` constructor:
 ```csharp
 public EmbeddingService(OpenRouterClient client, string embeddingModel = "your-model")
 ```
 Or make it configurable via CLI/config (future enhancement).
 ### Changing Chunk Size
 Chunk size (500 chars) is defined in `ChunkingService.cs`:
 ```csharp
 private const int MAX_CHUNK_SIZE = 500;
 ```
 Modify this constant to change how articles are split. Larger chunks:
 - ✅ More context per chunk
 - ❌ Fewer chunks for same article
 - ❌ Higher token usage in final answer
 Smaller chunks:
 - ✅ More granular matching
 - ❌ May lose context across chunk boundaries
 ### Using a Custom SearxNG Instance
 Some SearxNG deployments may require HTTPS, authentication, or custom paths:
 ```bash
 # With authentication (if supported)
 export SEARXNG_URL="https://user:pass@searx.example.com:8080"
 # With custom path
 export SEARXNG_URL="https://searx.example.com/custom-path"
 ```
 Note: Most SearxNG instances don't require auth as they're designed for privacy.
 ### OpenRouter Settings
 OpenRouter supports additional parameters (not yet exposed in OpenQuery):
 - `temperature` - Randomness (0-2, default ~1)
 - `max_tokens` - Response length limit
 - `top_p` - Nucleus sampling
 - `frequency_penalty` / `presence_penalty`
 These could be added to `ChatCompletionRequest` in future versions.
 ## Managing Multiple Configurations
 You can maintain multiple config files and symlink or set per-project:
 ```bash
 # Create project-specific config
 cp ~/.config/openquery/config ~/myproject/openquery.config
 # Use it temporarily
 OPENQUERY_CONFIG=~/myproject/openquery.config openquery "question"
 ```
 **Note**: Currently OpenQuery only looks at `~/.config/openquery/config`. Multi-config support would require code changes (reading from `OPENQUERY_CONFIG` env var).
 ## Configuration Validation
 OpenQuery doesn't strictly validate config values. Invalid settings may cause runtime errors:
 - `DefaultQueries <= 0` → May cause exceptions or zero queries
 - `DefaultChunks <= 0` → May return no context
 - `DefaultResults <= 0` → No search results
 Validate manually:
 ```bash
 # Test your config loads
 cat ~/.config/openquery/config
 # Test with verbose mode
 openquery -v "test"
 ```
 ---
 ## Next Steps
 - [Usage Guide](usage.md) - Learn how to use the CLI
 - [Architecture](architecture.md) - Understand the system design
 - [Troubleshooting](troubleshooting.md) - Fix common issues
@@ -0,0 +1,173 @@
 # OpenQuery Documentation
 Welcome to the comprehensive documentation for OpenQuery - the AI-powered search and answer system.
 ## 📚 Documentation Overview
 ### Getting Started
 - **[Installation Guide](installation.md)** - Build, install, and setup instructions
 - **[Configuration](configuration.md)** - Configure API keys, models, and settings
 - **[Usage Guide](usage.md)** - Complete CLI reference with examples
 ### Deep Dive
 - **[Architecture](architecture.md)** - System design, patterns, and data flow
 - **[Components](components/overview.md)** - Detailed component documentation
  - [OpenQueryApp](components/openquery-app.md)
  - [SearchTool](components/search-tool.md)
  - [Services](components/services.md)
  - [Models](components/models.md)
 - **[API Reference](api/cli.md)** - Complete command-line interface reference
  - [Environment Variables](api/environment-variables.md)
  - [Programmatic APIs](api/programmatic.md)
 ### Support
 - **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
 - **[Performance](performance.md)** - Performance characteristics and optimization
 ## 🎯 Quick Links
 ### For Users
 - [Install OpenQuery](installation.md) in 5 minutes
 - [Configure your API key](configuration.md)
 - [Learn the basics](usage.md)
 - [Solve common problems](troubleshooting.md)
 ### For Developers
 - [Understand the architecture](architecture.md)
 - [Explore components](components/overview.md)
 - [Use the APIs programmatically](api/programmatic.md)
 - [Performance tuning](performance.md)
 ## 📋 Table of Contents
 1. [Project Overview](#project-overview)
 2. [Key Concepts](#key-concepts)
 3. [Technology Stack](#technology-stack)
 4. [System Workflow](#system-workflow)
 ## Project Overview
 **OpenQuery** is a sophisticated CLI tool that combines the power of large language models with web search to provide accurate, well-sourced answers to complex questions.
 ### What It Does
 - Takes a natural language question as input
 - Generates multiple diverse search queries
 - Searches the web via SearxNG
 - Extracts and processes article content
 - Uses semantic similarity to rank relevance
 - Synthesizes a comprehensive AI-generated answer with citations
 ### Why Use OpenQuery?
 - **Accuracy**: Multiple search queries reduce bias and increase coverage
 - **Transparency**: Sources are cited in the final answer
 - **Speed**: Parallel processing minimizes latency
 - **Control**: Fine-tune every aspect from query count to chunk selection
 - **Privacy**: SearxNG provides anonymous, aggregating search
 ## Key Concepts
 ### Search Queries
 Instead of using your exact question, OpenQuery generates multiple optimized search queries (default: 3). For example, "What is quantum entanglement?" might become:
 - "quantum entanglement definition"
 - "how quantum entanglement works"
 - "quantum entanglement experiments"
 ### Content Chunks
 Long articles are split into ~500-character chunks. Each chunk is:
 - Stored with its source URL and title
 - Converted to a vector embedding (1536 dimensions)
 - Scored against your query embedding
 ### Semantic Ranking
 Using cosine similarity between embeddings, OpenQuery ranks chunks by relevance and selects the top N (default: 3) for the final context.
 ### Streaming Answer
 The LLM receives your question plus the top chunks as context and streams the answer in real-time, citing sources like `[Source 1]`.
 ## Technology Stack
 | Layer | Technology | Purpose |
 |-------|------------|---------|
 | Runtime | .NET 10.0 AOT | Native performance, minimal footprint |
 | LLM | OpenRouter API | Chat completions and embeddings |
 | Search | SearxNG | Metasearch engine |
 | Content Extraction | SmartReader | Article text extraction |
 | Vector Math | System.Numerics.Tensors | High-performance cosine similarity |
 | Resilience | Polly | Retry and circuit breaker policies |
 | CLI | System.CommandLine | Command parsing and help |
 | JSON | System.Text.Json (source-gen) | Fast serialization |
 ## System Workflow
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                         OpenQuery Workflow                       │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                   │
 │  1. User Query: "What is quantum entanglement?"                 │
 │                                                                   │
 │  2. Query Generation (Optional)                                 │
 │     LLM generates: ["quantum entanglement physics",            │
 │                      "quantum entanglement definition",         │
 │                      "how does quantum entanglement work"]      │
 │                                                                   │
 │  3. Parallel Searches                                          │
 │     ┌────────────┐  ┌────────────┐  ┌────────────┐           │
 │     │ Query 1 →   │→ │ SearxNG    │→ │  Results   │           │
 │     └────────────┘  └────────────┘  └────────────┘           │
 │     ┌────────────┐  ┌────────────┐  ┌────────────┐           │
 │     │ Query 2 →   │→ │ SearxNG    │→ │  Results   │           │
 │     └────────────┘  └────────────┘  └────────────┘           │
 │     ┌────────────┐  ┌────────────┐  ┌────────────┘           │
 │     │ Query 3 →   │→ │ SearxNG    │→ │  Results (combined)   │
 │     └────────────┘  └────────────┘  └────────────┘           │
 │                                                                   │
 │  4. Parallel Article Fetching                                  │
 │     ┌──────────┐  ┌──────────┐  ┌──────────┐                │
 │     │ URL 1 →  │→ │ Article  │→ │ Chunks   │                │
 │     └──────────┘  └──────────┘  └──────────┘                │
 │     ┌──────────┐  ┌──────────┐  ┌──────────┐                │
 │     │ URL 2 →  │→ │ Article  │→ │ Chunks   │                │
 │     └──────────┘  └──────────┘  └──────────┘                │
 │     ... (concurrent, max 10 at a time)                         │
 │                                                                   │
 │  5. Parallel Embeddings                                        │
 │     ┌──────────┐  ┌──────────┐  ┌──────────┐                │
 │     │ Chunks   │→ │ Embed-   │→ │ Vectors  │                │
 │     │  Batch 1 │  │ ding API │  │          │                │
 │     └──────────┘  └──────────┘  └──────────┘                │
 │     ┌──────────┐  ┌──────────┐  ┌──────────┐                │
 │     │ Chunks   │→ │ Embed-   │→ │ Vectors  │                │
 │     │  Batch 2 │  │ ding API │  │          │                │
 │     └──────────┘  └──────────┘  └──────────┘                │
 │     (batches of 300, up to 4 concurrent)                       │
 │                                                                   │
 │  6. Semantic Ranking                                          │
 │     Query Embedding + Chunk Embeddings → Cosine Similarity →   │
 │     Score → Sort Descending → Top 3 Chunks                     │
 │                                                                   │
 │  7. Final Answer Generation                                    │
 │     ┌────────────────────────────────────────────┐             │
 │     │ System: "Answer based on this context:"   │             │
 │     │ Context: [Top 3 chunks with sources]      │             │
 │     │ Question: "What is quantum entanglement?"  │             │
 │     └────────────────────────────────────────────┘             │
 │                           ↓                                      │
 │                    LLM Streams Answer                           │
 │                    "Quantum entanglement is..."                 │
 │                    with citations like [Source 1]              │
 │                                                                   │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ## Next Steps
 1. **[Install OpenQuery](installation.md)**
 2. **[Configure it](configuration.md)**
 3. **[Start asking questions](usage.md)**
 For detailed technical information, continue to [the architecture guide](architecture.md).
 ---
 **Need help?** Check the [Troubleshooting](troubleshooting.md) guide.
@@ -0,0 +1,358 @@
 # Installation Guide
 This guide covers how to build, install, and configure OpenQuery on your system.
 ## 📋 Table of Contents
 1. [Prerequisites](#prerequisites)
 2. [Quick Install](#quick-install)
 3. [Manual Build](#manual-build)
 4. [Platform-Specific Instructions](#platform-specific-instructions)
 5. [Post-Installation](#post-installation)
 6. [Verification](#verification)
 7. [Uninstallation](#uninstallation)
 ## Prerequisites
 ### Required Software
 - **.NET SDK 10.0** or later
  - Download from [dotnet.microsoft.com](https://dotnet.microsoft.com/download)
  - Verify: `dotnet --version` should show 10.x or higher
 ### External Services (Setup Required)
 1. **SearxNG Instance** - Metasearch engine
   - **Docker (Recommended)**:
     ```bash
     docker run -d \
       --name searxng \
       -p 8002:8080 \
       -v searxng-data:/etc/searxng \
       searxng/searxng:latest
     ```
   - Access at `http://localhost:8002`
   - **Alternative**: Use a public SearxNG instance from [searx.space](https://searx.space)
 2. **OpenRouter API Key** - AI model provider
   - Sign up at [openrouter.ai](https://openrouter.ai)
   - Get your API key from dashboard
   - Free tier available with rate limits
 ## Quick Install
 The easiest way to get OpenQuery up and running:
 ```bash
 # 1. Clone the repository
 git clone <your-repo-url>
 cd OpenQuery
 # 2. Make install script executable and run
 chmod +x install.sh
 ./install.sh
 # 3. Configure your API key
 openquery configure -i
 # 4. Test it
 openquery "Hello world"
 ```
 **What the install script does**:
 - Builds the project in Release mode
 - Publishes as self-contained AOT binary
 - Copies to `~/.local/bin/OpenQuery` (Linux/macOS)
 - Creates config directory `~/.config/openquery/`
 ## Manual Build
 If you prefer to build manually or need a specific platform:
 ### Step 1: Restore Dependencies
 ```bash
 dotnet restore
 ```
 ### Step 2: Build
 ```bash
 dotnet build -c Release
 ```
 ### Step 3: Publish
 #### For Current Platform (Self-Contained AOT)
 ```bash
 dotnet publish -c Release \
  --self-contained true \
  /p:PublishAot=true
 ```
 The binary will be at:
 ```
 bin/Release/net10.0/<rid>/publish/OpenQuery
 ```
 #### For Specific Platform (Cross-Compilation)
 **Runtime Identifiers (RIDs)**:
 | Platform | RID |
 |----------|-----|
 | Linux x64 | `linux-x64` |
 | Linux ARM64 | `linux-arm64` |
 | macOS x64 | `osx-x64` |
 | macOS ARM64 | `osx-arm64` |
 | Windows x64 | `win-x64` |
 | Windows ARM64 | `win-arm64` |
 Example for Linux x64:
 ```bash
 dotnet publish -c Release \
  -r linux-x64 \
  --self-contained true \
  /p:PublishAot=true
 ```
 ### Step 4: Deploy
 Copy the binary to a directory in your PATH:
 ```bash
 # Linux/macOS
 sudo cp bin/Release/net10.0/linux-x64/publish/OpenQuery /usr/local/bin/
 chmod +x /usr/local/bin/OpenQuery
 # Windows (PowerShell as Admin)
 Copy-Item bin\Release\net10.0\win-x64\publish\OpenQuery.exe C:\Program Files\OpenQuery\
 ```
 Or use a local bin directory:
 ```bash
 mkdir -p ~/.local/bin
 cp bin/Release/net10.0/linux-x64/publish/OpenQuery ~/.local/bin/
 # Add to PATH if not already: export PATH="$HOME/.local/bin:$PATH"
 ```
 ## Platform-Specific Instructions
 ### Linux
 #### Ubuntu/Debian
 ```bash
 # Install .NET SDK 10.0
 wget https://dot.net/v10/dotnet-install.sh -O dotnet-install.sh
 chmod +x dotnet-install.sh
 ./dotnet-install.sh --channel 10.0
 # Add to PATH
 export PATH="$HOME/.dotnet:$PATH"
 # Build and install (as shown above)
 ```
 #### With Systemd Service (Optional)
 If you run SearxNG locally, you might want it as a service:
 ```bash
 # Create systemd service for SearxNG (if using Docker)
 sudo nano /etc/systemd/system/searxng.service
 ```
 ```ini
 [Unit]
 Description=SearxNG Search Engine
 Requires=docker.service
 After=docker.service
 [Service]
 Restart=always
 ExecStart=/usr/bin/docker start -a searxng
 ExecStop=/usr/bin/docker stop -t 2 searxng
 [Install]
 WantedBy=multi-user.target
 ```
 ```bash
 sudo systemctl enable searxng
 sudo systemctl start searxng
 ```
 ### macOS
 #### Homebrew Install (if .NET available)
 ```bash
 brew install dotnet-sdk
 ```
 #### M1/M2 (ARM64) Notes
 - Use RID: `osx-arm64`
 - Ensure you have the ARM64 version of .NET SDK
 ### Windows
 #### Using Winget (Windows 10/11)
 ```powershell
 winget install Microsoft.DotNet.SDK.10
 ```
 #### Manual Install
 1. Download installer from [dotnet.microsoft.com](https://dotnet.microsoft.com/download)
 2. Run installer
 3. Verify in PowerShell:
   ```powershell
   dotnet --version
   ```
 #### Building
 ```powershell
 dotnet publish -c Release -r win-x64 --self-contained true /p:PublishAot=true
 ```
 ## Post-Installation
 ### 1. Verify SearxNG is Running
 ```bash
 curl "http://localhost:8002/search?q=test&format=json"
 ```
 Expected: JSON response with results array.
 ### 2. Configure OpenQuery
 ```bash
 # Interactive setup
 openquery configure -i
 # Or via environment variables
 setx OPENROUTER_API_KEY "sk-or-..."  # Windows
 export OPENROUTER_API_KEY="sk-or-..." # Linux/macOS
 ```
 ### 3. Optional: Set Defaults
 ```bash
 openquery configure --queries 5 --chunks 4 --results 10
 ```
 ## Verification
 ### Test Installation
 ```bash
 # Check binary exists and is executable
 which openquery  # Linux/macOS
 where openquery  # Windows
 # If installed as OpenQuery (capital O)
 which OpenQuery
 ```
 ### Test Configuration
 ```bash
 # Should show your config or defaults
 cat ~/.config/openquery/config
 ```
 ### Test the System
 ```bash
 # Simple query (should work with any API key)
 openquery "What is 2+2?"
 # More complex query
 openquery -v "What are the benefits of exercise?"
 ```
 Expected output:
 - Spinner animation with status updates
 - Streaming answer from the AI
 - Citations like `[Source 1](url)` in the answer
 ## Uninstallation
 ### Using Uninstall Script
 ```bash
 chmod +x uninstall.sh
 ./uninstall.sh
 ```
 The script will:
 - Remove binary from `~/.local/bin/`
 - Ask if you want to delete config directory
 ### Manual Removal
 ```bash
 # Remove binary
 rm ~/.local/bin/OpenQuery
 # Remove config (optional)
 rm -r ~/.config/openquery
 ```
 ### Remove SearxNG (if no longer needed)
 ```bash
 docker rm -f searxng
 docker volume rm searxng-data
 ```
 ## Advanced Build Options
 ### Reduce Binary Size
 Edit `OpenQuery.csproj`:
 ```xml
 <PropertyGroup>
  <PublishAot>true</PublishAot>
  <InvariantGlobalization>true</InvariantGlobalization>  <!-- Already set -->
  <StripSymbols>true</StripSymbols>
 </PropertyGroup>
 ```
 ### Debug Build
 ```bash
 dotnet build -c Debug
 dotnet run -- "your question"
 ```
 ### With Symbols (for debugging)
 ```bash
 dotnet publish -c Release -r linux-x64 \
  --self-contained true \
  /p:PublishAot=true \
  /p:DebugType=portable
 ```
 ## Troubleshooting Installation
 ### "dotnet: command not found"
 - Add `.dotnet` to PATH: `export PATH="$HOME/.dotnet:$PATH"`
 - Restart terminal or source shell config
 ### "The SDK 'Microsoft.NET.Sdk' was not found"
 - .NET SDK not installed correctly
 - Re-run installer or use `dotnet-install.sh`
 ### AOT Build Fails
 - Some platforms may not support AOT yet
 - Remove `/p:PublishAot=true` to use JIT
 - Check [.NET AOT support](https://docs.microsoft.com/dotnet/core/deploying/native-aot/)
 ### Docker Pull Fails (SearxNG)
 ```bash
 # Pull image separately first
 docker pull searxng/searxng:latest
 # Then run container
 docker run -d --name searxng -p 8002:8080 searxng/searxng
 ```
 ### Port 8002 Already in Use
 Change port in docker command:
 ```bash
 docker run -d --name searxng -p 8080:8080 searxng/searxng
 # Then set SEARXNG_URL=http://localhost:8080
 ```
 ## Next Steps
 After successful installation:
 1. [Configure OpenQuery](configuration.md)
 2. [Learn how to use it](usage.md)
 3. Read the [Architecture](architecture.md) to understand how it works
 ---
 **Need help?** See [Troubleshooting](troubleshooting.md) or open an issue.
@@ -0,0 +1,522 @@
 # Performance
 Performance characteristics, optimization strategies, and scalability considerations for OpenQuery.
 ## 📋 Table of Contents
 1. [Performance Overview](#performance-overview)
 2. [Latency Breakdown](#latency-breakdown)
 3. [Throughput](#throughput)
 4. [Memory Usage](#memory-usage)
 5. [Benchmarking](#benchmarking)
 6. [Optimization Strategies](#optimization-strategies)
 7. [Scalability Limits](#scalability-limits)
 ## Performance Overview
 OpenQuery is designed for **low-latency interactive use** (15-50 seconds end-to-end) while maximizing parallelization to minimize wait time.
 ### Key Metrics
 | Metric | Typical | Best Case | Worst Case |
 |--------|---------|-----------|------------|
 | **End-to-End Latency** | 15-50s | 10s | 120s+ |
 | **API Cost** | $0.01-0.05 | $0.005 | $0.20+ |
 | **Memory Footprint** | 100-300MB | 50MB | 1GB+ |
 | **Network I/O** | 5-20MB | 1MB | 100MB+ |
 **Note**: Wide variance due to network latency, content size, and LLM speed.
 ---
 ## Latency Breakdown
 ### Default Configuration
 `-q 3 -r 5 -c 3` (3 queries, 5 results each, 3 final chunks)
 | Stage | Operation | Parallelism | Time (p50) | Time (p95) | Dominant Factor |
 |-------|-----------|-------------|------------|------------|-----------------|
 | 1 | Query Generation | 1 | 2-5s | 10s | LLM inference speed |
 | 2a | Searches (3 queries × 5 results) | 3 concurrent | 3-8s | 15s | SearxNG latency |
 | 2b | Article Fetching (≈15 URLs) | 10 concurrent | 5-15s | 30s | Each site's response time |
 | 2c | Chunking | 10 concurrent | <1s | 2s | CPU (HTML parsing) |
 | 3a | Query Embedding | 1 | 0.5-1s | 3s | Embedding API latency |
 | 3b | Chunk Embeddings (≈50 chunks) | 4 concurrent | 1-3s | 10s | Batch API latency |
 | 4 | Ranking | 1 | <0.1s | 0.5s | CPU (vector math) |
 | 5 | Final Answer Streaming | 1 | 5-20s | 40s | LLM generation speed |
 | **Total** | | | **16-50s** | **~60s** | |
 ### Phase Details
 #### Phase 1: Query Generation (2-5s)
 - Single non-streaming LLM call
 - Input: system prompt + user question (~200 tokens)
 - Output: JSON array of 3-5 short strings (~50 tokens)
 - Fast because small context and output
 #### Phase 2a: Searches (3-8s)
 - 3 parallel `SearxngClient.SearchAsync` calls
 - Each: query → SearxNG → aggregator engines → scraped results
 - Latency highly variable based on:
  - SearxNG instance performance
  - Network distance to SearxNG
  - SearxNG's upstream search engines
 #### Phase 2b: Article Fetching (5-15s)
 - ≈15 URLs to fetch (3 queries × 5 results minus duplicates)
 - Up to 10 concurrent fetches (semaphore)
 - Each: TCP connect + TLS handshake + HTTP GET + SmartReader parse
 - Latency:
  - Fast sites (CDN, cached): 200-500ms
  - Normal sites: 1-3s
  - Slow/unresponsive sites: timeout after ~30s
 Why 5-15s for 15 URLs with 10 concurrent?
 - First wave (10 URLs): max latency among them ≈ 3s → 3s
 - Second wave (5 URLs): another ≈ 3s → total 6s
 - But many URLs faster (500ms) → total ≈ 2-3s
 - However, some sites take 5-10s → dominates
 **Tail latency**: Slowest few URLs can dominate total time. Cannot proceed until all fetch attempts complete (or fail).
 #### Phase 2c: Chunking (<1s)
 - CPU-bound HTML cleaning and splitting
 - SmartReader is surprisingly fast; C# HTML parser
 - Typically 100-300 chunks total
 - <1s on modern CPU
 #### Phase 3: Embeddings (1.5-4s)
 - **Query embedding**: 1 call, ~200 tokens, ≈ 0.5-1s
 - **Chunk embeddings**: ≈50 chunks → 1 batch of 50 (batch size 300 unused here)
  - Batch of 50: still single API call, ~15K tokens (50 × 300 chars ≈ 15K tokens)
  - If using `text-embedding-3-small`: $0.00002 per 1K → ~$0.0003 per batch
  - Latency: 1-3s for embedding API
 If more chunks (say 500), would be 2 batches → maybe 2-4s.
 Parallel batches (4 concurrent) help if many batches (1500+ chunks).
 #### Phase 4: Ranking (<0.1s)
 - Cosine similarity for 50-100 chunks
 - Each: dot product + normalization (O(dim)=1536)
 - 100 × 1536 ≈ 150K FLOPs → negligible on modern CPU
 - SIMD acceleration from `TensorPrimitives`
 #### Phase 5: Final Answer (5-20s)
 - Streaming chat completion
 - Input: system prompt + context (50K tokens for 3×500-char chunks) + question
 - Output: varies wildly (200-2000 tokens typically)
 - Longer context slightly increases latency
 - Model choice major factor:
  - Qwen Flash: fast (5-10s for 1000 output tokens)
  - Gemini Flash: moderate (10-15s)
  - Llama-class: slower (20-40s)
 ---
 ## Throughput
 ### Sequential Execution
 Running queries one after another (default CLI behavior):
 - Latency per query: 16-50s
 - Throughput: 1 query / 20s ≈ 180 queries/hour (theoretically)
 But API rate limits will kick in before that:
 - OpenRouter free tier: limited RPM/TPM
 - Even paid: soft limits
 ### Concurrent Execution (Multiple OpenQuery Instances)
 You could run multiple OpenQuery processes in parallel (different terminals), but they share:
 - Same API key (OpenRouter rate limit is per API key, not per process)
 - Same SearxNG instance (could saturate it)
 **Practical**: 3-5 concurrent processes before hitting diminishing returns or rate limits.
 ### Throughput Optimization
 To maximize queries per hour:
 1. Use fastest model (Qwen Flash)
 2. Reduce `--chunks` to 1-2
 3. Reduce `--queries` to 1
 4. Use local/fast SearxNG
 5. Cache embedding results (not implemented)
 6. Batch multiple questions in one process (not implemented; would require redesign)
 **Achievable**: Maybe 500-1000 queries/hour on paid OpenRouter plan with aggressive settings.
 ---
 ## Memory Usage
 ### Baseline
 .NET 10 AOT app with dependencies:
 - **Code**: ~30MB (AOT compiled native code)
 - **Runtime**: ~20MB (.NET runtime overhead)
 - **Base Memory**: ~50MB
 ### Per-Query Memory
 | Component | Memory | Lifetime |
 |-----------|--------|----------|
 | Search results (15 items) | ~30KB | Pipeline |
 | Articles (raw HTML) | ~5MB (transient) | Freed after parse |
 | Articles (extracted text) | ~500KB | Until pipeline complete |
 | Chunks (≈100 items) | ~50KB text + embeddings 600KB | Until pipeline complete |
 | Embeddings (100 × 1536 floats) | ~600KB | Until pipeline complete |
 | HTTP buffers | ~1MB per concurrent request | Short-lived |
 | **Total per query** | **~2-5MB** (excluding base) | Released after complete |
 **Peak**: When all articles fetched but not yet embedded, we have text ~500KB + chunks ~650KB = ~1.2MB + overhead ≈ 2-3MB.
 **If processing many queries in parallel** (unlikely for CLI), memory would scale linearly.
 ### Memory Leak Risks
 - `HttpClient` instances: Created per `OpenRouterClient` and `SearxngClient`. Should be disposed (not happening). But short-lived process exits anyway.
 - `StatusReporter` background task: Disposed via `using`
 - `RateLimiter` semaphore: Disposed via `IAsyncDisposable` if wrapped in `using` (not currently, but short-lived)
 No major leaks observed.
 ### Memory Optimization Opportunities
 1. **Reuse HttpClient** with `IHttpClientFactory` (but not needed for CLI)
 2. **Stream article fetching** instead of buffering all articles before embedding (possible: embed as URLs complete)
 3. **Early chunk filtering**: Discard low-quality chunks before embedding to reduce embedding count
 4. **Cache embeddings**: By content hash, avoid re-embedding seen text (would need persistent storage)
 ---
 ## Benchmarking
 ### Methodology
 Measure with `time` command and verbose logging:
 ```bash
 time openquery -v "What is quantum entanglement?" 2>&1 | tee log.txt
 ```
 Parse log for timestamps (or add them manually by modifying code).
 ### Sample Benchmark
 **Environment**:
 - Linux x64, .NET 10 AOT
 - SearxNG local Docker (localhost:8002)
 - OpenRouter API (US East)
 - Model: qwen/qwen3.5-flash-02-23
 **Run 1**:
 ```
 real    0m23.4s
 user    0m1.2s
 sys     0m0.3s
 ```
 Log breakdown:
 - Query generation: 3.2s
 - Searches: 4.1s
 - Article fetching: 8.7s (12 URLs)
 - Embeddings: 2.8s (45 chunks)
 - Final answer: 4.6s (325 tokens)
 **Run 2** (cached SearxNG results, same URLs):
 ```
 real    0m15.8s
 ```
 Faster article fetching (2.3s) because sites cached or faster second request.
 **Run 3** (verbose `-s` short answer):
 ```
 real    0m18.2s
 ```
 Final answer faster (2.1s instead of 4.6s) due to shorter output.
 ### Benchmarking Tips
 1. **Warm up**: First run slower (JIT or AOT cold start). Discard first measurement.
 2. **Network variance**: Run multiple times and average.
 3. **Control variables**: Same question, same SearxNG instance, same network conditions.
 4. **Measure API costs**: Check OpenRouter dashboard for token counts.
 5. **Profile with dotTrace** or `perf` if investigating CPU bottlenecks.
 ---
 ## Optimization Strategies
 ### 1. Tune Concurrent Limits
 Edit `SearchTool.cs` where `_options` is created:
 ```csharp
 var _options = new ParallelProcessingOptions
 {
    MaxConcurrentArticleFetches = 5,        // ↓ from 10
    MaxConcurrentEmbeddingRequests = 2,    // ↓ from 4
    EmbeddingBatchSize = 300               // ↑ or ↓ (rarely matters)
 };
 ```
 **Why tune down?**
 - Hit OpenRouter rate limits
 - Network bandwidth saturated
 - Too many concurrent fetches overwhelm target sites (ethical/scraping etiquette)
 **Why tune up?**
 - Fast network, powerful CPU, no rate limits
 - Many chunks (>500) needing parallel embedding batches
 **Monitor**:
 - `openquery -v` shows embedding progress: `[Generating embeddings: batch X/Y]`
 - If Y=1 (all fitted in one batch), batch size is fine
 - If Y>1 and max concurrent = Y, you're using full parallelism
 ### 2. Reduce Data Volume
 **Fewer search results**:
 ```bash
 openquery -r 3 "question"  # instead of 5 or 10
 ```
 Effect: Fetches fewer URLs, extracts fewer chunks. Linear reduction in work.
 **Fewer queries**:
 ```bash
 openquery -q 1 "question"
 ```
 Effect: One search instead of N. Quality may suffer (less diverse sources).
 **Fewer chunks**:
 ```bash
 openquery -c 1 "question"
 ```
 Effect: Only top 1 chunk in context → fewer tokens → faster final answer, but may miss relevant info.
 **Chunk size** (compile-time constant):
 Edit `ChunkingService.cs`:
 ```csharp
 private const int MAX_CHUNK_SIZE = 300;  // instead of 500
 ```
 Effect: More chunks (more granular ranking) but each chunk shorter → more chunks to rank, more embeddings to generate. Could increase or decrease total time. Likely more tokens overall (more chunks in context if `-c` is fixed number).
 ### 3. Change Embedding Model
 Currently hardcoded to `openai/text-embedding-3-small`. Could use:
 - `openai/text-embedding-3-large` (higher quality, slower, more expensive)
 - `intfloat/multilingual-e5-large` (multilingual, smaller)
 Modify `EmbeddingService` constructor:
 ```csharp
 public EmbeddingService(OpenRouterClient client, string embeddingModel = "your-model")
 ```
 Then pass:
 ```csharp
 var embeddingService = new EmbeddingService(client, "intfloat/multilingual-e5-large");
 ```
 **Impact**: Different dimensionality (1536 vs 1024 vs 4096). Memory scales with dim. Quality may vary for non-English queries.
 ### 4. Caching
 **Current**: No caching. Every query hits all APIs.
 **Embedding cache** (by text hash):
 - Could store in memory: `Dictionary<string, float[]>`
 - Or disk: `~/.cache/openquery/embeddings/`
 - Invalidation: embeddings are deterministic per model, so long-term cache viable
 **Search cache** (by query hash):
 - Cache `List<SearxngResult>` for identical queries
 - TTL: maybe 1 hour (search results change over time)
 **Article cache** (by URL hash):
 - Cache `Article` (text content) per URL
 - Invalidation: could check `Last-Modified` header or use TTL (1 day)
 **Implementation effort**: Medium. Would need cache abstraction (interface, in-memory + disk options).
 **Benefit**: Repeat queries (common in testing or similar questions) become instant.
 ### 5. Parallelize More (Aggressive)
 **Currently**: 
 - Searches: unbounded (as many as `--queries`)
 - Fetches: max 10
 - Embeddings: max 4
 Could increase:
 - Fetches to 20 or 50 (if network/CPU can handle)
 - Embeddings to 8-16 (if OpenRouter rate limit allows)
 **Risk**: 
 - Overwhelming target sites (unethical scraping)
 - API rate limits → 429 errors
 - Local bandwidth saturation
 ### 6. Local Models (Self-Hosted)
 Replace OpenRouter with local LLM:
 - **Query generation**: Could run tiny model locally (no API latency)
 - **Embeddings**: Could run `all-MiniLM-L6-v2` locally (fast, free after setup)
 - **Answer**: Could run Llama 3 8B locally (no cost, but slower than GPT-4/Gemini)
 **Benefits**:
 - Zero API costs (after hardware)
 - No network latency
 - Unlimited queries
 **Drawbacks**:
 - GPU required for decent speed (or CPU very slow)
 - Setup complexity (Ollama, llama.cpp, vLLM, etc.)
 - Model quality may lag behind commercial APIs
 **Integration**: Would need to implement local inference backends (separate project scope).
 ---
 ## Scalability Limits
 ### API Rate Limits
 **OpenRouter**:
 - Free tier: Very limited (few RPM)
 - Paid: Varies by model, but typical ~10-30 requests/second
 - Embedding API has separate limits
 **Mitigation**:
 - Reduce concurrency (see tuning)
 - Add exponential backoff (already have for embeddings)
 - Batch embedding requests (already done)
 ### SearxNG Limits
 **Single instance**:
 - Can handle ~10-50 QPS depending on hardware
 - Upstream search engines may rate limit per instance
 - Memory ~100-500MB
 **Mitigation**:
 - Run multiple SearxNG instances behind load balancer
 - Use different public instances
 - Implement client-side rate limiting (currently only per-URL fetches limited, not searches)
 ### Network Bandwidth
 **Typical data transfer**:
 - Searches: 1KB per query × 3 = 3KB
 - Articles: 100-500KB per fetch × 15 = 1.5-7.5MB (raw HTML)
 - Extracted text: ~10% of HTML size = 150-750KB
 - Embeddings: 100 chunks × 1536 × 4 bytes = 600KB (request + response)
 - Final answer: 2-10KB
 **Total**: ~3-10MB per query
 **100 queries/hour**: ~300MB-1GB data transfer
 **Not an issue** for broadband, but could matter on metered connections.
 ---
 ## Moatslaw's Law: Scaling with Chunk Count
 Let:
 - C = number of chunks with valid embeddings
 - d = embedding dimension (1536)
 - B = embedding batch size (300)
 - P = max parallel embedding batches (4)
 **Embedding Time** ≈ `O(C/B * 1/P)` (batches divided by parallelism)
 **Ranking Time** ≈ `O(C * d)` (dot product per chunk)
 **Context Tokens** (for final answer) ≈ `C * avg_chunk_tokens` (≈ 500 chars = 125 tokens)
 **As C increases**:
 - Embedding time: linear in C/B (sublinear if batch fits in one)
 - Ranking time: linear in C
 - Final answer latency: more tokens in context → longer context processing + potentially longer answer (more relevant chunks to synthesize)
 **Practical limit**:
 - With defaults, C ~ 50-100 (from 15 articles)
 - Could reach C ~ 500-1000 if:
  - `--queries` = 10
  - `--results` = 20 (200 URLs)
  - Many articles long → many chunks each
 - At C = 1000:
  - Embeddings: 1000/300 ≈ 4 batches, with 4 parallel → still 1 sequential step (if 4 batches, parallel all 4 → time ≈ 1 batch duration)
  - But OpenRouter may have per-minute limits on embedding requests
  - Ranking: 1000 × 1536 = 1.5M FLOPs → still <0.01s
  - Context tokens: 1000 × 125 = 125K tokens! Many LLMs have 200K context, so fits, but expensive and slow.
 **Conclusion**: Current defaults scale to C ~ 100-200 comfortably. Beyond that:
 - Need to increase batch size or parallelism for embeddings
 - May hit embedding API rate limits
 - Context token count becomes expensive and may degrade answer quality (LLMs lose focus in very long context)
 ---
 ## Profiling
 ### CPU Profiling
 Use `dotnet-trace` or `perf`:
 ```bash
 # Collect trace for 30 seconds while running query
 dotnet-trace collect --process-id $(pgrep OpenQuery) --duration 30s -o trace.nettrace
 # Analyze with Visual Studio or PerfView
 ```
 Look for:
 - Hot methods: `ChunkingService.ChunkText`, `EmbeddingService.GetEmbeddingsAsync`, cosine similarity
 - Allocation hotspots
 ### Memory Profiling
 ```bash
 dotnet-gcdump collect -p <pid>
 # Open in VS or dotnet-gcdump analyze
 ```
 Check heap size, object counts (look for large `string` objects from article content).
 ### Network Profiling
 Use `tcpdump` or `wireshark`:
 ```bash
 tcpdump -i any port 8002 or port 443 -w capture.pcap
 ```
 Or simpler: `time` on individual curl commands to measure latency components.
 ---
 ## Next Steps
 - [Configuration](../configuration.md) - Tune for your environment
 - [Troubleshooting](../troubleshooting.md) - Diagnose slow performance
 - [Architecture](../architecture.md) - Understand pipeline bottlenecks
 ---
 **Quick Tuning Cheatsheet**
 ```bash
 # Fast & cheap (factual Q&A)
 openquery -q 1 -r 3 -c 2 -s "What is X?"
 # Thorough (research)
 openquery -q 5 -r 10 -c 5 -l "Deep dive on X"
 # Custom code edit for concurrency
 # In SearchTool.cs:
 _options = new ParallelProcessingOptions {
    MaxConcurrentArticleFetches = 20,  // if network can handle
    MaxConcurrentEmbeddingRequests = 8  // if API allows
 };
 ```
@@ -0,0 +1,699 @@
 # Troubleshooting
 Solve common issues, errors, and performance problems with OpenQuery.
 ## 📋 Table of Contents
 1. [Common Errors](#common-errors)
 2. [Performance Issues](#performance-issues)
 3. [Debugging Strategies](#debugging-strategies)
 4. [Getting Help](#getting-help)
 ## Common Errors
 ### ❌ "API Key is missing"
 **Error Message**:
 ```
 [Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i' to set it up.
 ```
 **Cause**: No API key available from environment or config file.
 **Solutions**:
 1. **Set environment variable** (temporary):
 ```bash
 export OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
 ```
 2. **Configure interactively** (persistent):
 ```bash
 openquery configure -i
 # Follow prompts to enter API key
 ```
 3. **Check config file**:
 ```bash
 cat ~/.config/openquery/config
 # Should contain: ApiKey=sk-or-...
 ```
 4. **Verify environment**:
 ```bash
 echo $OPENROUTER_API_KEY
 # If empty, you didn't export or exported in wrong shell
 ```
 ---
 ### ❌ "Network request failed"
 **Error Message**:
 ```
 [Error] Network request failed. Details: Name or service not known
 ```
 **Cause**: Cannot reach OpenRouter or SearxNG API endpoints.
 **Solutions**:
 1. **Check internet connectivity**:
 ```bash
 ping 8.8.8.8
 curl https://openrouter.ai
 ```
 2. **Verify SearxNG is running**:
 ```bash
 curl "http://localhost:8002/search?q=test&format=json"
 # Should return JSON
 ```
 If connection refused:
 ```bash
 # Start SearxNG if using Docker
 docker start searxng
 # Or run fresh
 docker run -d --name searxng -p 8002:8080 searxng/searxng:latest
 ```
 3. **Check firewall/proxy**:
 ```bash
 # Test OpenRouter API
 curl -H "Authorization: Bearer $OPENROUTER_API_KEY" \
     https://openrouter.ai/api/v1/models
 ```
 4. **Test from different network** (if behind restrictive firewall)
 ---
 ### ❌ "No search results found"
 **Error Message**:
 ```
 No search results found.
 ```
 **Cause**: Search queries returned zero results from SearxNG.
 **Solutions**:
 1. **Test SearxNG manually**:
 ```bash
 curl "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
 # Should be > 0
 ```
 2. **Check SearxNG configuration**:
 - If self-hosted: ensure internet access is enabled in `/etc/searxng/settings.yml`
 - Some public instances disable certain engines or have rate limits
 3. **Try a different SearxNG instance**:
 ```bash
 export SEARXNG_URL="https://searx.example.com"
 openquery "question"
 ```
 4. **Use simpler queries**: Some queries may be too obscure or malformed
 5. **Verbose mode to see queries**:
 ```bash
 openquery -v "complex question"
 # See what queries were generated
 ```
 ---
 ### ❌ "Found search results but could not extract readable content."
 **Cause**: SearxNG returned results but `ArticleService` failed to extract content from all URLs.
 **Common Reasons**:
 - JavaScript-heavy sites (React, Vue apps) where content loaded dynamically
 - Paywalled sites (NYT, academic journals)
 - PDFs or non-HTML content
 - Malformed HTML
 - Server returned error (404, 403, 500)
 - `robots.txt` blocked crawler
 **Solutions**:
 1. **Accept that some sites can't be scraped** - try different query to get different results
 2. **Use site:reddit.com or site:wikipedia.org** - these are usually scrape-friendly
 3. **Increase `--results`** to get more URLs (some will work)
 4. **Check verbose output**:
 ```bash
 openquery -v "question"
 # Look for "Warning: Failed to fetch article"
 ```
 5. **Try a local SearxNG instance with more engines** - some engines fetch different sources
 ---
 ### ❌ Rate Limiting (429 Too Many Requests)
 **Symptoms**:
 ```bash
 [Error] Response status code does not indicate success: 429 (Too Many Requests).
 ```
 Or retries exhausting after Polly attempts.
 **Cause**: Too many concurrent requests to OpenRouter API.
 **Solutions**:
 1. **Reduce concurrency** (edit `SearchTool.cs`):
 ```csharp
 var _options = new ParallelProcessingOptions
 {
    MaxConcurrentArticleFetches = 5,  // reduce from 10
    MaxConcurrentEmbeddingRequests = 2,  // reduce from 4
    EmbeddingBatchSize = 150  // reduce from 300
 };
 ```
 2. **Add delay** between embedding batches (custom implementation)
 3. **Upgrade OpenRouter plan** to higher rate limits
 4. **Wait and retry** - rate limits reset after time window
 ---
 ### ❌ Slow Performance
 **Symptom**: Queries take 60+ seconds when they usually take 20s.
 **Diagnosis Steps**:
 1. **Run with verbose mode**:
 ```bash
 openquery -v "question"
 ```
 Watch which phase takes longest:
 - Query generation?
 - Searching?
 - Fetching articles?
 - Embeddings?
 2. **Check network latency**:
 ```bash
 time curl "https://openrouter.ai/api/v1/models"
 time curl "http://localhost:8002/search?q=test&format=json"
 ```
 **Common Causes & Fixes**:
 | Phase | Cause | Fix |
 |-------|-------|-----|
 | Searches | SearxNG overloaded/slow | Check CPU/memory, restart container |
 | Fetching | Target sites slow | Reduce `--results` to fewer URLs |
 | Embeddings | API rate limited | Reduce concurrency (see above) |
 | Answer | Heavy model/load | Switch to faster model (e.g., Qwen Flash) |
 3. **Resource monitoring**:
 ```bash
 htop  # CPU/memory usage
 iftop  # network throughput
 ```
 4. **Reduce parameters**:
 ```bash
 openquery -q 2 -r 3 -c 2 "question"  # lighter load
 ```
 ---
 ### ❌ Out of Memory
 **Symptoms**:
 - Process killed by OOM killer (Linux)
 - `System.OutOfMemoryException`
 - System becomes unresponsive
 **Cause**: Processing too many large articles simultaneously.
 **Why**: Each article can be 100KB+ of text, split into many chunks, embeddings are 6KB per chunk (1536 floats × 4 bytes). 200 chunks = 1.2MB embeddings, plus text ~100KB = 1.3MB. Not huge, but many large articles could create thousands of chunks.
 **Solutions**:
 1. **Reduce `--results`** (fewer URLs per query):
 ```bash
 openquery -r 3 "question"  # instead of 10
 ```
 2. **Reduce `--queries`** (fewer search queries):
 ```bash
 openquery -q 2 "question"
 ```
 3. **Fetches already limited** to 10 concurrent by default, which is reasonable
 4. **Check article size**: Some sites (PDFs, long documents) may yield megabytes of text; SmartReader should truncate but may not
 ---
 ### ❌ Invalid JSON from Query Generation
 **Symptom**: Query generation fails silently, falls back to original question.
 **Cause**: LLM returned non-JSON (even though instructed). Could be:
 - Model not instruction-following
 - Output exceeded context window
 - API error in response
 **Detection**: Run with `-v` to see:
 ```
 [Failed to generate queries, falling back to original question. Error: ...]
 ```
 **Solutions**:
 - Try a different model (configure to use Gemini or DeepSeek)
 - Reduce `--queries` count (simpler task)
 - Tune system prompt (would require code change)
 - Accept fallback - the original question often works as sole query
 ---
 ### ❌ Spinner Artifacts in Output
 **Symptom**: When redirecting output to file, you see weird characters like `⠋`, `�`, etc.
 **Cause**: Spinner uses Unicode Braille characters and ANSI escape codes.
 **Fix**: Use `2>/dev/null | sed 's/.\x08//g'` to clean:
 ```bash
 openquery "question" 2>/dev/null | sed 's/.\x08//g' > answer.md
 ```
 Or run with `--verbose` (no spinner, only newline-separated messages):
 ```bash
 openquery -v "question" > answer.txt
 ```
 ---
 ### ❌ "The type or namespace name '...' does not exist" (Build Error)
 **Cause**: Missing NuGet package or wrong .NET SDK version.
 **Solution**:
 1. **Verify .NET SDK 10.0**:
 ```bash
 dotnet --version
 # Should be 10.x
 ```
 If lower: https://dotnet.microsoft.com/download/dotnet/10.0
 2. **Restore packages**:
 ```bash
 dotnet restore
 ```
 3. **Clean and rebuild**:
 ```bash
 dotnet clean
 dotnet build
 ```
 4. **Check OpenQuery.csproj** for package references:
 ```xml
 <PackageReference Include="Polly.Core" Version="8.6.6" />
 <PackageReference Include="Polly.RateLimiting" Version="8.6.6" />
 <PackageReference Include="SmartReader" Version="0.11.0" />
 <PackageReference Include="System.CommandLine" Version="2.0.0-beta4.22272.1" />
 <PackageReference Include="System.Numerics.Tensors" Version="9.0.0" />
 ```
 If restore fails, these packages may not be available for .NET 10 preview. Consider:
 - Downgrade to .NET 8.0 (if packages incompatible)
 - Or find package versions compatible with .NET 10
 ---
 ### ❌ AOT Compilation Fails
 **Error**: `error NETSDK1085: The current .NET SDK does not support targeting .NET 10.0.`
 **Cause**: Using .NET SDK older than 10.0.
 **Fix**: Install .NET SDK 10.0 preview.
 **Or**: Disable AOT for development (edit `.csproj`):
 ```xml
 <!-- Remove or set to false -->
 <PublishAot>false</PublishAot>
 ```
 ---
 ## Performance Issues
 ### Slow First Request
 **Expected**: First query slower (JIT compilation for .NET runtime if not AOT, or initial API connections).
 If not using AOT:
 - Consider publishing with `/p:PublishAot=true` for production distribution
 - Development builds use JIT, which adds 500ms-2s warmup
 **Mitigation**: Accept as warmup cost, or pre-warm with dummy query.
 ---
 ### High Memory Usage
 **Check**:
 ```bash
 ps aux | grep OpenQuery
 # Look at RSS (resident set size)
 ```
 **Typical**: 50-200MB (including .NET runtime, AOT code, data structures)
 **If >500MB**:
 - Likely processing very many articles
 - Check `--results` and `--queries` values
 - Use `--verbose` to see counts: `[Fetched X search results]`, `[Extracted Y chunks]`
 **Reduce**:
 - `--queries 2` instead of 10
 - `--results 3` instead of 15
 - These directly limit number of URLs to fetch
 ---
 ### High CPU Usage
 **Cause**: 
 - SmartReader HTML parsing (CPU-bound)
 - Cosine similarity calculations (many chunks, but usually fast)
 - Spinner animation (negligible)
 **Check**: `htop` → which core at 100%? If single core, likely parsing. If all cores, parallel fetch.
 **Mitigation**:
 - Ensure `MaxConcurrentArticleFetches` not excessively high (default 10 is okay)
 - Accept - CPU spikes normal during fetch phase
 ---
 ### API Costs Higher Than Expected
 **Symptom**: OpenRouter dashboard shows high token usage.
 **Causes**:
 1. Using expensive model (check `OPENROUTER_MODEL`)
 2. High `--chunks` → more tokens in context
 3. High `--queries` + `--results` → many articles → many embedding tokens (usually cheap)
 4. Long answers (many completion tokens) - especially with `--long`
 **Mitigation**:
 - Use `qwen/qwen3.5-flash-02-23` (cheapest good option)
 - Reduce `--chunks` to 2-3
 - Use `--short` when detailed answer not needed
 - Set `MaxTokens` in request (would need code change or **LLM capabilities**
 ---
 ## Debugging Strategies
 ### 1. Enable Verbose Mode
 Always start with:
 ```bash
 openquery -v "question" 2>&1 | tee debug.log
 ```
 Logs everything:
 - Generated queries
 - URLs fetched
 - Progress counts
 - Errors/warnings
 **Analyze log**:
 - How many queries generated? (Should match `--queries`)
 - How many search results per query? (Should be ≤ `--results`)
 - How many articles fetched successfully?
 - How many chunks extracted?
 - Any warnings?
 ---
 ### 2. Isolate Components
 **Test SearxNG**:
 ```bash
 curl "http://localhost:8002/search?q=test&format=json" | jq '.results[0]'
 ```
 **Test OpenRouter API**:
 ```bash
 curl -X POST https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen/qwen3.5-flash-02-23","messages":[{"role":"user","content":"Hello"}]}'
 ```
 **Test Article Fetching** (with known good URL):
 ```bash
 curl -L "https://example.com/article" | head -50
 ```
 Then check if SmartReader can parse.
 ---
 ### 3. Reduce Scope
 Test with minimal parameters to isolate failing phase:
 ```bash
 # 1 query, 2 results, 1 chunk - should be fast and simple
 openquery -q 1 -r 2 -c 1 "simple test question" -v
 # If that works, gradually increase:
 openquery -q 1 -r 5 -c 1 "simple question"
 openquery -q 3 -r 5 -c 1 "simple question"
 openquery -q 3 -r 5 -c 3 "simple question"
 # Then try complex question
 ```
 ---
 ### 4. Check Resource Limits
 **File descriptors**: If fetching many articles, may hit limit.
 ```bash
 ulimit -n  # usually 1024, should be fine
 ```
 **Memory**: Monitor with `free -h` while running.
 **Disk space**: Not much disk use, but logs could fill if verbose mode used repeatedly.
 ---
 ### 5. Examine Config File
 ```bash
 cat ~/.config/openquery/config
 # Ensure no spaces around '='
 # Correct: ApiKey=sk-or-...
 # Wrong: ApiKey = sk-or-...  (spaces become part of value)
 ```
 Reconfigure if needed:
 ```bash
 openquery configure --key "sk-or-..."
 ```
 ---
 ### 6. Clear Cache / Reset
 No persistent cache exists, but:
 - Re-start SearxNG container: `docker restart searxng`
 - Clear DNS cache if network issues: `sudo systemd-resolve --flush-caches`
 ---
 ## Getting Help
 ### Before Asking
 Gather information:
 1. **OpenQuery version** (commit or build date if available)
 2. **OS and architecture**: `uname -a` (Linux/macOS) or `systeminfo` (Windows)
 3. **Full command** you ran
 4. **Verbose output**: `openquery -v "question" 2>&1 | tee log.txt`
 5. **Config** (redact API key):
 ```bash
 sed 's/ApiKey=.*/ApiKey=REDACTED/' ~/.config/openquery/config
 ```
 6. **SearxNG test**:
 ```bash
 curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
 ```
 7. **OpenRouter test**:
 ```bash
 curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \
     https://openrouter.ai/api/v1/models | jq '.data[0].id'
 ```
 ---
 ### Where to Ask
 1. **GitHub Issues** (if repository hosted there):
   - Search existing issues first
   - Provide all info from above
   - Include log file (or link to gist)
 2. **Community Forum** (if exists)
 3. **Self-Diagnose**:
   - Check `docs/troubleshooting.md` (this file)
   - Check `docs/configuration.md`
   - Check `docs/usage.md`
 ---
 ### Example Bug Report
 ```
 Title: OpenQuery hangs on "Fetching article X/Y"
 Platform: Ubuntu 22.04, .NET 10.0, OpenQuery built from commit abc123
 Command: openquery -v "What is Docker?" 2>&1 | tee log.txt
 Verbose output shows:
 [...]
 [Fetching article 1/15: docker.com]
 [Fetching article 2/15: hub.docker.com]
 [Fetching article 3/15: docs.docker.com]
 # Hangs here indefinitely, no more progress
 SearxNG test:
 $ curl "http://localhost:8002/search?q=docker&format=json" | jq '.results | length'
 15  # SearxNG works
 Config:
 ApiKey=sk-or-xxxx (redacted)
 Model=qwen/qwen3.5-flash-02-23
 DefaultQueries=3
 DefaultChunks=3
 DefaultResults=5
 Observation:
 - Fetches 3 articles fine, then stalls
 - Nothing in log after "Fetching article 3/15"
 - Process uses ~150MB memory, CPU 0% (idle)
 - Ctrl+C exits immediately
 Expected: Should fetch remaining 12 articles (concurrent up to 10)
 Actual: Only 3 fetched, then silent hang
 ```
 ---
 ## Known Issues
 ### Issue: Spinner Characters Not Displaying
 Some terminals don't support Braille Unicode patterns.
 **Symptoms**: Spinner shows as `?` or boxes.
 **Fix**: Use font with Unicode support, or disable spinner by setting `TERM=dumb` or use `--verbose`.
 ---
 ### Issue: Progress Messages Overwritten
 In very fast operations, progress updates may overlap.
 **Cause**: `StatusReporter` uses `Console.Write` without lock in compact mode; concurrent writes from channel processor and spinner task could interleave.
 **Mitigation**: Unlikely in practice (channel serializes, spinner only updates when `_currentMessage` set). If problematic, add lock around Console operations.
 ---
 ### Issue: Articles with No Text Content
 Some URLs return articles with empty `TextContent`.
 **Cause**: SmartReader's quality heuristic (`IsReadable`) failed, or article truly has no text (image, script, error page).
 **Effect**: Those URLs contribute zero chunks.
 **Acceptable**: Part of normal operation; not all URLs yield readable content.
 ---
 ### Issue: Duplicate Sources in Answer
 Same website may appear multiple times (different articles).
 **Cause**: Different URLs from different search results may be from same domain but different pages.
 **Effect**: `[Source 1]` and `[Source 3]` could both be `example.com`. Not necessarily bad - they're different articles.
 ---
 ## Performance Tuning Reference
 | Setting | Default | Fastest | Most Thorough | Notes |
 |---------|---------|---------|---------------|-------|
 | `--queries` | 3 | 1 | 8+ | More queries = more searches |
 | `--results` | 5 | 2 | 15+ | Fewer = fewer articles to fetch |
 | `--chunks` | 3 | 1 | 5+ | More chunks = more context tokens |
 | `MaxConcurrentArticleFetches` | 10 | 5 | 20 | Higher = more parallel fetches |
 | `MaxConcurrentEmbeddingRequests` | 4 | 2 | 8 | Higher = faster embeddings (may hit rate limits) |
 | `EmbeddingBatchSize` | 300 | 100 | 1000 | Larger = fewer API calls, more data per call |
 **Start**: Defaults are balanced.
 **Adjust if**:
 - Slow: Reduce `--results`, `--queries`, or concurrency limits
 - Poor quality: Increase `--chunks`, `--results`, `--queries`
 - Rate limited: Reduce concurrency limits
 - High cost: Use `--short`, reduce `--chunks`, choose cheaper model
 ---
 ## Next Steps
 - [Performance](../performance.md) - Detailed performance analysis
 - [Configuration](../configuration.md) - Adjust settings
 - [Usage](../usage.md) - Optimize workflow
 ---
 **Quick Diagnostic Checklist**
 ```bash
 # 1. Check API key
 echo $OPENROUTER_API_KEY | head -c 10
 # 2. Test SearxNG
 curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
 # 3. Test OpenRouter
 curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \
     https://openrouter.ai/api/v1/models | jq '.data[0].id'
 # 4. Run verbose
 openquery -v "test" 2>&1 | grep -E "Fetching|Generated|Found"
 # 5. Check resource usage while running
 htop
 # 6. Reduce scope and retry
 openquery -q 1 -r 2 -c 1 "simple test"
 ```
@@ -0,0 +1,483 @@
 # Usage Guide
 Complete reference for using the OpenQuery command-line interface.
 ## 📋 Table of Contents
 1. [Basic Usage](#basic-usage)
 2. [Command Reference](#command-reference)
 3. [Examples](#examples)
 4. [Output Format](#output-format)
 5. [Tips and Tricks](#tips-and-tricks)
 ## Basic Usage
 ### Simplest Form
 ```bash
 openquery "your question here"
 ```
 That's it! OpenQuery will:
 1. Generate search queries
 2. Search the web
 3. Extract relevant content
 4. Stream an answer with sources
 ### Common Pattern
 ```bash
 openquery [OPTIONS] "your question"
 ```
 Quotes around the question are recommended to preserve spaces.
 ## Command Reference
 ### Main Command
 #### `openquery [options] <question>`
 Ask a question and get an AI-powered answer with citations.
 **Arguments**:
 - `question` (positional, one or more words) - The question to ask
 **Options**:
 | Option | Aliases | Type | Default | Description |
 |--------|---------|------|---------|-------------|
 | `--chunks` | `-c` | int | 3 (from config) | Number of top relevant content chunks to include in context |
 | `--results` | `-r` | int | 5 (from config) | Number of search results to fetch per generated query |
 | `--queries` | `-q` | int | 3 (from config) | Number of search queries to generate from your question |
 | `--short` | `-s` | bool | false | Request a concise, to-the-point answer |
 | `--long` | `-l` | bool | false | Request a detailed, comprehensive answer |
 | `--verbose` | `-v` | bool | false | Show detailed progress information and debug output |
 **Behavior**:
 - Short and long are mutually exclusive but can both be omitted (balanced answer)
 - If both `--short` and `--long` are specified, `--long` takes precedence
 - Options override configuration file defaults
 #### `openquery configure [options]`
 Configure OpenQuery settings (API key, model, defaults).
 **Options**:
 | Option | Type | Description |
 |--------|------|-------------|
 | `--interactive` / `-i` | bool | Launch interactive configuration wizard |
 | `--key` | string | Set the OpenRouter API key |
 | `--model` | string | Set the default model |
 | `--queries` | int? | Set default number of queries |
 | `--chunks` | int? | Set default number of chunks |
 | `--results` | int? | Set default number of results |
 **Examples**:
 ```bash
 # Interactive wizard
 openquery configure -i
 # Set just the API key
 openquery configure --key "sk-or-..."
 # Set multiple defaults non-interactively
 openquery configure --model "deepseek/deepseek-v3.2" --queries 5 --chunks 4
 ```
 **Note**: Options with `?` are nullable; only provided values are updated.
 ## Examples
 ### Everyday Queries
 **Simple factual question**:
 ```bash
 openquery "What is the speed of light?"
 ```
 **Multi-word question**:
 ```bash
 openquery "How do solar panels work?"
 ```
 **Question with special characters**:
 ```bash
 openquery "What's the weather in New York?"
 ```
 ### Customizing Output
 **Get a quick answer**:
 ```bash
 openquery -s "Who is the CEO of Tesla?"
 ```
 Output: "Elon Musk is the CEO of Tesla." (minimal explanation)
 **Get detailed analysis**:
 ```bash
 openquery -l "Explain how nuclear fusion works"
 ```
 Output: Multi-paragraph detailed explanation with scientific details
 **See everything**:
 ```bash
 openquery -v "What is machine learning?"
 ```
 Output: Shows all progress messages alongside the answer
 ### Adjusting Search Depth
 **Minimal search** (fast, cheap):
 ```bash
 openquery -q 1 -r 2 -c 1 "What time is it in London?"
 ```
 - 1 generated query
 - 2 results per query
 - 1 context chunk
 **Thorough research** (slow, comprehensive):
 ```bash
 openquery -q 8 -r 15 -c 5 "History and applications of cryptography"
 ```
 - 8 diverse queries
 - 15 results each
 - 5 top chunks
 **Balanced (recommended defaults)**:
 ```bash
 openquery "Latest advancements in CRISPR technology"
 ```
 - 3 queries
 - 5 results each
 - 3 top chunks
 ### Combining Options
 **Verbose custom search**:
 ```bash
 openquery -v -q 5 -r 10 -c 4 "What are the ethical implications of AI?"
 ```
 **Short answer with more context**:
 ```bash
 openquery -s -c 5 "Python vs JavaScript for web development"
 ```
 **Long answer, lots of research**:
 ```bash
 openquery -l -q 10 -r 20 -c 6 "Complete guide to quantum computing"
 ```
 ### Practical Use Cases
 **News and Current Events**:
 ```bash
 openquery "Latest developments in the Ukraine conflict"
 ```
 **Technical Questions**:
 ```bash
 openquery "How to set up a PostgreSQL replication cluster"
 ```
 **Health Information** (verify with doctor!):
 ```bash
 openquery "What are the symptoms of vitamin D deficiency?"
 ```
 **Cooking**:
 ```bash
 openquery "How to make authentic Italian pizza dough"
 ```
 **Travel**:
 ```bash
 openquery "Best things to do in Tokyo in spring"
 ```
 **Programming**:
 ```bash
 openquery "Rust vs Go for backend development in 2025"
 ```
 ### Configuration Examples
 **Set up for the first time**:
 ```bash
 openquery configure -i
 # Follow prompts to enter API key, choose model, set defaults
 ```
 **Switch to a different model**:
 ```bash
 openquery configure --model "google/gemini-3-flash-preview"
 ```
 **Update default number of queries**:
 ```bash
 openquery configure --queries 5
 ```
 **Set cost-effective defaults**:
 ```bash
 openquery configure --model "qwen/qwen3.5-flash-02-23" --queries 2 --chunks 2 --results 3
 ```
 **Check your configuration**:
 ```bash
 cat ~/.config/openquery/config
 ```
 ## Output Format
 ### Standard Output (Streaming)
 The answer streams in real-time, character by character, like this:
 ```
 ⠋ Generating search queries...  (spinner with status)
 ⠹ Searching web...
 ⠸ Fetching articles...
 ⠼ Processing embeddings...
 ⠴ Generating answer...
 Assistant: Quantum entanglement is a phenomenon where pairs or groups of
 particles interact in ways such that the quantum state of each particle
 cannot be described independently of the others, even when separated by
 large distances.
 [Source 1: Understanding Quantum Mechanics](https://example.com/quantum)
 [Source 2: Quantum Physics Overview](https://example.com/physics)
 ```
 ### Verbose Mode Output (`-v`)
 When `--verbose` is enabled, you see detailed progress:
 ```
 [Generating 3 search queries based on your question...]
 [Generated queries: 
  1. quantum entanglement definition
  2. how quantum entanglement works
  3. quantum entanglement Bell's theorem
 ]
 [Searching web for 'quantum entanglement definition'...]
 [Searching web for 'how quantum entanglement works'...]
 [Searching web for 'quantum entanglement Bell's theorem'...]
 [Fetched 15 search results total]
 [Fetching article 1/12: physicsworld.com]
 [Fetching article 2/12: nature.com]
 ...
 [Fetching article 12/12: scientificamerican.com]
 [Extracted 48 content chunks]
 [Generating embeddings: batch 1/4]
 [Generating embeddings: batch 2/4]
 [Generating embeddings: batch 3/4]
 [Generating embeddings: batch 4/4]
 [Ranked chunks by relevance]
 [Found top 3 most relevant chunks overall. Generating answer...]
 Assistant: Quantum entanglement is a fundamental phenomenon in quantum
 mechanics where...
 ```
 ### Source Citations
 Sources are formatted as markdown links in the answer:
 ```
 [Source 1: Article Title](https://example.com/article)
 ```
 These appear inline where the AI references that source. Multiple sources can be cited in a single paragraph.
 ### Error Output
 Errors are written to stderr and exit with non-zero status:
 ```
 [Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i'.
 ```
 ## Tips and Tricks
 ### Speed Tips
 1. **Reduce concurrency limits** (edit `SearchTool.cs` if constantly rate-limited)
 2. **Reduce `--results`** - fewer articles to fetch and process
 3. **Reduce `--queries`** - fewer parallel searches
 4. **Use local SearxNG** - minimize network latency to search backend
 5. **Cache results** - future enhancement could add caching
 ### Quality Tips
 1. **Increase `--chunks`** to 4-5 for complex topics
 2. **Increase `--queries`** to 5-8 for broad exploration
 3. **Use `--long`** for deep topics that need elaboration
 4. **Check `-v` output** to see which sources were selected
 5. **Try different models** - some are better at synthesis, others at facts
 ### Cost Tips
 1. **Use `qwen/qwen3.5-flash-02-23`** - cheapest good model
 2. **Reduce `--chunks`** and `--results`** - fewer tokens in context
 3. **Use `--short`** - shorter answers use fewer completion tokens
 4. **Monitor usage** at [openrouter.ai](https://openrouter.ai) dashboard
 ### Workflow Tips
 **Iterative deepening**:
 ```bash
 # Start broad
 openquery -v "machine learning"
 # Identify subtopics from answer, then dive deeper
 openquery "What is transformer architecture in LLMs?"
 ```
 **Compare answers**:
 ```bash
 # Same question with different models
 OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23" openquery "question"
 OPENROUTER_MODEL="google/gemini-3-flash-preview" openquery "question"
 ```
 **Save answers**:
 ```bash
 openquery "What is Docker?" > answer.md
 # answer.md will contain the streamed output (including spinner chars, so filter):
 openquery "What is Docker?" 2>/dev/null | sed 's/.\x08//g' > clean-answer.md
 ```
 ### Shell Aliases and Functions
 Add to `~/.bashrc` or `~/.zshrc`:
 ```bash
 # Short alias
 alias oq='openquery'
 # With common options
 alias oql='openquery -l -q 5 -r 10'  # long, thorough
 alias oqs='openquery -s'              # short
 alias oqv='openquery -v'              # verbose
 # Function to save output cleanly
 oqsave() {
    openquery "$@" 2>/dev/null | sed 's/.\x08//g' > "answer-$(date +%Y%m%d-%H%M%S).md"
 }
 ```
 ### Scripting
 ```bash
 #!/bin/bash
 # batch-questions.sh
 while IFS= read -r question; do
    echo "## $question" >> research.md
    echo "" >> research.md
    openquery -l "$question" 2>/dev/null | sed 's/.\x08//g' >> research.md
    echo "" >> research.md
 done < questions.txt
 ```
 ### Chaining with Other Tools
 Pipe to `jq` (if you modify to output JSON):
 ```bash
 # Future: openquery --json "question" | jq '.answer'
 ```
 Pipe to `pbcopy` (macOS) or `xclip` (Linux):
 ```bash
 openquery "quick fact" 2>/dev/null | sed 's/.\x08//g' | pbcopy
 ```
 Filter sources:
 ```bash
 openquery "topic" 2>/dev/null | sed 's/.\x08//g' | grep -E '^\[Source'
 ```
 ## Keyboard Interrupts
 - **Ctrl+C** during processing: Cancels current operation, exits gracefully
 - **Ctrl+C** during streaming answer: Stops streaming, shows partial answer
 - **Ctrl+Z** (suspend): Not recommended; may leave background tasks running
 OpenQuery uses proper cancellation tokens to clean up resources on interrupt.
 ## Exit Codes
 | Code | Meaning |
 |------|---------|
 | 0 | Success - answer was generated |
 | 1 | Error - see stderr message |
 | 2 | Configuration error (missing API key) |
 You can check the exit code in shell scripts:
 ```bash
 openquery "question"
 if [ $? -eq 0 ]; then
    echo "Success!"
 else
    echo "Failed"
 fi
 ```
 ## Limitation and Workarounds
 ### Question Length
 Very long questions (>2000 chars) may be truncated by the LLM's context window or cause token limits.
 **Workaround**: Keep questions concise; discuss complex multi-part questions separately.
 ### Answer Length Limits
 The LLM may hit `max_tokens` limits for very complex questions.
 **Workaround**: Use `--long` flag (already maximizes allowed tokens) or break into sub-questions.
 ### Rate Limiting
 OpenRouter may rate limit if you send too many requests too quickly.
 **Symptoms**: 429 errors, occasional timeouts.
 **Workaround**: The built-in retry (Polly) handles this automatically. For persistent issues:
 - Reduce concurrency (edit code)
 - Add delays between queries
 - Upgrade OpenRouter plan
 ### SearxNG Timeouts
 Large SearxNG responses or slow targets may timeout.
 **Workaround**: Reduce `--results` or check SearxNG logs. Nothing to do on OpenQuery side (HTTP client timeout is ~30s default).
 ---
 ## Next Steps
 - [Architecture](architecture.md) - Understand how OpenQuery works under the hood
 - [Configuration](configuration.md) - Fine-tune your setup
 - [Troubleshooting](troubleshooting.md) - Solve common problems
 ---
 **Quick Reference Card**
 ```bash
 # Basic
 openquery "question"
 # Quick fact
 openquery -s "question"
 # Deep research
 openquery -l -q 5 -r 10 -c 4 "question"
 # See progress
 openquery -v "question"
 # Configure
 openquery configure -i
 # Check config
 cat ~/.config/openquery/config
 ```