diff --git a/README.md b/README.md
new file mode 100644
index 0000000..829c68e
--- /dev/null
+++ b/README.md
@@ -0,0 +1,196 @@
+# OpenQuery
+
+**AI-powered search and answer system** that finds accurate, well-sourced answers to your questions by searching the web, extracting relevant content, and synthesizing intelligent responses.
+
+![.NET](https://img.shields.io/badge/.NET-10.0-blue)
+[AOT](https://img.shields.io/badge/AOT-Compiled-green)
+[License](https://img.shields.io/badge/license-MIT-green)
+
+## ✨ Features
+
+- 🤖 **Smart Query Generation** - Automatically creates multiple diverse search queries from your question
+- ⚡ **Parallel Processing** - Fast concurrent searches, article fetching, and embedding generation
+- 🎯 **Semantic Search** - Uses vector embeddings to find the most relevant information
+- 📚 **Clean Article Extraction** - Intelligently extracts article content using SmartReader
+- 🔄 **Streaming Responses** - Watch the AI answer generate in real-time
+- ⚙️ **Fully Configurable** - Control queries, results, and context chunks
+- 🛡️ **Production Ready** - Built with rate limiting, retries, and error handling
+
+## 🚀 Quick Start
+
+### 1. Prerequisites
+
+- A **SearxNG** instance (Docker recommended):
+  ```bash
+  docker run -d --name searxng -p 8002:8080 searxng/searxng:latest
+  ```
+
+- A **OpenRouter API key** from [openrouter.ai](https://openrouter.ai)
+
+### 2. Installation
+
+```bash
+# Clone and build
+git clone <your-repo-url>
+cd OpenQuery
+chmod +x install.sh
+./install.sh
+
+# Or build manually
+dotnet publish -c Release -r linux-x64 --self-contained true /p:PublishAot=true
+```
+
+### 3. Configuration
+
+```bash
+# Interactive setup
+openquery configure -i
+
+# Or set environment variables
+export OPENROUTER_API_KEY="sk-or-..."
+export SEARXNG_URL="http://localhost:8002"  # default
+```
+
+### 4. Ask a Question
+
+```bash
+openquery "What is quantum entanglement and how does it work?"
+```
+
+That's it! The system will:
+1. Generate 3 search queries (configurable)
+2. Search the web via SearxNG
+3. Extract and chunk relevant articles
+4. Rank content by semantic relevance
+5. Stream a comprehensive answer with citations
+
+## 📖 Usage Examples
+
+```bash
+# Concise answer
+openquery -s "Who won the 2024 US presidential election?"
+
+# Detailed research
+openquery -l -q 5 -r 10 "Explain quantum computing and its applications"
+
+# See everything
+openquery -v "What are the health benefits of meditation?"
+
+# Customize
+openquery -c 5 -r 8 "Current state of SpaceX Starship development"
+```
+
+## 🔧 Options
+
+```
+-c, --chunks N     Number of top context chunks (default: 3)
+-r, --results N    Search results per query (default: 5)
+-q, --queries N    Number of search queries to generate (default: 3)
+-s, --short        Give a concise answer
+-l, --long         Give a detailed answer
+-v, --verbose      Show detailed progress
+```
+
+## 🌐 Supported Models
+
+OpenQuery works with any OpenRouter model. Popular choices:
+
+- `qwen/qwen3.5-flash-02-23` (default, fast & affordable)
+- `google/gemini-3-flash-preview`
+- `deepseek/deepseek-v3.2`
+- `moonshotai/kimi-k2.5`
+
+Configure your preferred model:
+```bash
+openquery configure --model "google/gemini-3-flash-preview"
+```
+
+## 📁 Project Structure
+
+```
+OpenQuery/
+├── README.md                 # This file
+├── docs/                     # Detailed documentation
+│   ├── installation.md
+│   ├── configuration.md
+│   ├── usage.md
+│   ├── architecture.md
+│   ├── components/
+│   └── troubleshooting.md
+├── Program.cs                # CLI entry point
+├── OpenQuery.cs              # Main application logic
+├── Services/                 # Business logic services
+├── Models/                   # Data models
+├── Tools/                    # Search orchestration
+└── ConfigManager.cs          # Configuration management
+```
+
+## 🏗️ Architecture
+
+OpenQuery uses a multi-stage pipeline:
+
+```
+Query → Multiple Searches → Article Fetching → Embeddings → Ranking → AI Answer
+```
+
+1. **Query Expansion**: LLM generates diverse search queries
+2. **Parallel Search**: SearxNG executes all queries simultaneously
+3. **Content Extraction**: SmartReader pulls clean article text
+4. **Embedding Generation**: Vectorize query and chunks
+5. **Semantic Ranking**: Cosine similarity scoring
+6. **Answer Synthesis**: Final LLM response with sources
+
+## 🔍 How It Works
+
+1. **You ask a question** → OpenQuery generates 3 optimized search queries
+2. **Searches the web** → All queries run in parallel via SearxNG
+3. **Fetches articles** → Extracts clean content from top results
+4. **Splits into chunks** → ~500 character pieces for embedding
+5. **Ranks by relevance** → Semantic similarity to your question
+6. **Synthesizes answer** → LLM reviews top 3 chunks and responds with citations
+
+## 🐛 Troubleshooting
+
+**Common issues** and solutions:
+
+| Problem | Solution |
+|---------|----------|
+| "API Key is missing" | Run `openquery configure -i` or set `OPENROUTER_API_KEY` |
+| No search results | Check your SearxNG instance is running (`curl http://localhost:8002`) |
+| Slow performance | Reduce `--results` or `--queries` count |
+| Articles failing to fetch | Some sites block scrapers; try different queries |
+
+See [docs/troubleshooting.md](docs/troubleshooting.md) for detailed help.
+
+## 📚 Documentation
+
+- **[Installation Guide](docs/installation.md)** - Build and setup instructions
+- **[Configuration](docs/configuration.md)** - All config options and environment variables
+- **[Usage Guide](docs/usage.md)** - Complete CLI reference and examples
+- **[Architecture](docs/architecture.md)** - System design and patterns
+- **[Components](docs/components/)** - Deep dive into each module
+- **[Troubleshooting](docs/troubleshooting.md)** - Solve common problems
+- **[API Reference](docs/api-reference.md)** - Programmatic interfaces
+
+## 🤝 Contributing
+
+Contributions welcome! Please:
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Submit a pull request
+
+## 📄 License
+
+MIT License - see LICENSE file for details.
+
+## 🙏 Acknowledgments
+
+- [OpenRouter](https://openrouter.ai) - LLM API aggregation
+- [SearxNG](https://searx.space) - Privacy-respecting metasearch
+- [SmartReader](https://github.com/kfasten/SmartReader) - Article extraction
+- Built with [.NET](https://dotnet.microsoft.com)
+
+---
+
+**Need more details?** Check the comprehensive documentation in the [docs/](docs/) folder.
diff --git a/docs/api/cli.md b/docs/api/cli.md
new file mode 100644
index 0000000..5e9da38
--- /dev/null
+++ b/docs/api/cli.md
@@ -0,0 +1,309 @@
+# CLI Reference
+
+Complete command-line interface reference for OpenQuery.
+
+## 📋 Table of Contents
+
+1. [Command Structure](#command-structure)
+2. [Main Command: `openquery`](#main-command-openquery)
+3. [Configure Command: `openquery configure`](#configure-command-openquery-configure)
+4. [Exit Codes](#exit-codes)
+5. [Examples by Use Case](#examples-by-use-case)
+6. [Shell Integration](#shell-integration)
+
+## Command Structure
+
+OpenQuery uses [System.CommandLine](https://learn.microsoft.com/dotnet/standard/commandline/) for CLI parsing.
+
+### Syntax
+```bash
+openquery [GLOBAL-OPTIONS] <COMMAND> [COMMAND-OPTIONS] [ARGUMENTS]
+```
+
+If no command specified, `openquery` (main command) is assumed.
+
+### Help
+```bash
+openquery --help
+openquery configure --help
+```
+
+Shows usage, options, examples.
+
+### Version
+```bash
+openquery --version  # if implemented
+```
+
+---
+
+## Main Command: `openquery`
+
+Ask a question and get an AI-powered answer.
+
+### Synopsis
+```bash
+openquery [OPTIONS] <question>
+```
+
+### Arguments
+
+| Name | Arity | Type | Description |
+|------|-------|------|-------------|
+| `question` | ZeroOrMore | `string[]` | The question to ask (positional, concatenated with spaces) |
+
+**Notes**:
+- `ZeroOrMore` means you can omit the question (shows help)
+- Multiple words are combined: `openquery what is quantum` → `"what is quantum"`
+- Use quotes for questions with special characters: `openquery "what's the weather?"`
+
+### Options
+
+| Option | Aliases | Type | Default | Description |
+|--------|---------|------|---------|-------------|
+| `--chunks` | `-c` | `int` | `DefaultChunks` (config) | Number of top context chunks to pass to LLM |
+| `--results` | `-r` | `int` | `DefaultResults` (config) | Number of search results per query |
+| `--queries` | `-q` | `int` | `DefaultQueries` (config) | Number of search queries to generate |
+| `--short` | `-s` | `bool` | `false` | Request a concise answer |
+| `--long` | `-l` | `bool` | `false` | Request a detailed answer |
+| `--verbose` | `-v` | `bool` | `false` | Show detailed progress information |
+
+**Option Notes**:
+- `--short` and `--long` are flags; if both specified, `--long` takes precedence
+- Integer options validate as positive numbers (parsed by System.CommandLine)
+- Defaults come from config file or hardcoded (3, 5, 3 respectively)
+
+### Behavior
+
+1. Loads API key (env `OPENROUTER_API_KEY` or config file)
+2. Loads model (env `OPENROUTER_MODEL` or config)
+3. Executes workflow:
+   - Generate queries (if `--queries > 1`)
+   - Run search pipeline
+   - Stream final answer
+4. Exits with code 0 on success, 1 on error
+
+### Examples
+
+```bash
+# Basic
+openquery "What is the capital of France?"
+
+# With options
+openquery -q 5 -r 10 -c 4 "Explain quantum computing"
+
+# Short answer
+openquery -s "Who won the 2024 election?"
+
+# Verbose mode
+openquery -v "How does photosynthesis work?"
+
+# Combined
+openquery -l -v -q 8 "History of the internet"
+```
+
+---
+
+## Configure Command: `openquery configure`
+
+Configure OpenQuery settings (API key, model, defaults).
+
+### Synopsis
+```bash
+openquery configure [OPTIONS]
+```
+
+### Options
+
+| Option | Type | Description |
+|--------|------|-------------|
+| `--interactive` / `-i` | `bool` | Launch interactive configuration wizard |
+| `--key` | `string` | Set OpenRouter API key |
+| `--model` | `string` | Set default LLM model |
+| `--queries` | `int?` | Set default number of queries |
+| `--chunks` | `int?` | Set default number of chunks |
+| `--results` | `int?` | Set default number of results |
+
+**Note**: Nullable options (`int?`) only update if provided.
+
+### Behavior
+
+- **Interactive mode** (`-i`): Prompts for each setting with current defaults shown in brackets
+- **Non-interactive**: Only updates provided options, leaves others untouched
+- Writes to `~/.config/openquery/config` (creates directory if missing)
+- Overwrites entire file (not incremental)
+
+### Interactive Mode Details
+
+Models presented with numbered menu:
+
+```
+Available models:
+1. qwen/qwen3.5-flash-02-23
+2. qwen/qwen3.5-122b-a10b
+3. minimax/minimax-m2.5
+4. google/gemini-3-flash-preview
+5. deepseek/deepseek-v3.2
+6. moonshotai/kuki-k2.5
+Model [qwen/qwen3.5-flash-02-23]:
+```
+
+- Enter number (1-6) to select preset
+- Or enter custom model string (any OpenRouter model)
+
+### Examples
+
+```bash
+# Interactive wizard
+openquery configure -i
+
+# Set just API key
+openquery configure --key "sk-or-xxxxxxxxxxxx"
+
+# Set multiple defaults
+openquery configure --model "google/gemini-3-flash-preview" --queries 5 --chunks 4
+
+# Update model only
+openquery configure --model "deepseek/deepseek-v3.2"
+```
+
+---
+
+## Exit Codes
+
+| Code | Meaning |
+|------|---------|
+| `0` | Success - answer generated and streamed |
+| `1` | Error - API key missing, network failure, or exception |
+
+**Usage in scripts**:
+```bash
+openquery "question"
+if [ $? -eq 0 ]; then
+    echo "Success"
+else
+    echo "Failed" >&2
+fi
+```
+
+---
+
+## Examples by Use Case
+
+### Quick Facts
+```bash
+openquery -s "capital of France"
+```
+Fast, concise, minimal tokens.
+
+### Research Paper
+```bash
+openquery -l -q 5 -r 10 -c 4 "quantum entanglement experiments"
+```
+Multiple angles, deep sources, detailed synthesis.
+
+### News & Current Events
+```bash
+openquery -v "latest news about OpenAI"
+```
+See everything: queries, results, which sources fetched.
+
+### Troubleshooting
+```bash
+# Reduce scope if errors
+openquery -q 1 -r 2 "test question"
+```
+
+### Save Answer to File
+```bash
+openquery "question" 2>/dev/null | sed 's/.\x08//g' > answer.md
+```
+
+(Removes spinner characters)
+
+### Batch Processing
+```bash
+for q in $(cat questions.txt); do
+    echo "## $q" >> all-answers.md
+    openquery -s "$q" 2>/dev/null | sed 's/.\x08//g' >> all-answers.md
+    echo "" >> all-answers.md
+done
+```
+
+---
+
+## Shell Integration
+
+### Aliases (add to ~/.bashrc or ~/.zshrc)
+
+```bash
+# Short alias
+alias oq='openquery'
+
+# Presets
+alias oqs='openquery -s'              # short
+alias oql='openquery -l'              # long
+alias oqv='openquery -v'              # verbose
+alias oqr='openquery -q 5 -r 10 -c 4' # research mode
+
+# Config shortcuts
+alias oqcfg='openquery configure -i'
+```
+
+### Functions
+
+```bash
+# Save answer cleanly (removes spinner chars)
+oqsave() {
+    local query="$*"
+    local filename="answer-$(date +%Y%m%d-%H%M%S).md"
+    openquery "$query" 2>/dev/null | sed 's/.\x08//g' > "$filename"
+    echo "Saved to $filename"
+}
+
+# Search and grep results
+oqgrep() {
+    openquery "$1" 2>/dev/null | sed 's/.\x08//g' | grep -i "$2"
+}
+```
+
+### Environment Setup Script
+
+```bash
+# ~/.local/bin/openquery-env.sh
+export OPENROUTER_API_KEY="sk-or-..."
+export OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23"
+export SEARXNG_URL="http://localhost:8002"
+```
+
+Source it: `source ~/.local/bin/openquery-env.sh`
+
+---
+
+## Next Steps
+
+- **[Configuration](configuration.md)** - Set up your environment
+- **[Usage](usage.md)** - Learn usage patterns and tips
+- **[Troubleshooting](troubleshooting.md)** - Fix common problems
+
+---
+
+**Quick Reference Card**
+
+```
+# Ask
+openquery "question"
+openquery -s "quick fact"
+openquery -l -q 5 "deep research"
+
+# Configure
+openquery configure -i
+openquery configure --key "..."
+openquery configure --model "..."
+
+# Debug
+openquery -v "question"
+
+# Help
+openquery --help
+```
diff --git a/docs/api/environment-variables.md b/docs/api/environment-variables.md
new file mode 100644
index 0000000..3cb39d2
--- /dev/null
+++ b/docs/api/environment-variables.md
@@ -0,0 +1,235 @@
+# Environment Variables
+
+Reference for all environment variables used by OpenQuery.
+
+## 📋 Summary
+
+| Variable | Purpose | Required | Default | Example |
+|----------|---------|----------|---------|---------|
+| `OPENROUTER_API_KEY` | OpenRouter authentication | **Yes** | (none) | `sk-or-...` |
+| `OPENROUTER_MODEL` | Override default LLM model | No | `qwen/qwen3.5-flash-02-23` | `google/gemini-3-flash-preview` |
+| `SEARXNG_URL` | SearxNG instance URL | No | `http://localhost:8002` | `https://searx.example.com` |
+
+## Detailed Reference
+
+### `OPENROUTER_API_KEY`
+
+**Purpose**: Your OpenRouter API authentication token.
+
+**Required**: Yes, unless you have `ApiKey` set in config file.
+
+**How to Obtain**:
+1. Sign up at https://openrouter.ai
+2. Go to Dashboard → API Keys
+3. Copy your key (starts with `sk-or-`)
+
+**Priority**: Overrides config file `ApiKey`.
+
+**Setting**:
+
+```bash
+# Bash/Zsh
+export OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+
+# Fish
+set -x OPENROUTER_API_KEY "sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+
+# PowerShell
+$env:OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+
+# Windows CMD
+set OPENROUTER_API_KEY=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+```
+
+**Security**:
+- Never commit API key to version control
+- Don't share key publicly
+- Use environment variables or config file with restrictive permissions (600)
+- Rotate key if accidentally exposed
+
+**Validation**: OpenQuery checks if key is empty string and exits with error if missing:
+
+```
+[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i' to set it up.
+```
+
+### `OPENROUTER_MODEL`
+
+**Purpose**: Override the default LLM model used for both query generation and final answer.
+
+**Required**: No.
+
+**Default**: `qwen/qwen3.5-flash-02-23`
+
+**Available Models** (from OpenRouter):
+
+| Model | Provider | Context | Cost (Input/Output per 1M tokens) |
+|-------|----------|---------|-----------------------------------|
+| `qwen/qwen3.5-flash-02-23` | Alibaba | 200K | \$0.10 / \$0.20 |
+| `qwen/qwen3.5-122b-a10b` | Alibaba | 200K | ~\$0.20 / ~\$0.40 |
+| `minimax/minimax-m2.5` | MiniMax | 200K | ~\$0.20 / ~\$0.40 |
+| `google/gemini-3-flash-preview` | Google | 1M | ~\$0.10 / ~\$0.40 |
+| `deepseek/deepseek-v3.2` | DeepSeek | 200K | ~\$0.10 / ~\$0.30 |
+| `moonshotai/kimi-k2.5` | Moonshot AI | 200K | ~\$0.10 / ~\$0.30 |
+
+(See OpenRouter for current pricing.)
+
+**Setting**:
+
+```bash
+export OPENROUTER_MODEL="google/gemini-3-flash-preview"
+```
+
+**Interactive Config Models**: The `configure -i` wizard shows only these 6 models for convenience, but you can set any OpenRouter model via environment variable or non-interactive configure.
+
+**Note**: Different models have different:
+- Speed (Flash models faster)
+- Cost (check pricing)
+- Quality (may vary by task)
+- Context window size (Gemini 3 Flash has 1M tokens, others ~200K)
+
+### `SEARXNG_URL`
+
+**Purpose**: URL of the SearxNG metasearch instance.
+
+**Required**: No.
+
+**Default**: `http://localhost:8002`
+
+**Format**: Must include protocol (`http://` or `https://`) and host:port.
+
+**Setting**:
+
+```bash
+# Local Docker instance
+export SEARXNG_URL="http://localhost:8002"
+
+# Remote instance with HTTPS
+export SEARXNG_URL="https://searx.example.com"
+
+# Custom port
+export SEARXNG_URL="http://localhost:8080"
+```
+
+**Finding a Public Instance**:
+- Visit https://searx.space for list of public instances
+- Choose one with HTTPS and low latency
+- Note: Public instances may have rate limits or require attribution
+
+**Priority**: Overrides any default, but not config file (no config setting for SearxNG URL - only env var). Could be added to config in future.
+
+**Test Your Instance**:
+```bash
+curl "$SEARXNG_URL/search?q=test&format=json" | head
+```
+
+Expected: JSON with `"results": [...]`.
+
+---
+
+## Configuration Priority Recap
+
+When OpenQuery needs a value:
+
+1. **Command-line option** (`--model`, `--key` from configure) - highest
+2. **Environment variable** (`OPENROUTER_MODEL`, `OPENROUTER_API_KEY`, `SEARXNG_URL`)
+3. **Configuration file** (`~/.config/openquery/config`: `Model`, `ApiKey`)
+4. **Hard-coded default** (only for model)
+
+**Example**:
+```bash
+# Config file: Model=qwen/qwen3.5-flash-02-23
+export OPENROUTER_MODEL="deepseek/deepseek-v3.2"
+openquery --model "google/gemini-3-flash-preview" "question"
+# Uses: model=google (CLI override), overrides env and config
+```
+
+---
+
+## Troubleshooting Environment Variables
+
+### Variable Not Taking Effect
+
+**Symptom**: `openquery` still uses old value after export.
+
+**Causes**:
+- Exported in different shell session
+- Exported after running `openquery`
+- Shell profile not reloaded
+
+**Check**:
+```bash
+echo $OPENROUTER_API_KEY
+# Should print the key (or blank if unset)
+```
+
+**Fix**:
+```bash
+# Export in current session
+export OPENROUTER_API_KEY="sk-or-..."
+
+# Or add to ~/.bashrc / ~/.zshrc and restart terminal
+```
+
+### Special Characters in Values
+
+If your API key contains special characters (`$`, `!`, etc.), quote properly:
+
+```bash
+export OPENROUTER_API_KEY='sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
+# Single quotes prevent shell expansion
+```
+
+### Variable Name Typos
+
+`OPENROUTER_API_KEY` is all caps with underscores. `openrouter_api_key` (lowercase) won't work.
+
+**Check spelling**:
+```bash
+env | grep -i openrouter
+```
+
+### Windows Environment Variables
+
+On Windows, environment variables are set per-session or user-level:
+
+**PowerShell** (current session):
+```powershell
+$env:OPENROUTER_API_KEY="sk-or-..."
+```
+
+**Persistent** (PowerShell):
+```powershell
+[Environment]::SetEnvironmentVariable("OPENROUTER_API_KEY", "sk-or-...", "User")
+```
+
+**CMD**:
+```cmd
+set OPENROUTER_API_KEY=sk-or-...
+```
+
+**System Properties** → Advanced → Environment Variables (GUI)
+
+---
+
+## Next Steps
+
+- **[Configuration File](../configuration.md)** - Persistent configuration
+- **[Usage Guide](../usage.md)** - How to use these variables
+- **[Troubleshooting](../troubleshooting.md)** - Diagnose environment issues
+
+---
+
+**Quick Reference**
+
+```bash
+# Required
+export OPENROUTER_API_KEY="sk-or-..."
+
+# Optional (override defaults)
+export OPENROUTER_MODEL="google/gemini-3-flash-preview"
+export SEARXNG_URL="https://searx.example.com"
+
+# Run
+openquery "your question"
+```
diff --git a/docs/api/programmatic.md b/docs/api/programmatic.md
new file mode 100644
index 0000000..f40ddac
--- /dev/null
+++ b/docs/api/programmatic.md
@@ -0,0 +1,508 @@
+# Programmatic API Reference
+
+How to use OpenQuery components programmatically in your own C# code.
+
+## 📋 Table of Contents
+
+1. [Overview](#overview)
+2. [Using OpenQueryApp Programmatically](#using-openqueryapp-programmatically)
+3. [Using Individual Services](#using-individual-services)
+4. [Custom Implementations](#custom-implementations)
+5. [Thread Safety](#thread-safety)
+6. [Error Handling](#error-handling)
+
+## Overview
+
+OpenQuery is designed as a library of composable services, not just a CLI tool. You can reference the project (or extract the core classes) and use them in your own applications.
+
+### Core Interfaces
+
+Currently, OpenQuery uses concrete classes rather than interfaces. To use programmatically:
+
+1. Reference the `OpenQuery` project/dll
+2. Add `using OpenQuery.Services;` and `using OpenQuery.Tools;`
+3. Instantiate dependencies
+4. Call methods
+
+### Dependency Chain
+
+```
+Your Code
+  ├── OpenRouterClient (LLM API)
+  ├── SearxngClient (Search API)
+  ├── EmbeddingService (requires OpenRouterClient)
+  └── SearchTool (requires SearxngClient + EmbeddingService)
+      └── (internally uses ArticleService, ChunkingService, RateLimiter)
+```
+
+---
+
+## Using OpenQueryApp Programmatically
+
+### Minimal Example
+
+```csharp
+using OpenQuery;
+using OpenQuery.Services;
+using OpenQuery.Tools;
+using OpenQuery.Models;
+
+// 1. Configure
+string apiKey = Environment.GetEnvironmentVariable("OPENROUTER_API_KEY") 
+    ?? throw new InvalidOperationException("API key required");
+string searxngUrl = Environment.GetEnvironmentVariable("SEARXNG_URL") 
+    ?? "http://localhost:8002";
+string model = Environment.GetEnvironmentVariable("OPENROUTER_MODEL") 
+    ?? "qwen/qwen3.5-flash-02-23";
+
+// 2. Instantiate services
+var openRouterClient = new OpenRouterClient(apiKey);
+var searxngClient = new SearxngClient(searxngUrl);
+var embeddingService = new EmbeddingService(openRouterClient);
+var searchTool = new SearchTool(searxngClient, embeddingService);
+var openQuery = new OpenQueryApp(openRouterClient, searchTool, model);
+
+// 3. Execute
+var options = new OpenQueryOptions(
+    Chunks: 3,
+    Results: 5,
+    Queries: 3,
+    Short: false,
+    Long: false,
+    Verbose: false,
+    Question: "What is quantum entanglement?"
+);
+
+await openQuery.RunAsync(options);
+```
+
+**Output**: Streams answer to `Console.Out` (hardcoded in `OpenQueryApp`). To capture output, modify `OpenQueryApp` or redirect console.
+
+### Capturing Output
+
+`OpenQueryApp.RunAsync` writes directly to `Console`. To capture:
+
+**Option 1**: Redirect Console (hacky)
+```csharp
+var sw = new StringWriter();
+Console.SetOut(sw);
+await openQuery.RunAsync(options);
+string answer = sw.ToString();
+```
+
+**Option 2**: Modify OpenQueryApp to accept TextWriter (not currently supported)
+
+**Option 3**: Reimplement using OpenQuery components without `OpenQueryApp`
+
+```csharp
+public async Task<string> GetAnswerAsync(string question, OpenQueryOptions options)
+{
+    var sb = new StringBuilder();
+    var reporter = new StatusReporter(options.Verbose);
+    
+    // Replicate OpenQueryApp.RunAsync but collect output
+    // ... (copy logic from OpenQuery.cs)
+    
+    return sb.ToString();
+}
+```
+
+---
+
+## Using Individual Services
+
+### OpenRouterClient
+
+```csharp
+var client = new OpenRouterClient("your-api-key");
+
+// Non-streaming chat completion
+var request = new ChatCompletionRequest(
+    model: "qwen/qwen3.5-flash-02-23",
+    messages: new List<Message>
+    {
+        new Message("system", "You are a helpful assistant."),
+        new Message("user", "What is 2+2?")
+    }
+);
+
+var response = await client.CompleteAsync(request);
+Console.WriteLine(response.Choices[0].Message.Content);
+
+// Streaming chat completion
+var streamRequest = request with { Stream = true };
+await foreach (var chunk in client.StreamAsync(streamRequest))
+{
+    if (chunk.TextDelta != null)
+        Console.Write(chunk.TextDelta);
+}
+
+// Embeddings
+var embeddingRequest = new EmbeddingRequest(
+    model: "openai/text-embedding-3-small",
+    input: new List<string> { "text 1", "text 2" }
+);
+float[][] embeddings = await client.EmbedAsync(embeddingRequest.Model, embeddingRequest.Input);
+// embeddings[0] is vector for "text 1"
+```
+
+### SearxngClient
+
+```csharp
+var searxng = new SearxngClient("http://localhost:8002");
+
+List<SearxngResult> results = await searxng.SearchAsync("quantum physics", limit: 5);
+
+foreach (var result in results)
+{
+    Console.WriteLine($"{result.Title}");
+    Console.WriteLine($"{result.Url}");
+    Console.WriteLine($"{result.Content}");
+    Console.WriteLine();
+}
+```
+
+### EmbeddingService
+
+```csharp
+var client = new OpenRouterClient("your-api-key");
+var embeddingService = new EmbeddingService(client); // default model: openai/text-embedding-3-small
+
+// Single embedding
+float[] embedding = await embeddingService.GetEmbeddingAsync("Hello world");
+
+// Batch embeddings (with progress)
+List<string> texts = new() { "text 1", "text 2", "text 3" };
+float[][] embeddings = await embeddingService.GetEmbeddingsAsync(
+    texts,
+    onProgress: msg => Console.WriteLine(msg)
+);
+
+// Cosine similarity
+float similarity = EmbeddingService.CosineSimilarity(embedding1, embedding2);
+```
+
+### ArticleService
+
+```csharp
+var article = await ArticleService.FetchArticleAsync("https://example.com/article");
+Console.WriteLine(article.Title);
+Console.WriteLine(article.TextContent);
+Console.WriteLine($"Readable: {article.IsReadable}");
+```
+
+Note: `Article` type comes from SmartReader library (not OpenQuery-specific).
+
+### ChunkingService
+
+```csharp
+List<string> chunks = ChunkingService.ChunkText("Long article text...");
+
+foreach (var chunk in chunks)
+{
+    Console.WriteLine($"Chunk ({chunk.Length} chars): {chunk.Substring(0, 50)}...");
+}
+```
+
+### SearchTool (Orchestration)
+
+```csharp
+var searxngClient = new SearxngClient("http://localhost:8002");
+var embeddingService = new EmbeddingService(openRouterClient);
+var searchTool = new SearchTool(searxngClient, embeddingService);
+
+string context = await searchTool.ExecuteAsync(
+    originalQuery: "What is quantum entanglement?",
+    generatedQueries: new List<string>
+    {
+        "quantum entanglement definition",
+        "how quantum entanglement works"
+    },
+    maxResults: 5,
+    topChunksLimit: 3,
+    onProgress: msg => Console.WriteLine(msg),
+    verbose: true
+);
+
+Console.WriteLine("Context:");
+Console.WriteLine(context);
+```
+
+Output is a formatted string:
+```
+[Source 1: Title](https://example.com/1)
+Content chunk...
+
+[Source 2: Title](https://example.com/2)
+Content chunk...
+```
+
+---
+
+## Custom Implementations
+
+### Custom Progress Reporter
+
+`SearchTool.ExecuteAsync` accepts `Action<string>? onProgress`. Provide your own:
+
+```csharp
+public class MyProgressReporter
+{
+    public void Report(string message)
+    {
+        // Log to file
+        File.AppendAllText("log.txt", $"{DateTime.UtcNow}: {message}\n");
+        
+        // Update UI
+        myLabel.Text = message;
+        
+        // Send to telemetry
+        Telemetry.TrackEvent("OpenQueryProgress", new { message });
+    }
+}
+
+// Usage
+var reporter = new MyProgressReporter();
+await searchTool.ExecuteAsync(..., reporter.Report, verbose: false);
+```
+
+### Custom Chunking Strategy
+
+Extend `ChunkingService` or implement your own:
+
+```csharp
+public static class MyChunkingService
+{
+    public static List<string> ChunkText(string text, int maxSize = 500, int overlap = 50)
+    {
+        // Overlapping chunks for better context retrieval
+        var chunks = new List<string>();
+        int start = 0;
+        while (start < text.Length)
+        {
+            int end = Math.Min(start + maxSize, text.Length);
+            var chunk = text.Substring(start, end - start);
+            chunks.Add(chunk);
+            start += maxSize - overlap; // Slide window
+        }
+        return chunks;
+    }
+}
+```
+
+### Custom Rate Limiter
+
+Implement `IAsyncDisposable` with your own strategy (token bucket, leaky bucket):
+
+```csharp
+public class TokenBucketRateLimiter : IAsyncDisposable
+{
+    private readonly SemaphoreSlim _semaphore;
+    private readonly TimeSpan _refillPeriod;
+    private int _tokens;
+    private readonly int _maxTokens;
+    
+    // Implementation details...
+    
+    public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken ct)
+    {
+        await WaitForTokenAsync(ct);
+        try
+        {
+            return await action();
+        }
+        finally
+        {
+            // Return tokens or replenish bucket
+        }
+    }
+}
+```
+
+---
+
+## Thread Safety
+
+**Thread-Safe Components**:
+- `RateLimiter` - `SemaphoreSlim` is thread-safe
+- `StatusReporter` - Channel is thread-safe
+- Static utility classes (`ChunkingService`) - no state
+
+**Not Thread-Safe** (instances should not be shared across threads):
+- `OpenRouterClient` - wraps `HttpClient` (which is thread-safe but instance may have state)
+- `SearxngClient` - `HttpClient` (thread-safe but reuse recommendations apply)
+- `EmbeddingService` - has mutable fields (`_rateLimiter`, `_retryPipeline`)
+- `SearchTool` - has mutable `_options`
+
+**Recommendation**: Create new instances per operation or use locks if sharing.
+
+### Example: Parallel Queries
+
+```csharp
+var tasks = questions.Select(async question =>
+{
+    var options = new OpenQueryOptions(..., question: question);
+    var query = new OpenQueryApp(client, searchTool, model);
+    await query.RunAsync(options);
+    // Separate instances per task
+});
+
+await Task.WhenAll(tasks);
+```
+
+**Better**: Create factory that spawns fresh instances.
+
+---
+
+## Error Handling
+
+All public async methods may throw:
+
+- `HttpRequestException` - network errors, non-2xx responses
+- `TaskCanceledException` - timeout or cancellation
+- `JsonException` - malformed JSON
+- `Argument*Exception` - invalid arguments
+- `Exception` - any other error
+
+### Pattern: Try-Catch
+
+```csharp
+try
+{
+    var response = await client.CompleteAsync(request);
+    Console.WriteLine(response.Choices[0].Message.Content);
+}
+catch (HttpRequestException ex)
+{
+    Console.Error.WriteLine($"Network error: {ex.Message}");
+}
+catch (Exception ex)
+{
+    Console.Error.WriteLine($"Unexpected error: {ex.Message}");
+}
+```
+
+### Pattern: Resilience with Polly
+
+`EmbeddingService` already wraps `client.EmbedAsync` with Polly retry. For other calls, you can add your own:
+
+```csharp
+var retryPolicy = Policy
+    .Handle<HttpRequestException>()
+    .WaitAndRetryAsync(3, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)));
+
+await retryPolicy.ExecuteAsync(async () =>
+{
+    var response = await client.CompleteAsync(request);
+    // ...
+});
+```
+
+---
+
+## Advanced Usage
+
+### Streaming Responses to Network
+
+```csharp
+var request = new ChatCompletionRequest(model, messages) { Stream = true };
+var response = await client.StreamAsync(request);
+
+await foreach (var chunk in response)
+{
+    if (chunk.TextDelta != null)
+    {
+        await networkStream.WriteAsync(Encoding.UTF8.GetBytes(chunk.TextDelta));
+    }
+}
+```
+
+### Parallel Embedding Batches with Progress
+
+```csharp
+var texts = Enumerable.Range(0, 1000).Select(i => $"Text {i}").ToList();
+
+await embeddingService.GetEmbeddingsAsync(texts, 
+    onProgress: progress => 
+    {
+        Console.WriteLine(progress); // "[Generating embeddings: batch 5/4]"
+    });
+```
+
+### Custom Embedding Service with Different Model
+
+```csharp
+var client = new OpenRouterClient(apiKey);
+var customService = new EmbeddingService(client, "your-embedding-model");
+
+float[] embedding = await customService.GetEmbeddingAsync("text");
+```
+
+---
+
+## Limitations
+
+### No Interface-based Design
+
+OpenQuery uses concrete classes. For mocking in tests, you'd need to create wrappers or use tools like JustMock/Moq that can mock non-virtual methods (not recommended). Better: define interfaces like `IOpenRouterClient` and have implementations.
+
+### Hardcoded Concurrency Settings
+
+`ParallelProcessingOptions` is instantiated in `SearchTool` with hardcoded defaults. To customize, you'd need to:
+
+1. Subclass `SearchTool` and override access to `_options`
+2. Or modify source to accept `ParallelProcessingOptions` in constructor
+3. Or use reflection (hacky)
+
+Suggested improvement: Add constructor parameter.
+
+### Single Responsibility Blur
+
+`OpenQueryApp` does query generation + pipeline + streaming. Could split:
+- `IQueryGenerator` (for expanding queries)
+- `IPipelineExecutor` (for search tool)
+- `IAnswerStreamer` (for final LLM streaming)
+
+Currently, `OpenQueryApp` is the facade.
+
+---
+
+## Next Steps
+
+- **[Components](../components/overview.md)** - Understand architecture
+- **[CLI Reference](../api/cli.md)** - CLI that uses these APIs
+- **[Source Code](../)** - Read implementation details
+
+---
+
+**Code Snippet: Full Programmatic Flow**
+
+```csharp
+using OpenQuery.Services;
+using OpenQuery.Tools;
+using OpenQuery.Models;
+
+async Task<string> Research(string question)
+{
+    var apiKey = GetApiKey(); // your method
+    var client = new OpenRouterClient(apiKey);
+    var searxng = new SearxngClient("http://localhost:8002");
+    var embeddings = new EmbeddingService(client);
+    var search = new SearchTool(searxng, embeddings);
+    var app = new OpenQueryApp(client, search, "qwen/qwen3.5-flash-02-23");
+    
+    var options = new OpenQueryOptions(
+        Chunks: 3,
+        Results: 5,
+        Queries: 3,
+        Short: false,
+        Long: false,
+        Verbose: false,
+        Question: question
+    );
+    
+    // Capture output by redirecting Console or modifying OpenQueryApp
+    await app.RunAsync(options);
+    return "streamed to console"; // would need custom capture
+}
+```
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..215c9a2
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,682 @@
+# Architecture
+
+Deep dive into OpenQuery's system design, architectural patterns, and data flow.
+
+## 📋 Table of Contents
+
+1. [System Overview](#system-overview)
+2. [Architectural Patterns](#architectural-patterns)
+3. [Component Architecture](#component-architecture)
+4. [Data Flow](#data-flow)
+5. [Concurrency Model](#concurrency-model)
+6. [Error Handling & Resilience](#error-handling--resilience)
+7. [Performance Considerations](#performance-considerations)
+8. [Design Decisions](#design-decisions)
+
+## System Overview
+
+OpenQuery is a **pipeline-based AI application** that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.
+
+### Core Design Principles
+
+1. **Separation of Concerns** - Each component has a single, well-defined responsibility
+2. **Parallel First** - Wherever possible, operations are parallelized for speed
+3. **Resilient by Default** - Built-in retries, rate limiting, and graceful degradation
+4. **Configurable** - Most parameters can be adjusted without code changes
+5. **Observable** - Progress reporting and verbose mode for debugging
+
+### High-Level Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                         User Layer                          │
+│  CLI (System.CommandLine) → OpenQueryApp                   │
+└─────────────────────────────┬───────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────┐
+│                    Orchestration Layer                      │
+│  OpenQueryApp → SearchTool (4-phase pipeline)              │
+└─────────────────────────────┬───────────────────────────────┘
+                              │
+        ┌─────────────────────┼─────────────────────┐
+        │                     │                     │
+        ▼                     ▼                     ▼
+┌──────────────┐   ┌──────────────────┐   ┌──────────────┐
+│ Search Layer │   │ Processing Layer │   │ AI Layer     │
+│              │   │                  │   │              │
+│ SearxngClient│   │ ArticleService   │   │OpenRouterClient│
+│              │   │ ChunkingService  │   │              │
+│              │   │ EmbeddingService │   │              │
+└──────────────┘   └──────────────────┘   └──────────────┘
+```
+
+## Architectural Patterns
+
+### 1. Pipeline Pattern
+
+The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:
+
+```csharp
+Phase 1: ExecuteParallelSearchesAsync
+   ↓ (List<SearxngResult>)
+Phase 2: ExecuteParallelArticleFetchingAsync
+   ↓ (List<Chunk>)
+Phase 3: ExecuteParallelEmbeddingsAsync
+   ↓ ((queryEmbedding, chunkEmbeddings))
+Phase 4: RankAndSelectTopChunks
+   ↓ (List<Chunk> topChunks)
+→ Formatted context string returned
+```
+
+Each phase:
+- Accepts input from previous phase
+- Processes in parallel where applicable
+- Returns output to next phase
+- Reports progress via callbacks
+
+### 2. Service Layer Pattern
+
+Services (`Services/` directory) are stateless classes that encapsulate specific operations:
+
+- **Clients**: `OpenRouterClient`, `SearxngClient` (HTTP communication)
+- **Processors**: `EmbeddingService`, `ChunkingService` (data transformation)
+- **Extractors**: `ArticleService` (content extraction)
+- **Infrastructure**: `RateLimiter`, `StatusReporter` (cross-cutting concerns)
+
+All dependencies are explicit (constructor injection), making services easily testable.
+
+### 3. Dependency Injection (Manual)
+
+While not using a DI container, OpenQuery follows DI principles:
+
+```csharp
+// Program.cs: instantiate dependencies with explicit parameters
+var client = new OpenRouterClient(apiKey);
+var searxngClient = new SearxngClient(searxngUrl);
+var embeddingService = new EmbeddingService(client);
+var searchTool = new SearchTool(searxngClient, embeddingService);
+var openQuery = new OpenQueryApp(client, searchTool, model);
+```
+
+Benefits:
+- Clear dependency graph
+- Easy to substitute mocks for testing
+- No magic; visible construction
+
+### 4. Observer Pattern (Progress Reporting)
+
+`StatusReporter` and progress callbacks implement observer pattern:
+
+```csharp
+// SearchTool receives a progress callback
+public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)
+
+// Components invoke callback at key milestones
+onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");
+
+// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
+_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);
+```
+
+### 5. Resilience Patterns (Polly)
+
+`EmbeddingService` uses Polly's retry policy:
+
+```csharp
+_retryPipeline = new ResiliencePipelineBuilder()
+    .AddRetry(new RetryStrategyOptions
+    {
+        MaxRetryAttempts = 3,
+        Delay = TimeSpan.FromSeconds(1),
+        BackoffType = DelayBackoffType.Exponential,
+        ShouldHandle = new PredicateBuilder()
+            .Handle<HttpRequestException>()
+    })
+    .Build();
+```
+
+This automatically retries failed embedding requests with exponential backoff.
+
+### 6. Producer-Consumer Pattern (Channel-based)
+
+`StatusReporter` uses `System.Threading.Channels.Channel<string>` for asynchronous progress updates:
+
+- Producer: `UpdateStatus()` writes messages to channel
+- Consumer: Background task `ProcessStatusUpdatesAsync()` reads and displays
+- Benefit: No blocking between progress generation and display
+
+### 7. Disposable Pattern
+
+Components that hold unmanaged resources implement `IDisposable` or `IAsyncDisposable`:
+
+- `StatusReporter` - stops background spinner task
+- `RateLimiter` - disposes semaphore
+
+Used via `using` statements for deterministic cleanup.
+
+## Component Architecture
+
+### OpenQueryApp (OpenQuery.cs)
+
+**Role**: Main orchestrator; coordinates the entire workflow
+
+**Responsibilities**:
+- Parse CLI options into `OpenQueryOptions`
+- Load configuration and resolve API keys/models
+- Optionally generate expanded search queries via LLM
+- Invoke `SearchTool` with progress callbacks
+- Stream final answer from LLM
+
+**Key Methods**:
+- `RunAsync(OpenQueryOptions)` - Main entry point
+
+**Interactions**:
+- Instantiates `OpenRouterClient` (for both query gen and final answer)
+- Instantiates `SearxngClient` (passed to `SearchTool`)
+- Instantiates `EmbeddingService` (passed to `SearchTool`)
+- Instantiates `SearchTool` (orchestration)
+- `StatusReporter` for UI updates
+
+### SearchTool (Tools/SearchTool.cs)
+
+**Role**: Core search-retrieve-rank pipeline orchestrator
+
+**Responsibilities**:
+- Execute 4-phase pipeline (search → fetch → embed → rank)
+- Manage concurrency limits (via semaphores)
+- Coordinate parallel operations
+- Generate context string for final answer
+
+**Interactions**:
+- Uses `SearxngClient` for Phase 1
+- Uses `ArticleService` + `ChunkingService` for Phase 2
+- Uses `EmbeddingService` for Phase 3
+- Has no external UI dependency (pure logic)
+
+**Parallelization Strategy**:
+- **Phase 1**: `Task.WhenAll` on search tasks (unbounded but limited by SearxNG instance)
+- **Phase 2**: Semaphore (max 10 concurrent fetches)
+- **Phase 3**: `Parallel.ForEachAsync` (max 4 concurrent embedding batches)
+
+### EmbeddingService (Services/EmbeddingService.cs)
+
+**Role**: Generate vector embeddings with batching, rate limiting, and retries
+
+**Responsibilities**:
+- Batch embedding requests (default: 300 per batch)
+- Parallelize batches (default: 4 concurrent)
+- Apply rate limiting (via `RateLimiter`)
+- Retry failed requests (Polly)
+- Calculate cosine similarity
+
+**Key Methods**:
+- `GetEmbeddingsAsync(List<string> texts, ...)` - batch with progress
+- `GetEmbeddingAsync(string text)` - single embedding
+- `CosineSimilarity(float[], float[])` - static vector math
+
+**Design Notes**:
+- Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
+- Batches of 300 reduce API overhead
+- Polly retry handles transient failures (429, 500, network blips)
+
+### OpenRouterClient (Services/OpenRouterClient.cs)
+
+**Role**: HTTP client for OpenRouter API (completions + embeddings)
+
+**Responsibilities**:
+- Serialize requests to JSON (source-generated)
+- Send HTTP with authorization header
+- Stream responses for chat completions (IAsyncEnumerable)
+- Return full responses for non-streaming
+- Throw on non-2xx status codes
+
+**Endpoints**:
+- POST `/chat/completions` (stream and non-stream)
+- POST `/embeddings`
+
+**Configuration**:
+- Base URL: `https://openrouter.ai/api/v1`
+- Headers: `Authorization: Bearer {apiKey}`, `Accept: application/json`
+
+**Design**:
+- Low-level client; no retry logic (retry is in `EmbeddingService`)
+- Thin wrapper around `HttpClient`
+- Could be replaced with `HttpClientFactory` in larger apps
+
+### SearxngClient (Services/SearxngClient.cs)
+
+**Role**: HTTP client for SearxNG metasearch
+
+**Responsibilities**:
+- Construct search URL with query param
+- GET request and deserialize JSON
+- Limit results (`.Take(limit)`)
+- Return empty list on failure (no exceptions)
+
+**Endpoint**: `GET /search?q={query}&format=json`
+
+**Design**:
+- Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
+- `DistinctBy(r => r.Url)` deduplication happens upstream
+
+### ArticleService (Services/ArticleService.cs)
+
+**Role**: Extract clean article content from URLs
+
+**Responsibilities**:
+- Call `SmartReader.ParseArticleAsync(url)`
+- Return `Article` object with `Title`, `TextContent`, `IsReadable`
+
+**Design**:
+- Single responsibility: extraction only (no fetching, no chunking)
+- SmartReader handles all complexity (HTML parsing, boilerplate removal)
+- Exceptions propagate to `SearchTool` (handled there)
+
+### ChunkingService (Services/ChunkingService.cs)
+
+**Role**: Split long text into 500-char chunks at natural boundaries
+
+**Algorithm**:
+1. Start at index 0
+2. Take up to 500 chars
+3. If not at end, backtrack to last space/newline/period
+4. Add chunk, advance start
+5. Repeat until done
+
+**Design**:
+- Static class (stateless utility)
+- No dependencies
+- Pure function (input text → output chunks)
+
+### RateLimiter (Services/RateLimiter.cs)
+
+**Role**: Limit concurrent operations via semaphore
+
+**Responsibilities**:
+- Wrap actions with semaphore acquisition
+- Support both sync and async actions
+- Implement `IAsyncDisposable` for cleanup
+
+**Usage Pattern**:
+```csharp
+await _rateLimiter.ExecuteAsync(async () =>
+{
+    // operation limited by semaphore
+    return await SomeApiCall();
+});
+```
+
+**Design**:
+- Generic wrapper (can return TResult)
+- `SemaphoreSlim` initialized at construction
+- Used in `EmbeddingService` for parallel embedding batches
+
+### StatusReporter (Services/StatusReporter.cs)
+
+**Role**: Show real-time progress with spinner or verbose mode
+
+**Responsibilities**:
+- Maintain spinner animation (background task)
+- Receive status updates via channel
+- Display updates with appropriate formatting
+- Stop spinner on completion
+
+**Features**:
+- **Spinner mode** (non-verbose): `⠋ Fetching...` with animated Braille chars
+- **Verbose mode**: `[Fetching article 1/10: example.com]` on separate lines
+- **Thread-safe**: Channel is safe for concurrent writes
+- **Non-blocking**: Background spinner doesn't block updates
+
+**Design**:
+- `Channel<string>` for asynchronous producer-consumer
+- Background task (`_statusProcessor`) reads from channel
+- Spinner runs on its own task with 100ms delay per frame
+- `IDisposable` ensures proper cleanup
+
+### ConfigManager (ConfigManager.cs)
+
+**Role**: Load and save configuration from/to file
+
+**Responsibilities**:
+- Get config path (XDG: `~/.config/openquery/config`)
+- Parse key-value pairs (no INI library, manual parsing)
+- Provide `AppConfig` object with defaults
+- Save settings back to file
+
+**Design**:
+- Static class (no instances)
+- Creates config directory if missing
+- Line-by-line parsing (simple, no dependencies)
+- Could be improved with proper INI parser or JSON
+
+## Data Flow
+
+### End-to-End Data Flow
+
+```
+User: "What is quantum entanglement?"
+
+1. OpenQueryOptions created
+   { Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }
+
+2. Query Generation (if Queries > 1)
+   → ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
+   → Deserialize to List<string> (generatedQueries)
+
+3. Search Phase
+   generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
+   → DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)
+
+4. Fetch Phase
+   searchResults → Parallel.ArticleService.FetchArticleAsync → Article
+   → ChunkingService.ChunkText (split into ~500-char pieces)
+   → ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)
+
+5. Embedding Phase
+   originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
+   chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)
+
+6. Ranking Phase
+   For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
+   OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)
+
+7. Answer Phase
+   context = string.Join("\n\n", topChunks.Select(...)) 
+   → ChatCompletionRequest to OpenRouter with context + question
+   → StreamAsync → Console.Write(delta) (real-time display)
+
+Result: User sees answer with [Source N] citations
+```
+
+### Data Structures
+
+**Chunk** - The core data structure flowing through the pipeline:
+```csharp
+public record Chunk(
+    string Content,      // Text content (~500 chars)
+    string SourceUrl,    // Where it came from
+    string? Title = null // Article title
+)
+{
+    public float[]? Embedding { get; set; }  // Added in Phase 3
+    public float Score { get; set; }         // Added in Phase 4
+}
+```
+
+**Data Flow State**:
+- Phase 1-2: `Chunk` without embedding
+- Phase 3: `Chunk.Embedding` populated
+- Phase 4: `Chunk.Score` populated
+- Phase 5: Serialized into context string
+
+### Memory Footprint
+
+**Per 15-article run (approximate)**:
+- Raw HTML (fetched): ~5MB (transient, discarded after extract)
+- Articles: ~500KB (15 articles × ~30KB extracted text)
+- Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
+- Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
+- Total peak: ~1-2MB (excluding OpenRouter's memory usage)
+
+**Note**: AOT compilation reduces runtime memory compared to JIT.
+
+## Concurrency Model
+
+OpenQuery uses multiple parallelization strategies:
+
+### Unbounded Parallelism (Task.WhenAll)
+
+**Where**: Search queries (Phase 1)
+
+```csharp
+var searchTasks = generatedQueries.Select(async query => { ... });
+await Task.WhenAll(searchTasks);
+```
+
+**Rationale**: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.
+
+**Risk**: Could overwhelm SearxNG if `--queries` is set very high (100+). Default 3 is safe.
+
+### Semaphore-Controlled Parallelism
+
+**Where**: Article fetching (Phase 2)
+
+```csharp
+var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
+await Task.WhenAll(fetchTasks); // Each task waits on semaphore
+```
+
+**Rationale**: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.
+
+**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentArticleFetches` (compile-time constant currently).
+
+### Parallel.ForEachAsync with MaxDegreeOfParallelism
+
+**Where**: Embedding batch processing (Phase 3)
+
+```csharp
+await Parallel.ForEachAsync(
+    batchIndices,
+    new ParallelOptions { MaxDegreeOfParallelism = 4 },
+    async (batchIndex, ct) => { ... }
+);
+```
+
+**Rationale**: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.
+
+**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentEmbeddingRequests` (compile-time).
+
+### Progress Reporting (Channel)
+
+**Where**: All phases pass `onProgress` callback
+
+**Implementation**: 
+- `StatusReporter.UpdateStatus()` → writes to channel
+- Background task reads channel and displays
+- Non-blocking; callbacks are fire-and-forget (TryWrite)
+
+**Thread Safety**: Channel is thread-safe; multiple phases may write concurrently.
+
+## Error Handling & Resilience
+
+### HTTP Errors
+
+**OpenRouterClient**:
+- Calls `response.EnsureSuccessStatusCode()` → throws `HttpRequestException` on 4xx/5xx
+- No retry (handled at higher level in `EmbeddingService`)
+
+**SearxngClient**:
+- Returns empty `List<SearxngResult>` on non-success
+- No exception thrown (searches are non-critical; if some queries fail, others proceed)
+
+### Retry Policy (Polly)
+
+**Location**: `EmbeddingService` constructor
+
+**Scope**: Only embedding requests (`_client.EmbedAsync`)
+
+**Policy**:
+- Max 3 attempts
+- Exponential backoff: 1s, 2s, 4s
+- Only retries `HttpRequestException` (network errors, 429, 5xx)
+
+**Why not on chat completions?**
+- Query generation and final answer are critical; failures should surface immediately
+- Could be added in future if transient failures are common
+
+### Graceful Degradation
+
+**Query Generation Failure**:
+```csharp
+try { ... generate queries ... }
+catch (Exception ex)
+{
+    // Fall back to original question as sole query
+    if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
+}
+```
+
+**Embedding Batch Failure**:
+```csharp
+catch
+{
+    // Return empty embeddings for this batch (chunk will be filtered out)
+    var emptyBatch = new float[batch.Count][];
+    // fill with empty arrays
+    results.Add((batchIndex, emptyBatch));
+}
+```
+
+**Article Fetch Failure**:
+```csharp
+try { await ArticleService.FetchArticleAsync(url); }
+catch (Exception ex)
+{
+    if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
+    // Chunk not added; continue with others
+}
+```
+
+### User-Facing Errors
+
+Top-level exception handler in `Program.cs`:
+```csharp
+try { await openQuery.RunAsync(options); }
+catch (HttpRequestException ex)
+{
+    Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
+    Environment.Exit(1);
+}
+catch (Exception ex)
+{
+    Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
+    Environment.Exit(1);
+}
+```
+
+### Cancellation Support
+
+`OpenRouterClient.StreamAsync` and `EmbeddingService` methods accept `CancellationToken`.
+
+Used in:
+- Streaming answer (Ctrl+C stops immediately)
+- Parallel embeddings (can be cancelled)
+
+## Performance Considerations
+
+### Latency Breakdown (Typical)
+
+| Stage | Time | Description |
+|-------|------|-------------|
+| Query generation | 2-5s | LLM generates 3-5 queries |
+| Searches | 3-8s | 3-5 parallel SearxNG queries |
+| Article fetching | 5-15s | 10-20 parallel fetches (network + parse) |
+| Embeddings | 2-4s | 50-100 chunks in 4-parallel batches |
+| Final answer | 5-20s | Depends on answer length (streaming) |
+| **Total** | **15-50s** | Varies widely based on network & content |
+
+### Bottlenecks
+
+1. **Network I/O** (article fetching, API calls) - can't be CPU-bound
+2. **OpenRouter API latency** - varies by model and load
+3. **SmartReader parsing** - CPU-bound for large HTML
+4. **Embedding API rate** - OpenRouter may rate limit if too many concurrent
+
+### Optimization Strategies
+
+- **Parallelism**: Already maximized within API constraints
+- **Caching**: Not implemented; future enhancement could cache embeddings per URL
+- **Batching**: 300-chunk batches reduce API overhead
+- **AOT**: Native compilation reduces startup overhead vs JIT
+
+### Scalability Limits
+
+- **Memory**: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
+- **API Limits**: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
+- **SearxNG Limits**: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).
+
+## Design Decisions
+
+### Why Not Use a DI Container?
+
+OpenQuery manually wires dependencies in `Program.cs`. For a small CLI app, this is:
+- Simpler (no container configuration)
+- More explicit (easy to trace dependency graph)
+- No runtime overhead
+- Easier to understand for contributors
+
+Container-based DI would be overengineering.
+
+### Why AOT?
+
+.NET 10 AOT provides:
+- **Fast startup** (<100ms vs ~500ms JIT)
+- **Smaller footprint** (trimmed, no JIT)
+- **No runtime dependencies** (self-contained)
+- Better for CLI tools distributed to users
+
+Trade-offs:
+- Longer build time
+- Some reflection-based APIs not supported (not needed here)
+- Less flexible (can't load dynamic assemblies, but not needed)
+
+### Why SmartReader for Article Extraction?
+
+SmartReader uses Readability-based algorithm similar to Firefox Reader View:
+- Removes ads, navigation, comments, boilerplate
+- Extracts main article content
+- Handles malformed HTML gracefully
+- Zero dependencies (pure .NET)
+
+Alternatives considered:
+- `HtmlAgilityPack` (too low-level, need to implement extraction logic)
+- `AngleSharp` (similar, still need extraction)
+- External services (like diffbot) - require API keys, costs money
+
+SmartReader is the sweet spot: free, good quality, easy integration.
+
+### Why Embeddings + Cosine Similarity vs Full-Text Search?
+
+Full-text search (like Lucene) would:
+- Require inverted index, more complex
+- Be faster for exact keyword matching
+- Not understand semantic similarity
+
+Embeddings provide:
+- Semantic similarity (understand meaning, not just keywords)
+- Simple math (cosine similarity of float arrays)
+- No index to maintain (just compute on-the-fly)
+
+Trade-off: Embedding API cost and latency vs LRU cache potential.
+
+### Why Not RAG (Retrieval Augmented Generation) Framework?
+
+OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:
+- Add dependency bloat
+- Reduce control
+- Increase abstraction complexity
+
+Custom implementation is ~1000 LOC and perfectly matches needs.
+
+### Why System.CommandLine?
+
+Provides:
+- Native-like CLI help (`openquery --help`)
+- Strongly-typed options
+- Command hierarchy (main + subcommands)
+- Good error messages
+
+Alternative: `CommandLineParser` (older) or manual parsing. System.CommandLine is modern and actively developed.
+
+---
+
+## Next Steps
+
+- [Components](components/overview.md) - Deep dive into each module
+- [API Reference](api/cli.md) - Complete command documentation
+- [Troubleshooting](troubleshooting.md) - Debug issues
+
+---
+
+**Want to extend OpenQuery?** Check the [Components](components/overview.md) guide to understand each piece.
diff --git a/docs/components/models.md b/docs/components/models.md
new file mode 100644
index 0000000..8db2cf6
--- /dev/null
+++ b/docs/components/models.md
@@ -0,0 +1,528 @@
+# Models Reference
+
+Complete reference for all data models, DTOs, and records in OpenQuery.
+
+## 📋 Table of Contents
+
+1. [Core Data Models](#core-data-models)
+2. [OpenRouter API Models](#openrouter-api-models)
+3. [SearxNG API Models](#searxng-api-models)
+4. [JSON Serialization](#json-serialization)
+5. [Model Relationships](#model-relationships)
+
+## Core Data Models
+
+### OpenQueryOptions
+
+**Location**: `Models/OpenQueryOptions.cs`  
+**Type**: `record`  
+**Purpose**: Immutable options object for a single query execution
+
+```csharp
+public record OpenQueryOptions(
+    int Chunks,      // Number of top chunks to include in context
+    int Results,     // Search results per generated query
+    int Queries,     // Number of search queries to generate (if >1)
+    bool Short,      // Request concise answer
+    bool Long,       // Request detailed answer
+    bool Verbose,    // Enable verbose logging
+    string Question  // Original user question (required)
+);
+```
+
+**Lifecycle**:
+- Created in `Program.cs` by combining CLI options, config defaults, and environment variables
+- Passed to `OpenQueryApp.RunAsync(options)`
+
+**Validation**: None (assumes valid values from CLI parser/config)
+
+**Example**:
+```csharp
+var options = new OpenQueryOptions(
+    Chunks: 3,
+    Results: 5,
+    Queries: 3,
+    Short: false,
+    Long: false,
+    Verbose: true,
+    Question: "What is quantum entanglement?"
+);
+```
+
+---
+
+### Chunk
+
+**Location**: `Models/Chunk.cs`  
+**Type**: `record`  
+**Purpose**: Content chunk with metadata, embedding, and relevance score
+
+```csharp
+public record Chunk(
+    string Content,      // Text content (typically ~500 chars)
+    string SourceUrl,    // Original article URL
+    string? Title = null // Article title (optional, may be null)
+)
+{
+    public float[]? Embedding { get; set; }  // Vector embedding (1536-dim for text-embedding-3-small)
+    public float Score { get; set; }         // Relevance score (0-1, higher = more relevant)
+}
+```
+
+**Lifecycle**:
+1. **Created** in `SearchTool.ExecuteParallelArticleFetchingAsync`:
+   ```csharp
+   chunks.Add(new Chunk(chunkText, result.Url, article.Title));
+   ```
+   At this point: `Embedding = null`, `Score = 0`
+
+2. **Embedded** in `SearchTool.ExecuteParallelEmbeddingsAsync`:
+   ```csharp
+   validChunks[i].Embedding = validEmbeddings[i];
+   ```
+
+3. **Scored** in `SearchTool.RankAndSelectTopChunks`:
+   ```csharp
+   chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
+   ```
+
+4. **Formatted** into context string:
+   ```csharp
+   $"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"
+   ```
+
+**Properties**:
+- `Content`: Never null/empty (filters empty chunks in `ChunkingService`)
+- `SourceUrl`: Always provided (from `SearxngResult.Url`)
+- `Title`: May be null if article extraction failed to get title
+- `Embedding`: Null until phase 3; may remain null if embedding failed
+- `Score`: 0 until phase 4; irrelevant for non-embedded chunks
+
+**Equality**: Records use value equality (all properties compared). Two chunks with same content/url/title are equal; embeddings and scores ignored for equality (as they're mutable).
+
+---
+
+### ParallelProcessingOptions
+
+**Location**: `Models/ParallelOptions.cs`  
+**Type**: `class`  
+**Purpose**: Configuration for parallel/concurrent operations
+
+```csharp
+public class ParallelProcessingOptions
+{
+    public int MaxConcurrentArticleFetches { get; set; } = 10;
+    public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
+    public int EmbeddingBatchSize { get; set; } = 300;
+}
+```
+
+**Usage**:
+- Instantiated in `SearchTool` constructor (hardcoded new)
+- Passed to `EmbeddingService` constructor
+- Read by `SearchTool` for article fetching semaphore
+
+**Default Values**:
+| Property | Default | Effect |
+|----------|---------|--------|
+| `MaxConcurrentArticleFetches` | 10 | Up to 10 articles fetched simultaneously |
+| `MaxConcurrentEmbeddingRequests` | 4 | Up to 4 embedding batches in parallel |
+| `EmbeddingBatchSize` | 300 | Each embedding API call handles up to 300 texts |
+
+**Current Limitation**: These are **compile-time defaults** (hardcoded in `SearchTool.cs`). To make them configurable:
+1. Add to `AppConfig`
+2. Read in `ConfigManager`
+3. Pass through `SearchTool` constructor
+
+---
+
+## OpenRouter API Models
+
+**Location**: `Models/OpenRouter.cs`  
+**Purpose**: DTOs for OpenRouter's REST API (JSON serialization)
+
+### Chat Completion
+
+#### `ChatCompletionRequest`
+```csharp
+public record ChatCompletionRequest(
+    [property: JsonPropertyName("model")] string Model,
+    [property: JsonPropertyName("messages")] List<Message> Messages,
+    [property: JsonPropertyName("tools")] List<ToolDefinition>? Tools = null,
+    [property: JsonPropertyName("stream")] bool Stream = false
+);
+```
+
+**Example**:
+```json
+{
+  "model": "qwen/qwen3.5-flash-02-23",
+  "messages": [
+    { "role": "system", "content": "You are a helpful assistant." },
+    { "role": "user", "content": "What is 2+2?" }
+  ],
+  "stream": true
+}
+```
+
+#### `Message`
+```csharp
+public record Message(
+    [property: JsonPropertyName("role")] string Role,
+    [property: JsonPropertyName("content")] string? Content = null,
+    [property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null,
+    [property: JsonPropertyName("tool_call_id")] string? ToolCallId = null
+)
+{
+    // Factory method for tool responses
+    public static Message FromTool(string content, string toolCallId) =>
+        new Message("tool", content, null, toolCallId);
+}
+```
+
+**Roles**: `"system"`, `"user"`, `"assistant"`, `"tool"`
+
+**Usage**:
+- `Content` for text messages
+- `ToolCalls` when assistant requests tool use
+- `ToolCallId` when responding to tool call
+
+#### `ChatCompletionResponse`
+```csharp
+public record ChatCompletionResponse(
+    [property: JsonPropertyName("choices")] List<Choice> Choices,
+    [property: JsonPropertyName("usage")] Usage? Usage = null
+);
+
+public record Choice(
+    [property: JsonPropertyName("message")] Message Message,
+    [property: JsonPropertyName("finish_reason")] string? FinishReason = null
+);
+```
+
+**Response Example**:
+```json
+{
+  "choices": [
+    {
+      "message": {
+        "role": "assistant",
+        "content": "Answer text..."
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 100,
+    "completion_tokens": 50,
+    "total_tokens": 150
+  }
+}
+```
+
+#### `Usage`
+```csharp
+public record Usage(
+    [property: JsonPropertyName("prompt_tokens")] int PromptTokens,
+    [property: JsonPropertyName("completion_tokens")] int CompletionTokens,
+    [property: JsonPropertyName("total_tokens")] int TotalTokens
+);
+```
+
+### Tool Calling (Not Currently Used)
+
+#### `ToolDefinition` / `ToolFunction`
+```csharp
+public record ToolDefinition(
+    [property: JsonPropertyName("type")] string Type,  // e.g., "function"
+    [property: JsonPropertyName("function")] ToolFunction Function
+);
+
+public record ToolFunction(
+    [property: JsonPropertyName("name")] string Name,
+    [property: JsonPropertyName("description")] string Description,
+    [property: JsonPropertyName("parameters")] JsonElement Parameters  // JSON Schema
+);
+```
+
+#### `ToolCall` / `FunctionCall`
+```csharp
+public record ToolCall(
+    [property: JsonPropertyName("id")] string Id,
+    [property: JsonPropertyName("type")] string Type,
+    [property: JsonPropertyName("function")] FunctionCall Function
+);
+
+public record FunctionCall(
+    [property: JsonPropertyName("name")] string Name,
+    [property: JsonPropertyName("arguments")] string Arguments  // JSON string
+);
+```
+
+**Note**: OpenQuery doesn't use tools currently, but models are defined for future tool-calling capability.
+
+### Streaming
+
+#### `StreamChunk`
+```csharp
+public record StreamChunk(
+    string? TextDelta = null,
+    ClientToolCall? Tool = null
+);
+```
+
+Yielded by `OpenRouterClient.StreamAsync()` for each SSE event.
+
+#### `ChatCompletionChunk` (Server Response)
+```csharp
+public record ChatCompletionChunk(
+    [property: JsonPropertyName("choices")] List<ChunkChoice> Choices
+);
+
+public record ChunkChoice(
+    [property: JsonPropertyName("delta")] ChunkDelta Delta
+);
+
+public record ChunkDelta(
+    [property: JsonPropertyName("content")] string? Content = null,
+    [property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null
+);
+```
+
+**Streaming Response Example** (SSE):
+```
+data: {"choices":[{"delta":{"content":"Hello"}}]}
+data: {"choices":[{"delta":{"content":" world"}}]}
+data: [DONE]
+```
+
+`OpenRouterClient.StreamAsync` parses and yields `StreamChunk` with non-null `TextDelta` for content.
+
+### Embeddings
+
+#### `EmbeddingRequest`
+```csharp
+public record EmbeddingRequest(
+    [property: JsonPropertyName("model")] string Model,
+    [property: JsonPropertyName("input")] List<string> Input
+);
+```
+
+**Example**:
+```json
+{
+  "model": "openai/text-embedding-3-small",
+  "input": ["text 1", "text 2", ...]
+}
+```
+
+#### `EmbeddingResponse`
+```csharp
+public record EmbeddingResponse(
+    [property: JsonPropertyName("data")] List<EmbeddingData> Data,
+    [property: JsonPropertyName("usage")] Usage Usage
+);
+
+public record EmbeddingData(
+    [property: JsonPropertyName("embedding")] float[] Embedding,
+    [property: JsonPropertyName("index")] int Index
+);
+```
+
+**Response Example**:
+```json
+{
+  "data": [
+    { "embedding": [0.1, 0.2, ...], "index": 0 },
+    { "embedding": [0.3, 0.4, ...], "index": 1 }
+  ],
+  "usage": {
+    "prompt_tokens": 100,
+    "total_tokens": 100
+  }
+}
+```
+
+**Note**: `_client.EmbedAsync` orders by `index` to match input order.
+
+---
+
+## SearxNG API Models
+
+**Location**: `Models/Searxng.cs`  
+**Purpose**: DTOs for SearxNG's JSON response format
+
+### `SearxngRoot`
+```csharp
+public record SearxngRoot(
+    [property: JsonPropertyName("results")] List<SearxngResult> Results
+);
+```
+
+Top-level response object.
+
+### `SearxngResult`
+```csharp
+public record SearxngResult(
+    [property: JsonPropertyName("title")] string Title,
+    [property: JsonPropertyName("url")] string Url,
+    [property: JsonPropertyName("content")] string Content  // Snippet/description
+);
+```
+
+**Fields**:
+- `Title`: Result title (from page `<title>` or OpenGraph)
+- `Url`: Absolute URL to article
+- `Content`: Short snippet (~200 chars) from search engine
+
+**Usage**:
+- `Url` passed to `ArticleService.FetchArticleAsync`
+- `Title` used as fallback if article extraction fails
+- `Content` currently unused (could be for quick answer without fetching)
+
+**Example Response**:
+```json
+{
+  "results": [
+    {
+      "title": "Quantum Entanglement - Wikipedia",
+      "url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
+      "content": "Quantum entanglement is a physical phenomenon..."
+    }
+  ]
+}
+```
+
+---
+
+## JSON Serialization
+
+### JsonContext (Source Generation)
+
+**Location**: `Models/JsonContexts.cs`  
+**Purpose**: Provide source-generated JSON serializer context for AOT compatibility
+
+#### Declaration
+```csharp
+[JsonSerializable(typeof(ChatCompletionRequest))]
+[JsonSerializable(typeof(ChatCompletionResponse))]
+[JsonSerializable(typeof(ChatCompletionChunk))]
+[JsonSerializable(typeof(EmbeddingRequest))]
+[JsonSerializable(typeof(EmbeddingResponse))]
+[JsonSerializable(typeof(SearxngRoot))]
+[JsonJsonSerializer(typeof(List<string>))]
+internal partial class AppJsonContext : JsonSerializerContext
+{
+}
+```
+
+**Usage**:
+```csharp
+var json = JsonSerializer.Serialize(request, AppJsonContext.Default.ChatCompletionRequest);
+var response = JsonSerializer.Deserialize(json, AppJsonContext.Default.ChatCompletionResponse);
+```
+
+**Benefits**:
+- **AOT-compatible**: No reflection, works with PublishAot=true
+- **Performance**: Pre-compiled serializers are faster
+- **Trimming safe**: Unused serializers trimmed automatically
+
+**Generated**: Partial class compiled by source generator (no manual implementation)
+
+**Important**: Must include ALL types that will be serialized/deserialized in `[JsonSerializable]` attributes, otherwise runtime exception in AOT.
+
+---
+
+## Model Relationships
+
+### Object Graph (Typical Execution)
+
+```
+OpenQueryOptions
+    ↓
+OpenQueryApp.RunAsync()
+    │
+    ├─ queryGenerationMessages (List<Message>)
+    │   ├─ system: "You are an expert researcher..."
+    │   └─ user: "Generate N queries for: {question}"
+    │       ↓
+    │   ChatCompletionRequest → OpenRouter → ChatCompletionResponse
+    │       ↓
+    │   List<string> generatedQueries
+    │
+    ├─ SearchTool.ExecuteAsync()
+    │       ↓
+    │   ┌─────────────────────────────────────┐
+    │   │ Phase 1: Parallel Searches         │
+    │   │ SearxngClient.SearchAsync(query) × N
+    │   │ → List<SearxngResult>              │
+    │   │   (Title, Url, Content)            │
+    │   └─────────────────────────────────────┘
+    │       ↓
+    │   ┌─────────────────────────────────────┐
+    │   │ Phase 2: Article Fetch & Chunking  │
+    │   │ ArticleService.FetchAsync(Url) × M
+    │   │ → Article (TextContent, Title)
+    │   │ → ChunkingService.ChunkText → List<string> chunks
+    │   │ → Chunk(content, url, title) × K  │
+    │   └─────────────────────────────────────┘
+    │       ↓
+    │   ┌─────────────────────────────────────┐
+    │   │ Phase 3: Embeddings                │
+    │   │ EmbeddingService.GetEmbeddingsAsync(chunkContents)
+    │   │ → float[][] chunkEmbeddings        │
+    │   │ → Set chunk.Embedding for each     │
+    │   │ Also: GetEmbeddingAsync(question) → float[] queryEmbedding
+    │   └─────────────────────────────────────┘
+    │       ↓
+    │   ┌─────────────────────────────────────┐
+    │   │ Phase 4: Ranking                   │
+    │   │ For each chunk: Score = CosineSimilarity(queryEmbedding, chunk.Embedding)
+    │   │ → Set chunk.Score                 │
+    │   │ → OrderByDescending(Score)        │
+    │   │ → Take(topChunksLimit) → topChunks (List<Chunk>)
+    │   └─────────────────────────────────────┘
+    │       ↓
+    │   Context string: formatted topChunks
+    │       ↓
+    └─ OpenQueryApp → final ChatCompletionRequest
+        System: "Answer based on context..."
+        User: "Context:\n{context}\n\nQuestion: {question}"
+            ↓
+        StreamAsync() → StreamChunk.TextDelta → Console
+```
+
+### Record Immutability
+
+Most DTOs are `record` types:
+- **Immutable**: Properties are init-only (`{ get; init; }`)
+- **Value semantics**: Equality based on content
+- **Thread-safe**: Can be shared across threads
+
+**Exception**:
+- `Chunk`: Has mutable properties `Embedding` and `Score` (set during pipeline)
+- `ParallelProcessingOptions`: Class with mutable setters
+- `AppConfig`: Class with mutable setters
+
+---
+
+## Next Steps
+
+- **[API Reference](../../api/cli.md)** - How these models are used in CLI commands
+- **[OpenRouterClient](../../services/OpenRouterClient.md)** - Uses OpenRouter models
+- **[SearxngClient](../../services/SearxngClient.md)** - Uses Searxng models
+- **[SearchTool](../../components/search-tool.md)** - Orchestrates all models
+
+---
+
+**Quick Reference Table**
+
+| Model | Category | Purpose | Mutable? |
+|-------|----------|---------|----------|
+| `OpenQueryOptions` | Core | CLI options | No (record) |
+| `Chunk` | Core | Content + metadata + ranking | Partially (Embedding, Score) |
+| `ParallelProcessingOptions` | Config | Concurrency settings | Yes (class) |
+| `ChatCompletionRequest/Response` | OpenRouter | LLM API | No |
+| `EmbeddingRequest/Response` | OpenRouter | Embeddings API | No |
+| `SearxngRoot/Result` | SearxNG | Search results | No |
+| `AppJsonContext` | Internal | JSON serialization | No (generated partial) |
diff --git a/docs/components/openquery-app.md b/docs/components/openquery-app.md
new file mode 100644
index 0000000..41cdeec
--- /dev/null
+++ b/docs/components/openquery-app.md
@@ -0,0 +1,395 @@
+# OpenQueryApp Component
+
+Deep dive into the `OpenQueryApp` class - the main application orchestrator.
+
+## Overview
+
+`OpenQueryApp` is the heart of OpenQuery. It coordinates all components, manages the workflow from question to answer, and handles progress reporting.
+
+## Location
+`OpenQuery.cs` in project root
+
+## Class Definition
+
+```csharp
+public class OpenQueryApp
+{
+    private readonly OpenRouterClient _client;
+    private readonly SearchTool _searchTool;
+    private readonly string _model;
+
+    public OpenQueryApp(
+        OpenRouterClient client,
+        SearchTool searchTool,
+        string model);
+    
+    public async Task RunAsync(OpenQueryOptions options);
+}
+```
+
+**Dependencies**:
+- `OpenRouterClient` - for query generation and final answer streaming
+- `SearchTool` - for search-retrieve-rank pipeline
+- `string _model` - model identifier to use for LLM calls
+
+**Lifecycle**: Instantiated once per query execution in `Program.cs`, then `RunAsync()` called once.
+
+## RunAsync Workflow
+
+```csharp
+public async Task RunAsync(OpenQueryOptions options)
+{
+    // 1. Setup
+    using var reporter = new StatusReporter(options.Verbose);
+    reporter.StartSpinner();
+    
+    // 2. Query Generation (if needed)
+    List<string> queries = await GenerateQueriesIfNeededAsync(options, reporter);
+    
+    // 3. Search Pipeline
+    string searchResult = await ExecuteSearchPipelineAsync(options, queries, reporter);
+    
+    // 4. Final Answer Streaming
+    await StreamFinalAnswerAsync(options, searchResult, reporter);
+}
+```
+
+### Step 1: Status Reporter Setup
+
+```csharp
+using var reporter = new StatusReporter(options.Verbose);
+reporter.StartSpinner();
+```
+
+- Creates `StatusReporter` (implements `IDisposable`)
+- Starts spinner animation (unless verbose)
+- `using` ensures disposal on exit
+
+### Step 2: Query Generation
+
+**When**: `options.Queries > 1` (user wants multiple search queries)
+
+**Purpose**: Use LLM to generate diverse, optimized search queries from the original question
+
+**System Prompt** (hardcoded in `OpenQuery.cs`):
+```
+You are an expert researcher. The user will ask a question. Your task is to 
+generate optimal search queries to gather comprehensive information.
+
+Instructions:
+1. Break down complex questions.
+2. Use synonyms and alternative phrasing.
+3. Target different aspects (entities, mechanisms, pros/cons, history).
+
+CRITICAL: Output must be a valid JSON array of strings ONLY. No markdown, 
+explanations, or other text.
+```
+
+**Request**:
+```csharp
+var queryGenMessages = new List<Message>
+{
+    new Message("system", systemPrompt),
+    new Message("user", $"Generate {options.Queries} distinct search queries for:\n{options.Question}")
+};
+var request = new ChatCompletionRequest(_model, queryGenMessages);
+var response = await _client.CompleteAsync(request);
+```
+
+**Response Parsing**:
+```csharp
+var content = response.Choices.FirstOrDefault()?.Message.Content;
+if (!string.IsNullOrEmpty(content))
+{
+    // Remove markdown code fences if present
+    content = Regex.Replace(content, @"```json\s*|\s*```", "").Trim();
+    
+    // Deserialize to List<string>
+    var generatedQueries = JsonSerializer.Deserialize(content, AppJsonContext.Default.ListString);
+    if (generatedQueries != null && generatedQueries.Count > 0)
+    {
+        queries = generatedQueries;
+    }
+}
+```
+
+**Fallback**: If any step fails (exception, null, empty, invalid JSON), use `new List<string> { options.Question }` (single query = original)
+
+**Note**: Query generation reuses the same model as final answer. This could be optimized:
+- Use cheaper/faster model for query gen
+- Separate model configuration
+- Cache query generation results
+
+### Step 3: Search Pipeline Execution
+
+```csharp
+var searchResult = await _searchTool.ExecuteAsync(
+    options.Question,
+    queries,
+    options.Results,
+    options.Chunks,
+    (progress) => {
+        if (options.Verbose)
+            reporter.WriteLine(progress);
+        else
+            reporter.UpdateStatus(parsedMessage);
+    },
+    options.Verbose);
+```
+
+**Parameters**:
+- `originalQuery`: User's original question (used for final embedding)
+- `generatedQueries`: From step 2 (or fallback)
+- `maxResults`: `options.Results` (search results per query)
+- `topChunksLimit`: `options.Chunks` (top N chunks to return)
+- `onProgress`: Callback to update UI
+- `verbose`: Passed through to `SearchTool`
+
+**Returns**: `string context` - formatted context with source citations
+
+**Progress Handling**:
+- In verbose mode: all progress printed as lines (via `reporter.WriteLine()`)
+- In compact mode: parse progress messages to show concise status (e.g., "Fetching articles 3/10...")
+
+### Step 4: Final Answer Streaming
+
+**Status Update**:
+```csharp
+if (!options.Verbose)
+    reporter.UpdateStatus("Asking AI...");
+else
+{
+    reporter.ClearStatus();
+    Console.WriteLine();
+}
+```
+
+**Build System Prompt**:
+```csharp
+var systemPrompt = "You are a helpful AI assistant. Answer the user's question in depth, based on the provided context. Be precise and accurate. You can mention sources or citations.";
+if (options.Short) systemPrompt += " Give a very short concise answer.";
+if (options.Long) systemPrompt += " Give a long elaborate detailed answer.";
+```
+
+**Prompt Structure**:
+```
+System: {systemPrompt}
+User: Context:
+{searchResult}
+
+Question: {options.Question}
+```
+
+Where `searchResult` is:
+```
+[Source 1: Title](URL)
+Content chunk 1
+
+[Source 2: Title](URL)
+Content chunk 2
+
+...
+```
+
+**Streaming**:
+```csharp
+var requestStream = new ChatCompletionRequest(_model, messages);
+var assistantResponse = new StringBuilder();
+var isFirstChunk = true;
+
+using var streamCts = new CancellationTokenSource();
+await foreach (var chunk in _client.StreamAsync(requestStream, streamCts.Token))
+{
+    if (chunk.TextDelta == null) continue;
+    
+    if (isFirstChunk)
+    {
+        reporter.StopSpinner();
+        if (!options.Verbose) reporter.ClearStatus();
+        else Console.Write("Assistant: ");
+        isFirstChunk = false;
+    }
+    
+    Console.Write(chunk.TextDelta);
+    assistantResponse.Append(chunk.TextDelta);
+}
+```
+
+**Key Points**:
+- `StreamAsync` yields `StreamChunk` objects (text deltas)
+- First chunk stops spinner and clears status line
+- Each delta written to Console immediately (real-time feel)
+- Entire response accumulated in `assistantResponse` (though not used elsewhere)
+- `CancellationTokenSource` passed but not canceled (Ctrl+C would cancel from outside)
+
+**Finally Block**:
+```csharp
+finally
+{
+    reporter.StopSpinner();
+}
+```
+Ensures spinner stops even if streaming fails.
+
+**End**:
+```csharp
+Console.WriteLine(); // Newline after complete answer
+```
+
+## Error Handling
+
+`RunAsync` itself does not catch exceptions. All exceptions propagate to `Program.cs`:
+
+```csharp
+try
+{
+    var openQuery = new OpenQueryApp(client, searchTool, model);
+    await openQuery.RunAsync(options);
+}
+catch (HttpRequestException ex)
+{
+    Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
+    Environment.Exit(1);
+}
+catch (Exception ex)
+{
+    Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
+    Environment.Exit(1);
+}
+```
+
+**Common Exceptions**:
+- `HttpRequestException` - network failures, API errors
+- `JsonException` - malformed JSON from API
+- `TaskCanceledException` - timeout or user interrupt
+- `Exception` - anything else
+
+**No Retries at This Level**: Fail fast; user sees error immediately. Lower-level retries exist (embedding service).
+
+## Performance Characteristics
+
+**Query Generation**:
+- One non-streaming LLM call
+- Takes 2-5 seconds depending on model
+- Typically <1000 tokens
+
+**Search Pipeline** (`SearchTool.ExecuteAsync`):
+- See `SearchTool.md` for detailed timing breakdown
+- Total 10-30 seconds typically
+
+**Final Answer Streaming**:
+- Streaming LLM call
+- Time depends on answer length (typically 5-20 seconds)
+- User sees words appear progressively
+
+**Total End-to-End**: 15-50 seconds for typical query
+
+## Design Decisions
+
+### Why Not Stream Query Generation?
+
+Query generation currently uses `CompleteAsync` (non-streaming). Could be streamed but:
+- Queries are short (JSON array)
+- Streaming offers no UX benefit (user doesn't see intermediate queries)
+- Simpler to wait for all queries before proceeding
+
+### Why Build Prompt Manually Instead of Templates?
+
+Simple string concatenation is fine for few prompts. Pros:
+- No template dependencies
+- Easy to read and modify
+- No runtime compilation overhead
+
+Cons:
+- No validation
+- Could benefit from prompt engineering framework
+
+### Why Accumulate `assistantResponse` StringBuilder?
+
+Currently built but not used. Could be:
+- Saved to file (future feature: `--output file.md`)
+- Analyzed for token counting
+- Removed if not needed
+
+### Could Query Generation Be Cached?
+
+Yes! For repeated questions (common in scripts), cache query results:
+- `Dictionary<string, List<string>>` cache in memory
+- Or persistent cache (Redis, file)
+- Not implemented (low priority)
+
+### Single Responsibility Violation?
+
+`OpenQueryApp` does:
+- Query generation
+- Pipeline orchestration
+- Answer streaming
+
+That's 3 responsibilities, but they're tightly coupled to the "query → answer" workflow. Separating them would add complexity without clear benefit. Acceptable as "application coordinator".
+
+## Extension Points
+
+### Adding New Model for Query Generation
+
+Currently uses same `_model` for queries and answer. To use different models:
+
+1. Add `queryGenerationModel` parameter to constructor
+2. Use it for query gen: `new ChatCompletionRequest(queryGenerationModel, queryGenMessages)`
+3. Keep `_model` for final answer
+
+Or make it configurable via environment variable: `OPENROUTER_QUERY_MODEL`
+
+### Post-Processing Answer
+
+Opportunity to add:
+- Source citation formatting (footnotes, clickable links)
+- Answer summarization
+- Export to Markdown/JSON
+- Text-to-speech
+
+Add after streaming loop, before final newline.
+
+### Progress UI Enhancement
+
+Current `StatusReporter` is basic. Could add:
+- Progress bar with percentage
+- ETA calculation
+- Colors (ANSI) for different message types
+- Logging to file
+- Web dashboard
+
+Would require extending `StatusReporter` or replacing it.
+
+## Testing Considerations
+
+**Challenges**:
+- `RunAsync` is cohesive (hard to unit test in isolation)
+- Depends on many services (need mocks)
+- Asynchronous and streaming
+
+**Recommended Approach**:
+1. Extract interfaces:
+   - `ISearchTool` (wrapper around `SearchTool`)
+   - `IOpenRouterClient` (wrapper around `OpenRouterClient`)
+2. Mock interfaces in tests
+3. Test query generation parsing separately
+4. Test progress callback counting
+5. Test final answer prompt construction
+
+**Integration Tests**:
+- End-to-end with real/mocked APIs
+- Automated tests with test SearxNG/OpenRouter instances
+
+## Related Components
+
+- **[SearchTool](search-tool.md)** - pipeline executed by `OpenQueryApp`
+- **[Program.cs](../Program.md)** - creates `OpenQueryApp`
+- **[StatusReporter](../services/StatusReporter.md)** - progress UI used by `OpenQueryApp`
+
+---
+
+## Next Steps
+
+- [SearchTool](search-tool.md) - See the pipeline in detail
+- [Services](../services/overview.md) - Understand each service
+- [CLI Reference](../../api/cli.md) - How users invoke this
diff --git a/docs/components/overview.md b/docs/components/overview.md
new file mode 100644
index 0000000..6ee0edb
--- /dev/null
+++ b/docs/components/overview.md
@@ -0,0 +1,603 @@
+# Components Overview
+
+Detailed documentation for each major component in the OpenQuery system.
+
+## 📋 Table of Contents
+
+1. [Component Hierarchy](#component-hierarchy)
+2. [Core Components](#core-components)
+3. [Services](#services)
+4. [Data Models](#data-models)
+5. [Component Interactions](#component-interactions)
+
+## Component Hierarchy
+
+```
+OpenQuery/
+├── Program.cs                    [Entry Point, CLI]
+├── OpenQuery.cs                  [OpenQueryApp - Orchestrator]
+├── Tools/
+│   └── SearchTool.cs            [Pipeline Orchestration]
+├── Services/
+│   ├── OpenRouterClient.cs      [LLM & Embedding API]
+│   ├── SearxngClient.cs         [Search API]
+│   ├── EmbeddingService.cs      [Embedding Generation + Math]
+│   ├── ChunkingService.cs       [Text Splitting]
+│   ├── ArticleService.cs        [Content Extraction]
+│   ├── RateLimiter.cs           [Concurrency Control]
+│   └── StatusReporter.cs        [Progress Display]
+├── Models/
+│   ├── OpenQueryOptions.cs      [CLI Options Record]
+│   ├── Chunk.cs                 [Content + Metadata]
+│   ├── ParallelOptions.cs       [Concurrency Settings]
+│   ├── OpenRouter.cs            [API DTOs]
+│   ├── Searxng.cs               [Search Result DTOs]
+│   └── JsonContexts.cs          [JSON Context]
+└── ConfigManager.cs             [Configuration Persistence]
+```
+
+## Core Components
+
+### 1. Program.cs
+
+**Type**: Console Application Entry Point  
+**Responsibilities**: CLI parsing, dependency wiring, error handling
+
+**Key Elements**:
+- `RootCommand` from System.CommandLine
+- Options: `--chunks`, `--results`, `--queries`, `--short`, `--long`, `--verbose`
+- Subcommand: `configure` (with interactive mode)
+- Configuration loading via `ConfigManager.Load()`
+- Environment variable resolution
+- Service instantiation and coordination
+- Top-level try-catch for error reporting
+
+**Code Flow**:
+1. Load config file
+2. Define CLI options and commands
+3. Set handler for root command
+4. Handler: resolve API key/model → instantiate services → call `OpenQueryApp.RunAsync()`
+5. Set handler for configure command (writes config file)
+6. Invoke command parser: `await rootCommand.InvokeAsync(args)`
+
+**Exit Codes**:
+- 0 = success
+- 1 = error
+
+### 2. OpenQueryApp (OpenQuery.cs)
+
+**Type**: Main Application Class  
+**Responsibilities**: Workflow orchestration, query generation, answer streaming
+
+**Constructor Parameters**:
+- `OpenRouterClient client` - for query gen and final answer
+- `SearchTool searchTool` - for search-retrieve-rank pipeline
+- `string model` - LLM model identifier
+
+**Main Method**: `RunAsync(OpenQueryOptions options)`
+
+**Workflow Steps**:
+1. Create `StatusReporter` (for progress UI)
+2. **Optional Query Generation** (if `options.Queries > 1`):
+   - Create system message instructing JSON array output
+   - Create user message with `options.Question`
+   - Call `client.CompleteAsync()` with query gen model
+   - Parse JSON response; fall back to original question on failure
+   - Result: `List<string> queries` (1 or many)
+3. **Execute Search Pipeline**:
+   - Call `_searchTool.ExecuteAsync()` with queries, options
+   - Receive `string context` (formatted context with source citations)
+   - Progress reported via callback to `StatusReporter`
+4. **Generate Final Answer**:
+   - Build system prompt (append "short" or "long" modifier)
+   - Create user message with `Context:\n{context}\n\nQuestion: {options.Question}`
+   - Stream answer via `client.StreamAsync()`
+   - Write each `chunk.TextDelta` to Console as it arrives
+   - Stop spinner on first chunk, continue streaming
+5. Dispose reporter
+
+**Error Handling**:
+- Exceptions propagate to `Program.cs` top-level handler
+- `HttpRequestException` vs generic `Exception`
+
+**Note**: Query generation uses the same model as final answer; could be separated for cost/performance.
+
+### 3. SearchTool (Tools/SearchTool.cs)
+
+**Type**: Pipeline Orchestrator  
+**Responsibilities**: Execute 4-phase search-retrieve-rank-return workflow
+
+**Constructor Parameters**:
+- `SearxngClient searxngClient`
+- `EmbeddingService embeddingService`
+
+**Main Method**: `ExecuteAsync(originalQuery, generatedQueries, maxResults, topChunksLimit, onProgress, verbose)`
+
+**Returns**: `Task<string>` - formatted context string with source citations
+
+**Pipeline Phases**:
+
+#### Phase 1: ExecuteParallelSearchesAsync
+- Parallelize `searxngClient.SearchAsync(query, maxResults)` for each query
+- Collect all results in `ConcurrentBag<SearxngResult>`
+- Deduplicate by `DistinctBy(r => r.Url)`
+
+**Output**: `List<SearxngResult>` (aggregated, unique)
+
+#### Phase 2: ExecuteParallelArticleFetchingAsync
+- Semaphore: `MaxConcurrentArticleFetches` (default 10)
+- For each `SearxngResult`: fetch URL via `ArticleService.FetchArticleAsync()`
+- Extract article text, title
+- Chunk via `ChunkingService.ChunkText(article.TextContent)`
+- Add each chunk as new `Chunk(content, url, title)`
+
+**Output**: `List<Chunk>` (potentially 50-100 chunks)
+
+#### Phase 3: ExecuteParallelEmbeddingsAsync
+- Start two parallel tasks:
+  1. Query embedding: `embeddingService.GetEmbeddingAsync(originalQuery)`
+  2. Chunk embeddings: `embeddingService.GetEmbeddingsWithRateLimitAsync(chunkTexts, onProgress)`
+- `Parallel.ForEachAsync` with `MaxConcurrentEmbeddingRequests` (default 4)
+- Batch size: 300 chunks per embedding API call
+- Filter chunks with empty embeddings (failed batches)
+
+**Output**: `(float[] queryEmbedding, float[][] chunkEmbeddings)`
+
+#### Phase 4: RankAndSelectTopChunks
+- Calculate cosine similarity for each chunk vs query
+- Assign `chunk.Score`
+- Order by descending score
+- Take `topChunksLimit` (from `--chunks` option)
+- Return `List<Chunk>` (top N)
+
+**Formatting**:
+```csharp
+string context = string.Join("\n\n", topChunks.Select((c, i) =>
+    $"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
+```
+
+**Progress Callbacks**: Invoked at each major step for UI feedback
+
+## Services
+
+### OpenRouterClient
+
+**Purpose**: HTTP client for OpenRouter API (chat completions + embeddings)
+
+**Base URL**: `https://openrouter.ai/api/v1`
+
+**Authentication**: `Authorization: Bearer {apiKey}`
+
+**Methods**:
+
+#### `StreamAsync(ChatCompletionRequest request, CancellationToken)`
+- Sets `request.Stream = true`
+- POST to `/chat/completions`
+- Reads SSE stream line-by-line
+- Parses `data: {json}` chunks
+- Yields `StreamChunk` (text delta or tool call)
+- Supports cancellation
+
+#### `CompleteAsync(ChatCompletionRequest request)`
+- Sets `request.Stream = false`
+- POST to `/chat/completions`
+- Deserializes full response
+- Returns `ChatCompletionResponse`
+
+#### `EmbedAsync(string model, List<string> inputs)`
+- POST to `/embeddings`
+- Returns `float[][]` (ordered by input index)
+
+**Error Handling**: `EnsureSuccessStatusCode()` throws `HttpRequestException` on failure
+
+**Design**: Thin wrapper; no retry logic (delegated to EmbeddingService)
+
+### SearxngClient
+
+**Purpose**: HTTP client for SearxNG metasearch
+
+**Base URL**: Configurable (default `http://localhost:8002`)
+
+**Methods**:
+
+#### `SearchAsync(string query, int limit = 10)`
+- GET `{baseUrl}/search?q={query}&format=json`
+- Deserializes to `SearxngRoot`
+- Returns `Results.Take(limit).ToList()`
+- On failure: returns empty `List<SearxngResult>` (no exception)
+
+**Design**: Very simple; failures are tolerated (OpenQuery continues with other queries)
+
+### EmbeddingService
+
+**Purpose**: Batch embedding generation with rate limiting, parallelization, and retries
+
+**Configuration** (from `ParallelProcessingOptions`):
+- `MaxConcurrentEmbeddingRequests` = 4
+- `EmbeddingBatchSize` = 300
+
+**Default Embedding Model**: `openai/text-embedding-3-small`
+
+**Methods**:
+
+#### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
+- Splits `texts` into batches of `EmbeddingBatchSize`
+- Parallelizes batches with `Parallel.ForEachAsync` + `MaxConcurrentEmbeddingRequests`
+- Each batch: rate-limited + retry-wrapped `client.EmbedAsync(model, batch)`
+- Collects results in order (by batch index)
+- Returns `float[][]` (same order as input texts)
+- Failed batches return empty `float[]` for each text
+
+#### `GetEmbeddingAsync(string text, CancellationToken)`
+- Wraps single-text call in rate limiter + retry
+- Returns `float[]`
+
+#### `CosineSimilarity(float[] v1, float[] v2)`
+- Static method using `TensorPrimitives.CosineSimilarity`
+- Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
+
+**Retry Policy** (Polly):
+- Max 3 attempts
+- 1s base delay, exponential backoff
+- Only `HttpRequestException`
+
+**Rate Limiting**: `RateLimiter` semaphore with `MaxConcurrentEmbeddingRequests`
+
+**Design Notes**:
+- Two similar methods (`GetEmbeddingsAsync` and `GetEmbeddingsWithRateLimitAsync`) - could be consolidated
+- Uses Polly for resilience (good pattern)
+- Concurrency control prevents overwhelming OpenRouter
+
+### ChunkingService
+
+**Purpose**: Split long text into manageable pieces
+
+**Static Class** (no dependencies, pure function)
+
+**Algorithm** (in `ChunkText(string text)`):
+- Constant `MAX_CHUNK_SIZE = 500`
+- While remaining text:
+  - Take up to 500 chars
+  - If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
+  - Trim and add non-empty chunk
+  - Advance start position
+
+**Rationale**: 500 chars is a sweet spot for embeddings - long enough for context, short enough for semantic coherence.
+
+**Edge Cases**: Handles text shorter than 500 chars, empty text, text with no natural breaks.
+
+### ArticleService
+
+**Purpose**: Extract clean article content from URLs
+
+**Method**: `FetchArticleAsync(string url)`
+
+**Implementation**: Delegates to `SmartReader.ParseArticleAsync(url)`
+
+**Returns**: `Article` object (from SmartReader)
+- `Title` (string)
+- `TextContent` (string) - cleaned article body
+- `IsReadable` (bool) - quality indicator
+- Other metadata (author, date, etc.)
+
+**Error Handling**: Exceptions propagate (handled by `SearchTool`)
+
+**Design**: Thin wrapper around third-party library. Could be extended to add caching, custom extraction rules, etc.
+
+### RateLimiter
+
+**Purpose**: Limit concurrent operations via semaphore
+
+**Interface**:
+```csharp
+public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken);
+public async Task ExecuteAsync(Func<Task> action, CancellationToken);
+```
+
+**Implementation**: `SemaphoreSlim` with `WaitAsync` and `Release`
+
+**Disposal**: `IAsyncDisposable` (awaits semaphore disposal)
+
+**Usage**: Wrap API calls that need concurrency control
+```csharp
+var result = await _rateLimiter.ExecuteAsync(async () =>
+    await _client.EmbedAsync(model, batch), cancellationToken);
+```
+
+**Design**: Simple, reusable. Could be replaced with `Polly.RateLimiting` policy but this is lightweight.
+
+### StatusReporter
+
+**Purpose**: Real-time progress UI with spinner and verbose modes
+
+**Architecture**:
+- Producer: UpdateStatus(text) → writes to `Channel<string>`
+- Consumer: Background task `ProcessStatusUpdatesAsync()` reads from channel
+- Spinner: Separate task animates Braille characters every 100ms
+
+**Modes**:
+
+**Verbose Mode** (`_verbose = true`):
+- All progress messages written as `Console.WriteLine()`
+- No spinner
+- Full audit trail
+
+**Compact Mode** (default):
+- Status line with spinner (overwrites same line)
+- Only latest status visible
+- Example: `⠋ Fetching articles 3/10...`
+
+**Key Methods**:
+- `UpdateStatus(message)` - fire-and-forget, non-blocking
+- `WriteLine(text)` - stops spinner temporarily, writes full line
+- `StartSpinner()` / `StopSpinner()` - manual control
+- `ClearStatus()` - ANSI escape `\r\x1b[K` to clear line
+- `Dispose()` - completes channel, waits for background tasks
+
+**Spinner Chars**: `['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']` (Braille patterns, smooth animation)
+
+**ANSI Codes**: `\r` (carriage return), `\x1b[K` (erase to end of line)
+
+**Thread Safety**: Channel is thread-safe; multiple components can write concurrently without locks
+
+**Design**: Well-encapsulated; could be reused in other CLI projects.
+
+### ConfigManager
+
+**Purpose**: Load/save configuration from XDG-compliant location
+
+**Config Path**:
+- `Environment.SpecialFolder.UserProfile` → `~/.config/openquery/config`
+
+**Schema** (`AppConfig`):
+```csharp
+public class AppConfig
+{
+    public string ApiKey { get; set; } = "";
+    public string Model { get; set; } = "qwen/qwen3.5-flash-02-23";
+    public int DefaultQueries { get; set; } = 3;
+    public int DefaultChunks { get; set; } = 3;
+    public int DefaultResults { get; set; } = 5;
+}
+```
+
+**Format**: Simple `key=value` (no INI parser, manual line split)
+
+**Methods**:
+- `Load()` → reads file if exists, returns `AppConfig` (with defaults)
+- `Save(AppConfig)` → writes all 5 keys, overwrites existing
+
+**Design**:
+- Static class (no instances)
+- Creates directory if missing
+- No validation (writes whatever values given)
+- Could be improved with JSON format (but keep simple)
+
+## Data Models
+
+### OpenQueryOptions
+
+**Location**: `Models/OpenQueryOptions.cs`
+
+**Type**: `record`
+
+**Purpose**: Immutable options object passed through workflow
+
+**Properties**:
+- `int Chunks` - top N chunks for context
+- `int Results` - search results per query
+- `int Queries` - number of expanded queries to generate
+- `bool Short` - concise answer flag
+- `bool Long` - detailed answer flag
+- `bool Verbose` - verbose logging flag
+- `string Question` - original user question
+
+**Created**: In `Program.cs` from CLI options + config defaults
+
+**Used By**: `OpenQueryApp.RunAsync()`
+
+### Chunk
+
+**Location**: `Models/Chunk.cs`
+
+**Type**: `record`
+
+**Purpose**: Content chunk with metadata and embedding
+
+**Properties**:
+- `string Content` - extracted text (~500 chars)
+- `string SourceUrl` - article URL
+- `string? Title` - article title (nullable)
+- `float[]? Embedding` - vector embedding (populated by EmbeddingService)
+- `float Score` - relevance score (populated during ranking)
+
+**Lifecycle**:
+1. Instantiated in `SearchTool.ExecuteParallelArticleFetchingAsync` with content, url, title
+2. `Embedding` set in `ExecuteParallelEmbeddingsAsync` after batch processing
+3. `Score` set in `RankAndSelectTopChunks` after cosine similarity
+4. Serialized into context string for final answer
+
+**Equality**: Records provide value equality (based on all properties)
+
+### ParallelProcessingOptions
+
+**Location**: `Models/ParallelOptions.cs`
+
+**Type**: `class` (mutable)
+
+**Purpose**: Concurrency settings for parallel operations
+
+**Properties** (with defaults):
+- `MaxConcurrentArticleFetches` = 10
+- `MaxConcurrentEmbeddingRequests` = 4
+- `EmbeddingBatchSize` = 300
+
+**Used By**: `EmbeddingService` (for embeddings), `SearchTool` (for article fetching)
+
+**Currently**: Hardcoded in `SearchTool` constructor; could be made configurable
+
+### OpenRouter Models (Models/OpenRouter.cs)
+
+**Purpose**: DTOs for OpenRouter API (JSON serializable)
+
+**Chat Completion**:
+- `ChatCompletionRequest` (model, messages, tools, stream)
+- `ChatCompletionResponse` (choices[], usage[])
+- `Message` (role, content, tool_calls, tool_call_id)
+- `ToolDefinition`, `ToolFunction`, `ToolCall`, `FunctionCall`
+- `Choice`, `Usage`
+
+**Embedding**:
+- `EmbeddingRequest` (model, input[])
+- `EmbeddingResponse` (data[], usage)
+- `EmbeddingData` (embedding[], index)
+
+**Streaming**:
+- `StreamChunk` (TextDelta, Tool)
+- `ChatCompletionChunk`, `ChunkChoice`, `ChunkDelta`
+
+**JSON Properties**: Uses `[JsonPropertyName]` to match API
+
+**Serialization**: System.Text.Json with source generation (AppJsonContext)
+
+### Searxng Models (Models/Searxng.cs)
+
+**Purpose**: DTOs for SearxNG search results
+
+**Records**:
+- `SearxngRoot` with `List<SearxngResult> Results`
+- `SearxngResult` with `Title`, `Url`, `Content` (snippet)
+
+**Usage**: Deserialized from SearxNG's JSON response
+
+### JsonContexts
+
+**Location**: `Models/JsonContexts.cs`
+
+**Purpose**: Source-generated JSON serializer context for AOT compatibility
+
+**Pattern**:
+```csharp
+[JsonSerializable(typeof(ChatCompletionRequest))]
+[JsonSerializable(typeof(ChatCompletionResponse))]
+... etc ...
+internal partial class AppJsonContext : JsonSerializerContext
+{
+}
+```
+
+**Generated**: Partial class compiled by source generator
+
+**Used By**: All `JsonSerializer.Serialize/Deserialize` calls with `AppJsonContext.Default.{Type}`
+
+**Benefits**:
+- AOT-compatible (no reflection)
+- Faster serialization (compiled delegates)
+- Smaller binary (trimming-safe)
+
+## Component Interactions
+
+### Dependencies Graph
+
+```
+Program.cs
+├── ConfigManager (load/save)
+├── OpenRouterClient ──┐
+├── SearxngClient ─────┤
+├── EmbeddingService ──┤
+└── SearchTool ────────┤
+                        │
+OpenQueryApp ◄──────────┘
+    │
+    ├── OpenRouterClient (query gen + answer streaming)
+    ├── SearchTool (pipeline)
+    │   ├── SearxngClient (searches)
+    │   ├── ArticleService (fetch)
+    │   ├── ChunkingService (split)
+    │   ├── EmbeddingService (embeddings)
+    │   ├── RateLimiter (concurrency)
+    │   └── StatusReporter (progress via callback)
+    └── StatusReporter (UI)
+```
+
+### Data Flow Between Components
+
+```
+OpenQueryOptions
+    ↓
+OpenQueryApp
+    ├─ Query Generation
+    │   └─ OpenRouterClient.CompleteAsync()
+    │       → List<string> generatedQueries
+    │
+    ├─ Search Pipeline
+    │   └─ SearchTool.ExecuteAsync(originalQuery, generatedQueries, ...)
+    │       ↓
+    │   Phase 1: SearxngClient.SearchAsync(query) × N
+    │       → ConcurrentBag<SearxngResult>
+    │       → List<SearxngResult> (unique)
+    │       ↓
+    │   Phase 2: ArticleService.FetchArticleAsync(url) × M
+    │       → ChunkingService.ChunkText(article.TextContent)
+    │       → ConcurrentBag<Chunk> (content, url, title)
+    │       ↓
+    │   Phase 3: EmbeddingService.GetEmbeddingsAsync(chunkContents)
+    │       → (queryEmbedding, chunkEmbeddings)
+    │       ↓
+    │   Phase 4: CosineSimilarity + Rank
+    │       → List<Chunk> topChunks (with Score, Embedding set)
+    │       ↓
+    │   Format: context string with [Source N: Title](Url)
+    │       → return context string
+    │
+    └─ Final Answer
+        └─ OpenRouterClient.StreamAsync(prompt with context)
+            → stream deltas to Console
+```
+
+### Interface Contracts
+
+**SearchTool → Progress**:
+```csharp
+// Invoked as: onProgress?.Invoke("[Fetching article 1/10: example.com]")
+Action<string>? onProgress
+```
+
+**StatusReporter ← Progress**:
+```csharp
+// Handler in OpenQueryApp:
+(progress) => {
+    if (options.Verbose) reporter.WriteLine(progress);
+    else reporter.UpdateStatus(parsedShorterMessage);
+}
+```
+
+**SearchTool → ArticleService**:
+```csharp
+Article article = await ArticleService.FetchArticleAsync(url);
+```
+
+**SearchTool → EmbeddingService**:
+```csharp
+(float[] queryEmbedding, float[][] chunkEmbeddings) = await ExecuteParallelEmbeddingsAsync(...);
+// Also: embeddingService.GetEmbeddingAsync(text), GetEmbeddingsWithRateLimitAsync(...)
+```
+
+**SearchTool → ChunkingService**:
+```csharp
+List<string> chunks = ChunkingService.ChunkText(article.TextContent);
+```
+
+**SearchTool → RateLimiter**:
+```csharp
+await _rateLimiter.ExecuteAsync(async () => await _client.EmbedAsync(...), ct);
+```
+
+---
+
+## Next Steps
+
+- [OpenQueryApp](openquery-app.md) - Main orchestrator details
+- [SearchTool](search-tool.md) - Pipeline implementation
+- [Services](services.md) - All service classes documented
+- [Models](models.md) - Complete data model reference
diff --git a/docs/components/search-tool.md b/docs/components/search-tool.md
new file mode 100644
index 0000000..3cd0157
--- /dev/null
+++ b/docs/components/search-tool.md
@@ -0,0 +1,555 @@
+# SearchTool Component
+
+Deep dive into `SearchTool` - the core pipeline orchestrator that implements the 4-phase search-retrieve-rank workflow.
+
+## Overview
+
+`SearchTool` is the workhorse of OpenQuery. It Takes search queries, fetches articles, generates embeddings, ranks by relevance, and returns formatted context for the final AI answer.
+
+## Location
+`Tools/SearchTool.cs`
+
+## Class Definition
+
+```csharp
+public class SearchTool
+{
+    private readonly SearxngClient _searxngClient;
+    private readonly EmbeddingService _embeddingService;
+    private readonly ParallelProcessingOptions _options;
+    
+    public static string Name => "search";
+    public static string Description => "Search the web for information on a topic";
+    
+    public SearchTool(
+        SearxngClient searxngClient,
+        EmbeddingService embeddingService);
+    
+    public Task<string> ExecuteAsync(
+        string originalQuery,
+        List<string> generatedQueries,
+        int maxResults,
+        int topChunksLimit,
+        Action<string>? onProgress = null,
+        bool verbose = true);
+}
+```
+
+**Dependencies**:
+- `SearxngClient` - for web searches
+- `EmbeddingService` - for vector generation
+- `ParallelProcessingOptions` - concurrency settings (hardcoded new instance)
+
+**Static Properties**:
+- `Name` - tool identifier (currently "search")
+- `Description` - tool description
+
+## ExecuteAsync Method
+
+**Signature**:
+```csharp
+public async Task<string> ExecuteAsync(
+    string originalQuery,        // User's original question
+    List<string> generatedQueries, // Expanded search queries
+    int maxResults,              // Results per query
+    int topChunksLimit,          // Top N chunks to return
+    Action<string>? onProgress,  // Progress callback
+    bool verbose)                // Verbose mode flag
+```
+
+**Returns**: `Task<string>` - formatted context with source citations
+
+**Contract**:
+- Never returns `null` (returns "No search results found." on zero results)
+- Progress callback may be invoked frequently (many phases)
+- `verbose` passed to sub-components for their own logging
+
+## The 4-Phase Pipeline
+
+```
+ExecuteAsync()
+  │
+  ├─ Phase 1: ExecuteParallelSearchesAsync
+  │   Input: generatedQueries × maxResults
+  │   Output: List<SearxngResult> (deduplicated)
+  │
+  ├─ Phase 2: ExecuteParallelArticleFetchingAsync
+  │   Input: List<SearxngResult>
+  │   Output: List<Chunk> (with content, url, title)
+  │
+  ├─ Phase 3: ExecuteParallelEmbeddingsAsync
+  │   Input: originalQuery + List<Chunk>
+  │   Output: (queryEmbedding, chunkEmbeddings)
+  │   (also sets Chunk.Embedding for valid chunks)
+  │
+  ├─ Phase 4: RankAndSelectTopChunks
+  │   Input: List<Chunk> + queryEmbedding + chunkEmbeddings
+  │   Output: List<Chunk> topChunks (with Score set)
+  │
+  └─ Format Context → return string
+```
+
+### Phase 1: ExecuteParallelSearchesAsync
+
+**Purpose**: Execute all search queries in parallel, collect and deduplicate results.
+
+**Implementation**:
+```csharp
+var allResults = new ConcurrentBag<SearxngResult>();
+
+var searchTasks = generatedQueries.Select(async query =>
+{
+    onProgress?.Invoke($"[Searching web for '{query}'...]");
+    try
+    {
+        var results = await _searsult in results)
+            {
+                allResults.Add(result);
+            }
+    }
+    catch (Exception ex)
+    {
+        if (verbose)
+            Console.WriteLine($"Warning: Search failed for query '{query}': {ex.Message}");
+    }
+});
+
+await Task.WhenAll(searchTasks);
+
+var uniqueResults = allResults.DistinctBy(r => r.Url).ToList();
+return uniqueResults;
+```
+
+**Details**:
+- `ConcurrentBag<SearxngResult>` collects results thread-safely
+- `Task.WhenAll` - unbounded parallelism (parallel to `generatedQueries.Count`)
+- Each task: calls `_searxngClient.SearchAsync(query, maxResults)`
+- Errors caught and logged (verbose only); other queries continue
+- `DistinctBy(r => r.Url)` removes duplicates
+
+**Return**: `List<SearxngResult>` (unique URLs only)
+
+**Progress**: `[Searching web for '{query}'...]`
+
+**Potential Issues**:
+- Could overwhelm local SearxNG if `generatedQueries` is large (100+)
+- SearxNG itself may have its own rate limiting
+
+**Future Enhancement**:
+- Add semaphore to limit search concurrency
+- Add timeout per search task
+- Cache search results (same query across runs)
+
+### Phase 2: ExecuteParallelArticleFetchingAsync
+
+**Purpose**: Fetch each search result URL, extract article content, split into chunks.
+
+**Implementation**:
+```csharp
+var chunks = new ConcurrentBag<Chunk>();
+var completedFetches = 0;
+var totalFetches = searchResults.Count;
+var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
+
+var fetchTasks = searchResults.Select(async result =>
+{
+    await semaphore.WaitAsync();
+    try
+    {
+        var current = Interlocked.Increment(ref completedFetches);
+        var uri = new Uri(result.Url);
+        var domain = uri.Host;
+        onProgress?.Invoke($"[Fetching article {current}/{totalFetches}: {domain}]");
+        
+        try
+        {
+            var article = await ArticleService.FetchArticleAsync(result.Url);
+            if (!article.IsReadable || string.IsNullOrEmpty(article.TextContent))
+                return;
+            
+            var textChunks = ChunkingService.ChunkText(article.TextContent);
+            foreach (var chunkText in textChunks)
+            {
+                chunks.Add(new Chunk(chunkText, result.Url, article.Title));
+            }
+        }
+        catch (Exception ex)
+        {
+            if (verbose)
+                Console.WriteLine($"Warning: Failed to fetch article {result.Url}: {ex.Message}");
+        }
+    }
+    finally
+    {
+        semaphore.Release();
+    }
+});
+
+await Task.WhenAll(fetchTasks);
+return chunks.ToList();
+```
+
+**Details**:
+- `SemaphoreSlim` limits concurrency to `MaxConcurrentArticleFetches` (10)
+- `Interlocked.Increment` for thread-safe progress counting
+- Progress: `[Fetching article X/Y: domain]` (extracts host from URL)
+- `ArticleService.FetchArticleAsync` uses SmartReader
+- Article must be `IsReadable` and have `TextContent`
+- `ChunkingService.ChunkText` splits into ~500-char pieces
+- Each chunk becomes a `Chunk(content, url, article.Title)`
+- Errors logged (verbose only); failed URLs yield no chunks
+
+**Return**: `List<Chunk>` (potentially many per article)
+
+**Chunk Count Estimate**:
+- 15 articles × average 3000 chars/article = 45,000 chars
+- With 500-char chunks ≈ 90 chunks
+- With natural breaks → maybe 70-80 chunks
+
+**Potential Issues**:
+- Some sites block SmartReader (JS-heavy, paywalls)
+- Slow article fetches may cause long tail latency
+- Large articles create many chunks → memory + embedding cost
+
+**Future Enhancements**:
+- Add per-URL timeout
+- Filter chunks by length threshold (skip tiny chunks)
+- Deduplicate chunks across articles (same content on different sites)
+- Cache article fetches by URL
+
+### Phase 3: ExecuteParallelEmbeddingsAsync
+
+**Purpose**: Generate embeddings for the original query and all chunks, with batching, rate limiting, and concurrency control.
+
+**Implementation**:
+```csharp
+onProgress?.Invoke($"[Generating embeddings for {chunks.Count} chunks and query...]");
+
+// Start query embedding (single) and chunk embeddings (batch) concurrently
+var queryEmbeddingTask = _embeddingService.GetEmbeddingAsync(originalQuery);
+
+var chunkTexts = chunks.Select(c => c.Embedding).ToList(); // WRONG in original code?
+// Actually: chunks.Select(c => c.Content).ToList();
+var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
+    chunkTexts, onProgress);
+
+await Task.WhenAll(queryEmbeddingTask, chunkEmbeddingsTask);
+
+var queryEmbedding = await queryEmbeddingTask;
+var chunkEmbeddings = await chunkEmbeddingsTask;
+
+// Filter out chunks with empty embeddings
+var validChunks = new List<Chunk>();
+var validEmbeddings = new List<float[]>();
+
+for (var i = 0; i < chunks.Count; i++)
+{
+    if (chunkEmbeddings[i].Length > 0)
+    {
+        validChunks.Add(chunks[i]);
+        validEmbeddings.Add(chunkEmbeddings[i]);
+    }
+}
+
+// Update chunks with embeddings
+for (var i = 0; i < validChunks.Count; i++)
+{
+    validChunks[i].Embedding = validEmbeddings[i];
+}
+
+return (queryEmbedding, validEmbeddings.ToArray());
+```
+
+**Corrected Code** (matching actual source):
+```csharp
+var chunkTexts = chunks.Select(c => c.Content).ToList();
+var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
+    chunkTexts, onProgress);
+```
+
+**Details**:
+- **Query embedding**: Single request for original question (one embedding)
+- **Chunk embeddings**: Batch processing of all chunk texts
+- Both run concurrently via `Task.WhenAll`
+- `_embeddingService.GetEmbeddingsWithRateLimitAsync` uses:
+  - Batch size: 300 (default)
+  - Max concurrent batches: 4 (default)
+  - Polly retry (3 attempts, exponential backoff)
+  - `RateLimiter` (semaphore) for API concurrency
+- Failed batches return empty `float[]` (length 0)
+- Filters out failed chunks (won't be ranked)
+- `validChunks[i].Embedding = validEmbeddings[i]` attaches embedding to chunk
+
+**Return**: `(float[] queryEmbedding, float[][] chunkEmbeddings)` where:
+- `chunkEmbeddings` length = `validChunks.Count` (filtered)
+- Order matches `validChunks` order (since we filtered parallel arrays)
+
+**Progress**: Interleaved from embedding service's own progress callbacks (batch X/Y)
+
+**Potential Issues**:
+- `GetEmbeddingsWithRateLimitAsync` uses `results[batchIndex] = ...` which is not thread-safe without synchronization - **BUG**?
+  - Actually `results` is an array, not a list, so indexing is thread-safe
+  - But concurrent writes to different indices are safe
+- Filtering loop assumes `chunkEmbeddings` has same count as `chunks`; if embedding service returns fewer, might index out of range
+  - Looking at `GetEmbeddingsWithRateLimitAsync`: returns `results.SelectMany(r => r).ToArray()` which should match input count (including empty arrays for failed batches)
+  - So safe
+
+**Memory Consideration**:
+- `chunkTexts` list holds all chunk strings (may be large, but still in memory)
+- `chunkEmbeddings` holds all float arrays (600KB for 100 chunks)
+- Total: modest (~few MB)
+
+**Future Enhancements**:
+- Stream embeddings? (No benefit, need all for ranking)
+- Cache embeddings by content hash (cross-run)
+- Support different embedding model per query
+
+### Phase 4: RankAndSelectTopChunks
+
+**Purpose**: Score chunks by semantic relevance to query, sort, and select top N.
+
+**Implementation**:
+```csharp
+var chunksWithEmbeddings = chunks.Where(c => c.Embedding != null).ToList();
+
+foreach (var chunk in chunksWithEmbeddings)
+{
+    chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
+}
+
+var topChunks = chunksWithEmbeddings
+    .OrderByDescending(c => c.Score)
+    .Take(topChunksLimit)
+    .ToList();
+
+return topChunks;
+```
+
+**Details**:
+- Filters to chunks that have embeddings (successful phase 3)
+- For each: `Score = CosineSimilarity(queryEmbedding, chunkEmbedding)`
+  - Uses `TensorPrimitives.CosineSimilarity` (SIMD-accelerated)
+  - Returns float typically 0-1 (higher = more relevant)
+- `OrderByDescending` - highest scores first
+- `Take(topChunksLimit)` - select top N (from `--chunks` option)
+- Returns `List<Chunk>` (now with `Score` set)
+
+**Return**: Top N chunks ready for context formatting
+
+**Complexity**:
+- O(n) for scoring (where n = valid chunks, typically 50-100)
+- O(n log n) for sorting (fast for n=100)
+- Negligible CPU time
+
+**Edge Cases**:
+- If `topChunksLimit` > `chunksWithEmbeddings.Count`, returns all (no padding)
+- If all embeddings failed, returns empty list
+- Should handle `topChunksLimit == 0` (returns empty)
+
+### Context Formatting (After Phase 4)
+
+**Location**: In `ExecuteAsync`, after ranking:
+
+```csharp
+var context = string.Join("\n\n", topChunks.Select((c, i) =>
+    $"[Source {i + 1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
+
+return context;
+```
+
+**Format**:
+```
+[Source 1: Article Title](https://example.com/article)
+Chunk content text...
+
+[Source 2: Another Title](https://example.com/another)
+Chunk content text...
+
+[Source 3: Third Title](https://example.com/third)
+Chunk content text...
+```
+
+**Features**:
+- Each source numbered 1, 2, 3... (matches order of topChunks = descending relevance)
+- Title or "Unknown" if null
+- Title is markdown link to original URL
+- Chunk content as plain text (may contain its own formatting)
+- Double newline between sources
+
+**Rationale**:
+- Markdown links allow copy-pasting to browsers
+- Numbers allow LLM to cite `[Source 1]` in answer
+- Original title helps user recognize source
+
+**Potential Issues**:
+- LLM might misinterpret "Source 1" as literal citation required
+- If chunks contain markdown, may conflict (no escaping)
+- Some titles may have markdown special chars (unlikely but possible)
+
+**Alternative**: Could use XML-style tags or more robust citation format.
+
+## Error Handling & Edge Cases
+
+### Empty Results Handling
+
+At end of `ExecuteAsync`:
+```csharp
+if (searchResults.Count == 0)
+    return "No search results found.";
+
+if (chunks.Count == 0)
+    return "Found search results but could not extract readable content.";
+```
+
+These messages appear in final answer (LLM will respond to these contexts).
+
+### Partial Failures
+
+- Some search queries fail → proceed with others
+- Some articles fail to fetch → continue
+- Some embedding batches fail → those chunks filtered out
+- Ranking proceeds with whatever valid embeddings exist
+
+### Verbose vs Compact Progress
+
+`verbose` parameter affects what's passed to phases:
+- **Article fetching**: errors only shown if `verbose`
+- **Embeddings**: always shows batch progress via `onProgress` (from EmbeddingService)
+- **Searches**: no error suppression (warning always logged to Console, not through callback)
+
+### Progress Callback Pattern
+
+`onProgress` is invoked at major milestones:
+- Searching: `[Searching web for '{query}'...]`
+- Article fetch: `[Fetching article X/Y: domain]`
+- Embeddings: `[Generating embeddings: batch X/Y]`
+- Final: `[Found top X most relevant chunks overall. Generating answer...]`
+
+Each phase may invoke many times (e.g., embedding batches). `StatusReporter` handles these appropriately.
+
+## Performance Characteristics
+
+### Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles):
+
+| Phase | Time | Dominated By |
+|-------|------|--------------|
+| Searches | 3-8s | Network latency to SearxNG |
+| Article Fetching | 5-15s | Network + SmartReader CPU |
+| Embeddings | 2-4s | OpenRouter API latency (4 concurrent batches) |
+| Ranking | <0.1s | CPU (O(n log n) sort, n~100) |
+| **Total Pipeline** | **10-30s** | Articles + Searches |
+
+### Concurrency Limits Effect
+
+**Article Fetching** (`MaxConcurrentArticleFetches` = 10):
+- 15 articles → 2 waves (10 then 5)
+- If each takes 2s → ~4s total (vs 30s sequential)
+
+**Embedding Batching** (`MaxConcurrentEmbeddingRequests` = 4, `EmbeddingBatchSize` = 300):
+- 80 chunks → 1 batch of 300 (all fit)
+- If 300 chunks → 1 batch (300 fits), but max concurrent = 4 if multiple embedding calls
+- Here: single embedding call with 80 items = 1 batch (no parallelism needed)
+
+### Memory Usage
+
+- `searchResults` (15 items) → ~30KB
+- `chunks` (80 items × 500 chars) → ~40KB text + embeddings ~400KB (80 × 1536 × 4)
+- Total ≈ 500KB excluding temporary HTTP buffers
+
+## Design Decisions
+
+### Why Use ConcurrentBag for Results/Chunks?
+
+Thread-safe collection allows parallel tasks to add without locks. Order is not preserved (but we `DistinctBy` and `Select` maintains order of insertion? Actually no, `ConcurrentBag` doesn't guarantee order. But we later `ToList()` and `DistinctBy` preserves first occurrence order from the bag's enumeration (which is nondeterministic). This is acceptable because order doesn't matter (ranking is semantic). If order mattered, would need `ConcurrentQueue` or sorting by source.
+
+### Why Not Use Parallel.ForEach for Article Fetching?
+
+We use `Task.WhenAll` with `Select` + semaphore. `Parallel.ForEachAsync` could also work but requires .NET 6+ and we want to use same pattern as other phases. Semaphore gives explicit concurrency control.
+
+### Why Separate Query Embedding from Chunk Embeddings?
+
+`GetEmbeddingAsync` is called directly (not batched) because there's only one query. Could be batched with chunks but:
+- Query is small (single string)
+- Batch API has overhead (request structure)
+- Separate call allows independent completion (no need to wait for chunks to start query embedding)
+
+### Why Two Different Embedding Methods?
+
+`EmbeddingService` has:
+- `GetEmbeddingsWithRateLimitAsync` (used in SearchTool)
+- `GetEmbeddingsAsync` (similar but different implementation)
+
+Probably legacy/refactor artifact. Could consolidate.
+
+### Why Not Deduplicate URLs Earlier?
+
+Deduplication happens after search aggregation. Could also deduplicate within each search result (SearxNG might already dedupe across engines). But global dedupe is necessary.
+
+### Why Not Early Filtering (e.g., by domain, length)?
+
+Possibly could improve quality:
+- Filter by domain reputation
+- Filter articles too short (<200 chars) or too long (>50KB)
+- Not implemented (keep simple)
+
+## Testing Considerations
+
+**Unit Testability**: `SearchTool` is fairly testable with mocks:
+- Mock `SearxngClient` to return predetermined results
+- Mock `ArticleService` via `EmbeddingService` (or mock that too)
+- Verify progress callback invocations
+- Verify final context format
+
+**Integration Testing**:
+- End-to-end with real/mocked external services
+- Need test SearxNG instance and test OpenRouter key (or mock responses)
+
+**Performance Testing**:
+- Benchmark with different concurrency settings
+- Profile memory for large result sets (1000+ articles)
+- Measure embedding API latency impact
+
+## Known Issues
+
+### Bug in ExecuteParallelEmbeddingsAsync?
+
+Looking at the actual source code of `ExecuteParallelEmbeddingsAsync` **in the core SearchTool**:
+
+```csharp
+var chunkTexts = chunks.Select(c => c.Content).ToList();
+var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
+    chunkTexts, onProgress);
+```
+
+This is correct.
+
+But in the **initial search result**, I notice there might be confusion. I'll verify this when writing the full component documentation.
+
+### Potential Race Condition in GetEmbeddingsWithRateLimitAsync
+
+```csharp
+results[batchIndex] = batchResults;
+```
+
+This is writing to an array index from multiple parallel tasks. Array index writes are atomic for reference types on 64-bit? Actually, writes to different indices are safe because they don't overlap. This is fine.
+
+### Progress Callback May Overwhelm
+
+If invoked synchronously from many parallel tasks, could saturate the channel. `Channel.TryWrite` will return false if buffer full; we ignore return value. Could drop messages under heavy load. Acceptable for CLI UI (some messages may be lost but overall progress visible).
+
+## Related Components
+
+- **[OpenQueryApp](openquery-app.md)** - calls this
+- **[SearxngClient](../../services/SearxngClient.md)** - phase 1
+- **[ArticleService](../../services/ArticleService.md)** - phase 2a
+- **[ChunkingService](../../services/ChunkingService.md)** - phase 2b
+- **[EmbeddingService](../../services/EmbeddingService.md)** - phase 3
+- **[Ranking](../../services/EmbeddingService.md#cosinesimilarity)** - cosine similarity
+
+---
+
+## Next Steps
+
+- [Services Overview](../services/overview.md) - See supporting services
+- [CLI Reference](../../api/cli.md) - How users trigger this pipeline
+- [Performance](../performance.md) - Optimize pipeline settings
diff --git a/docs/components/services.md b/docs/components/services.md
new file mode 100644
index 0000000..c13d7d6
--- /dev/null
+++ b/docs/components/services.md
@@ -0,0 +1,471 @@
+# Services Overview
+
+Comprehensive reference for all service classes in OpenQuery.
+
+## 📋 Table of Contents
+
+1. [Service Catalog](#service-catalog)
+2. [Client Services](#client-services)
+3. [Processing Services](#processing-services)
+4. [Infrastructure Services](#infrastructure-services)
+5. [Service Interactions](#service-interactions)
+
+## Service Catalog
+
+OpenQuery's services are organized into three categories:
+
+| Category | Services | Purpose |
+|-----------|----------|---------|
+| **Clients** | `OpenRouterClient`, `SearxngClient` | External API communication |
+| **Processors** | `EmbeddingService`, `ChunkingService`, `ArticleService` | Data transformation & extraction |
+| **Infrastructure** | `RateLimiter`, `StatusReporter` | Cross-cutting concerns |
+
+All services are **stateless** (except for internal configuration) and can be safely reused across multiple operations.
+
+---
+
+## Client Services
+
+### OpenRouterClient
+
+**Location**: `Services/OpenRouterClient.cs`  
+**Purpose**: HTTP client for OpenRouter AI APIs (chat completions & embeddings)
+
+#### API Endpoints
+
+| Method | Endpoint | Purpose |
+|--------|----------|---------|
+| POST | `/chat/completions` | Chat completion (streaming or non-streaming) |
+| POST | `/embeddings` | Embedding generation for text inputs |
+
+#### Authentication
+```
+Authorization: Bearer {apiKey}
+Accept: application/json
+```
+
+#### Public Methods
+
+##### `StreamAsync(ChatCompletionRequest request, CancellationToken cancellationToken)`
+- **Returns**: `IAsyncEnumerable<StreamChunk>`
+- **Behavior**: Sets `request.Stream = true`, posts, reads Server-Sent Events stream
+- **Use Case**: Final answer streaming, real-time responses
+- **Stream Format**: SSE lines `data: {json}`; yields `TextDelta` or `ToolCall`
+
+##### `CompleteAsync(ChatCompletionRequest request)`
+- **Returns**: `Task<ChatCompletionResponse>`
+- **Behavior**: Sets `request.Stream = false`, posts, returns full response
+- **Use Case**: Query generation (non-streaming)
+
+##### `EmbedAsync(string model, List<string> inputs)`
+- **Returns**: `Task<float[][]>`
+- **Behavior**: POST `/embeddings`, returns array of vectors (ordered by input index)
+- **Use Case**: Batch embedding generation
+
+##### `HttpClient`
+- **Property**: Internal `_httpClient` (created per instance)
+- **Note**: Could use `IHttpClientFactory` for pooling (not needed for CLI)
+
+#### Error Handling
+- `EnsureSuccessStatusCode()` throws `HttpRequestException` on 4xx/5xx
+- No retry logic (handled by `EmbeddingService`)
+
+#### Configuration
+```csharp
+public OpenRouterClient(string apiKey)
+{
+    _apiKey = apiKey;
+    _httpClient = new HttpClient();
+    _httpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
+    _httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
+}
+```
+
+#### Example Usage
+```csharp
+var client = new OpenRouterClient("sk-or-...");
+var request = new ChatCompletionRequest("model", new List<Message> { ... });
+await foreach (var chunk in client.StreamAsync(request))
+{
+    Console.Write(chunk.TextDelta);
+}
+```
+
+---
+
+### SearxngClient
+
+**Location**: `Services/SearxngClient.cs`  
+**Purpose**: HTTP client for SearxNG metasearch engine
+
+#### API Endpoint
+```
+GET /search?q={query}&format=json
+```
+
+#### Constructor
+```csharp
+public SearxngClient(string baseUrl)  // e.g., "http://localhost:8002"
+```
+- `baseUrl` trimmed of trailing `/`
+
+#### Public Methods
+
+##### `SearchAsync(string query, int limit = 10)`
+- **Returns**: `Task<List<SearxngResult>>`
+- **Behavior**: GET request, deserialize JSON, take up to `limit` results
+- **On Failure**: Returns empty `List<SearxngResult>` (no exception)
+
+#### Error Handling
+- `response.EnsureSuccessStatusCode()` would throw, but code doesn't call it
+- If invalid JSON or missing `Results`, returns empty list
+- Failures are **tolerated** - individual search queries may fail without aborting whole operation
+
+#### Example Searxng Response
+```json
+{
+  "results": [
+    {
+      "title": "Quantum Entanglement - Wikipedia",
+      "url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
+      "content": "Quantum entanglement is a physical phenomenon..."
+    },
+    ...
+  ]
+}
+```
+
+---
+
+## Processing Services
+
+### EmbeddingService
+
+**Location**: `Services/EmbeddingService.cs`  
+**Purpose**: Generate embeddings with batching, rate limiting, and retry logic
+
+#### Configuration
+
+**Embedding Model**: `openai/text-embedding-3-small` (default, configurable via constructor)
+
+**ParallelProcessingOptions** (hardcoded defaults):
+```csharp
+public class ParallelProcessingOptions
+{
+    public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
+    public int EmbeddingBatchSize { get; set; } = 300;
+}
+```
+
+#### Public Methods
+
+##### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
+- **Returns**: `Task<float[][]>`
+- **Behavior**:
+  - Splits `texts` into batches of `EmbeddingBatchSize`
+  - Parallel executes batches (max `MaxConcurrentEmbeddingRequests` concurrent)
+  - Each batch: rate-limited, retry-wrapped `client.EmbedAsync(model, batch)`
+  - Reassembles in original order
+  - Failed batches → empty `float[]` for each text
+- **Progress**: Invokes `onProgress` for each batch: `"[Generating embeddings: batch X/Y]"`
+- **Thread-Safe**: Uses lock for collecting results
+
+##### `GetEmbeddingAsync(string text, CancellationToken)`
+- **Returns**: `Task<float[]>`
+- **Behavior**: Single embedding with rate limiting and retry
+- **Use Case**: Query embedding
+
+##### `Cos  static float CosineSimilarity(float[] vector1, float[] vector2)
+```
+Uses `System.Numerics.Tensors.TensorPrimitives.CosineSimilarity`
+
+Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
+```
+
+**Implementation**: Single line calling SIMD-accelerated tensor primitive
+
+---
+
+### ArticleService
+
+**Location**: `Services/ArticleService.cs`  
+**Purpose**: Extract clean article content from web URLs
+
+#### Public Methods
+
+##### `FetchArticleAsync(string url)`
+- **Returns**: `Task<Article>`
+- **Behavior**: Delegates to `SmartReader.ParseArticleAsync(url)`
+- **Result**: `Article` with `Title`, `TextContent`, `IsReadable`, and metadata
+
+#### Errors
+- Propagates exceptions (SmartReader may throw on network failures, malformed HTML)
+- `SearchTool` catches and logs
+
+#### SmartReader Notes
+- Open-source article extraction library (bundled via NuGet)
+- Uses Readability algorithm (similar to Firefox Reader View)
+- Removes ads, navigation, boilerplate
+- `IsReadable` indicates quality (e.g., not a 404 page, not too short)
+
+---
+
+### ChunkingService
+
+**Location**: `Services/ChunkingService.cs`  
+**Purpose**: Split text into 500-character chunks at natural boundaries
+
+#### Public Methods
+
+##### `ChunkText(string text)`
+- **Returns**: `List<string>`
+- **Algorithm**:
+  - Constant `MAX_CHUNK_SIZE = 500`
+  - While remaining text:
+    - Take up to 500 chars
+    - If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
+    - Trim, add if non-empty
+    - Advance start
+  - Returns all chunks
+
+#### Characteristics
+- Static class (no instances)
+- Pure function (no side effects)
+- Zero dependencies
+- Handles edge cases (empty text, short text, text without breaks)
+
+---
+
+## Infrastructure Services
+
+### RateLimiter
+
+**Location**: `Services/RateLimiter.cs`  
+**Purpose**: Limit concurrent operations using semaphore
+
+#### Constructor
+```csharp
+public RateLimiter(int maxConcurrentRequests)
+```
+Creates `SemaphoreSlim` with `maxConcurrentRequests`
+
+#### Public Methods
+
+##### `ExecuteAsync<T>(Func<Task<T>> action, CancellationToken)`
+```csharp
+public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken cancellationToken = default)
+{
+    await _semaphore.WaitAsync(cancellationToken);
+    try
+    {
+        return await action();
+    }
+    finally
+    {
+        _semaphore.Release();
+    }
+}
+```
+- Waits for semaphore slot
+- Executes `action` (typically an API call)
+- Releases semaphore (even if exception)
+- Returns result from `action`
+
+##### `ExecuteAsync(Func<Task> action, CancellationToken)`
+- Non-generic version (for void-returning actions)
+
+#### Disposal
+```csharp
+public async ValueTask DisposeAsync()
+{
+    _semaphore.Dispose();
+}
+```
+Implements `IAsyncDisposable` for async cleanup
+
+#### Usage Pattern
+```csharp
+var result = await _rateLimiter.ExecuteAsync(async () =>
+{
+    return await SomeApiCall();
+}, cancellationToken);
+```
+
+#### Where Used
+- `EmbeddingService`: Limits concurrent embedding batch requests (default 4)
+
+---
+
+### StatusReporter
+
+**Location**: `Services/StatusReporter.cs`  
+**Purpose**: Real-time progress display with spinner (compact) or verbose lines
+
+#### Constructor
+```csharp
+public StatusReporter(bool verbose)
+```
+- `verbose = true`: all progress via `WriteLine()` (no spinner)
+- `verbose = false`: spinner with latest status
+
+#### Architecture
+
+**Components**:
+- `Channel<string> _statusChannel` - producer-consumer queue
+- `Task _statusProcessor` - background task reading from channel
+- `CancellationTokenSource _spinnerCts` - spinner task cancellation
+- `Task _spinnerTask` - spinner animation task
+- `char[] _spinnerChars` - Braille spinner pattern
+
+**Spinner Animation**:
+- Runs at 10 FPS (100ms interval)
+- Cycles through `['⠋','⠙','⠹','⠸','⠼','⠴','⠦','⠧','⠇','⠏']`
+- Displays: `⠋ Fetching articles...`
+- Updates in place using ANSI: `\r\x1b[K` (carriage return + erase line)
+
+#### Public Methods
+
+##### `UpdateStatus(string message)`
+- Fire-and-forget: writes to channel via `TryWrite` (non-blocking)
+- If channel full, message dropped (acceptable loss for UI)
+
+##### `WriteLine(string text)`
+- Stops spinner temporarily
+- Clears current status line
+- Writes `text` with newline
+- In verbose mode: just `Console.WriteLine(text)`
+
+##### `ClearStatus()`
+- In compact mode: `Console.Write("\r\x1b[K")` (erase line)
+- In verbose: no-op
+- Sets `_currentMessage = null`
+
+##### `StartSpinner()` / `StopSpinner()`
+- Manual control (usually `StartSpinner` constructor call, `StopSpinner` by `Dispose`)
+
+##### `Dispose()`
+- Completes channel writer
+- Awaits `_statusProcessor` completion
+- Calls `StopSpinner()`
+
+#### Background Processing
+
+**Status Processor**:
+```csharp
+private async Task ProcessStatusUpdatesAsync()
+{
+    await foreach (var message in _statusChannel.Reader.ReadAllAsync())
+    {
+        if (_verbose)
+        {
+            Console.WriteLine(message);
+            continue;
+        }
+        Console.Write("\r\x1b[K");  // Clear line
+        Console.Write($"{_spinnerChars[0]} {message}");  // Static spinner
+        _currentMessage = message;
+    }
+}
+```
+
+**Spinner Task**:
+```csharp
+_spinnerTask = Task.Run(async () =>
+{
+    while (_spinnerCts is { Token.IsCancellationRequested: false })
+    {
+        if (_currentMessage != null)
+        {
+            Console.Write("\r\x1b[K");
+            var charIndex = index++ % spinner.Length;
+            Console.Write($"{spinner[charIndex]} {_currentMessage}");
+        }
+        await Task.Delay(100, _spinnerCts.Token);
+    }
+});
+```
+
+#### Thread Safety
+- `UpdateStatus` (producer) writes to channel
+- `ProcessStatusUpdatesAsync` (consumer) reads from channel
+- `_spinnerTask` runs concurrently
+- All UI writes happen in consumer/spinner task context (single-threaded UI)
+
+#### Design Notes
+- Could be simplified: just use `Console.CursorLeft` for spinner, no channel
+- Channel allows random `UpdateStatus` calls from any thread without blocking
+- Braille spinner requires terminal that supports Unicode (most modern terminals do)
+
+---
+
+## Service Interactions
+
+### Dependency Graph
+
+```
+OpenQueryApp
+├── OpenRouterClient ← (used for query gen + final answer)
+└── SearchTool
+    ├── SearxngClient
+    ├── ArticleService (uses SmartReader)
+    ├── ChunkingService (static)
+    ├── EmbeddingService
+    │   └── OpenRouterClient (different instance)
+    │   └── RateLimiter
+    └── ParallelProcessingOptions (config)
+```
+
+### Service Lifetimes
+
+All services are **transient** (new instance per query execution):
+- `OpenRouterClient` → 1 instance for query gen + answer
+- `SearxngClient` → 1 instance for all searches
+- `EmbeddingService` → 1 instance with its own `OpenRouterClient` and `RateLimiter`
+- `SearchTool` → 1 instance per query (constructed in `Program.cs`)
+
+No singleton or static state (except static utility classes like `ChunkingService`).
+
+### Data Flow Through Services
+
+```
+OpenQueryApp
+  │
+  ├─ OpenRouterClient.CompleteAsync() → query generation
+  │   Messages → JSON → HTTP request → response → JSON → Messages
+  │
+  └─ SearchTool.ExecuteAsync()
+      │
+      ├─ SearxngClient.SearchAsync() × N
+      │   query → URL encode → GET → JSON → SearxngResult[]
+      │
+      ├─ ArticleService.FetchArticleAsync() × M
+      │   URL → HTTP GET → SmartReader → Article
+      │
+      ├─ ChunkingService.ChunkText() × M
+      │   Article.TextContent → List<string> chunks
+      │
+      ├─ EmbeddingService.GetEmbeddingAsync(query) + GetEmbeddingsAsync(chunks[])
+      │   texts → batches → rate-limited HTTP POST → JSON → float[][]
+      │
+      ├─ CosineSimilarity(queryEmbedding, chunkEmbedding) × M
+      │   Vectors → dot product → magnitude → score
+      │
+      └─ return context string (formatted chunks)
+```
+
+---
+
+## Next Steps
+
+- **[OpenQueryApp](../components/openquery-app.md)** - Orchestrates services
+- **[SearchTool](../components/search-tool.md)** - Coordinates pipeline
+- **[Models](../components/models.md)** - Data structures passed between services
+- **[API Reference](../../api/cli.md)** - CLI that uses these services
+
+---
+
+**Service Design Principles**:
+- Single Responsibility: Each service does one thing well
+- Stateless: No instance state beyond constructor args
+- Composable: Services depend on abstractions (other services) not implementations
+- Testable: Can mock dependencies for unit testing
diff --git a/docs/configuration.md b/docs/configuration.md
new file mode 100644
index 0000000..dae589e
--- /dev/null
+++ b/docs/configuration.md
@@ -0,0 +1,356 @@
+# Configuration
+
+Complete guide to configuring OpenQuery for your environment.
+
+## 📋 Table of Contents
+
+1. [Configuration Methods](#configuration-methods)
+2. [Configuration File](#configuration-file)
+3. [Environment Variables](#environment-variables)
+4. [Command-Line Options](#command-line-options)
+5. [Configuration Priority](#configuration-priority)
+6. [Recommended Settings](#recommended-settings)
+7. [Advanced Configuration](#advanced-configuration)
+
+## Configuration Methods
+
+OpenQuery can be configured through three methods, which merge together with clear priority:
+
+| Method | Persistence | Use Case |
+|--------|-------------|----------|
+| Configuration File | Permanent | Default values you use daily |
+| Environment Variables | Session/Shell | CI/CD, scripting, temporary overrides |
+| Command-Line Options | Per-execution | One-off customizations |
+
+## Configuration File
+
+### Location
+OpenQuery follows the [XDG Base Directory](https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html) specification:
+
+- **Linux/macOS**: `~/.config/openquery/config`
+- **Windows**: `%APPDATA%\openquery\config` (e.g., `C:\Users\<user>\AppData\Roaming\openquery\config`)
+
+### Format
+Simple `key=value` pairs, one per line:
+
+```ini
+ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+Model=qwen/qwen3.5-flash-02-23
+DefaultQueries=3
+DefaultChunks=3
+DefaultResults=5
+```
+
+### Schema
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `ApiKey` | string | "" | OpenRouter API authentication key |
+| `Model` | string | `qwen/qwen3.5-flash-02-23` | Default LLM model to use |
+| `DefaultQueries` | int | 3 | Number of search queries to generate |
+| `DefaultChunks` | int | 3 | Number of top context chunks to include |
+| `DefaultResults` | int | 5 | Number of search results per query |
+
+### Example Configurations
+
+**Minimal** (just API key):
+```ini
+ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+```
+
+**Optimized for Research**:
+```ini
+ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+Model=google/gemini-3-flash-preview
+DefaultQueries=5
+DefaultChunks=4
+DefaultResults=10
+```
+
+**Cost-Conscious**:
+```ini
+ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+Model=qwen/qwen3.5-flash-02-23
+DefaultQueries=2
+DefaultChunks=2
+DefaultResults=3
+```
+
+## Environment Variables
+
+Environment variables override the configuration file and can be set temporarily or permanently in your shell profile.
+
+### Available Variables
+
+| Variable | Purpose | Required | Example |
+|----------|---------|----------|---------|
+| `OPENROUTER_API_KEY` | OpenRouter API key | **Yes** (unless in config file) | `export OPENROUTER_API_KEY="sk-or-..."` |
+| `OPENROUTER_MODEL` | Override default LLM model | No | `export OPENROUTER_MODEL="deepseek/deepseek-v3.2"` |
+| `SEARXNG_URL` | URL of SearxNG instance | No (default: `http://localhost:8002`) | `export SEARXNG_URL="https://searx.example.com"` |
+
+### Setting Environment Variables
+
+#### Temporary (Current Session)
+```bash
+# Linux/macOS
+export OPENROUTER_API_KEY="sk-or-..."
+export SEARXNG_URL="http://localhost:8002"
+
+# Windows PowerShell
+$env:OPENROUTER_API_KEY="sk-or-..."
+$env:SEARXNG_URL="http://localhost:8002"
+```
+
+#### Permanent (Shell Profile)
+
+**bash** (`~/.bashrc` or `~/.bash_profile`):
+```bash
+export OPENROUTER_API_KEY="sk-or-..."
+export SEARXNG_URL="http://localhost:8002"
+```
+
+**zsh** (`~/.zshrc`):
+```zsh
+export OPENROUTER_API_KEY="sk-or-..."
+export SEARXNG_URL="http://localhost:8002"
+```
+
+**fish** (`~/.config/fish/config.fish`):
+```fish
+set -x OPENROUTER_API_KEY "sk-or-..."
+set -x SEARXNG_URL "http://localhost:8002"
+```
+
+**Windows** (PowerShell profile):
+```powershell
+[Environment]::SetEnvironmentVariable("OPENROUTER_API_KEY", "sk-or-...", "User")
+[Environment]::SetEnvironmentVariable("SEARXNG_URL", "http://localhost:8002", "User")
+```
+
+After editing profile files, restart your terminal or run `source ~/.bashrc` (or equivalent).
+
+### Security Note
+Never commit your API key to version control. Use environment variables or config file that's in `.gitignore`. The default `.gitignore` already excludes common build directories but doesn't include the config file since it's outside the project directory (`~/.config/`).
+
+## Command-Line Options
+
+Options passed directly to the `openquery` command override both config file and environment variables for that specific execution.
+
+### Main Command Options
+
+```bash
+openquery [OPTIONS] <question>
+```
+
+| Option | Aliases | Type | Default Source | Description |
+|--------|---------|------|----------------|-------------|
+| `--chunks` | `-c` | int | Config `DefaultChunks` | Number of top context chunks |
+| `--results` | `-r` | int | Config `DefaultResults` | Search results per query |
+| ``--queries` | `-q` | int | Config `DefaultQueries` | Number of search queries |
+| `--short` | `-s` | bool | false | Request concise answer |
+| `--long` | `-l` | bool | false | Request detailed answer |
+| `--verbose` | `-v` | bool | false | Show detailed progress |
+
+### Configure Command Options
+
+```bash
+openquery configure [OPTIONS]
+```
+
+| Option | Type | Description |
+|--------|------|-------------|
+| `--interactive` / `-i` | bool | Launch interactive configuration wizard |
+| `--key` | string | Set API key |
+| `--model` | string | Set default model |
+| `--queries` | int? | Set default queries |
+| `--chunks` | int? | Set default chunks |
+| `--results` | int? | Set default results |
+
+## Configuration Priority
+
+When OpenQuery needs a value, it checks sources in this order (highest to lowest priority):
+
+1. **Command-line option** (if provided)
+2. **Environment variable** (if set)
+3. **Configuration file** (if key exists)
+4. **Hard-coded default** (if all above missing)
+
+### Examples
+
+**Example 1**: Environment overrides config
+```bash
+# config file: DefaultQueries=5
+export OPENROUTER_MODEL="deepseek/deepseek-v3.2"
+openquery --queries 2 "question"  # Uses: queries=2 (CLI), model=deepseek (env), chunks=3 (config)
+```
+
+**Example 2**: CLI overrides everything
+```bash
+export OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23"
+openquery --model "google/gemini-3-flash-preview" --chunks 5 "question"
+# Uses: model=google (CLI), chunks=5 (CLI), queries=3 (default)
+```
+
+**Example 3**: All sources combined
+```bash
+# config: DefaultChunks=4
+# env: OPENROUTER_MODEL="moonshotai/kimi-k2.5", SEARXNG_URL="http://custom:8002"
+# CLI: --queries 6 --short
+openquery "question"
+# Uses: queries=6 (CLI), chunks=4 (config), results=5 (config), 
+#       model=kimi-k2.5 (env), searxng=custom (env), short=true (CLI)
+```
+
+## Recommended Settings
+
+### For Quick Questions (Facts, Definitions)
+```bash
+openquery -q 2 -r 3 -c 2 "What is the capital of France?"
+```
+- Few queries (2) for straightforward facts
+- Few results (3) to minimize processing
+- Few chunks (2) for focused answer
+
+### For Research (Complex Topics)
+```bash
+openquery -q 5 -r 10 -c 4 -l "Explain the causes of the French Revolution"
+```
+- More queries (5) for diverse perspectives
+- More results (10) for comprehensive coverage
+- More chunks (4) for rich context
+- Long format for depth
+
+### For Exploration (Broad Topics)
+```bash
+openquery -q 8 -r 15 -c 5 "What are the latest developments in AI?"
+```
+- Many queries (8) to explore different angles
+- Many results (15) for breadth
+- More chunks (5) for extensive context
+
+### Cost Optimization
+```bash
+openquery configure --model "qwen/qwen3.5-flash-02-23"
+# Keep defaults: -q 3 -r 5 -c 3
+```
+- Qwen Flash is very cost-effective
+- Default parameters provide good balance
+
+### Performance Optimization
+```bash
+# Adjust ParallelProcessingOptions in SearchTool.cs if needed
+# Default: MaxConcurrentArticleFetches=10, MaxConcurrentEmbeddingRequests=4
+```
+- Reduce these values if you see rate limits or memory pressure
+- Increase them if you have fast network/API and want more speed
+
+## Advanced Configuration
+
+### Changing Concurrency Limits
+
+Concurrency limits are currently hardcoded in `SearchTool.cs` but can be adjusted:
+
+```csharp
+public class ParallelProcessingOptions
+{
+    public int MaxConcurrentArticleFetches { get; set; } = 10;  // ← Change this
+    public int MaxConcurrentEmbeddingRequests { get; set; } = 4;  // ← Change this
+    public int EmbeddingBatchSize { get; set; } = 300;  // ← Change this
+}
+```
+
+To make these configurable, you could:
+1. Add fields to `AppConfig`
+2. Read from config file
+3. Pass through to `SearchTool` constructor
+
+### Custom Embedding Model
+
+The embedding model is hardcoded to `openai/text-embedding-3-small`. To change:
+
+Edit the `EmbeddingService` constructor:
+```csharp
+public EmbeddingService(OpenRouterClient client, string embeddingModel = "your-model")
+```
+
+Or make it configurable via CLI/config (future enhancement).
+
+### Changing Chunk Size
+
+Chunk size (500 chars) is defined in `ChunkingService.cs`:
+```csharp
+private const int MAX_CHUNK_SIZE = 500;
+```
+
+Modify this constant to change how articles are split. Larger chunks:
+- ✅ More context per chunk
+- ❌ Fewer chunks for same article
+- ❌ Higher token usage in final answer
+
+Smaller chunks:
+- ✅ More granular matching
+- ❌ May lose context across chunk boundaries
+
+### Using a Custom SearxNG Instance
+
+Some SearxNG deployments may require HTTPS, authentication, or custom paths:
+
+```bash
+# With authentication (if supported)
+export SEARXNG_URL="https://user:pass@searx.example.com:8080"
+
+# With custom path
+export SEARXNG_URL="https://searx.example.com/custom-path"
+```
+
+Note: Most SearxNG instances don't require auth as they're designed for privacy.
+
+### OpenRouter Settings
+
+OpenRouter supports additional parameters (not yet exposed in OpenQuery):
+
+- `temperature` - Randomness (0-2, default ~1)
+- `max_tokens` - Response length limit
+- `top_p` - Nucleus sampling
+- `frequency_penalty` / `presence_penalty`
+
+These could be added to `ChatCompletionRequest` in future versions.
+
+## Managing Multiple Configurations
+
+You can maintain multiple config files and symlink or set per-project:
+
+```bash
+# Create project-specific config
+cp ~/.config/openquery/config ~/myproject/openquery.config
+
+# Use it temporarily
+OPENQUERY_CONFIG=~/myproject/openquery.config openquery "question"
+```
+
+**Note**: Currently OpenQuery only looks at `~/.config/openquery/config`. Multi-config support would require code changes (reading from `OPENQUERY_CONFIG` env var).
+
+## Configuration Validation
+
+OpenQuery doesn't strictly validate config values. Invalid settings may cause runtime errors:
+
+- `DefaultQueries <= 0` → May cause exceptions or zero queries
+- `DefaultChunks <= 0` → May return no context
+- `DefaultResults <= 0` → No search results
+
+Validate manually:
+```bash
+# Test your config loads
+cat ~/.config/openquery/config
+
+# Test with verbose mode
+openquery -v "test"
+```
+
+---
+
+## Next Steps
+
+- [Usage Guide](usage.md) - Learn how to use the CLI
+- [Architecture](architecture.md) - Understand the system design
+- [Troubleshooting](troubleshooting.md) - Fix common issues
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..eaf2cbc
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,173 @@
+# OpenQuery Documentation
+
+Welcome to the comprehensive documentation for OpenQuery - the AI-powered search and answer system.
+
+## 📚 Documentation Overview
+
+### Getting Started
+- **[Installation Guide](installation.md)** - Build, install, and setup instructions
+- **[Configuration](configuration.md)** - Configure API keys, models, and settings
+- **[Usage Guide](usage.md)** - Complete CLI reference with examples
+
+### Deep Dive
+- **[Architecture](architecture.md)** - System design, patterns, and data flow
+- **[Components](components/overview.md)** - Detailed component documentation
+  - [OpenQueryApp](components/openquery-app.md)
+  - [SearchTool](components/search-tool.md)
+  - [Services](components/services.md)
+  - [Models](components/models.md)
+- **[API Reference](api/cli.md)** - Complete command-line interface reference
+  - [Environment Variables](api/environment-variables.md)
+  - [Programmatic APIs](api/programmatic.md)
+
+### Support
+- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
+- **[Performance](performance.md)** - Performance characteristics and optimization
+
+## 🎯 Quick Links
+
+### For Users
+- [Install OpenQuery](installation.md) in 5 minutes
+- [Configure your API key](configuration.md)
+- [Learn the basics](usage.md)
+- [Solve common problems](troubleshooting.md)
+
+### For Developers
+- [Understand the architecture](architecture.md)
+- [Explore components](components/overview.md)
+- [Use the APIs programmatically](api/programmatic.md)
+- [Performance tuning](performance.md)
+
+## 📋 Table of Contents
+
+1. [Project Overview](#project-overview)
+2. [Key Concepts](#key-concepts)
+3. [Technology Stack](#technology-stack)
+4. [System Workflow](#system-workflow)
+
+## Project Overview
+
+**OpenQuery** is a sophisticated CLI tool that combines the power of large language models with web search to provide accurate, well-sourced answers to complex questions.
+
+### What It Does
+- Takes a natural language question as input
+- Generates multiple diverse search queries
+- Searches the web via SearxNG
+- Extracts and processes article content
+- Uses semantic similarity to rank relevance
+- Synthesizes a comprehensive AI-generated answer with citations
+
+### Why Use OpenQuery?
+- **Accuracy**: Multiple search queries reduce bias and increase coverage
+- **Transparency**: Sources are cited in the final answer
+- **Speed**: Parallel processing minimizes latency
+- **Control**: Fine-tune every aspect from query count to chunk selection
+- **Privacy**: SearxNG provides anonymous, aggregating search
+
+## Key Concepts
+
+### Search Queries
+Instead of using your exact question, OpenQuery generates multiple optimized search queries (default: 3). For example, "What is quantum entanglement?" might become:
+- "quantum entanglement definition"
+- "how quantum entanglement works"
+- "quantum entanglement experiments"
+
+### Content Chunks
+Long articles are split into ~500-character chunks. Each chunk is:
+- Stored with its source URL and title
+- Converted to a vector embedding (1536 dimensions)
+- Scored against your query embedding
+
+### Semantic Ranking
+Using cosine similarity between embeddings, OpenQuery ranks chunks by relevance and selects the top N (default: 3) for the final context.
+
+### Streaming Answer
+The LLM receives your question plus the top chunks as context and streams the answer in real-time, citing sources like `[Source 1]`.
+
+## Technology Stack
+
+| Layer | Technology | Purpose |
+|-------|------------|---------|
+| Runtime | .NET 10.0 AOT | Native performance, minimal footprint |
+| LLM | OpenRouter API | Chat completions and embeddings |
+| Search | SearxNG | Metasearch engine |
+| Content Extraction | SmartReader | Article text extraction |
+| Vector Math | System.Numerics.Tensors | High-performance cosine similarity |
+| Resilience | Polly | Retry and circuit breaker policies |
+| CLI | System.CommandLine | Command parsing and help |
+| JSON | System.Text.Json (source-gen) | Fast serialization |
+
+## System Workflow
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         OpenQuery Workflow                       │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                   │
+│  1. User Query: "What is quantum entanglement?"                 │
+│                                                                   │
+│  2. Query Generation (Optional)                                 │
+│     LLM generates: ["quantum entanglement physics",            │
+│                      "quantum entanglement definition",         │
+│                      "how does quantum entanglement work"]      │
+│                                                                   │
+│  3. Parallel Searches                                          │
+│     ┌────────────┐  ┌────────────┐  ┌────────────┐           │
+│     │ Query 1 →   │→ │ SearxNG    │→ │  Results   │           │
+│     └────────────┘  └────────────┘  └────────────┘           │
+│     ┌────────────┐  ┌────────────┐  ┌────────────┐           │
+│     │ Query 2 →   │→ │ SearxNG    │→ │  Results   │           │
+│     └────────────┘  └────────────┘  └────────────┘           │
+│     ┌────────────┐  ┌────────────┐  ┌────────────┘           │
+│     │ Query 3 →   │→ │ SearxNG    │→ │  Results (combined)   │
+│     └────────────┘  └────────────┘  └────────────┘           │
+│                                                                   │
+│  4. Parallel Article Fetching                                  │
+│     ┌──────────┐  ┌──────────┐  ┌──────────┐                │
+│     │ URL 1 →  │→ │ Article  │→ │ Chunks   │                │
+│     └──────────┘  └──────────┘  └──────────┘                │
+│     ┌──────────┐  ┌──────────┐  ┌──────────┐                │
+│     │ URL 2 →  │→ │ Article  │→ │ Chunks   │                │
+│     └──────────┘  └──────────┘  └──────────┘                │
+│     ... (concurrent, max 10 at a time)                         │
+│                                                                   │
+│  5. Parallel Embeddings                                        │
+│     ┌──────────┐  ┌──────────┐  ┌──────────┐                │
+│     │ Chunks   │→ │ Embed-   │→ │ Vectors  │                │
+│     │  Batch 1 │  │ ding API │  │          │                │
+│     └──────────┘  └──────────┘  └──────────┘                │
+│     ┌──────────┐  ┌──────────┐  ┌──────────┐                │
+│     │ Chunks   │→ │ Embed-   │→ │ Vectors  │                │
+│     │  Batch 2 │  │ ding API │  │          │                │
+│     └──────────┘  └──────────┘  └──────────┘                │
+│     (batches of 300, up to 4 concurrent)                       │
+│                                                                   │
+│  6. Semantic Ranking                                          │
+│     Query Embedding + Chunk Embeddings → Cosine Similarity →   │
+│     Score → Sort Descending → Top 3 Chunks                     │
+│                                                                   │
+│  7. Final Answer Generation                                    │
+│     ┌────────────────────────────────────────────┐             │
+│     │ System: "Answer based on this context:"   │             │
+│     │ Context: [Top 3 chunks with sources]      │             │
+│     │ Question: "What is quantum entanglement?"  │             │
+│     └────────────────────────────────────────────┘             │
+│                           ↓                                      │
+│                    LLM Streams Answer                           │
+│                    "Quantum entanglement is..."                 │
+│                    with citations like [Source 1]              │
+│                                                                   │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Next Steps
+
+1. **[Install OpenQuery](installation.md)**
+2. **[Configure it](configuration.md)**
+3. **[Start asking questions](usage.md)**
+
+For detailed technical information, continue to [the architecture guide](architecture.md).
+
+---
+
+**Need help?** Check the [Troubleshooting](troubleshooting.md) guide.
diff --git a/docs/installation.md b/docs/installation.md
new file mode 100644
index 0000000..15e66f8
--- /dev/null
+++ b/docs/installation.md
@@ -0,0 +1,358 @@
+# Installation Guide
+
+This guide covers how to build, install, and configure OpenQuery on your system.
+
+## 📋 Table of Contents
+
+1. [Prerequisites](#prerequisites)
+2. [Quick Install](#quick-install)
+3. [Manual Build](#manual-build)
+4. [Platform-Specific Instructions](#platform-specific-instructions)
+5. [Post-Installation](#post-installation)
+6. [Verification](#verification)
+7. [Uninstallation](#uninstallation)
+
+## Prerequisites
+
+### Required Software
+- **.NET SDK 10.0** or later
+  - Download from [dotnet.microsoft.com](https://dotnet.microsoft.com/download)
+  - Verify: `dotnet --version` should show 10.x or higher
+
+### External Services (Setup Required)
+1. **SearxNG Instance** - Metasearch engine
+   - **Docker (Recommended)**:
+     ```bash
+     docker run -d \
+       --name searxng \
+       -p 8002:8080 \
+       -v searxng-data:/etc/searxng \
+       searxng/searxng:latest
+     ```
+   - Access at `http://localhost:8002`
+   
+   - **Alternative**: Use a public SearxNG instance from [searx.space](https://searx.space)
+   
+2. **OpenRouter API Key** - AI model provider
+   - Sign up at [openrouter.ai](https://openrouter.ai)
+   - Get your API key from dashboard
+   - Free tier available with rate limits
+
+## Quick Install
+
+The easiest way to get OpenQuery up and running:
+
+```bash
+# 1. Clone the repository
+git clone <your-repo-url>
+cd OpenQuery
+
+# 2. Make install script executable and run
+chmod +x install.sh
+./install.sh
+
+# 3. Configure your API key
+openquery configure -i
+
+# 4. Test it
+openquery "Hello world"
+```
+
+**What the install script does**:
+- Builds the project in Release mode
+- Publishes as self-contained AOT binary
+- Copies to `~/.local/bin/OpenQuery` (Linux/macOS)
+- Creates config directory `~/.config/openquery/`
+
+## Manual Build
+
+If you prefer to build manually or need a specific platform:
+
+### Step 1: Restore Dependencies
+```bash
+dotnet restore
+```
+
+### Step 2: Build
+```bash
+dotnet build -c Release
+```
+
+### Step 3: Publish
+
+#### For Current Platform (Self-Contained AOT)
+```bash
+dotnet publish -c Release \
+  --self-contained true \
+  /p:PublishAot=true
+```
+
+The binary will be at:
+```
+bin/Release/net10.0/<rid>/publish/OpenQuery
+```
+
+#### For Specific Platform (Cross-Compilation)
+
+**Runtime Identifiers (RIDs)**:
+| Platform | RID |
+|----------|-----|
+| Linux x64 | `linux-x64` |
+| Linux ARM64 | `linux-arm64` |
+| macOS x64 | `osx-x64` |
+| macOS ARM64 | `osx-arm64` |
+| Windows x64 | `win-x64` |
+| Windows ARM64 | `win-arm64` |
+
+Example for Linux x64:
+```bash
+dotnet publish -c Release \
+  -r linux-x64 \
+  --self-contained true \
+  /p:PublishAot=true
+```
+
+### Step 4: Deploy
+
+Copy the binary to a directory in your PATH:
+
+```bash
+# Linux/macOS
+sudo cp bin/Release/net10.0/linux-x64/publish/OpenQuery /usr/local/bin/
+chmod +x /usr/local/bin/OpenQuery
+
+# Windows (PowerShell as Admin)
+Copy-Item bin\Release\net10.0\win-x64\publish\OpenQuery.exe C:\Program Files\OpenQuery\
+```
+
+Or use a local bin directory:
+```bash
+mkdir -p ~/.local/bin
+cp bin/Release/net10.0/linux-x64/publish/OpenQuery ~/.local/bin/
+# Add to PATH if not already: export PATH="$HOME/.local/bin:$PATH"
+```
+
+## Platform-Specific Instructions
+
+### Linux
+
+#### Ubuntu/Debian
+```bash
+# Install .NET SDK 10.0
+wget https://dot.net/v10/dotnet-install.sh -O dotnet-install.sh
+chmod +x dotnet-install.sh
+./dotnet-install.sh --channel 10.0
+
+# Add to PATH
+export PATH="$HOME/.dotnet:$PATH"
+
+# Build and install (as shown above)
+```
+
+#### With Systemd Service (Optional)
+If you run SearxNG locally, you might want it as a service:
+```bash
+# Create systemd service for SearxNG (if using Docker)
+sudo nano /etc/systemd/system/searxng.service
+```
+
+```ini
+[Unit]
+Description=SearxNG Search Engine
+Requires=docker.service
+After=docker.service
+
+[Service]
+Restart=always
+ExecStart=/usr/bin/docker start -a searxng
+ExecStop=/usr/bin/docker stop -t 2 searxng
+
+[Install]
+WantedBy=multi-user.target
+```
+
+```bash
+sudo systemctl enable searxng
+sudo systemctl start searxng
+```
+
+### macOS
+
+#### Homebrew Install (if .NET available)
+```bash
+brew install dotnet-sdk
+```
+
+#### M1/M2 (ARM64) Notes
+- Use RID: `osx-arm64`
+- Ensure you have the ARM64 version of .NET SDK
+
+### Windows
+
+#### Using Winget (Windows 10/11)
+```powershell
+winget install Microsoft.DotNet.SDK.10
+```
+
+#### Manual Install
+1. Download installer from [dotnet.microsoft.com](https://dotnet.microsoft.com/download)
+2. Run installer
+3. Verify in PowerShell:
+   ```powershell
+   dotnet --version
+   ```
+
+#### Building
+```powershell
+dotnet publish -c Release -r win-x64 --self-contained true /p:PublishAot=true
+```
+
+## Post-Installation
+
+### 1. Verify SearxNG is Running
+```bash
+curl "http://localhost:8002/search?q=test&format=json"
+```
+
+Expected: JSON response with results array.
+
+### 2. Configure OpenQuery
+```bash
+# Interactive setup
+openquery configure -i
+
+# Or via environment variables
+setx OPENROUTER_API_KEY "sk-or-..."  # Windows
+export OPENROUTER_API_KEY="sk-or-..." # Linux/macOS
+```
+
+### 3. Optional: Set Defaults
+```bash
+openquery configure --queries 5 --chunks 4 --results 10
+```
+
+## Verification
+
+### Test Installation
+```bash
+# Check binary exists and is executable
+which openquery  # Linux/macOS
+where openquery  # Windows
+
+# If installed as OpenQuery (capital O)
+which OpenQuery
+```
+
+### Test Configuration
+```bash
+# Should show your config or defaults
+cat ~/.config/openquery/config
+```
+
+### Test the System
+```bash
+# Simple query (should work with any API key)
+openquery "What is 2+2?"
+
+# More complex query
+openquery -v "What are the benefits of exercise?"
+```
+
+Expected output:
+- Spinner animation with status updates
+- Streaming answer from the AI
+- Citations like `[Source 1](url)` in the answer
+
+## Uninstallation
+
+### Using Uninstall Script
+```bash
+chmod +x uninstall.sh
+./uninstall.sh
+```
+
+The script will:
+- Remove binary from `~/.local/bin/`
+- Ask if you want to delete config directory
+
+### Manual Removal
+```bash
+# Remove binary
+rm ~/.local/bin/OpenQuery
+
+# Remove config (optional)
+rm -r ~/.config/openquery
+```
+
+### Remove SearxNG (if no longer needed)
+```bash
+docker rm -f searxng
+docker volume rm searxng-data
+```
+
+## Advanced Build Options
+
+### Reduce Binary Size
+Edit `OpenQuery.csproj`:
+```xml
+<PropertyGroup>
+  <PublishAot>true</PublishAot>
+  <InvariantGlobalization>true</InvariantGlobalization>  <!-- Already set -->
+  <StripSymbols>true</StripSymbols>
+</PropertyGroup>
+```
+
+### Debug Build
+```bash
+dotnet build -c Debug
+dotnet run -- "your question"
+```
+
+### With Symbols (for debugging)
+```bash
+dotnet publish -c Release -r linux-x64 \
+  --self-contained true \
+  /p:PublishAot=true \
+  /p:DebugType=portable
+```
+
+## Troubleshooting Installation
+
+### "dotnet: command not found"
+- Add `.dotnet` to PATH: `export PATH="$HOME/.dotnet:$PATH"`
+- Restart terminal or source shell config
+
+### "The SDK 'Microsoft.NET.Sdk' was not found"
+- .NET SDK not installed correctly
+- Re-run installer or use `dotnet-install.sh`
+
+### AOT Build Fails
+- Some platforms may not support AOT yet
+- Remove `/p:PublishAot=true` to use JIT
+- Check [.NET AOT support](https://docs.microsoft.com/dotnet/core/deploying/native-aot/)
+
+### Docker Pull Fails (SearxNG)
+```bash
+# Pull image separately first
+docker pull searxng/searxng:latest
+# Then run container
+docker run -d --name searxng -p 8002:8080 searxng/searxng
+```
+
+### Port 8002 Already in Use
+Change port in docker command:
+```bash
+docker run -d --name searxng -p 8080:8080 searxng/searxng
+# Then set SEARXNG_URL=http://localhost:8080
+```
+
+## Next Steps
+
+After successful installation:
+1. [Configure OpenQuery](configuration.md)
+2. [Learn how to use it](usage.md)
+3. Read the [Architecture](architecture.md) to understand how it works
+
+---
+
+**Need help?** See [Troubleshooting](troubleshooting.md) or open an issue.
diff --git a/docs/performance.md b/docs/performance.md
new file mode 100644
index 0000000..beb23be
--- /dev/null
+++ b/docs/performance.md
@@ -0,0 +1,522 @@
+# Performance
+
+Performance characteristics, optimization strategies, and scalability considerations for OpenQuery.
+
+## 📋 Table of Contents
+
+1. [Performance Overview](#performance-overview)
+2. [Latency Breakdown](#latency-breakdown)
+3. [Throughput](#throughput)
+4. [Memory Usage](#memory-usage)
+5. [Benchmarking](#benchmarking)
+6. [Optimization Strategies](#optimization-strategies)
+7. [Scalability Limits](#scalability-limits)
+
+## Performance Overview
+
+OpenQuery is designed for **low-latency interactive use** (15-50 seconds end-to-end) while maximizing parallelization to minimize wait time.
+
+### Key Metrics
+
+| Metric | Typical | Best Case | Worst Case |
+|--------|---------|-----------|------------|
+| **End-to-End Latency** | 15-50s | 10s | 120s+ |
+| **API Cost** | $0.01-0.05 | $0.005 | $0.20+ |
+| **Memory Footprint** | 100-300MB | 50MB | 1GB+ |
+| **Network I/O** | 5-20MB | 1MB | 100MB+ |
+
+**Note**: Wide variance due to network latency, content size, and LLM speed.
+
+---
+
+## Latency Breakdown
+
+### Default Configuration
+
+`-q 3 -r 5 -c 3` (3 queries, 5 results each, 3 final chunks)
+
+| Stage | Operation | Parallelism | Time (p50) | Time (p95) | Dominant Factor |
+|-------|-----------|-------------|------------|------------|-----------------|
+| 1 | Query Generation | 1 | 2-5s | 10s | LLM inference speed |
+| 2a | Searches (3 queries × 5 results) | 3 concurrent | 3-8s | 15s | SearxNG latency |
+| 2b | Article Fetching (≈15 URLs) | 10 concurrent | 5-15s | 30s | Each site's response time |
+| 2c | Chunking | 10 concurrent | <1s | 2s | CPU (HTML parsing) |
+| 3a | Query Embedding | 1 | 0.5-1s | 3s | Embedding API latency |
+| 3b | Chunk Embeddings (≈50 chunks) | 4 concurrent | 1-3s | 10s | Batch API latency |
+| 4 | Ranking | 1 | <0.1s | 0.5s | CPU (vector math) |
+| 5 | Final Answer Streaming | 1 | 5-20s | 40s | LLM generation speed |
+| **Total** | | | **16-50s** | **~60s** | |
+
+### Phase Details
+
+#### Phase 1: Query Generation (2-5s)
+- Single non-streaming LLM call
+- Input: system prompt + user question (~200 tokens)
+- Output: JSON array of 3-5 short strings (~50 tokens)
+- Fast because small context and output
+
+#### Phase 2a: Searches (3-8s)
+- 3 parallel `SearxngClient.SearchAsync` calls
+- Each: query → SearxNG → aggregator engines → scraped results
+- Latency highly variable based on:
+  - SearxNG instance performance
+  - Network distance to SearxNG
+  - SearxNG's upstream search engines
+
+#### Phase 2b: Article Fetching (5-15s)
+- ≈15 URLs to fetch (3 queries × 5 results minus duplicates)
+- Up to 10 concurrent fetches (semaphore)
+- Each: TCP connect + TLS handshake + HTTP GET + SmartReader parse
+- Latency:
+  - Fast sites (CDN, cached): 200-500ms
+  - Normal sites: 1-3s
+  - Slow/unresponsive sites: timeout after ~30s
+
+Why 5-15s for 15 URLs with 10 concurrent?
+- First wave (10 URLs): max latency among them ≈ 3s → 3s
+- Second wave (5 URLs): another ≈ 3s → total 6s
+- But many URLs faster (500ms) → total ≈ 2-3s
+- However, some sites take 5-10s → dominates
+
+**Tail latency**: Slowest few URLs can dominate total time. Cannot proceed until all fetch attempts complete (or fail).
+
+#### Phase 2c: Chunking (<1s)
+- CPU-bound HTML cleaning and splitting
+- SmartReader is surprisingly fast; C# HTML parser
+- Typically 100-300 chunks total
+- <1s on modern CPU
+
+#### Phase 3: Embeddings (1.5-4s)
+- **Query embedding**: 1 call, ~200 tokens, ≈ 0.5-1s
+- **Chunk embeddings**: ≈50 chunks → 1 batch of 50 (batch size 300 unused here)
+  - Batch of 50: still single API call, ~15K tokens (50 × 300 chars ≈ 15K tokens)
+  - If using `text-embedding-3-small`: $0.00002 per 1K → ~$0.0003 per batch
+  - Latency: 1-3s for embedding API
+
+If more chunks (say 500), would be 2 batches → maybe 2-4s.
+
+Parallel batches (4 concurrent) help if many batches (1500+ chunks).
+
+#### Phase 4: Ranking (<0.1s)
+- Cosine similarity for 50-100 chunks
+- Each: dot product + normalization (O(dim)=1536)
+- 100 × 1536 ≈ 150K FLOPs → negligible on modern CPU
+- SIMD acceleration from `TensorPrimitives`
+
+#### Phase 5: Final Answer (5-20s)
+- Streaming chat completion
+- Input: system prompt + context (50K tokens for 3×500-char chunks) + question
+- Output: varies wildly (200-2000 tokens typically)
+- Longer context slightly increases latency
+- Model choice major factor:
+  - Qwen Flash: fast (5-10s for 1000 output tokens)
+  - Gemini Flash: moderate (10-15s)
+  - Llama-class: slower (20-40s)
+
+---
+
+## Throughput
+
+### Sequential Execution
+
+Running queries one after another (default CLI behavior):
+- Latency per query: 16-50s
+- Throughput: 1 query / 20s ≈ 180 queries/hour (theoretically)
+
+But API rate limits will kick in before that:
+- OpenRouter free tier: limited RPM/TPM
+- Even paid: soft limits
+
+### Concurrent Execution (Multiple OpenQuery Instances)
+
+You could run multiple OpenQuery processes in parallel (different terminals), but they share:
+- Same API key (OpenRouter rate limit is per API key, not per process)
+- Same SearxNG instance (could saturate it)
+
+**Practical**: 3-5 concurrent processes before hitting diminishing returns or rate limits.
+
+### Throughput Optimization
+
+To maximize queries per hour:
+1. Use fastest model (Qwen Flash)
+2. Reduce `--chunks` to 1-2
+3. Reduce `--queries` to 1
+4. Use local/fast SearxNG
+5. Cache embedding results (not implemented)
+6. Batch multiple questions in one process (not implemented; would require redesign)
+
+**Achievable**: Maybe 500-1000 queries/hour on paid OpenRouter plan with aggressive settings.
+
+---
+
+## Memory Usage
+
+### Baseline
+
+.NET 10 AOT app with dependencies:
+- **Code**: ~30MB (AOT compiled native code)
+- **Runtime**: ~20MB (.NET runtime overhead)
+- **Base Memory**: ~50MB
+
+### Per-Query Memory
+
+| Component | Memory | Lifetime |
+|-----------|--------|----------|
+| Search results (15 items) | ~30KB | Pipeline |
+| Articles (raw HTML) | ~5MB (transient) | Freed after parse |
+| Articles (extracted text) | ~500KB | Until pipeline complete |
+| Chunks (≈100 items) | ~50KB text + embeddings 600KB | Until pipeline complete |
+| Embeddings (100 × 1536 floats) | ~600KB | Until pipeline complete |
+| HTTP buffers | ~1MB per concurrent request | Short-lived |
+| **Total per query** | **~2-5MB** (excluding base) | Released after complete |
+
+**Peak**: When all articles fetched but not yet embedded, we have text ~500KB + chunks ~650KB = ~1.2MB + overhead ≈ 2-3MB.
+
+**If processing many queries in parallel** (unlikely for CLI), memory would scale linearly.
+
+### Memory Leak Risks
+
+- `HttpClient` instances: Created per `OpenRouterClient` and `SearxngClient`. Should be disposed (not happening). But short-lived process exits anyway.
+- `StatusReporter` background task: Disposed via `using`
+- `RateLimiter` semaphore: Disposed via `IAsyncDisposable` if wrapped in `using` (not currently, but short-lived)
+
+No major leaks observed.
+
+### Memory Optimization Opportunities
+
+1. **Reuse HttpClient** with `IHttpClientFactory` (but not needed for CLI)
+2. **Stream article fetching** instead of buffering all articles before embedding (possible: embed as URLs complete)
+3. **Early chunk filtering**: Discard low-quality chunks before embedding to reduce embedding count
+4. **Cache embeddings**: By content hash, avoid re-embedding seen text (would need persistent storage)
+
+---
+
+## Benchmarking
+
+### Methodology
+
+Measure with `time` command and verbose logging:
+
+```bash
+time openquery -v "What is quantum entanglement?" 2>&1 | tee log.txt
+```
+
+Parse log for timestamps (or add them manually by modifying code).
+
+### Sample Benchmark
+
+**Environment**:
+- Linux x64, .NET 10 AOT
+- SearxNG local Docker (localhost:8002)
+- OpenRouter API (US East)
+- Model: qwen/qwen3.5-flash-02-23
+
+**Run 1**:
+```
+real    0m23.4s
+user    0m1.2s
+sys     0m0.3s
+```
+Log breakdown:
+- Query generation: 3.2s
+- Searches: 4.1s
+- Article fetching: 8.7s (12 URLs)
+- Embeddings: 2.8s (45 chunks)
+- Final answer: 4.6s (325 tokens)
+
+**Run 2** (cached SearxNG results, same URLs):
+```
+real    0m15.8s
+```
+Faster article fetching (2.3s) because sites cached or faster second request.
+
+**Run 3** (verbose `-s` short answer):
+```
+real    0m18.2s
+```
+Final answer faster (2.1s instead of 4.6s) due to shorter output.
+
+### Benchmarking Tips
+
+1. **Warm up**: First run slower (JIT or AOT cold start). Discard first measurement.
+2. **Network variance**: Run multiple times and average.
+3. **Control variables**: Same question, same SearxNG instance, same network conditions.
+4. **Measure API costs**: Check OpenRouter dashboard for token counts.
+5. **Profile with dotTrace** or `perf` if investigating CPU bottlenecks.
+
+---
+
+## Optimization Strategies
+
+### 1. Tune Concurrent Limits
+
+Edit `SearchTool.cs` where `_options` is created:
+
+```csharp
+var _options = new ParallelProcessingOptions
+{
+    MaxConcurrentArticleFetches = 5,        // ↓ from 10
+    MaxConcurrentEmbeddingRequests = 2,    // ↓ from 4
+    EmbeddingBatchSize = 300               // ↑ or ↓ (rarely matters)
+};
+```
+
+**Why tune down?**
+- Hit OpenRouter rate limits
+- Network bandwidth saturated
+- Too many concurrent fetches overwhelm target sites (ethical/scraping etiquette)
+
+**Why tune up?**
+- Fast network, powerful CPU, no rate limits
+- Many chunks (>500) needing parallel embedding batches
+
+**Monitor**:
+- `openquery -v` shows embedding progress: `[Generating embeddings: batch X/Y]`
+- If Y=1 (all fitted in one batch), batch size is fine
+- If Y>1 and max concurrent = Y, you're using full parallelism
+
+### 2. Reduce Data Volume
+
+**Fewer search results**:
+```bash
+openquery -r 3 "question"  # instead of 5 or 10
+```
+Effect: Fetches fewer URLs, extracts fewer chunks. Linear reduction in work.
+
+**Fewer queries**:
+```bash
+openquery -q 1 "question"
+```
+Effect: One search instead of N. Quality may suffer (less diverse sources).
+
+**Fewer chunks**:
+```bash
+openquery -c 1 "question"
+```
+Effect: Only top 1 chunk in context → fewer tokens → faster final answer, but may miss relevant info.
+
+**Chunk size** (compile-time constant):
+Edit `ChunkingService.cs`:
+```csharp
+private const int MAX_CHUNK_SIZE = 300;  // instead of 500
+```
+Effect: More chunks (more granular ranking) but each chunk shorter → more chunks to rank, more embeddings to generate. Could increase or decrease total time. Likely more tokens overall (more chunks in context if `-c` is fixed number).
+
+### 3. Change Embedding Model
+
+Currently hardcoded to `openai/text-embedding-3-small`. Could use:
+- `openai/text-embedding-3-large` (higher quality, slower, more expensive)
+- `intfloat/multilingual-e5-large` (multilingual, smaller)
+
+Modify `EmbeddingService` constructor:
+```csharp
+public EmbeddingService(OpenRouterClient client, string embeddingModel = "your-model")
+```
+
+Then pass:
+```csharp
+var embeddingService = new EmbeddingService(client, "intfloat/multilingual-e5-large");
+```
+
+**Impact**: Different dimensionality (1536 vs 1024 vs 4096). Memory scales with dim. Quality may vary for non-English queries.
+
+### 4. Caching
+
+**Current**: No caching. Every query hits all APIs.
+
+**Embedding cache** (by text hash):
+- Could store in memory: `Dictionary<string, float[]>`
+- Or disk: `~/.cache/openquery/embeddings/`
+- Invalidation: embeddings are deterministic per model, so long-term cache viable
+
+**Search cache** (by query hash):
+- Cache `List<SearxngResult>` for identical queries
+- TTL: maybe 1 hour (search results change over time)
+
+**Article cache** (by URL hash):
+- Cache `Article` (text content) per URL
+- Invalidation: could check `Last-Modified` header or use TTL (1 day)
+
+**Implementation effort**: Medium. Would need cache abstraction (interface, in-memory + disk options).
+
+**Benefit**: Repeat queries (common in testing or similar questions) become instant.
+
+### 5. Parallelize More (Aggressive)
+
+**Currently**: 
+- Searches: unbounded (as many as `--queries`)
+- Fetches: max 10
+- Embeddings: max 4
+
+Could increase:
+- Fetches to 20 or 50 (if network/CPU can handle)
+- Embeddings to 8-16 (if OpenRouter rate limit allows)
+
+**Risk**: 
+- Overwhelming target sites (unethical scraping)
+- API rate limits → 429 errors
+- Local bandwidth saturation
+
+### 6. Local Models (Self-Hosted)
+
+Replace OpenRouter with local LLM:
+- **Query generation**: Could run tiny model locally (no API latency)
+- **Embeddings**: Could run `all-MiniLM-L6-v2` locally (fast, free after setup)
+- **Answer**: Could run Llama 3 8B locally (no cost, but slower than GPT-4/Gemini)
+
+**Benefits**:
+- Zero API costs (after hardware)
+- No network latency
+- Unlimited queries
+
+**Drawbacks**:
+- GPU required for decent speed (or CPU very slow)
+- Setup complexity (Ollama, llama.cpp, vLLM, etc.)
+- Model quality may lag behind commercial APIs
+
+**Integration**: Would need to implement local inference backends (separate project scope).
+
+---
+
+## Scalability Limits
+
+### API Rate Limits
+
+**OpenRouter**:
+- Free tier: Very limited (few RPM)
+- Paid: Varies by model, but typical ~10-30 requests/second
+- Embedding API has separate limits
+
+**Mitigation**:
+- Reduce concurrency (see tuning)
+- Add exponential backoff (already have for embeddings)
+- Batch embedding requests (already done)
+
+### SearxNG Limits
+
+**Single instance**:
+- Can handle ~10-50 QPS depending on hardware
+- Upstream search engines may rate limit per instance
+- Memory ~100-500MB
+
+**Mitigation**:
+- Run multiple SearxNG instances behind load balancer
+- Use different public instances
+- Implement client-side rate limiting (currently only per-URL fetches limited, not searches)
+
+### Network Bandwidth
+
+**Typical data transfer**:
+- Searches: 1KB per query × 3 = 3KB
+- Articles: 100-500KB per fetch × 15 = 1.5-7.5MB (raw HTML)
+- Extracted text: ~10% of HTML size = 150-750KB
+- Embeddings: 100 chunks × 1536 × 4 bytes = 600KB (request + response)
+- Final answer: 2-10KB
+
+**Total**: ~3-10MB per query
+
+**100 queries/hour**: ~300MB-1GB data transfer
+
+**Not an issue** for broadband, but could matter on metered connections.
+
+---
+
+## Moatslaw's Law: Scaling with Chunk Count
+
+Let:
+- C = number of chunks with valid embeddings
+- d = embedding dimension (1536)
+- B = embedding batch size (300)
+- P = max parallel embedding batches (4)
+
+**Embedding Time** ≈ `O(C/B * 1/P)` (batches divided by parallelism)
+
+**Ranking Time** ≈ `O(C * d)` (dot product per chunk)
+
+**Context Tokens** (for final answer) ≈ `C * avg_chunk_tokens` (≈ 500 chars = 125 tokens)
+
+**As C increases**:
+- Embedding time: linear in C/B (sublinear if batch fits in one)
+- Ranking time: linear in C
+- Final answer latency: more tokens in context → longer context processing + potentially longer answer (more relevant chunks to synthesize)
+
+**Practical limit**:
+- With defaults, C ~ 50-100 (from 15 articles)
+- Could reach C ~ 500-1000 if:
+  - `--queries` = 10
+  - `--results` = 20 (200 URLs)
+  - Many articles long → many chunks each
+- At C = 1000:
+  - Embeddings: 1000/300 ≈ 4 batches, with 4 parallel → still 1 sequential step (if 4 batches, parallel all 4 → time ≈ 1 batch duration)
+  - But OpenRouter may have per-minute limits on embedding requests
+  - Ranking: 1000 × 1536 = 1.5M FLOPs → still <0.01s
+  - Context tokens: 1000 × 125 = 125K tokens! Many LLMs have 200K context, so fits, but expensive and slow.
+
+**Conclusion**: Current defaults scale to C ~ 100-200 comfortably. Beyond that:
+- Need to increase batch size or parallelism for embeddings
+- May hit embedding API rate limits
+- Context token count becomes expensive and may degrade answer quality (LLMs lose focus in very long context)
+
+---
+
+## Profiling
+
+### CPU Profiling
+
+Use `dotnet-trace` or `perf`:
+
+```bash
+# Collect trace for 30 seconds while running query
+dotnet-trace collect --process-id $(pgrep OpenQuery) --duration 30s -o trace.nettrace
+
+# Analyze with Visual Studio or PerfView
+```
+
+Look for:
+- Hot methods: `ChunkingService.ChunkText`, `EmbeddingService.GetEmbeddingsAsync`, cosine similarity
+- Allocation hotspots
+
+### Memory Profiling
+
+```bash
+dotnet-gcdump collect -p <pid>
+# Open in VS or dotnet-gcdump analyze
+```
+
+Check heap size, object counts (look for large `string` objects from article content).
+
+### Network Profiling
+
+Use `tcpdump` or `wireshark`:
+```bash
+tcpdump -i any port 8002 or port 443 -w capture.pcap
+```
+
+Or simpler: `time` on individual curl commands to measure latency components.
+
+---
+
+## Next Steps
+
+- [Configuration](../configuration.md) - Tune for your environment
+- [Troubleshooting](../troubleshooting.md) - Diagnose slow performance
+- [Architecture](../architecture.md) - Understand pipeline bottlenecks
+
+---
+
+**Quick Tuning Cheatsheet**
+
+```bash
+# Fast & cheap (factual Q&A)
+openquery -q 1 -r 3 -c 2 -s "What is X?"
+
+# Thorough (research)
+openquery -q 5 -r 10 -c 5 -l "Deep dive on X"
+
+# Custom code edit for concurrency
+# In SearchTool.cs:
+_options = new ParallelProcessingOptions {
+    MaxConcurrentArticleFetches = 20,  // if network can handle
+    MaxConcurrentEmbeddingRequests = 8  // if API allows
+};
+```
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
new file mode 100644
index 0000000..5c976b1
--- /dev/null
+++ b/docs/troubleshooting.md
@@ -0,0 +1,699 @@
+# Troubleshooting
+
+Solve common issues, errors, and performance problems with OpenQuery.
+
+## 📋 Table of Contents
+
+1. [Common Errors](#common-errors)
+2. [Performance Issues](#performance-issues)
+3. [Debugging Strategies](#debugging-strategies)
+4. [Getting Help](#getting-help)
+
+## Common Errors
+
+### ❌ "API Key is missing"
+
+**Error Message**:
+```
+[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i' to set it up.
+```
+
+**Cause**: No API key available from environment or config file.
+
+**Solutions**:
+
+1. **Set environment variable** (temporary):
+```bash
+export OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+```
+
+2. **Configure interactively** (persistent):
+```bash
+openquery configure -i
+# Follow prompts to enter API key
+```
+
+3. **Check config file**:
+```bash
+cat ~/.config/openquery/config
+# Should contain: ApiKey=sk-or-...
+```
+
+4. **Verify environment**:
+```bash
+echo $OPENROUTER_API_KEY
+# If empty, you didn't export or exported in wrong shell
+```
+
+---
+
+### ❌ "Network request failed"
+
+**Error Message**:
+```
+[Error] Network request failed. Details: Name or service not known
+```
+
+**Cause**: Cannot reach OpenRouter or SearxNG API endpoints.
+
+**Solutions**:
+
+1. **Check internet connectivity**:
+```bash
+ping 8.8.8.8
+curl https://openrouter.ai
+```
+
+2. **Verify SearxNG is running**:
+```bash
+curl "http://localhost:8002/search?q=test&format=json"
+# Should return JSON
+```
+
+If connection refused:
+```bash
+# Start SearxNG if using Docker
+docker start searxng
+# Or run fresh
+docker run -d --name searxng -p 8002:8080 searxng/searxng:latest
+```
+
+3. **Check firewall/proxy**:
+```bash
+# Test OpenRouter API
+curl -H "Authorization: Bearer $OPENROUTER_API_KEY" \
+     https://openrouter.ai/api/v1/models
+```
+
+4. **Test from different network** (if behind restrictive firewall)
+
+---
+
+### ❌ "No search results found"
+
+**Error Message**:
+```
+No search results found.
+```
+
+**Cause**: Search queries returned zero results from SearxNG.
+
+**Solutions**:
+
+1. **Test SearxNG manually**:
+```bash
+curl "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
+# Should be > 0
+```
+
+2. **Check SearxNG configuration**:
+- If self-hosted: ensure internet access is enabled in `/etc/searxng/settings.yml`
+- Some public instances disable certain engines or have rate limits
+
+3. **Try a different SearxNG instance**:
+```bash
+export SEARXNG_URL="https://searx.example.com"
+openquery "question"
+```
+
+4. **Use simpler queries**: Some queries may be too obscure or malformed
+
+5. **Verbose mode to see queries**:
+```bash
+openquery -v "complex question"
+# See what queries were generated
+```
+
+---
+
+### ❌ "Found search results but could not extract readable content."
+
+**Cause**: SearxNG returned results but `ArticleService` failed to extract content from all URLs.
+
+**Common Reasons**:
+- JavaScript-heavy sites (React, Vue apps) where content loaded dynamically
+- Paywalled sites (NYT, academic journals)
+- PDFs or non-HTML content
+- Malformed HTML
+- Server returned error (404, 403, 500)
+- `robots.txt` blocked crawler
+
+**Solutions**:
+1. **Accept that some sites can't be scraped** - try different query to get different results
+2. **Use site:reddit.com or site:wikipedia.org** - these are usually scrape-friendly
+3. **Increase `--results`** to get more URLs (some will work)
+4. **Check verbose output**:
+```bash
+openquery -v "question"
+# Look for "Warning: Failed to fetch article"
+```
+5. **Try a local SearxNG instance with more engines** - some engines fetch different sources
+
+---
+
+### ❌ Rate Limiting (429 Too Many Requests)
+
+**Symptoms**:
+```bash
+[Error] Response status code does not indicate success: 429 (Too Many Requests).
+```
+
+Or retries exhausting after Polly attempts.
+
+**Cause**: Too many concurrent requests to OpenRouter API.
+
+**Solutions**:
+
+1. **Reduce concurrency** (edit `SearchTool.cs`):
+```csharp
+var _options = new ParallelProcessingOptions
+{
+    MaxConcurrentArticleFetches = 5,  // reduce from 10
+    MaxConcurrentEmbeddingRequests = 2,  // reduce from 4
+    EmbeddingBatchSize = 150  // reduce from 300
+};
+```
+
+2. **Add delay** between embedding batches (custom implementation)
+
+3. **Upgrade OpenRouter plan** to higher rate limits
+
+4. **Wait and retry** - rate limits reset after time window
+
+---
+
+### ❌ Slow Performance
+
+**Symptom**: Queries take 60+ seconds when they usually take 20s.
+
+**Diagnosis Steps**:
+
+1. **Run with verbose mode**:
+```bash
+openquery -v "question"
+```
+Watch which phase takes longest:
+- Query generation?
+- Searching?
+- Fetching articles?
+- Embeddings?
+
+2. **Check network latency**:
+```bash
+time curl "https://openrouter.ai/api/v1/models"
+time curl "http://localhost:8002/search?q=test&format=json"
+```
+
+**Common Causes & Fixes**:
+
+| Phase | Cause | Fix |
+|-------|-------|-----|
+| Searches | SearxNG overloaded/slow | Check CPU/memory, restart container |
+| Fetching | Target sites slow | Reduce `--results` to fewer URLs |
+| Embeddings | API rate limited | Reduce concurrency (see above) |
+| Answer | Heavy model/load | Switch to faster model (e.g., Qwen Flash) |
+
+3. **Resource monitoring**:
+```bash
+htop  # CPU/memory usage
+iftop  # network throughput
+```
+
+4. **Reduce parameters**:
+```bash
+openquery -q 2 -r 3 -c 2 "question"  # lighter load
+```
+
+---
+
+### ❌ Out of Memory
+
+**Symptoms**:
+- Process killed by OOM killer (Linux)
+- `System.OutOfMemoryException`
+- System becomes unresponsive
+
+**Cause**: Processing too many large articles simultaneously.
+
+**Why**: Each article can be 100KB+ of text, split into many chunks, embeddings are 6KB per chunk (1536 floats × 4 bytes). 200 chunks = 1.2MB embeddings, plus text ~100KB = 1.3MB. Not huge, but many large articles could create thousands of chunks.
+
+**Solutions**:
+
+1. **Reduce `--results`** (fewer URLs per query):
+```bash
+openquery -r 3 "question"  # instead of 10
+```
+
+2. **Reduce `--queries`** (fewer search queries):
+```bash
+openquery -q 2 "question"
+```
+
+3. **Fetches already limited** to 10 concurrent by default, which is reasonable
+
+4. **Check article size**: Some sites (PDFs, long documents) may yield megabytes of text; SmartReader should truncate but may not
+
+---
+
+### ❌ Invalid JSON from Query Generation
+
+**Symptom**: Query generation fails silently, falls back to original question.
+
+**Cause**: LLM returned non-JSON (even though instructed). Could be:
+- Model not instruction-following
+- Output exceeded context window
+- API error in response
+
+**Detection**: Run with `-v` to see:
+```
+[Failed to generate queries, falling back to original question. Error: ...]
+```
+
+**Solutions**:
+- Try a different model (configure to use Gemini or DeepSeek)
+- Reduce `--queries` count (simpler task)
+- Tune system prompt (would require code change)
+- Accept fallback - the original question often works as sole query
+
+---
+
+### ❌ Spinner Artifacts in Output
+
+**Symptom**: When redirecting output to file, you see weird characters like `⠋`, `�`, etc.
+
+**Cause**: Spinner uses Unicode Braille characters and ANSI escape codes.
+
+**Fix**: Use `2>/dev/null | sed 's/.\x08//g'` to clean:
+```bash
+openquery "question" 2>/dev/null | sed 's/.\x08//g' > answer.md
+```
+
+Or run with `--verbose` (no spinner, only newline-separated messages):
+```bash
+openquery -v "question" > answer.txt
+```
+
+---
+
+### ❌ "The type or namespace name '...' does not exist" (Build Error)
+
+**Cause**: Missing NuGet package or wrong .NET SDK version.
+
+**Solution**:
+
+1. **Verify .NET SDK 10.0**:
+```bash
+dotnet --version
+# Should be 10.x
+```
+
+If lower: https://dotnet.microsoft.com/download/dotnet/10.0
+
+2. **Restore packages**:
+```bash
+dotnet restore
+```
+
+3. **Clean and rebuild**:
+```bash
+dotnet clean
+dotnet build
+```
+
+4. **Check OpenQuery.csproj** for package references:
+```xml
+<PackageReference Include="Polly.Core" Version="8.6.6" />
+<PackageReference Include="Polly.RateLimiting" Version="8.6.6" />
+<PackageReference Include="SmartReader" Version="0.11.0" />
+<PackageReference Include="System.CommandLine" Version="2.0.0-beta4.22272.1" />
+<PackageReference Include="System.Numerics.Tensors" Version="9.0.0" />
+```
+
+If restore fails, these packages may not be available for .NET 10 preview. Consider:
+- Downgrade to .NET 8.0 (if packages incompatible)
+- Or find package versions compatible with .NET 10
+
+---
+
+### ❌ AOT Compilation Fails
+
+**Error**: `error NETSDK1085: The current .NET SDK does not support targeting .NET 10.0.`
+
+**Cause**: Using .NET SDK older than 10.0.
+
+**Fix**: Install .NET SDK 10.0 preview.
+
+**Or**: Disable AOT for development (edit `.csproj`):
+```xml
+<!-- Remove or set to false -->
+<PublishAot>false</PublishAot>
+```
+
+---
+
+## Performance Issues
+
+### Slow First Request
+
+**Expected**: First query slower (JIT compilation for .NET runtime if not AOT, or initial API connections).
+
+If not using AOT:
+- Consider publishing with `/p:PublishAot=true` for production distribution
+- Development builds use JIT, which adds 500ms-2s warmup
+
+**Mitigation**: Accept as warmup cost, or pre-warm with dummy query.
+
+---
+
+### High Memory Usage
+
+**Check**:
+```bash
+ps aux | grep OpenQuery
+# Look at RSS (resident set size)
+```
+
+**Typical**: 50-200MB (including .NET runtime, AOT code, data structures)
+
+**If >500MB**:
+- Likely processing very many articles
+- Check `--results` and `--queries` values
+- Use `--verbose` to see counts: `[Fetched X search results]`, `[Extracted Y chunks]`
+
+**Reduce**:
+- `--queries 2` instead of 10
+- `--results 3` instead of 15
+- These directly limit number of URLs to fetch
+
+---
+
+### High CPU Usage
+
+**Cause**: 
+- SmartReader HTML parsing (CPU-bound)
+- Cosine similarity calculations (many chunks, but usually fast)
+- Spinner animation (negligible)
+
+**Check**: `htop` → which core at 100%? If single core, likely parsing. If all cores, parallel fetch.
+
+**Mitigation**:
+- Ensure `MaxConcurrentArticleFetches` not excessively high (default 10 is okay)
+- Accept - CPU spikes normal during fetch phase
+
+---
+
+### API Costs Higher Than Expected
+
+**Symptom**: OpenRouter dashboard shows high token usage.
+
+**Causes**:
+1. Using expensive model (check `OPENROUTER_MODEL`)
+2. High `--chunks` → more tokens in context
+3. High `--queries` + `--results` → many articles → many embedding tokens (usually cheap)
+4. Long answers (many completion tokens) - especially with `--long`
+
+**Mitigation**:
+- Use `qwen/qwen3.5-flash-02-23` (cheapest good option)
+- Reduce `--chunks` to 2-3
+- Use `--short` when detailed answer not needed
+- Set `MaxTokens` in request (would need code change or **LLM capabilities**
+
+---
+
+## Debugging Strategies
+
+### 1. Enable Verbose Mode
+
+Always start with:
+```bash
+openquery -v "question" 2>&1 | tee debug.log
+```
+
+Logs everything:
+- Generated queries
+- URLs fetched
+- Progress counts
+- Errors/warnings
+
+**Analyze log**:
+- How many queries generated? (Should match `--queries`)
+- How many search results per query? (Should be ≤ `--results`)
+- How many articles fetched successfully?
+- How many chunks extracted?
+- Any warnings?
+
+---
+
+### 2. Isolate Components
+
+**Test SearxNG**:
+```bash
+curl "http://localhost:8002/search?q=test&format=json" | jq '.results[0]'
+```
+
+**Test OpenRouter API**:
+```bash
+curl -X POST https://openrouter.ai/api/v1/chat/completions \
+  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"qwen/qwen3.5-flash-02-23","messages":[{"role":"user","content":"Hello"}]}'
+```
+
+**Test Article Fetching** (with known good URL):
+```bash
+curl -L "https://example.com/article" | head -50
+```
+Then check if SmartReader can parse.
+
+---
+
+### 3. Reduce Scope
+
+Test with minimal parameters to isolate failing phase:
+
+```bash
+# 1 query, 2 results, 1 chunk - should be fast and simple
+openquery -q 1 -r 2 -c 1 "simple test question" -v
+
+# If that works, gradually increase:
+openquery -q 1 -r 5 -c 1 "simple question"
+openquery -q 3 -r 5 -c 1 "simple question"
+openquery -q 3 -r 5 -c 3 "simple question"
+
+# Then try complex question
+```
+
+---
+
+### 4. Check Resource Limits
+
+**File descriptors**: If fetching many articles, may hit limit.
+```bash
+ulimit -n  # usually 1024, should be fine
+```
+
+**Memory**: Monitor with `free -h` while running.
+
+**Disk space**: Not much disk use, but logs could fill if verbose mode used repeatedly.
+
+---
+
+### 5. Examine Config File
+
+```bash
+cat ~/.config/openquery/config
+# Ensure no spaces around '='
+# Correct: ApiKey=sk-or-...
+# Wrong: ApiKey = sk-or-...  (spaces become part of value)
+```
+
+Reconfigure if needed:
+```bash
+openquery configure --key "sk-or-..."
+```
+
+---
+
+### 6. Clear Cache / Reset
+
+No persistent cache exists, but:
+- Re-start SearxNG container: `docker restart searxng`
+- Clear DNS cache if network issues: `sudo systemd-resolve --flush-caches`
+
+---
+
+## Getting Help
+
+### Before Asking
+
+Gather information:
+
+1. **OpenQuery version** (commit or build date if available)
+2. **OS and architecture**: `uname -a` (Linux/macOS) or `systeminfo` (Windows)
+3. **Full command** you ran
+4. **Verbose output**: `openquery -v "question" 2>&1 | tee log.txt`
+5. **Config** (redact API key):
+```bash
+sed 's/ApiKey=.*/ApiKey=REDACTED/' ~/.config/openquery/config
+```
+6. **SearxNG test**:
+```bash
+curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
+```
+7. **OpenRouter test**:
+```bash
+curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \
+     https://openrouter.ai/api/v1/models | jq '.data[0].id'
+```
+
+---
+
+### Where to Ask
+
+1. **GitHub Issues** (if repository hosted there):
+   - Search existing issues first
+   - Provide all info from above
+   - Include log file (or link to gist)
+
+2. **Community Forum** (if exists)
+
+3. **Self-Diagnose**:
+   - Check `docs/troubleshooting.md` (this file)
+   - Check `docs/configuration.md`
+   - Check `docs/usage.md`
+
+---
+
+### Example Bug Report
+
+```
+Title: OpenQuery hangs on "Fetching article X/Y"
+
+Platform: Ubuntu 22.04, .NET 10.0, OpenQuery built from commit abc123
+Command: openquery -v "What is Docker?" 2>&1 | tee log.txt
+
+Verbose output shows:
+[...]
+[Fetching article 1/15: docker.com]
+[Fetching article 2/15: hub.docker.com]
+[Fetching article 3/15: docs.docker.com]
+# Hangs here indefinitely, no more progress
+
+SearxNG test:
+$ curl "http://localhost:8002/search?q=docker&format=json" | jq '.results | length'
+15  # SearxNG works
+
+Config:
+ApiKey=sk-or-xxxx (redacted)
+Model=qwen/qwen3.5-flash-02-23
+DefaultQueries=3
+DefaultChunks=3
+DefaultResults=5
+
+Observation:
+- Fetches 3 articles fine, then stalls
+- Nothing in log after "Fetching article 3/15"
+- Process uses ~150MB memory, CPU 0% (idle)
+- Ctrl+C exits immediately
+
+Expected: Should fetch remaining 12 articles (concurrent up to 10)
+Actual: Only 3 fetched, then silent hang
+```
+
+---
+
+## Known Issues
+
+### Issue: Spinner Characters Not Displaying
+
+Some terminals don't support Braille Unicode patterns.
+
+**Symptoms**: Spinner shows as `?` or boxes.
+
+**Fix**: Use font with Unicode support, or disable spinner by setting `TERM=dumb` or use `--verbose`.
+
+---
+
+### Issue: Progress Messages Overwritten
+
+In very fast operations, progress updates may overlap.
+
+**Cause**: `StatusReporter` uses `Console.Write` without lock in compact mode; concurrent writes from channel processor and spinner task could interleave.
+
+**Mitigation**: Unlikely in practice (channel serializes, spinner only updates when `_currentMessage` set). If problematic, add lock around Console operations.
+
+---
+
+### Issue: Articles with No Text Content
+
+Some URLs return articles with empty `TextContent`.
+
+**Cause**: SmartReader's quality heuristic (`IsReadable`) failed, or article truly has no text (image, script, error page).
+
+**Effect**: Those URLs contribute zero chunks.
+
+**Acceptable**: Part of normal operation; not all URLs yield readable content.
+
+---
+
+### Issue: Duplicate Sources in Answer
+
+Same website may appear multiple times (different articles).
+
+**Cause**: Different URLs from different search results may be from same domain but different pages.
+
+**Effect**: `[Source 1]` and `[Source 3]` could both be `example.com`. Not necessarily bad - they're different articles.
+
+---
+
+## Performance Tuning Reference
+
+| Setting | Default | Fastest | Most Thorough | Notes |
+|---------|---------|---------|---------------|-------|
+| `--queries` | 3 | 1 | 8+ | More queries = more searches |
+| `--results` | 5 | 2 | 15+ | Fewer = fewer articles to fetch |
+| `--chunks` | 3 | 1 | 5+ | More chunks = more context tokens |
+| `MaxConcurrentArticleFetches` | 10 | 5 | 20 | Higher = more parallel fetches |
+| `MaxConcurrentEmbeddingRequests` | 4 | 2 | 8 | Higher = faster embeddings (may hit rate limits) |
+| `EmbeddingBatchSize` | 300 | 100 | 1000 | Larger = fewer API calls, more data per call |
+
+**Start**: Defaults are balanced.
+
+**Adjust if**:
+- Slow: Reduce `--results`, `--queries`, or concurrency limits
+- Poor quality: Increase `--chunks`, `--results`, `--queries`
+- Rate limited: Reduce concurrency limits
+- High cost: Use `--short`, reduce `--chunks`, choose cheaper model
+
+---
+
+## Next Steps
+
+- [Performance](../performance.md) - Detailed performance analysis
+- [Configuration](../configuration.md) - Adjust settings
+- [Usage](../usage.md) - Optimize workflow
+
+---
+
+**Quick Diagnostic Checklist**
+
+```bash
+# 1. Check API key
+echo $OPENROUTER_API_KEY | head -c 10
+
+# 2. Test SearxNG
+curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
+
+# 3. Test OpenRouter
+curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \
+     https://openrouter.ai/api/v1/models | jq '.data[0].id'
+
+# 4. Run verbose
+openquery -v "test" 2>&1 | grep -E "Fetching|Generated|Found"
+
+# 5. Check resource usage while running
+htop
+
+# 6. Reduce scope and retry
+openquery -q 1 -r 2 -c 1 "simple test"
+```
diff --git a/docs/usage.md b/docs/usage.md
new file mode 100644
index 0000000..e89cdb5
--- /dev/null
+++ b/docs/usage.md
@@ -0,0 +1,483 @@
+# Usage Guide
+
+Complete reference for using the OpenQuery command-line interface.
+
+## 📋 Table of Contents
+
+1. [Basic Usage](#basic-usage)
+2. [Command Reference](#command-reference)
+3. [Examples](#examples)
+4. [Output Format](#output-format)
+5. [Tips and Tricks](#tips-and-tricks)
+
+## Basic Usage
+
+### Simplest Form
+```bash
+openquery "your question here"
+```
+
+That's it! OpenQuery will:
+1. Generate search queries
+2. Search the web
+3. Extract relevant content
+4. Stream an answer with sources
+
+### Common Pattern
+```bash
+openquery [OPTIONS] "your question"
+```
+
+Quotes around the question are recommended to preserve spaces.
+
+## Command Reference
+
+### Main Command
+
+#### `openquery [options] <question>`
+
+Ask a question and get an AI-powered answer with citations.
+
+**Arguments**:
+- `question` (positional, one or more words) - The question to ask
+
+**Options**:
+
+| Option | Aliases | Type | Default | Description |
+|--------|---------|------|---------|-------------|
+| `--chunks` | `-c` | int | 3 (from config) | Number of top relevant content chunks to include in context |
+| `--results` | `-r` | int | 5 (from config) | Number of search results to fetch per generated query |
+| `--queries` | `-q` | int | 3 (from config) | Number of search queries to generate from your question |
+| `--short` | `-s` | bool | false | Request a concise, to-the-point answer |
+| `--long` | `-l` | bool | false | Request a detailed, comprehensive answer |
+| `--verbose` | `-v` | bool | false | Show detailed progress information and debug output |
+
+**Behavior**:
+- Short and long are mutually exclusive but can both be omitted (balanced answer)
+- If both `--short` and `--long` are specified, `--long` takes precedence
+- Options override configuration file defaults
+
+#### `openquery configure [options]`
+
+Configure OpenQuery settings (API key, model, defaults).
+
+**Options**:
+
+| Option | Type | Description |
+|--------|------|-------------|
+| `--interactive` / `-i` | bool | Launch interactive configuration wizard |
+| `--key` | string | Set the OpenRouter API key |
+| `--model` | string | Set the default model |
+| `--queries` | int? | Set default number of queries |
+| `--chunks` | int? | Set default number of chunks |
+| `--results` | int? | Set default number of results |
+
+**Examples**:
+```bash
+# Interactive wizard
+openquery configure -i
+
+# Set just the API key
+openquery configure --key "sk-or-..."
+
+# Set multiple defaults non-interactively
+openquery configure --model "deepseek/deepseek-v3.2" --queries 5 --chunks 4
+```
+
+**Note**: Options with `?` are nullable; only provided values are updated.
+
+## Examples
+
+### Everyday Queries
+
+**Simple factual question**:
+```bash
+openquery "What is the speed of light?"
+```
+
+**Multi-word question**:
+```bash
+openquery "How do solar panels work?"
+```
+
+**Question with special characters**:
+```bash
+openquery "What's the weather in New York?"
+```
+
+### Customizing Output
+
+**Get a quick answer**:
+```bash
+openquery -s "Who is the CEO of Tesla?"
+```
+Output: "Elon Musk is the CEO of Tesla." (minimal explanation)
+
+**Get detailed analysis**:
+```bash
+openquery -l "Explain how nuclear fusion works"
+```
+Output: Multi-paragraph detailed explanation with scientific details
+
+**See everything**:
+```bash
+openquery -v "What is machine learning?"
+```
+Output: Shows all progress messages alongside the answer
+
+### Adjusting Search Depth
+
+**Minimal search** (fast, cheap):
+```bash
+openquery -q 1 -r 2 -c 1 "What time is it in London?"
+```
+- 1 generated query
+- 2 results per query
+- 1 context chunk
+
+**Thorough research** (slow, comprehensive):
+```bash
+openquery -q 8 -r 15 -c 5 "History and applications of cryptography"
+```
+- 8 diverse queries
+- 15 results each
+- 5 top chunks
+
+**Balanced (recommended defaults)**:
+```bash
+openquery "Latest advancements in CRISPR technology"
+```
+- 3 queries
+- 5 results each
+- 3 top chunks
+
+### Combining Options
+
+**Verbose custom search**:
+```bash
+openquery -v -q 5 -r 10 -c 4 "What are the ethical implications of AI?"
+```
+
+**Short answer with more context**:
+```bash
+openquery -s -c 5 "Python vs JavaScript for web development"
+```
+
+**Long answer, lots of research**:
+```bash
+openquery -l -q 10 -r 20 -c 6 "Complete guide to quantum computing"
+```
+
+### Practical Use Cases
+
+**News and Current Events**:
+```bash
+openquery "Latest developments in the Ukraine conflict"
+```
+
+**Technical Questions**:
+```bash
+openquery "How to set up a PostgreSQL replication cluster"
+```
+
+**Health Information** (verify with doctor!):
+```bash
+openquery "What are the symptoms of vitamin D deficiency?"
+```
+
+**Cooking**:
+```bash
+openquery "How to make authentic Italian pizza dough"
+```
+
+**Travel**:
+```bash
+openquery "Best things to do in Tokyo in spring"
+```
+
+**Programming**:
+```bash
+openquery "Rust vs Go for backend development in 2025"
+```
+
+### Configuration Examples
+
+**Set up for the first time**:
+```bash
+openquery configure -i
+# Follow prompts to enter API key, choose model, set defaults
+```
+
+**Switch to a different model**:
+```bash
+openquery configure --model "google/gemini-3-flash-preview"
+```
+
+**Update default number of queries**:
+```bash
+openquery configure --queries 5
+```
+
+**Set cost-effective defaults**:
+```bash
+openquery configure --model "qwen/qwen3.5-flash-02-23" --queries 2 --chunks 2 --results 3
+```
+
+**Check your configuration**:
+```bash
+cat ~/.config/openquery/config
+```
+
+## Output Format
+
+### Standard Output (Streaming)
+
+The answer streams in real-time, character by character, like this:
+
+```
+⠋ Generating search queries...  (spinner with status)
+⠹ Searching web...
+⠸ Fetching articles...
+⠼ Processing embeddings...
+⠴ Generating answer...
+Assistant: Quantum entanglement is a phenomenon where pairs or groups of
+particles interact in ways such that the quantum state of each particle
+cannot be described independently of the others, even when separated by
+large distances.
+
+[Source 1: Understanding Quantum Mechanics](https://example.com/quantum)
+[Source 2: Quantum Physics Overview](https://example.com/physics)
+```
+
+### Verbose Mode Output (`-v`)
+
+When `--verbose` is enabled, you see detailed progress:
+
+```
+[Generating 3 search queries based on your question...]
+[Generated queries: 
+  1. quantum entanglement definition
+  2. how quantum entanglement works
+  3. quantum entanglement Bell's theorem
+]
+[Searching web for 'quantum entanglement definition'...]
+[Searching web for 'how quantum entanglement works'...]
+[Searching web for 'quantum entanglement Bell's theorem'...]
+[Fetched 15 search results total]
+[Fetching article 1/12: physicsworld.com]
+[Fetching article 2/12: nature.com]
+...
+[Fetching article 12/12: scientificamerican.com]
+[Extracted 48 content chunks]
+[Generating embeddings: batch 1/4]
+[Generating embeddings: batch 2/4]
+[Generating embeddings: batch 3/4]
+[Generating embeddings: batch 4/4]
+[Ranked chunks by relevance]
+[Found top 3 most relevant chunks overall. Generating answer...]
+
+Assistant: Quantum entanglement is a fundamental phenomenon in quantum
+mechanics where...
+```
+
+### Source Citations
+
+Sources are formatted as markdown links in the answer:
+```
+[Source 1: Article Title](https://example.com/article)
+```
+
+These appear inline where the AI references that source. Multiple sources can be cited in a single paragraph.
+
+### Error Output
+
+Errors are written to stderr and exit with non-zero status:
+
+```
+[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i'.
+```
+
+## Tips and Tricks
+
+### Speed Tips
+
+1. **Reduce concurrency limits** (edit `SearchTool.cs` if constantly rate-limited)
+2. **Reduce `--results`** - fewer articles to fetch and process
+3. **Reduce `--queries`** - fewer parallel searches
+4. **Use local SearxNG** - minimize network latency to search backend
+5. **Cache results** - future enhancement could add caching
+
+### Quality Tips
+
+1. **Increase `--chunks`** to 4-5 for complex topics
+2. **Increase `--queries`** to 5-8 for broad exploration
+3. **Use `--long`** for deep topics that need elaboration
+4. **Check `-v` output** to see which sources were selected
+5. **Try different models** - some are better at synthesis, others at facts
+
+### Cost Tips
+
+1. **Use `qwen/qwen3.5-flash-02-23`** - cheapest good model
+2. **Reduce `--chunks`** and `--results`** - fewer tokens in context
+3. **Use `--short`** - shorter answers use fewer completion tokens
+4. **Monitor usage** at [openrouter.ai](https://openrouter.ai) dashboard
+
+### Workflow Tips
+
+**Iterative deepening**:
+```bash
+# Start broad
+openquery -v "machine learning"
+
+# Identify subtopics from answer, then dive deeper
+openquery "What is transformer architecture in LLMs?"
+```
+
+**Compare answers**:
+```bash
+# Same question with different models
+OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23" openquery "question"
+OPENROUTER_MODEL="google/gemini-3-flash-preview" openquery "question"
+```
+
+**Save answers**:
+```bash
+openquery "What is Docker?" > answer.md
+# answer.md will contain the streamed output (including spinner chars, so filter):
+openquery "What is Docker?" 2>/dev/null | sed 's/.\x08//g' > clean-answer.md
+```
+
+### Shell Aliases and Functions
+
+Add to `~/.bashrc` or `~/.zshrc`:
+
+```bash
+# Short alias
+alias oq='openquery'
+
+# With common options
+alias oql='openquery -l -q 5 -r 10'  # long, thorough
+alias oqs='openquery -s'              # short
+alias oqv='openquery -v'              # verbose
+
+# Function to save output cleanly
+oqsave() {
+    openquery "$@" 2>/dev/null | sed 's/.\x08//g' > "answer-$(date +%Y%m%d-%H%M%S).md"
+}
+```
+
+### Scripting
+
+```bash
+#!/bin/bash
+# batch-questions.sh
+
+while IFS= read -r question; do
+    echo "## $question" >> research.md
+    echo "" >> research.md
+    openquery -l "$question" 2>/dev/null | sed 's/.\x08//g' >> research.md
+    echo "" >> research.md
+done < questions.txt
+```
+
+### Chaining with Other Tools
+
+Pipe to `jq` (if you modify to output JSON):
+```bash
+# Future: openquery --json "question" | jq '.answer'
+```
+
+Pipe to `pbcopy` (macOS) or `xclip` (Linux):
+```bash
+openquery "quick fact" 2>/dev/null | sed 's/.\x08//g' | pbcopy
+```
+
+Filter sources:
+```bash
+openquery "topic" 2>/dev/null | sed 's/.\x08//g' | grep -E '^\[Source'
+```
+
+## Keyboard Interrupts
+
+- **Ctrl+C** during processing: Cancels current operation, exits gracefully
+- **Ctrl+C** during streaming answer: Stops streaming, shows partial answer
+- **Ctrl+Z** (suspend): Not recommended; may leave background tasks running
+
+OpenQuery uses proper cancellation tokens to clean up resources on interrupt.
+
+## Exit Codes
+
+| Code | Meaning |
+|------|---------|
+| 0 | Success - answer was generated |
+| 1 | Error - see stderr message |
+| 2 | Configuration error (missing API key) |
+
+You can check the exit code in shell scripts:
+```bash
+openquery "question"
+if [ $? -eq 0 ]; then
+    echo "Success!"
+else
+    echo "Failed"
+fi
+```
+
+## Limitation and Workarounds
+
+### Question Length
+Very long questions (>2000 chars) may be truncated by the LLM's context window or cause token limits.
+
+**Workaround**: Keep questions concise; discuss complex multi-part questions separately.
+
+### Answer Length Limits
+The LLM may hit `max_tokens` limits for very complex questions.
+
+**Workaround**: Use `--long` flag (already maximizes allowed tokens) or break into sub-questions.
+
+### Rate Limiting
+OpenRouter may rate limit if you send too many requests too quickly.
+
+**Symptoms**: 429 errors, occasional timeouts.
+
+**Workaround**: The built-in retry (Polly) handles this automatically. For persistent issues:
+- Reduce concurrency (edit code)
+- Add delays between queries
+- Upgrade OpenRouter plan
+
+### SearxNG Timeouts
+Large SearxNG responses or slow targets may timeout.
+
+**Workaround**: Reduce `--results` or check SearxNG logs. Nothing to do on OpenQuery side (HTTP client timeout is ~30s default).
+
+---
+
+## Next Steps
+
+- [Architecture](architecture.md) - Understand how OpenQuery works under the hood
+- [Configuration](configuration.md) - Fine-tune your setup
+- [Troubleshooting](troubleshooting.md) - Solve common problems
+
+---
+
+**Quick Reference Card**
+
+```bash
+# Basic
+openquery "question"
+
+# Quick fact
+openquery -s "question"
+
+# Deep research
+openquery -l -q 5 -r 10 -c 4 "question"
+
+# See progress
+openquery -v "question"
+
+# Configure
+openquery configure -i
+
+# Check config
+cat ~/.config/openquery/config
+```