docs: add comprehensive documentation with README and detailed guides
- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
This commit is contained in:
196
README.md
Normal file
196
README.md
Normal file
@@ -0,0 +1,196 @@
|
||||
# OpenQuery
|
||||
|
||||
**AI-powered search and answer system** that finds accurate, well-sourced answers to your questions by searching the web, extracting relevant content, and synthesizing intelligent responses.
|
||||
|
||||

|
||||
[AOT](https://img.shields.io/badge/AOT-Compiled-green)
|
||||
[License](https://img.shields.io/badge/license-MIT-green)
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- 🤖 **Smart Query Generation** - Automatically creates multiple diverse search queries from your question
|
||||
- ⚡ **Parallel Processing** - Fast concurrent searches, article fetching, and embedding generation
|
||||
- 🎯 **Semantic Search** - Uses vector embeddings to find the most relevant information
|
||||
- 📚 **Clean Article Extraction** - Intelligently extracts article content using SmartReader
|
||||
- 🔄 **Streaming Responses** - Watch the AI answer generate in real-time
|
||||
- ⚙️ **Fully Configurable** - Control queries, results, and context chunks
|
||||
- 🛡️ **Production Ready** - Built with rate limiting, retries, and error handling
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### 1. Prerequisites
|
||||
|
||||
- A **SearxNG** instance (Docker recommended):
|
||||
```bash
|
||||
docker run -d --name searxng -p 8002:8080 searxng/searxng:latest
|
||||
```
|
||||
|
||||
- A **OpenRouter API key** from [openrouter.ai](https://openrouter.ai)
|
||||
|
||||
### 2. Installation
|
||||
|
||||
```bash
|
||||
# Clone and build
|
||||
git clone <your-repo-url>
|
||||
cd OpenQuery
|
||||
chmod +x install.sh
|
||||
./install.sh
|
||||
|
||||
# Or build manually
|
||||
dotnet publish -c Release -r linux-x64 --self-contained true /p:PublishAot=true
|
||||
```
|
||||
|
||||
### 3. Configuration
|
||||
|
||||
```bash
|
||||
# Interactive setup
|
||||
openquery configure -i
|
||||
|
||||
# Or set environment variables
|
||||
export OPENROUTER_API_KEY="sk-or-..."
|
||||
export SEARXNG_URL="http://localhost:8002" # default
|
||||
```
|
||||
|
||||
### 4. Ask a Question
|
||||
|
||||
```bash
|
||||
openquery "What is quantum entanglement and how does it work?"
|
||||
```
|
||||
|
||||
That's it! The system will:
|
||||
1. Generate 3 search queries (configurable)
|
||||
2. Search the web via SearxNG
|
||||
3. Extract and chunk relevant articles
|
||||
4. Rank content by semantic relevance
|
||||
5. Stream a comprehensive answer with citations
|
||||
|
||||
## 📖 Usage Examples
|
||||
|
||||
```bash
|
||||
# Concise answer
|
||||
openquery -s "Who won the 2024 US presidential election?"
|
||||
|
||||
# Detailed research
|
||||
openquery -l -q 5 -r 10 "Explain quantum computing and its applications"
|
||||
|
||||
# See everything
|
||||
openquery -v "What are the health benefits of meditation?"
|
||||
|
||||
# Customize
|
||||
openquery -c 5 -r 8 "Current state of SpaceX Starship development"
|
||||
```
|
||||
|
||||
## 🔧 Options
|
||||
|
||||
```
|
||||
-c, --chunks N Number of top context chunks (default: 3)
|
||||
-r, --results N Search results per query (default: 5)
|
||||
-q, --queries N Number of search queries to generate (default: 3)
|
||||
-s, --short Give a concise answer
|
||||
-l, --long Give a detailed answer
|
||||
-v, --verbose Show detailed progress
|
||||
```
|
||||
|
||||
## 🌐 Supported Models
|
||||
|
||||
OpenQuery works with any OpenRouter model. Popular choices:
|
||||
|
||||
- `qwen/qwen3.5-flash-02-23` (default, fast & affordable)
|
||||
- `google/gemini-3-flash-preview`
|
||||
- `deepseek/deepseek-v3.2`
|
||||
- `moonshotai/kimi-k2.5`
|
||||
|
||||
Configure your preferred model:
|
||||
```bash
|
||||
openquery configure --model "google/gemini-3-flash-preview"
|
||||
```
|
||||
|
||||
## 📁 Project Structure
|
||||
|
||||
```
|
||||
OpenQuery/
|
||||
├── README.md # This file
|
||||
├── docs/ # Detailed documentation
|
||||
│ ├── installation.md
|
||||
│ ├── configuration.md
|
||||
│ ├── usage.md
|
||||
│ ├── architecture.md
|
||||
│ ├── components/
|
||||
│ └── troubleshooting.md
|
||||
├── Program.cs # CLI entry point
|
||||
├── OpenQuery.cs # Main application logic
|
||||
├── Services/ # Business logic services
|
||||
├── Models/ # Data models
|
||||
├── Tools/ # Search orchestration
|
||||
└── ConfigManager.cs # Configuration management
|
||||
```
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
OpenQuery uses a multi-stage pipeline:
|
||||
|
||||
```
|
||||
Query → Multiple Searches → Article Fetching → Embeddings → Ranking → AI Answer
|
||||
```
|
||||
|
||||
1. **Query Expansion**: LLM generates diverse search queries
|
||||
2. **Parallel Search**: SearxNG executes all queries simultaneously
|
||||
3. **Content Extraction**: SmartReader pulls clean article text
|
||||
4. **Embedding Generation**: Vectorize query and chunks
|
||||
5. **Semantic Ranking**: Cosine similarity scoring
|
||||
6. **Answer Synthesis**: Final LLM response with sources
|
||||
|
||||
## 🔍 How It Works
|
||||
|
||||
1. **You ask a question** → OpenQuery generates 3 optimized search queries
|
||||
2. **Searches the web** → All queries run in parallel via SearxNG
|
||||
3. **Fetches articles** → Extracts clean content from top results
|
||||
4. **Splits into chunks** → ~500 character pieces for embedding
|
||||
5. **Ranks by relevance** → Semantic similarity to your question
|
||||
6. **Synthesizes answer** → LLM reviews top 3 chunks and responds with citations
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
**Common issues** and solutions:
|
||||
|
||||
| Problem | Solution |
|
||||
|---------|----------|
|
||||
| "API Key is missing" | Run `openquery configure -i` or set `OPENROUTER_API_KEY` |
|
||||
| No search results | Check your SearxNG instance is running (`curl http://localhost:8002`) |
|
||||
| Slow performance | Reduce `--results` or `--queries` count |
|
||||
| Articles failing to fetch | Some sites block scrapers; try different queries |
|
||||
|
||||
See [docs/troubleshooting.md](docs/troubleshooting.md) for detailed help.
|
||||
|
||||
## 📚 Documentation
|
||||
|
||||
- **[Installation Guide](docs/installation.md)** - Build and setup instructions
|
||||
- **[Configuration](docs/configuration.md)** - All config options and environment variables
|
||||
- **[Usage Guide](docs/usage.md)** - Complete CLI reference and examples
|
||||
- **[Architecture](docs/architecture.md)** - System design and patterns
|
||||
- **[Components](docs/components/)** - Deep dive into each module
|
||||
- **[Troubleshooting](docs/troubleshooting.md)** - Solve common problems
|
||||
- **[API Reference](docs/api-reference.md)** - Programmatic interfaces
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
Contributions welcome! Please:
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Submit a pull request
|
||||
|
||||
## 📄 License
|
||||
|
||||
MIT License - see LICENSE file for details.
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
- [OpenRouter](https://openrouter.ai) - LLM API aggregation
|
||||
- [SearxNG](https://searx.space) - Privacy-respecting metasearch
|
||||
- [SmartReader](https://github.com/kfasten/SmartReader) - Article extraction
|
||||
- Built with [.NET](https://dotnet.microsoft.com)
|
||||
|
||||
---
|
||||
|
||||
**Need more details?** Check the comprehensive documentation in the [docs/](docs/) folder.
|
||||
309
docs/api/cli.md
Normal file
309
docs/api/cli.md
Normal file
@@ -0,0 +1,309 @@
|
||||
# CLI Reference
|
||||
|
||||
Complete command-line interface reference for OpenQuery.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Command Structure](#command-structure)
|
||||
2. [Main Command: `openquery`](#main-command-openquery)
|
||||
3. [Configure Command: `openquery configure`](#configure-command-openquery-configure)
|
||||
4. [Exit Codes](#exit-codes)
|
||||
5. [Examples by Use Case](#examples-by-use-case)
|
||||
6. [Shell Integration](#shell-integration)
|
||||
|
||||
## Command Structure
|
||||
|
||||
OpenQuery uses [System.CommandLine](https://learn.microsoft.com/dotnet/standard/commandline/) for CLI parsing.
|
||||
|
||||
### Syntax
|
||||
```bash
|
||||
openquery [GLOBAL-OPTIONS] <COMMAND> [COMMAND-OPTIONS] [ARGUMENTS]
|
||||
```
|
||||
|
||||
If no command specified, `openquery` (main command) is assumed.
|
||||
|
||||
### Help
|
||||
```bash
|
||||
openquery --help
|
||||
openquery configure --help
|
||||
```
|
||||
|
||||
Shows usage, options, examples.
|
||||
|
||||
### Version
|
||||
```bash
|
||||
openquery --version # if implemented
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Main Command: `openquery`
|
||||
|
||||
Ask a question and get an AI-powered answer.
|
||||
|
||||
### Synopsis
|
||||
```bash
|
||||
openquery [OPTIONS] <question>
|
||||
```
|
||||
|
||||
### Arguments
|
||||
|
||||
| Name | Arity | Type | Description |
|
||||
|------|-------|------|-------------|
|
||||
| `question` | ZeroOrMore | `string[]` | The question to ask (positional, concatenated with spaces) |
|
||||
|
||||
**Notes**:
|
||||
- `ZeroOrMore` means you can omit the question (shows help)
|
||||
- Multiple words are combined: `openquery what is quantum` → `"what is quantum"`
|
||||
- Use quotes for questions with special characters: `openquery "what's the weather?"`
|
||||
|
||||
### Options
|
||||
|
||||
| Option | Aliases | Type | Default | Description |
|
||||
|--------|---------|------|---------|-------------|
|
||||
| `--chunks` | `-c` | `int` | `DefaultChunks` (config) | Number of top context chunks to pass to LLM |
|
||||
| `--results` | `-r` | `int` | `DefaultResults` (config) | Number of search results per query |
|
||||
| `--queries` | `-q` | `int` | `DefaultQueries` (config) | Number of search queries to generate |
|
||||
| `--short` | `-s` | `bool` | `false` | Request a concise answer |
|
||||
| `--long` | `-l` | `bool` | `false` | Request a detailed answer |
|
||||
| `--verbose` | `-v` | `bool` | `false` | Show detailed progress information |
|
||||
|
||||
**Option Notes**:
|
||||
- `--short` and `--long` are flags; if both specified, `--long` takes precedence
|
||||
- Integer options validate as positive numbers (parsed by System.CommandLine)
|
||||
- Defaults come from config file or hardcoded (3, 5, 3 respectively)
|
||||
|
||||
### Behavior
|
||||
|
||||
1. Loads API key (env `OPENROUTER_API_KEY` or config file)
|
||||
2. Loads model (env `OPENROUTER_MODEL` or config)
|
||||
3. Executes workflow:
|
||||
- Generate queries (if `--queries > 1`)
|
||||
- Run search pipeline
|
||||
- Stream final answer
|
||||
4. Exits with code 0 on success, 1 on error
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# Basic
|
||||
openquery "What is the capital of France?"
|
||||
|
||||
# With options
|
||||
openquery -q 5 -r 10 -c 4 "Explain quantum computing"
|
||||
|
||||
# Short answer
|
||||
openquery -s "Who won the 2024 election?"
|
||||
|
||||
# Verbose mode
|
||||
openquery -v "How does photosynthesis work?"
|
||||
|
||||
# Combined
|
||||
openquery -l -v -q 8 "History of the internet"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configure Command: `openquery configure`
|
||||
|
||||
Configure OpenQuery settings (API key, model, defaults).
|
||||
|
||||
### Synopsis
|
||||
```bash
|
||||
openquery configure [OPTIONS]
|
||||
```
|
||||
|
||||
### Options
|
||||
|
||||
| Option | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `--interactive` / `-i` | `bool` | Launch interactive configuration wizard |
|
||||
| `--key` | `string` | Set OpenRouter API key |
|
||||
| `--model` | `string` | Set default LLM model |
|
||||
| `--queries` | `int?` | Set default number of queries |
|
||||
| `--chunks` | `int?` | Set default number of chunks |
|
||||
| `--results` | `int?` | Set default number of results |
|
||||
|
||||
**Note**: Nullable options (`int?`) only update if provided.
|
||||
|
||||
### Behavior
|
||||
|
||||
- **Interactive mode** (`-i`): Prompts for each setting with current defaults shown in brackets
|
||||
- **Non-interactive**: Only updates provided options, leaves others untouched
|
||||
- Writes to `~/.config/openquery/config` (creates directory if missing)
|
||||
- Overwrites entire file (not incremental)
|
||||
|
||||
### Interactive Mode Details
|
||||
|
||||
Models presented with numbered menu:
|
||||
|
||||
```
|
||||
Available models:
|
||||
1. qwen/qwen3.5-flash-02-23
|
||||
2. qwen/qwen3.5-122b-a10b
|
||||
3. minimax/minimax-m2.5
|
||||
4. google/gemini-3-flash-preview
|
||||
5. deepseek/deepseek-v3.2
|
||||
6. moonshotai/kuki-k2.5
|
||||
Model [qwen/qwen3.5-flash-02-23]:
|
||||
```
|
||||
|
||||
- Enter number (1-6) to select preset
|
||||
- Or enter custom model string (any OpenRouter model)
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# Interactive wizard
|
||||
openquery configure -i
|
||||
|
||||
# Set just API key
|
||||
openquery configure --key "sk-or-xxxxxxxxxxxx"
|
||||
|
||||
# Set multiple defaults
|
||||
openquery configure --model "google/gemini-3-flash-preview" --queries 5 --chunks 4
|
||||
|
||||
# Update model only
|
||||
openquery configure --model "deepseek/deepseek-v3.2"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Exit Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| `0` | Success - answer generated and streamed |
|
||||
| `1` | Error - API key missing, network failure, or exception |
|
||||
|
||||
**Usage in scripts**:
|
||||
```bash
|
||||
openquery "question"
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "Success"
|
||||
else
|
||||
echo "Failed" >&2
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Examples by Use Case
|
||||
|
||||
### Quick Facts
|
||||
```bash
|
||||
openquery -s "capital of France"
|
||||
```
|
||||
Fast, concise, minimal tokens.
|
||||
|
||||
### Research Paper
|
||||
```bash
|
||||
openquery -l -q 5 -r 10 -c 4 "quantum entanglement experiments"
|
||||
```
|
||||
Multiple angles, deep sources, detailed synthesis.
|
||||
|
||||
### News & Current Events
|
||||
```bash
|
||||
openquery -v "latest news about OpenAI"
|
||||
```
|
||||
See everything: queries, results, which sources fetched.
|
||||
|
||||
### Troubleshooting
|
||||
```bash
|
||||
# Reduce scope if errors
|
||||
openquery -q 1 -r 2 "test question"
|
||||
```
|
||||
|
||||
### Save Answer to File
|
||||
```bash
|
||||
openquery "question" 2>/dev/null | sed 's/.\x08//g' > answer.md
|
||||
```
|
||||
|
||||
(Removes spinner characters)
|
||||
|
||||
### Batch Processing
|
||||
```bash
|
||||
for q in $(cat questions.txt); do
|
||||
echo "## $q" >> all-answers.md
|
||||
openquery -s "$q" 2>/dev/null | sed 's/.\x08//g' >> all-answers.md
|
||||
echo "" >> all-answers.md
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Shell Integration
|
||||
|
||||
### Aliases (add to ~/.bashrc or ~/.zshrc)
|
||||
|
||||
```bash
|
||||
# Short alias
|
||||
alias oq='openquery'
|
||||
|
||||
# Presets
|
||||
alias oqs='openquery -s' # short
|
||||
alias oql='openquery -l' # long
|
||||
alias oqv='openquery -v' # verbose
|
||||
alias oqr='openquery -q 5 -r 10 -c 4' # research mode
|
||||
|
||||
# Config shortcuts
|
||||
alias oqcfg='openquery configure -i'
|
||||
```
|
||||
|
||||
### Functions
|
||||
|
||||
```bash
|
||||
# Save answer cleanly (removes spinner chars)
|
||||
oqsave() {
|
||||
local query="$*"
|
||||
local filename="answer-$(date +%Y%m%d-%H%M%S).md"
|
||||
openquery "$query" 2>/dev/null | sed 's/.\x08//g' > "$filename"
|
||||
echo "Saved to $filename"
|
||||
}
|
||||
|
||||
# Search and grep results
|
||||
oqgrep() {
|
||||
openquery "$1" 2>/dev/null | sed 's/.\x08//g' | grep -i "$2"
|
||||
}
|
||||
```
|
||||
|
||||
### Environment Setup Script
|
||||
|
||||
```bash
|
||||
# ~/.local/bin/openquery-env.sh
|
||||
export OPENROUTER_API_KEY="sk-or-..."
|
||||
export OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23"
|
||||
export SEARXNG_URL="http://localhost:8002"
|
||||
```
|
||||
|
||||
Source it: `source ~/.local/bin/openquery-env.sh`
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Configuration](configuration.md)** - Set up your environment
|
||||
- **[Usage](usage.md)** - Learn usage patterns and tips
|
||||
- **[Troubleshooting](troubleshooting.md)** - Fix common problems
|
||||
|
||||
---
|
||||
|
||||
**Quick Reference Card**
|
||||
|
||||
```
|
||||
# Ask
|
||||
openquery "question"
|
||||
openquery -s "quick fact"
|
||||
openquery -l -q 5 "deep research"
|
||||
|
||||
# Configure
|
||||
openquery configure -i
|
||||
openquery configure --key "..."
|
||||
openquery configure --model "..."
|
||||
|
||||
# Debug
|
||||
openquery -v "question"
|
||||
|
||||
# Help
|
||||
openquery --help
|
||||
```
|
||||
235
docs/api/environment-variables.md
Normal file
235
docs/api/environment-variables.md
Normal file
@@ -0,0 +1,235 @@
|
||||
# Environment Variables
|
||||
|
||||
Reference for all environment variables used by OpenQuery.
|
||||
|
||||
## 📋 Summary
|
||||
|
||||
| Variable | Purpose | Required | Default | Example |
|
||||
|----------|---------|----------|---------|---------|
|
||||
| `OPENROUTER_API_KEY` | OpenRouter authentication | **Yes** | (none) | `sk-or-...` |
|
||||
| `OPENROUTER_MODEL` | Override default LLM model | No | `qwen/qwen3.5-flash-02-23` | `google/gemini-3-flash-preview` |
|
||||
| `SEARXNG_URL` | SearxNG instance URL | No | `http://localhost:8002` | `https://searx.example.com` |
|
||||
|
||||
## Detailed Reference
|
||||
|
||||
### `OPENROUTER_API_KEY`
|
||||
|
||||
**Purpose**: Your OpenRouter API authentication token.
|
||||
|
||||
**Required**: Yes, unless you have `ApiKey` set in config file.
|
||||
|
||||
**How to Obtain**:
|
||||
1. Sign up at https://openrouter.ai
|
||||
2. Go to Dashboard → API Keys
|
||||
3. Copy your key (starts with `sk-or-`)
|
||||
|
||||
**Priority**: Overrides config file `ApiKey`.
|
||||
|
||||
**Setting**:
|
||||
|
||||
```bash
|
||||
# Bash/Zsh
|
||||
export OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
|
||||
|
||||
# Fish
|
||||
set -x OPENROUTER_API_KEY "sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
|
||||
|
||||
# PowerShell
|
||||
$env:OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
|
||||
|
||||
# Windows CMD
|
||||
set OPENROUTER_API_KEY=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
**Security**:
|
||||
- Never commit API key to version control
|
||||
- Don't share key publicly
|
||||
- Use environment variables or config file with restrictive permissions (600)
|
||||
- Rotate key if accidentally exposed
|
||||
|
||||
**Validation**: OpenQuery checks if key is empty string and exits with error if missing:
|
||||
|
||||
```
|
||||
[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i' to set it up.
|
||||
```
|
||||
|
||||
### `OPENROUTER_MODEL`
|
||||
|
||||
**Purpose**: Override the default LLM model used for both query generation and final answer.
|
||||
|
||||
**Required**: No.
|
||||
|
||||
**Default**: `qwen/qwen3.5-flash-02-23`
|
||||
|
||||
**Available Models** (from OpenRouter):
|
||||
|
||||
| Model | Provider | Context | Cost (Input/Output per 1M tokens) |
|
||||
|-------|----------|---------|-----------------------------------|
|
||||
| `qwen/qwen3.5-flash-02-23` | Alibaba | 200K | \$0.10 / \$0.20 |
|
||||
| `qwen/qwen3.5-122b-a10b` | Alibaba | 200K | ~\$0.20 / ~\$0.40 |
|
||||
| `minimax/minimax-m2.5` | MiniMax | 200K | ~\$0.20 / ~\$0.40 |
|
||||
| `google/gemini-3-flash-preview` | Google | 1M | ~\$0.10 / ~\$0.40 |
|
||||
| `deepseek/deepseek-v3.2` | DeepSeek | 200K | ~\$0.10 / ~\$0.30 |
|
||||
| `moonshotai/kimi-k2.5` | Moonshot AI | 200K | ~\$0.10 / ~\$0.30 |
|
||||
|
||||
(See OpenRouter for current pricing.)
|
||||
|
||||
**Setting**:
|
||||
|
||||
```bash
|
||||
export OPENROUTER_MODEL="google/gemini-3-flash-preview"
|
||||
```
|
||||
|
||||
**Interactive Config Models**: The `configure -i` wizard shows only these 6 models for convenience, but you can set any OpenRouter model via environment variable or non-interactive configure.
|
||||
|
||||
**Note**: Different models have different:
|
||||
- Speed (Flash models faster)
|
||||
- Cost (check pricing)
|
||||
- Quality (may vary by task)
|
||||
- Context window size (Gemini 3 Flash has 1M tokens, others ~200K)
|
||||
|
||||
### `SEARXNG_URL`
|
||||
|
||||
**Purpose**: URL of the SearxNG metasearch instance.
|
||||
|
||||
**Required**: No.
|
||||
|
||||
**Default**: `http://localhost:8002`
|
||||
|
||||
**Format**: Must include protocol (`http://` or `https://`) and host:port.
|
||||
|
||||
**Setting**:
|
||||
|
||||
```bash
|
||||
# Local Docker instance
|
||||
export SEARXNG_URL="http://localhost:8002"
|
||||
|
||||
# Remote instance with HTTPS
|
||||
export SEARXNG_URL="https://searx.example.com"
|
||||
|
||||
# Custom port
|
||||
export SEARXNG_URL="http://localhost:8080"
|
||||
```
|
||||
|
||||
**Finding a Public Instance**:
|
||||
- Visit https://searx.space for list of public instances
|
||||
- Choose one with HTTPS and low latency
|
||||
- Note: Public instances may have rate limits or require attribution
|
||||
|
||||
**Priority**: Overrides any default, but not config file (no config setting for SearxNG URL - only env var). Could be added to config in future.
|
||||
|
||||
**Test Your Instance**:
|
||||
```bash
|
||||
curl "$SEARXNG_URL/search?q=test&format=json" | head
|
||||
```
|
||||
|
||||
Expected: JSON with `"results": [...]`.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Priority Recap
|
||||
|
||||
When OpenQuery needs a value:
|
||||
|
||||
1. **Command-line option** (`--model`, `--key` from configure) - highest
|
||||
2. **Environment variable** (`OPENROUTER_MODEL`, `OPENROUTER_API_KEY`, `SEARXNG_URL`)
|
||||
3. **Configuration file** (`~/.config/openquery/config`: `Model`, `ApiKey`)
|
||||
4. **Hard-coded default** (only for model)
|
||||
|
||||
**Example**:
|
||||
```bash
|
||||
# Config file: Model=qwen/qwen3.5-flash-02-23
|
||||
export OPENROUTER_MODEL="deepseek/deepseek-v3.2"
|
||||
openquery --model "google/gemini-3-flash-preview" "question"
|
||||
# Uses: model=google (CLI override), overrides env and config
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting Environment Variables
|
||||
|
||||
### Variable Not Taking Effect
|
||||
|
||||
**Symptom**: `openquery` still uses old value after export.
|
||||
|
||||
**Causes**:
|
||||
- Exported in different shell session
|
||||
- Exported after running `openquery`
|
||||
- Shell profile not reloaded
|
||||
|
||||
**Check**:
|
||||
```bash
|
||||
echo $OPENROUTER_API_KEY
|
||||
# Should print the key (or blank if unset)
|
||||
```
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Export in current session
|
||||
export OPENROUTER_API_KEY="sk-or-..."
|
||||
|
||||
# Or add to ~/.bashrc / ~/.zshrc and restart terminal
|
||||
```
|
||||
|
||||
### Special Characters in Values
|
||||
|
||||
If your API key contains special characters (`$`, `!`, etc.), quote properly:
|
||||
|
||||
```bash
|
||||
export OPENROUTER_API_KEY='sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
|
||||
# Single quotes prevent shell expansion
|
||||
```
|
||||
|
||||
### Variable Name Typos
|
||||
|
||||
`OPENROUTER_API_KEY` is all caps with underscores. `openrouter_api_key` (lowercase) won't work.
|
||||
|
||||
**Check spelling**:
|
||||
```bash
|
||||
env | grep -i openrouter
|
||||
```
|
||||
|
||||
### Windows Environment Variables
|
||||
|
||||
On Windows, environment variables are set per-session or user-level:
|
||||
|
||||
**PowerShell** (current session):
|
||||
```powershell
|
||||
$env:OPENROUTER_API_KEY="sk-or-..."
|
||||
```
|
||||
|
||||
**Persistent** (PowerShell):
|
||||
```powershell
|
||||
[Environment]::SetEnvironmentVariable("OPENROUTER_API_KEY", "sk-or-...", "User")
|
||||
```
|
||||
|
||||
**CMD**:
|
||||
```cmd
|
||||
set OPENROUTER_API_KEY=sk-or-...
|
||||
```
|
||||
|
||||
**System Properties** → Advanced → Environment Variables (GUI)
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Configuration File](../configuration.md)** - Persistent configuration
|
||||
- **[Usage Guide](../usage.md)** - How to use these variables
|
||||
- **[Troubleshooting](../troubleshooting.md)** - Diagnose environment issues
|
||||
|
||||
---
|
||||
|
||||
**Quick Reference**
|
||||
|
||||
```bash
|
||||
# Required
|
||||
export OPENROUTER_API_KEY="sk-or-..."
|
||||
|
||||
# Optional (override defaults)
|
||||
export OPENROUTER_MODEL="google/gemini-3-flash-preview"
|
||||
export SEARXNG_URL="https://searx.example.com"
|
||||
|
||||
# Run
|
||||
openquery "your question"
|
||||
```
|
||||
508
docs/api/programmatic.md
Normal file
508
docs/api/programmatic.md
Normal file
@@ -0,0 +1,508 @@
|
||||
# Programmatic API Reference
|
||||
|
||||
How to use OpenQuery components programmatically in your own C# code.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Using OpenQueryApp Programmatically](#using-openqueryapp-programmatically)
|
||||
3. [Using Individual Services](#using-individual-services)
|
||||
4. [Custom Implementations](#custom-implementations)
|
||||
5. [Thread Safety](#thread-safety)
|
||||
6. [Error Handling](#error-handling)
|
||||
|
||||
## Overview
|
||||
|
||||
OpenQuery is designed as a library of composable services, not just a CLI tool. You can reference the project (or extract the core classes) and use them in your own applications.
|
||||
|
||||
### Core Interfaces
|
||||
|
||||
Currently, OpenQuery uses concrete classes rather than interfaces. To use programmatically:
|
||||
|
||||
1. Reference the `OpenQuery` project/dll
|
||||
2. Add `using OpenQuery.Services;` and `using OpenQuery.Tools;`
|
||||
3. Instantiate dependencies
|
||||
4. Call methods
|
||||
|
||||
### Dependency Chain
|
||||
|
||||
```
|
||||
Your Code
|
||||
├── OpenRouterClient (LLM API)
|
||||
├── SearxngClient (Search API)
|
||||
├── EmbeddingService (requires OpenRouterClient)
|
||||
└── SearchTool (requires SearxngClient + EmbeddingService)
|
||||
└── (internally uses ArticleService, ChunkingService, RateLimiter)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Using OpenQueryApp Programmatically
|
||||
|
||||
### Minimal Example
|
||||
|
||||
```csharp
|
||||
using OpenQuery;
|
||||
using OpenQuery.Services;
|
||||
using OpenQuery.Tools;
|
||||
using OpenQuery.Models;
|
||||
|
||||
// 1. Configure
|
||||
string apiKey = Environment.GetEnvironmentVariable("OPENROUTER_API_KEY")
|
||||
?? throw new InvalidOperationException("API key required");
|
||||
string searxngUrl = Environment.GetEnvironmentVariable("SEARXNG_URL")
|
||||
?? "http://localhost:8002";
|
||||
string model = Environment.GetEnvironmentVariable("OPENROUTER_MODEL")
|
||||
?? "qwen/qwen3.5-flash-02-23";
|
||||
|
||||
// 2. Instantiate services
|
||||
var openRouterClient = new OpenRouterClient(apiKey);
|
||||
var searxngClient = new SearxngClient(searxngUrl);
|
||||
var embeddingService = new EmbeddingService(openRouterClient);
|
||||
var searchTool = new SearchTool(searxngClient, embeddingService);
|
||||
var openQuery = new OpenQueryApp(openRouterClient, searchTool, model);
|
||||
|
||||
// 3. Execute
|
||||
var options = new OpenQueryOptions(
|
||||
Chunks: 3,
|
||||
Results: 5,
|
||||
Queries: 3,
|
||||
Short: false,
|
||||
Long: false,
|
||||
Verbose: false,
|
||||
Question: "What is quantum entanglement?"
|
||||
);
|
||||
|
||||
await openQuery.RunAsync(options);
|
||||
```
|
||||
|
||||
**Output**: Streams answer to `Console.Out` (hardcoded in `OpenQueryApp`). To capture output, modify `OpenQueryApp` or redirect console.
|
||||
|
||||
### Capturing Output
|
||||
|
||||
`OpenQueryApp.RunAsync` writes directly to `Console`. To capture:
|
||||
|
||||
**Option 1**: Redirect Console (hacky)
|
||||
```csharp
|
||||
var sw = new StringWriter();
|
||||
Console.SetOut(sw);
|
||||
await openQuery.RunAsync(options);
|
||||
string answer = sw.ToString();
|
||||
```
|
||||
|
||||
**Option 2**: Modify OpenQueryApp to accept TextWriter (not currently supported)
|
||||
|
||||
**Option 3**: Reimplement using OpenQuery components without `OpenQueryApp`
|
||||
|
||||
```csharp
|
||||
public async Task<string> GetAnswerAsync(string question, OpenQueryOptions options)
|
||||
{
|
||||
var sb = new StringBuilder();
|
||||
var reporter = new StatusReporter(options.Verbose);
|
||||
|
||||
// Replicate OpenQueryApp.RunAsync but collect output
|
||||
// ... (copy logic from OpenQuery.cs)
|
||||
|
||||
return sb.ToString();
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Using Individual Services
|
||||
|
||||
### OpenRouterClient
|
||||
|
||||
```csharp
|
||||
var client = new OpenRouterClient("your-api-key");
|
||||
|
||||
// Non-streaming chat completion
|
||||
var request = new ChatCompletionRequest(
|
||||
model: "qwen/qwen3.5-flash-02-23",
|
||||
messages: new List<Message>
|
||||
{
|
||||
new Message("system", "You are a helpful assistant."),
|
||||
new Message("user", "What is 2+2?")
|
||||
}
|
||||
);
|
||||
|
||||
var response = await client.CompleteAsync(request);
|
||||
Console.WriteLine(response.Choices[0].Message.Content);
|
||||
|
||||
// Streaming chat completion
|
||||
var streamRequest = request with { Stream = true };
|
||||
await foreach (var chunk in client.StreamAsync(streamRequest))
|
||||
{
|
||||
if (chunk.TextDelta != null)
|
||||
Console.Write(chunk.TextDelta);
|
||||
}
|
||||
|
||||
// Embeddings
|
||||
var embeddingRequest = new EmbeddingRequest(
|
||||
model: "openai/text-embedding-3-small",
|
||||
input: new List<string> { "text 1", "text 2" }
|
||||
);
|
||||
float[][] embeddings = await client.EmbedAsync(embeddingRequest.Model, embeddingRequest.Input);
|
||||
// embeddings[0] is vector for "text 1"
|
||||
```
|
||||
|
||||
### SearxngClient
|
||||
|
||||
```csharp
|
||||
var searxng = new SearxngClient("http://localhost:8002");
|
||||
|
||||
List<SearxngResult> results = await searxng.SearchAsync("quantum physics", limit: 5);
|
||||
|
||||
foreach (var result in results)
|
||||
{
|
||||
Console.WriteLine($"{result.Title}");
|
||||
Console.WriteLine($"{result.Url}");
|
||||
Console.WriteLine($"{result.Content}");
|
||||
Console.WriteLine();
|
||||
}
|
||||
```
|
||||
|
||||
### EmbeddingService
|
||||
|
||||
```csharp
|
||||
var client = new OpenRouterClient("your-api-key");
|
||||
var embeddingService = new EmbeddingService(client); // default model: openai/text-embedding-3-small
|
||||
|
||||
// Single embedding
|
||||
float[] embedding = await embeddingService.GetEmbeddingAsync("Hello world");
|
||||
|
||||
// Batch embeddings (with progress)
|
||||
List<string> texts = new() { "text 1", "text 2", "text 3" };
|
||||
float[][] embeddings = await embeddingService.GetEmbeddingsAsync(
|
||||
texts,
|
||||
onProgress: msg => Console.WriteLine(msg)
|
||||
);
|
||||
|
||||
// Cosine similarity
|
||||
float similarity = EmbeddingService.CosineSimilarity(embedding1, embedding2);
|
||||
```
|
||||
|
||||
### ArticleService
|
||||
|
||||
```csharp
|
||||
var article = await ArticleService.FetchArticleAsync("https://example.com/article");
|
||||
Console.WriteLine(article.Title);
|
||||
Console.WriteLine(article.TextContent);
|
||||
Console.WriteLine($"Readable: {article.IsReadable}");
|
||||
```
|
||||
|
||||
Note: `Article` type comes from SmartReader library (not OpenQuery-specific).
|
||||
|
||||
### ChunkingService
|
||||
|
||||
```csharp
|
||||
List<string> chunks = ChunkingService.ChunkText("Long article text...");
|
||||
|
||||
foreach (var chunk in chunks)
|
||||
{
|
||||
Console.WriteLine($"Chunk ({chunk.Length} chars): {chunk.Substring(0, 50)}...");
|
||||
}
|
||||
```
|
||||
|
||||
### SearchTool (Orchestration)
|
||||
|
||||
```csharp
|
||||
var searxngClient = new SearxngClient("http://localhost:8002");
|
||||
var embeddingService = new EmbeddingService(openRouterClient);
|
||||
var searchTool = new SearchTool(searxngClient, embeddingService);
|
||||
|
||||
string context = await searchTool.ExecuteAsync(
|
||||
originalQuery: "What is quantum entanglement?",
|
||||
generatedQueries: new List<string>
|
||||
{
|
||||
"quantum entanglement definition",
|
||||
"how quantum entanglement works"
|
||||
},
|
||||
maxResults: 5,
|
||||
topChunksLimit: 3,
|
||||
onProgress: msg => Console.WriteLine(msg),
|
||||
verbose: true
|
||||
);
|
||||
|
||||
Console.WriteLine("Context:");
|
||||
Console.WriteLine(context);
|
||||
```
|
||||
|
||||
Output is a formatted string:
|
||||
```
|
||||
[Source 1: Title](https://example.com/1)
|
||||
Content chunk...
|
||||
|
||||
[Source 2: Title](https://example.com/2)
|
||||
Content chunk...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Custom Implementations
|
||||
|
||||
### Custom Progress Reporter
|
||||
|
||||
`SearchTool.ExecuteAsync` accepts `Action<string>? onProgress`. Provide your own:
|
||||
|
||||
```csharp
|
||||
public class MyProgressReporter
|
||||
{
|
||||
public void Report(string message)
|
||||
{
|
||||
// Log to file
|
||||
File.AppendAllText("log.txt", $"{DateTime.UtcNow}: {message}\n");
|
||||
|
||||
// Update UI
|
||||
myLabel.Text = message;
|
||||
|
||||
// Send to telemetry
|
||||
Telemetry.TrackEvent("OpenQueryProgress", new { message });
|
||||
}
|
||||
}
|
||||
|
||||
// Usage
|
||||
var reporter = new MyProgressReporter();
|
||||
await searchTool.ExecuteAsync(..., reporter.Report, verbose: false);
|
||||
```
|
||||
|
||||
### Custom Chunking Strategy
|
||||
|
||||
Extend `ChunkingService` or implement your own:
|
||||
|
||||
```csharp
|
||||
public static class MyChunkingService
|
||||
{
|
||||
public static List<string> ChunkText(string text, int maxSize = 500, int overlap = 50)
|
||||
{
|
||||
// Overlapping chunks for better context retrieval
|
||||
var chunks = new List<string>();
|
||||
int start = 0;
|
||||
while (start < text.Length)
|
||||
{
|
||||
int end = Math.Min(start + maxSize, text.Length);
|
||||
var chunk = text.Substring(start, end - start);
|
||||
chunks.Add(chunk);
|
||||
start += maxSize - overlap; // Slide window
|
||||
}
|
||||
return chunks;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Custom Rate Limiter
|
||||
|
||||
Implement `IAsyncDisposable` with your own strategy (token bucket, leaky bucket):
|
||||
|
||||
```csharp
|
||||
public class TokenBucketRateLimiter : IAsyncDisposable
|
||||
{
|
||||
private readonly SemaphoreSlim _semaphore;
|
||||
private readonly TimeSpan _refillPeriod;
|
||||
private int _tokens;
|
||||
private readonly int _maxTokens;
|
||||
|
||||
// Implementation details...
|
||||
|
||||
public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken ct)
|
||||
{
|
||||
await WaitForTokenAsync(ct);
|
||||
try
|
||||
{
|
||||
return await action();
|
||||
}
|
||||
finally
|
||||
{
|
||||
// Return tokens or replenish bucket
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Thread Safety
|
||||
|
||||
**Thread-Safe Components**:
|
||||
- `RateLimiter` - `SemaphoreSlim` is thread-safe
|
||||
- `StatusReporter` - Channel is thread-safe
|
||||
- Static utility classes (`ChunkingService`) - no state
|
||||
|
||||
**Not Thread-Safe** (instances should not be shared across threads):
|
||||
- `OpenRouterClient` - wraps `HttpClient` (which is thread-safe but instance may have state)
|
||||
- `SearxngClient` - `HttpClient` (thread-safe but reuse recommendations apply)
|
||||
- `EmbeddingService` - has mutable fields (`_rateLimiter`, `_retryPipeline`)
|
||||
- `SearchTool` - has mutable `_options`
|
||||
|
||||
**Recommendation**: Create new instances per operation or use locks if sharing.
|
||||
|
||||
### Example: Parallel Queries
|
||||
|
||||
```csharp
|
||||
var tasks = questions.Select(async question =>
|
||||
{
|
||||
var options = new OpenQueryOptions(..., question: question);
|
||||
var query = new OpenQueryApp(client, searchTool, model);
|
||||
await query.RunAsync(options);
|
||||
// Separate instances per task
|
||||
});
|
||||
|
||||
await Task.WhenAll(tasks);
|
||||
```
|
||||
|
||||
**Better**: Create factory that spawns fresh instances.
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
All public async methods may throw:
|
||||
|
||||
- `HttpRequestException` - network errors, non-2xx responses
|
||||
- `TaskCanceledException` - timeout or cancellation
|
||||
- `JsonException` - malformed JSON
|
||||
- `Argument*Exception` - invalid arguments
|
||||
- `Exception` - any other error
|
||||
|
||||
### Pattern: Try-Catch
|
||||
|
||||
```csharp
|
||||
try
|
||||
{
|
||||
var response = await client.CompleteAsync(request);
|
||||
Console.WriteLine(response.Choices[0].Message.Content);
|
||||
}
|
||||
catch (HttpRequestException ex)
|
||||
{
|
||||
Console.Error.WriteLine($"Network error: {ex.Message}");
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Console.Error.WriteLine($"Unexpected error: {ex.Message}");
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern: Resilience with Polly
|
||||
|
||||
`EmbeddingService` already wraps `client.EmbedAsync` with Polly retry. For other calls, you can add your own:
|
||||
|
||||
```csharp
|
||||
var retryPolicy = Policy
|
||||
.Handle<HttpRequestException>()
|
||||
.WaitAndRetryAsync(3, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)));
|
||||
|
||||
await retryPolicy.ExecuteAsync(async () =>
|
||||
{
|
||||
var response = await client.CompleteAsync(request);
|
||||
// ...
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Streaming Responses to Network
|
||||
|
||||
```csharp
|
||||
var request = new ChatCompletionRequest(model, messages) { Stream = true };
|
||||
var response = await client.StreamAsync(request);
|
||||
|
||||
await foreach (var chunk in response)
|
||||
{
|
||||
if (chunk.TextDelta != null)
|
||||
{
|
||||
await networkStream.WriteAsync(Encoding.UTF8.GetBytes(chunk.TextDelta));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Parallel Embedding Batches with Progress
|
||||
|
||||
```csharp
|
||||
var texts = Enumerable.Range(0, 1000).Select(i => $"Text {i}").ToList();
|
||||
|
||||
await embeddingService.GetEmbeddingsAsync(texts,
|
||||
onProgress: progress =>
|
||||
{
|
||||
Console.WriteLine(progress); // "[Generating embeddings: batch 5/4]"
|
||||
});
|
||||
```
|
||||
|
||||
### Custom Embedding Service with Different Model
|
||||
|
||||
```csharp
|
||||
var client = new OpenRouterClient(apiKey);
|
||||
var customService = new EmbeddingService(client, "your-embedding-model");
|
||||
|
||||
float[] embedding = await customService.GetEmbeddingAsync("text");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Limitations
|
||||
|
||||
### No Interface-based Design
|
||||
|
||||
OpenQuery uses concrete classes. For mocking in tests, you'd need to create wrappers or use tools like JustMock/Moq that can mock non-virtual methods (not recommended). Better: define interfaces like `IOpenRouterClient` and have implementations.
|
||||
|
||||
### Hardcoded Concurrency Settings
|
||||
|
||||
`ParallelProcessingOptions` is instantiated in `SearchTool` with hardcoded defaults. To customize, you'd need to:
|
||||
|
||||
1. Subclass `SearchTool` and override access to `_options`
|
||||
2. Or modify source to accept `ParallelProcessingOptions` in constructor
|
||||
3. Or use reflection (hacky)
|
||||
|
||||
Suggested improvement: Add constructor parameter.
|
||||
|
||||
### Single Responsibility Blur
|
||||
|
||||
`OpenQueryApp` does query generation + pipeline + streaming. Could split:
|
||||
- `IQueryGenerator` (for expanding queries)
|
||||
- `IPipelineExecutor` (for search tool)
|
||||
- `IAnswerStreamer` (for final LLM streaming)
|
||||
|
||||
Currently, `OpenQueryApp` is the facade.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Components](../components/overview.md)** - Understand architecture
|
||||
- **[CLI Reference](../api/cli.md)** - CLI that uses these APIs
|
||||
- **[Source Code](../)** - Read implementation details
|
||||
|
||||
---
|
||||
|
||||
**Code Snippet: Full Programmatic Flow**
|
||||
|
||||
```csharp
|
||||
using OpenQuery.Services;
|
||||
using OpenQuery.Tools;
|
||||
using OpenQuery.Models;
|
||||
|
||||
async Task<string> Research(string question)
|
||||
{
|
||||
var apiKey = GetApiKey(); // your method
|
||||
var client = new OpenRouterClient(apiKey);
|
||||
var searxng = new SearxngClient("http://localhost:8002");
|
||||
var embeddings = new EmbeddingService(client);
|
||||
var search = new SearchTool(searxng, embeddings);
|
||||
var app = new OpenQueryApp(client, search, "qwen/qwen3.5-flash-02-23");
|
||||
|
||||
var options = new OpenQueryOptions(
|
||||
Chunks: 3,
|
||||
Results: 5,
|
||||
Queries: 3,
|
||||
Short: false,
|
||||
Long: false,
|
||||
Verbose: false,
|
||||
Question: question
|
||||
);
|
||||
|
||||
// Capture output by redirecting Console or modifying OpenQueryApp
|
||||
await app.RunAsync(options);
|
||||
return "streamed to console"; // would need custom capture
|
||||
}
|
||||
```
|
||||
682
docs/architecture.md
Normal file
682
docs/architecture.md
Normal file
@@ -0,0 +1,682 @@
|
||||
# Architecture
|
||||
|
||||
Deep dive into OpenQuery's system design, architectural patterns, and data flow.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [System Overview](#system-overview)
|
||||
2. [Architectural Patterns](#architectural-patterns)
|
||||
3. [Component Architecture](#component-architecture)
|
||||
4. [Data Flow](#data-flow)
|
||||
5. [Concurrency Model](#concurrency-model)
|
||||
6. [Error Handling & Resilience](#error-handling--resilience)
|
||||
7. [Performance Considerations](#performance-considerations)
|
||||
8. [Design Decisions](#design-decisions)
|
||||
|
||||
## System Overview
|
||||
|
||||
OpenQuery is a **pipeline-based AI application** that orchestrates multiple external services (OpenRouter, SearxNG) to answer user questions with web-sourced, semantically-ranked content.
|
||||
|
||||
### Core Design Principles
|
||||
|
||||
1. **Separation of Concerns** - Each component has a single, well-defined responsibility
|
||||
2. **Parallel First** - Wherever possible, operations are parallelized for speed
|
||||
3. **Resilient by Default** - Built-in retries, rate limiting, and graceful degradation
|
||||
4. **Configurable** - Most parameters can be adjusted without code changes
|
||||
5. **Observable** - Progress reporting and verbose mode for debugging
|
||||
|
||||
### High-Level Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ User Layer │
|
||||
│ CLI (System.CommandLine) → OpenQueryApp │
|
||||
└─────────────────────────────┬───────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Orchestration Layer │
|
||||
│ OpenQueryApp → SearchTool (4-phase pipeline) │
|
||||
└─────────────────────────────┬───────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────┼─────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
|
||||
│ Search Layer │ │ Processing Layer │ │ AI Layer │
|
||||
│ │ │ │ │ │
|
||||
│ SearxngClient│ │ ArticleService │ │OpenRouterClient│
|
||||
│ │ │ ChunkingService │ │ │
|
||||
│ │ │ EmbeddingService │ │ │
|
||||
└──────────────┘ └──────────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
## Architectural Patterns
|
||||
|
||||
### 1. Pipeline Pattern
|
||||
|
||||
The main workflow (SearchTool.ExecuteAsync) implements a multi-stage pipeline:
|
||||
|
||||
```csharp
|
||||
Phase 1: ExecuteParallelSearchesAsync
|
||||
↓ (List<SearxngResult>)
|
||||
Phase 2: ExecuteParallelArticleFetchingAsync
|
||||
↓ (List<Chunk>)
|
||||
Phase 3: ExecuteParallelEmbeddingsAsync
|
||||
↓ ((queryEmbedding, chunkEmbeddings))
|
||||
Phase 4: RankAndSelectTopChunks
|
||||
↓ (List<Chunk> topChunks)
|
||||
→ Formatted context string returned
|
||||
```
|
||||
|
||||
Each phase:
|
||||
- Accepts input from previous phase
|
||||
- Processes in parallel where applicable
|
||||
- Returns output to next phase
|
||||
- Reports progress via callbacks
|
||||
|
||||
### 2. Service Layer Pattern
|
||||
|
||||
Services (`Services/` directory) are stateless classes that encapsulate specific operations:
|
||||
|
||||
- **Clients**: `OpenRouterClient`, `SearxngClient` (HTTP communication)
|
||||
- **Processors**: `EmbeddingService`, `ChunkingService` (data transformation)
|
||||
- **Extractors**: `ArticleService` (content extraction)
|
||||
- **Infrastructure**: `RateLimiter`, `StatusReporter` (cross-cutting concerns)
|
||||
|
||||
All dependencies are explicit (constructor injection), making services easily testable.
|
||||
|
||||
### 3. Dependency Injection (Manual)
|
||||
|
||||
While not using a DI container, OpenQuery follows DI principles:
|
||||
|
||||
```csharp
|
||||
// Program.cs: instantiate dependencies with explicit parameters
|
||||
var client = new OpenRouterClient(apiKey);
|
||||
var searxngClient = new SearxngClient(searxngUrl);
|
||||
var embeddingService = new EmbeddingService(client);
|
||||
var searchTool = new SearchTool(searxngClient, embeddingService);
|
||||
var openQuery = new OpenQueryApp(client, searchTool, model);
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Clear dependency graph
|
||||
- Easy to substitute mocks for testing
|
||||
- No magic; visible construction
|
||||
|
||||
### 4. Observer Pattern (Progress Reporting)
|
||||
|
||||
`StatusReporter` and progress callbacks implement observer pattern:
|
||||
|
||||
```csharp
|
||||
// SearchTool receives a progress callback
|
||||
public Task<string> ExecuteAsync(..., Action<string>? onProgress = null, ...)
|
||||
|
||||
// Components invoke callback at key milestones
|
||||
onProgress?.Invoke($"[Fetching article {current}/{total}: {domain}]");
|
||||
|
||||
// Caller (OpenQueryApp) provides reporter.StatusUpdate() as callback
|
||||
_searchTool.ExecuteAsync(..., (progress) => reporter.WriteLine(progress), ...);
|
||||
```
|
||||
|
||||
### 5. Resilience Patterns (Polly)
|
||||
|
||||
`EmbeddingService` uses Polly's retry policy:
|
||||
|
||||
```csharp
|
||||
_retryPipeline = new ResiliencePipelineBuilder()
|
||||
.AddRetry(new RetryStrategyOptions
|
||||
{
|
||||
MaxRetryAttempts = 3,
|
||||
Delay = TimeSpan.FromSeconds(1),
|
||||
BackoffType = DelayBackoffType.Exponential,
|
||||
ShouldHandle = new PredicateBuilder()
|
||||
.Handle<HttpRequestException>()
|
||||
})
|
||||
.Build();
|
||||
```
|
||||
|
||||
This automatically retries failed embedding requests with exponential backoff.
|
||||
|
||||
### 6. Producer-Consumer Pattern (Channel-based)
|
||||
|
||||
`StatusReporter` uses `System.Threading.Channels.Channel<string>` for asynchronous progress updates:
|
||||
|
||||
- Producer: `UpdateStatus()` writes messages to channel
|
||||
- Consumer: Background task `ProcessStatusUpdatesAsync()` reads and displays
|
||||
- Benefit: No blocking between progress generation and display
|
||||
|
||||
### 7. Disposable Pattern
|
||||
|
||||
Components that hold unmanaged resources implement `IDisposable` or `IAsyncDisposable`:
|
||||
|
||||
- `StatusReporter` - stops background spinner task
|
||||
- `RateLimiter` - disposes semaphore
|
||||
|
||||
Used via `using` statements for deterministic cleanup.
|
||||
|
||||
## Component Architecture
|
||||
|
||||
### OpenQueryApp (OpenQuery.cs)
|
||||
|
||||
**Role**: Main orchestrator; coordinates the entire workflow
|
||||
|
||||
**Responsibilities**:
|
||||
- Parse CLI options into `OpenQueryOptions`
|
||||
- Load configuration and resolve API keys/models
|
||||
- Optionally generate expanded search queries via LLM
|
||||
- Invoke `SearchTool` with progress callbacks
|
||||
- Stream final answer from LLM
|
||||
|
||||
**Key Methods**:
|
||||
- `RunAsync(OpenQueryOptions)` - Main entry point
|
||||
|
||||
**Interactions**:
|
||||
- Instantiates `OpenRouterClient` (for both query gen and final answer)
|
||||
- Instantiates `SearxngClient` (passed to `SearchTool`)
|
||||
- Instantiates `EmbeddingService` (passed to `SearchTool`)
|
||||
- Instantiates `SearchTool` (orchestration)
|
||||
- `StatusReporter` for UI updates
|
||||
|
||||
### SearchTool (Tools/SearchTool.cs)
|
||||
|
||||
**Role**: Core search-retrieve-rank pipeline orchestrator
|
||||
|
||||
**Responsibilities**:
|
||||
- Execute 4-phase pipeline (search → fetch → embed → rank)
|
||||
- Manage concurrency limits (via semaphores)
|
||||
- Coordinate parallel operations
|
||||
- Generate context string for final answer
|
||||
|
||||
**Interactions**:
|
||||
- Uses `SearxngClient` for Phase 1
|
||||
- Uses `ArticleService` + `ChunkingService` for Phase 2
|
||||
- Uses `EmbeddingService` for Phase 3
|
||||
- Has no external UI dependency (pure logic)
|
||||
|
||||
**Parallelization Strategy**:
|
||||
- **Phase 1**: `Task.WhenAll` on search tasks (unbounded but limited by SearxNG instance)
|
||||
- **Phase 2**: Semaphore (max 10 concurrent fetches)
|
||||
- **Phase 3**: `Parallel.ForEachAsync` (max 4 concurrent embedding batches)
|
||||
|
||||
### EmbeddingService (Services/EmbeddingService.cs)
|
||||
|
||||
**Role**: Generate vector embeddings with batching, rate limiting, and retries
|
||||
|
||||
**Responsibilities**:
|
||||
- Batch embedding requests (default: 300 per batch)
|
||||
- Parallelize batches (default: 4 concurrent)
|
||||
- Apply rate limiting (via `RateLimiter`)
|
||||
- Retry failed requests (Polly)
|
||||
- Calculate cosine similarity
|
||||
|
||||
**Key Methods**:
|
||||
- `GetEmbeddingsAsync(List<string> texts, ...)` - batch with progress
|
||||
- `GetEmbeddingAsync(string text)` - single embedding
|
||||
- `CosineSimilarity(float[], float[])` - static vector math
|
||||
|
||||
**Design Notes**:
|
||||
- Rate limiting is crucial to avoid overwhelming OpenRouter's embedding endpoint
|
||||
- Batches of 300 reduce API overhead
|
||||
- Polly retry handles transient failures (429, 500, network blips)
|
||||
|
||||
### OpenRouterClient (Services/OpenRouterClient.cs)
|
||||
|
||||
**Role**: HTTP client for OpenRouter API (completions + embeddings)
|
||||
|
||||
**Responsibilities**:
|
||||
- Serialize requests to JSON (source-generated)
|
||||
- Send HTTP with authorization header
|
||||
- Stream responses for chat completions (IAsyncEnumerable)
|
||||
- Return full responses for non-streaming
|
||||
- Throw on non-2xx status codes
|
||||
|
||||
**Endpoints**:
|
||||
- POST `/chat/completions` (stream and non-stream)
|
||||
- POST `/embeddings`
|
||||
|
||||
**Configuration**:
|
||||
- Base URL: `https://openrouter.ai/api/v1`
|
||||
- Headers: `Authorization: Bearer {apiKey}`, `Accept: application/json`
|
||||
|
||||
**Design**:
|
||||
- Low-level client; no retry logic (retry is in `EmbeddingService`)
|
||||
- Thin wrapper around `HttpClient`
|
||||
- Could be replaced with `HttpClientFactory` in larger apps
|
||||
|
||||
### SearxngClient (Services/SearxngClient.cs)
|
||||
|
||||
**Role**: HTTP client for SearxNG metasearch
|
||||
|
||||
**Responsibilities**:
|
||||
- Construct search URL with query param
|
||||
- GET request and deserialize JSON
|
||||
- Limit results (`.Take(limit)`)
|
||||
- Return empty list on failure (no exceptions)
|
||||
|
||||
**Endpoint**: `GET /search?q={query}&format=json`
|
||||
|
||||
**Design**:
|
||||
- Very simple; no retry (failures are acceptable, OpenQuery continues with other queries)
|
||||
- `DistinctBy(r => r.Url)` deduplication happens upstream
|
||||
|
||||
### ArticleService (Services/ArticleService.cs)
|
||||
|
||||
**Role**: Extract clean article content from URLs
|
||||
|
||||
**Responsibilities**:
|
||||
- Call `SmartReader.ParseArticleAsync(url)`
|
||||
- Return `Article` object with `Title`, `TextContent`, `IsReadable`
|
||||
|
||||
**Design**:
|
||||
- Single responsibility: extraction only (no fetching, no chunking)
|
||||
- SmartReader handles all complexity (HTML parsing, boilerplate removal)
|
||||
- Exceptions propagate to `SearchTool` (handled there)
|
||||
|
||||
### ChunkingService (Services/ChunkingService.cs)
|
||||
|
||||
**Role**: Split long text into 500-char chunks at natural boundaries
|
||||
|
||||
**Algorithm**:
|
||||
1. Start at index 0
|
||||
2. Take up to 500 chars
|
||||
3. If not at end, backtrack to last space/newline/period
|
||||
4. Add chunk, advance start
|
||||
5. Repeat until done
|
||||
|
||||
**Design**:
|
||||
- Static class (stateless utility)
|
||||
- No dependencies
|
||||
- Pure function (input text → output chunks)
|
||||
|
||||
### RateLimiter (Services/RateLimiter.cs)
|
||||
|
||||
**Role**: Limit concurrent operations via semaphore
|
||||
|
||||
**Responsibilities**:
|
||||
- Wrap actions with semaphore acquisition
|
||||
- Support both sync and async actions
|
||||
- Implement `IAsyncDisposable` for cleanup
|
||||
|
||||
**Usage Pattern**:
|
||||
```csharp
|
||||
await _rateLimiter.ExecuteAsync(async () =>
|
||||
{
|
||||
// operation limited by semaphore
|
||||
return await SomeApiCall();
|
||||
});
|
||||
```
|
||||
|
||||
**Design**:
|
||||
- Generic wrapper (can return TResult)
|
||||
- `SemaphoreSlim` initialized at construction
|
||||
- Used in `EmbeddingService` for parallel embedding batches
|
||||
|
||||
### StatusReporter (Services/StatusReporter.cs)
|
||||
|
||||
**Role**: Show real-time progress with spinner or verbose mode
|
||||
|
||||
**Responsibilities**:
|
||||
- Maintain spinner animation (background task)
|
||||
- Receive status updates via channel
|
||||
- Display updates with appropriate formatting
|
||||
- Stop spinner on completion
|
||||
|
||||
**Features**:
|
||||
- **Spinner mode** (non-verbose): `⠋ Fetching...` with animated Braille chars
|
||||
- **Verbose mode**: `[Fetching article 1/10: example.com]` on separate lines
|
||||
- **Thread-safe**: Channel is safe for concurrent writes
|
||||
- **Non-blocking**: Background spinner doesn't block updates
|
||||
|
||||
**Design**:
|
||||
- `Channel<string>` for asynchronous producer-consumer
|
||||
- Background task (`_statusProcessor`) reads from channel
|
||||
- Spinner runs on its own task with 100ms delay per frame
|
||||
- `IDisposable` ensures proper cleanup
|
||||
|
||||
### ConfigManager (ConfigManager.cs)
|
||||
|
||||
**Role**: Load and save configuration from/to file
|
||||
|
||||
**Responsibilities**:
|
||||
- Get config path (XDG: `~/.config/openquery/config`)
|
||||
- Parse key-value pairs (no INI library, manual parsing)
|
||||
- Provide `AppConfig` object with defaults
|
||||
- Save settings back to file
|
||||
|
||||
**Design**:
|
||||
- Static class (no instances)
|
||||
- Creates config directory if missing
|
||||
- Line-by-line parsing (simple, no dependencies)
|
||||
- Could be improved with proper INI parser or JSON
|
||||
|
||||
## Data Flow
|
||||
|
||||
### End-to-End Data Flow
|
||||
|
||||
```
|
||||
User: "What is quantum entanglement?"
|
||||
|
||||
1. OpenQueryOptions created
|
||||
{ Question = "...", Queries = 3, Results = 5, Chunks = 3, ... }
|
||||
|
||||
2. Query Generation (if Queries > 1)
|
||||
→ ChatCompletionRequest to OpenRouter (system prompt for JSON queries)
|
||||
→ Deserialize to List<string> (generatedQueries)
|
||||
|
||||
3. Search Phase
|
||||
generatedQueries → Parallel.SearxngClient.SearchAsync → ConcurrentBag<SearxngResult>
|
||||
→ DistinctBy(Url) → List<SearxngResult> (15 results = 3 queries × 5 results)
|
||||
|
||||
4. Fetch Phase
|
||||
searchResults → Parallel.ArticleService.FetchArticleAsync → Article
|
||||
→ ChunkingService.ChunkText (split into ~500-char pieces)
|
||||
→ ConcurrentBag<Chunk> (could be 50-100 chunks from 15 articles)
|
||||
|
||||
5. Embedding Phase
|
||||
originalQuery → EmbeddingService.GetEmbeddingAsync → float[] (queryEmbedding)
|
||||
chunk.Contents → EmbeddingService.GetEmbeddingsAsync → float[][] (chunkEmbeddings)
|
||||
|
||||
6. Ranking Phase
|
||||
For each Chunk: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
|
||||
OrderByDescending(Score).Take(3) → topChunks (final 3 chunks)
|
||||
|
||||
7. Answer Phase
|
||||
context = string.Join("\n\n", topChunks.Select(...))
|
||||
→ ChatCompletionRequest to OpenRouter with context + question
|
||||
→ StreamAsync → Console.Write(delta) (real-time display)
|
||||
|
||||
Result: User sees answer with [Source N] citations
|
||||
```
|
||||
|
||||
### Data Structures
|
||||
|
||||
**Chunk** - The core data structure flowing through the pipeline:
|
||||
```csharp
|
||||
public record Chunk(
|
||||
string Content, // Text content (~500 chars)
|
||||
string SourceUrl, // Where it came from
|
||||
string? Title = null // Article title
|
||||
)
|
||||
{
|
||||
public float[]? Embedding { get; set; } // Added in Phase 3
|
||||
public float Score { get; set; } // Added in Phase 4
|
||||
}
|
||||
```
|
||||
|
||||
**Data Flow State**:
|
||||
- Phase 1-2: `Chunk` without embedding
|
||||
- Phase 3: `Chunk.Embedding` populated
|
||||
- Phase 4: `Chunk.Score` populated
|
||||
- Phase 5: Serialized into context string
|
||||
|
||||
### Memory Footprint
|
||||
|
||||
**Per 15-article run (approximate)**:
|
||||
- Raw HTML (fetched): ~5MB (transient, discarded after extract)
|
||||
- Articles: ~500KB (15 articles × ~30KB extracted text)
|
||||
- Chunks: ~50-100 items × 500 chars ≈ 25-50KB text
|
||||
- Embeddings: ~50-100 × 1536 floats × 4 bytes ≈ 300-600KB
|
||||
- Total peak: ~1-2MB (excluding OpenRouter's memory usage)
|
||||
|
||||
**Note**: AOT compilation reduces runtime memory compared to JIT.
|
||||
|
||||
## Concurrency Model
|
||||
|
||||
OpenQuery uses multiple parallelization strategies:
|
||||
|
||||
### Unbounded Parallelism (Task.WhenAll)
|
||||
|
||||
**Where**: Search queries (Phase 1)
|
||||
|
||||
```csharp
|
||||
var searchTasks = generatedQueries.Select(async query => { ... });
|
||||
await Task.WhenAll(searchTasks);
|
||||
```
|
||||
|
||||
**Rationale**: SearxNG can handle concurrent queries; no need to limit (it's a local/single-user tool). SearxNG itself may throttle internally.
|
||||
|
||||
**Risk**: Could overwhelm SearxNG if `--queries` is set very high (100+). Default 3 is safe.
|
||||
|
||||
### Semaphore-Controlled Parallelism
|
||||
|
||||
**Where**: Article fetching (Phase 2)
|
||||
|
||||
```csharp
|
||||
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
|
||||
await Task.WhenAll(fetchTasks); // Each task waits on semaphore
|
||||
```
|
||||
|
||||
**Rationale**: Prevent flooding target websites with requests (DOS-like behavior). 10 concurrent is polite but fast.
|
||||
|
||||
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentArticleFetches` (compile-time constant currently).
|
||||
|
||||
### Parallel.ForEachAsync with MaxDegreeOfParallelism
|
||||
|
||||
**Where**: Embedding batch processing (Phase 3)
|
||||
|
||||
```csharp
|
||||
await Parallel.ForEachAsync(
|
||||
batchIndices,
|
||||
new ParallelOptions { MaxDegreeOfParallelism = 4 },
|
||||
async (batchIndex, ct) => { ... }
|
||||
);
|
||||
```
|
||||
|
||||
**Rationale**: Limit API concurrency to respect OpenRouter rate limits. 4 concurrent embedding requests is a safe default.
|
||||
|
||||
**Configurable**: Yes, via `ParallelProcessingOptions.MaxConcurrentEmbeddingRequests` (compile-time).
|
||||
|
||||
### Progress Reporting (Channel)
|
||||
|
||||
**Where**: All phases pass `onProgress` callback
|
||||
|
||||
**Implementation**:
|
||||
- `StatusReporter.UpdateStatus()` → writes to channel
|
||||
- Background task reads channel and displays
|
||||
- Non-blocking; callbacks are fire-and-forget (TryWrite)
|
||||
|
||||
**Thread Safety**: Channel is thread-safe; multiple phases may write concurrently.
|
||||
|
||||
## Error Handling & Resilience
|
||||
|
||||
### HTTP Errors
|
||||
|
||||
**OpenRouterClient**:
|
||||
- Calls `response.EnsureSuccessStatusCode()` → throws `HttpRequestException` on 4xx/5xx
|
||||
- No retry (handled at higher level in `EmbeddingService`)
|
||||
|
||||
**SearxngClient**:
|
||||
- Returns empty `List<SearxngResult>` on non-success
|
||||
- No exception thrown (searches are non-critical; if some queries fail, others proceed)
|
||||
|
||||
### Retry Policy (Polly)
|
||||
|
||||
**Location**: `EmbeddingService` constructor
|
||||
|
||||
**Scope**: Only embedding requests (`_client.EmbedAsync`)
|
||||
|
||||
**Policy**:
|
||||
- Max 3 attempts
|
||||
- Exponential backoff: 1s, 2s, 4s
|
||||
- Only retries `HttpRequestException` (network errors, 429, 5xx)
|
||||
|
||||
**Why not on chat completions?**
|
||||
- Query generation and final answer are critical; failures should surface immediately
|
||||
- Could be added in future if transient failures are common
|
||||
|
||||
### Graceful Degradation
|
||||
|
||||
**Query Generation Failure**:
|
||||
```csharp
|
||||
try { ... generate queries ... }
|
||||
catch (Exception ex)
|
||||
{
|
||||
// Fall back to original question as sole query
|
||||
if (options.Verbose) reporter.WriteLine($"[Failed to generate queries, falling back to original question]");
|
||||
}
|
||||
```
|
||||
|
||||
**Embedding Batch Failure**:
|
||||
```csharp
|
||||
catch
|
||||
{
|
||||
// Return empty embeddings for this batch (chunk will be filtered out)
|
||||
var emptyBatch = new float[batch.Count][];
|
||||
// fill with empty arrays
|
||||
results.Add((batchIndex, emptyBatch));
|
||||
}
|
||||
```
|
||||
|
||||
**Article Fetch Failure**:
|
||||
```csharp
|
||||
try { await ArticleService.FetchArticleAsync(url); }
|
||||
catch (Exception ex)
|
||||
{
|
||||
if (verbose) Console.WriteLine($"Warning: Failed to fetch article {url}: {ex.Message}");
|
||||
// Chunk not added; continue with others
|
||||
}
|
||||
```
|
||||
|
||||
### User-Facing Errors
|
||||
|
||||
Top-level exception handler in `Program.cs`:
|
||||
```csharp
|
||||
try { await openQuery.RunAsync(options); }
|
||||
catch (HttpRequestException ex)
|
||||
{
|
||||
Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
|
||||
Environment.Exit(1);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
|
||||
Environment.Exit(1);
|
||||
}
|
||||
```
|
||||
|
||||
### Cancellation Support
|
||||
|
||||
`OpenRouterClient.StreamAsync` and `EmbeddingService` methods accept `CancellationToken`.
|
||||
|
||||
Used in:
|
||||
- Streaming answer (Ctrl+C stops immediately)
|
||||
- Parallel embeddings (can be cancelled)
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Latency Breakdown (Typical)
|
||||
|
||||
| Stage | Time | Description |
|
||||
|-------|------|-------------|
|
||||
| Query generation | 2-5s | LLM generates 3-5 queries |
|
||||
| Searches | 3-8s | 3-5 parallel SearxNG queries |
|
||||
| Article fetching | 5-15s | 10-20 parallel fetches (network + parse) |
|
||||
| Embeddings | 2-4s | 50-100 chunks in 4-parallel batches |
|
||||
| Final answer | 5-20s | Depends on answer length (streaming) |
|
||||
| **Total** | **15-50s** | Varies widely based on network & content |
|
||||
|
||||
### Bottlenecks
|
||||
|
||||
1. **Network I/O** (article fetching, API calls) - can't be CPU-bound
|
||||
2. **OpenRouter API latency** - varies by model and load
|
||||
3. **SmartReader parsing** - CPU-bound for large HTML
|
||||
4. **Embedding API rate** - OpenRouter may rate limit if too many concurrent
|
||||
|
||||
### Optimization Strategies
|
||||
|
||||
- **Parallelism**: Already maximized within API constraints
|
||||
- **Caching**: Not implemented; future enhancement could cache embeddings per URL
|
||||
- **Batching**: 300-chunk batches reduce API overhead
|
||||
- **AOT**: Native compilation reduces startup overhead vs JIT
|
||||
|
||||
### Scalability Limits
|
||||
|
||||
- **Memory**: Scales with number of chunks. 100 chunks × 1536 floats × 4 bytes ≈ 600KB. Can handle thousands but CPU for similarity calculation becomes O(n).
|
||||
- **API Limits**: OpenRouter rate limits per API key; may need to reduce concurrency if hitting 429s.
|
||||
- **SearxNG Limits**: Single SearxNG instance can handle ~10-50 QPS; above that may need load balancing (not in scope).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Not Use a DI Container?
|
||||
|
||||
OpenQuery manually wires dependencies in `Program.cs`. For a small CLI app, this is:
|
||||
- Simpler (no container configuration)
|
||||
- More explicit (easy to trace dependency graph)
|
||||
- No runtime overhead
|
||||
- Easier to understand for contributors
|
||||
|
||||
Container-based DI would be overengineering.
|
||||
|
||||
### Why AOT?
|
||||
|
||||
.NET 10 AOT provides:
|
||||
- **Fast startup** (<100ms vs ~500ms JIT)
|
||||
- **Smaller footprint** (trimmed, no JIT)
|
||||
- **No runtime dependencies** (self-contained)
|
||||
- Better for CLI tools distributed to users
|
||||
|
||||
Trade-offs:
|
||||
- Longer build time
|
||||
- Some reflection-based APIs not supported (not needed here)
|
||||
- Less flexible (can't load dynamic assemblies, but not needed)
|
||||
|
||||
### Why SmartReader for Article Extraction?
|
||||
|
||||
SmartReader uses Readability-based algorithm similar to Firefox Reader View:
|
||||
- Removes ads, navigation, comments, boilerplate
|
||||
- Extracts main article content
|
||||
- Handles malformed HTML gracefully
|
||||
- Zero dependencies (pure .NET)
|
||||
|
||||
Alternatives considered:
|
||||
- `HtmlAgilityPack` (too low-level, need to implement extraction logic)
|
||||
- `AngleSharp` (similar, still need extraction)
|
||||
- External services (like diffbot) - require API keys, costs money
|
||||
|
||||
SmartReader is the sweet spot: free, good quality, easy integration.
|
||||
|
||||
### Why Embeddings + Cosine Similarity vs Full-Text Search?
|
||||
|
||||
Full-text search (like Lucene) would:
|
||||
- Require inverted index, more complex
|
||||
- Be faster for exact keyword matching
|
||||
- Not understand semantic similarity
|
||||
|
||||
Embeddings provide:
|
||||
- Semantic similarity (understand meaning, not just keywords)
|
||||
- Simple math (cosine similarity of float arrays)
|
||||
- No index to maintain (just compute on-the-fly)
|
||||
|
||||
Trade-off: Embedding API cost and latency vs LRU cache potential.
|
||||
|
||||
### Why Not RAG (Retrieval Augmented Generation) Framework?
|
||||
|
||||
OpenQuery is essentially a lightweight custom RAG system. Using a full framework (like LangChain) would:
|
||||
- Add dependency bloat
|
||||
- Reduce control
|
||||
- Increase abstraction complexity
|
||||
|
||||
Custom implementation is ~1000 LOC and perfectly matches needs.
|
||||
|
||||
### Why System.CommandLine?
|
||||
|
||||
Provides:
|
||||
- Native-like CLI help (`openquery --help`)
|
||||
- Strongly-typed options
|
||||
- Command hierarchy (main + subcommands)
|
||||
- Good error messages
|
||||
|
||||
Alternative: `CommandLineParser` (older) or manual parsing. System.CommandLine is modern and actively developed.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Components](components/overview.md) - Deep dive into each module
|
||||
- [API Reference](api/cli.md) - Complete command documentation
|
||||
- [Troubleshooting](troubleshooting.md) - Debug issues
|
||||
|
||||
---
|
||||
|
||||
**Want to extend OpenQuery?** Check the [Components](components/overview.md) guide to understand each piece.
|
||||
528
docs/components/models.md
Normal file
528
docs/components/models.md
Normal file
@@ -0,0 +1,528 @@
|
||||
# Models Reference
|
||||
|
||||
Complete reference for all data models, DTOs, and records in OpenQuery.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Core Data Models](#core-data-models)
|
||||
2. [OpenRouter API Models](#openrouter-api-models)
|
||||
3. [SearxNG API Models](#searxng-api-models)
|
||||
4. [JSON Serialization](#json-serialization)
|
||||
5. [Model Relationships](#model-relationships)
|
||||
|
||||
## Core Data Models
|
||||
|
||||
### OpenQueryOptions
|
||||
|
||||
**Location**: `Models/OpenQueryOptions.cs`
|
||||
**Type**: `record`
|
||||
**Purpose**: Immutable options object for a single query execution
|
||||
|
||||
```csharp
|
||||
public record OpenQueryOptions(
|
||||
int Chunks, // Number of top chunks to include in context
|
||||
int Results, // Search results per generated query
|
||||
int Queries, // Number of search queries to generate (if >1)
|
||||
bool Short, // Request concise answer
|
||||
bool Long, // Request detailed answer
|
||||
bool Verbose, // Enable verbose logging
|
||||
string Question // Original user question (required)
|
||||
);
|
||||
```
|
||||
|
||||
**Lifecycle**:
|
||||
- Created in `Program.cs` by combining CLI options, config defaults, and environment variables
|
||||
- Passed to `OpenQueryApp.RunAsync(options)`
|
||||
|
||||
**Validation**: None (assumes valid values from CLI parser/config)
|
||||
|
||||
**Example**:
|
||||
```csharp
|
||||
var options = new OpenQueryOptions(
|
||||
Chunks: 3,
|
||||
Results: 5,
|
||||
Queries: 3,
|
||||
Short: false,
|
||||
Long: false,
|
||||
Verbose: true,
|
||||
Question: "What is quantum entanglement?"
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Chunk
|
||||
|
||||
**Location**: `Models/Chunk.cs`
|
||||
**Type**: `record`
|
||||
**Purpose**: Content chunk with metadata, embedding, and relevance score
|
||||
|
||||
```csharp
|
||||
public record Chunk(
|
||||
string Content, // Text content (typically ~500 chars)
|
||||
string SourceUrl, // Original article URL
|
||||
string? Title = null // Article title (optional, may be null)
|
||||
)
|
||||
{
|
||||
public float[]? Embedding { get; set; } // Vector embedding (1536-dim for text-embedding-3-small)
|
||||
public float Score { get; set; } // Relevance score (0-1, higher = more relevant)
|
||||
}
|
||||
```
|
||||
|
||||
**Lifecycle**:
|
||||
1. **Created** in `SearchTool.ExecuteParallelArticleFetchingAsync`:
|
||||
```csharp
|
||||
chunks.Add(new Chunk(chunkText, result.Url, article.Title));
|
||||
```
|
||||
At this point: `Embedding = null`, `Score = 0`
|
||||
|
||||
2. **Embedded** in `SearchTool.ExecuteParallelEmbeddingsAsync`:
|
||||
```csharp
|
||||
validChunks[i].Embedding = validEmbeddings[i];
|
||||
```
|
||||
|
||||
3. **Scored** in `SearchTool.RankAndSelectTopChunks`:
|
||||
```csharp
|
||||
chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
|
||||
```
|
||||
|
||||
4. **Formatted** into context string:
|
||||
```csharp
|
||||
$"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"
|
||||
```
|
||||
|
||||
**Properties**:
|
||||
- `Content`: Never null/empty (filters empty chunks in `ChunkingService`)
|
||||
- `SourceUrl`: Always provided (from `SearxngResult.Url`)
|
||||
- `Title`: May be null if article extraction failed to get title
|
||||
- `Embedding`: Null until phase 3; may remain null if embedding failed
|
||||
- `Score`: 0 until phase 4; irrelevant for non-embedded chunks
|
||||
|
||||
**Equality**: Records use value equality (all properties compared). Two chunks with same content/url/title are equal; embeddings and scores ignored for equality (as they're mutable).
|
||||
|
||||
---
|
||||
|
||||
### ParallelProcessingOptions
|
||||
|
||||
**Location**: `Models/ParallelOptions.cs`
|
||||
**Type**: `class`
|
||||
**Purpose**: Configuration for parallel/concurrent operations
|
||||
|
||||
```csharp
|
||||
public class ParallelProcessingOptions
|
||||
{
|
||||
public int MaxConcurrentArticleFetches { get; set; } = 10;
|
||||
public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
|
||||
public int EmbeddingBatchSize { get; set; } = 300;
|
||||
}
|
||||
```
|
||||
|
||||
**Usage**:
|
||||
- Instantiated in `SearchTool` constructor (hardcoded new)
|
||||
- Passed to `EmbeddingService` constructor
|
||||
- Read by `SearchTool` for article fetching semaphore
|
||||
|
||||
**Default Values**:
|
||||
| Property | Default | Effect |
|
||||
|----------|---------|--------|
|
||||
| `MaxConcurrentArticleFetches` | 10 | Up to 10 articles fetched simultaneously |
|
||||
| `MaxConcurrentEmbeddingRequests` | 4 | Up to 4 embedding batches in parallel |
|
||||
| `EmbeddingBatchSize` | 300 | Each embedding API call handles up to 300 texts |
|
||||
|
||||
**Current Limitation**: These are **compile-time defaults** (hardcoded in `SearchTool.cs`). To make them configurable:
|
||||
1. Add to `AppConfig`
|
||||
2. Read in `ConfigManager`
|
||||
3. Pass through `SearchTool` constructor
|
||||
|
||||
---
|
||||
|
||||
## OpenRouter API Models
|
||||
|
||||
**Location**: `Models/OpenRouter.cs`
|
||||
**Purpose**: DTOs for OpenRouter's REST API (JSON serialization)
|
||||
|
||||
### Chat Completion
|
||||
|
||||
#### `ChatCompletionRequest`
|
||||
```csharp
|
||||
public record ChatCompletionRequest(
|
||||
[property: JsonPropertyName("model")] string Model,
|
||||
[property: JsonPropertyName("messages")] List<Message> Messages,
|
||||
[property: JsonPropertyName("tools")] List<ToolDefinition>? Tools = null,
|
||||
[property: JsonPropertyName("stream")] bool Stream = false
|
||||
);
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{
|
||||
"model": "qwen/qwen3.5-flash-02-23",
|
||||
"messages": [
|
||||
{ "role": "system", "content": "You are a helpful assistant." },
|
||||
{ "role": "user", "content": "What is 2+2?" }
|
||||
],
|
||||
"stream": true
|
||||
}
|
||||
```
|
||||
|
||||
#### `Message`
|
||||
```csharp
|
||||
public record Message(
|
||||
[property: JsonPropertyName("role")] string Role,
|
||||
[property: JsonPropertyName("content")] string? Content = null,
|
||||
[property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null,
|
||||
[property: JsonPropertyName("tool_call_id")] string? ToolCallId = null
|
||||
)
|
||||
{
|
||||
// Factory method for tool responses
|
||||
public static Message FromTool(string content, string toolCallId) =>
|
||||
new Message("tool", content, null, toolCallId);
|
||||
}
|
||||
```
|
||||
|
||||
**Roles**: `"system"`, `"user"`, `"assistant"`, `"tool"`
|
||||
|
||||
**Usage**:
|
||||
- `Content` for text messages
|
||||
- `ToolCalls` when assistant requests tool use
|
||||
- `ToolCallId` when responding to tool call
|
||||
|
||||
#### `ChatCompletionResponse`
|
||||
```csharp
|
||||
public record ChatCompletionResponse(
|
||||
[property: JsonPropertyName("choices")] List<Choice> Choices,
|
||||
[property: JsonPropertyName("usage")] Usage? Usage = null
|
||||
);
|
||||
|
||||
public record Choice(
|
||||
[property: JsonPropertyName("message")] Message Message,
|
||||
[property: JsonPropertyName("finish_reason")] string? FinishReason = null
|
||||
);
|
||||
```
|
||||
|
||||
**Response Example**:
|
||||
```json
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "Answer text..."
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 100,
|
||||
"completion_tokens": 50,
|
||||
"total_tokens": 150
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### `Usage`
|
||||
```csharp
|
||||
public record Usage(
|
||||
[property: JsonPropertyName("prompt_tokens")] int PromptTokens,
|
||||
[property: JsonPropertyName("completion_tokens")] int CompletionTokens,
|
||||
[property: JsonPropertyName("total_tokens")] int TotalTokens
|
||||
);
|
||||
```
|
||||
|
||||
### Tool Calling (Not Currently Used)
|
||||
|
||||
#### `ToolDefinition` / `ToolFunction`
|
||||
```csharp
|
||||
public record ToolDefinition(
|
||||
[property: JsonPropertyName("type")] string Type, // e.g., "function"
|
||||
[property: JsonPropertyName("function")] ToolFunction Function
|
||||
);
|
||||
|
||||
public record ToolFunction(
|
||||
[property: JsonPropertyName("name")] string Name,
|
||||
[property: JsonPropertyName("description")] string Description,
|
||||
[property: JsonPropertyName("parameters")] JsonElement Parameters // JSON Schema
|
||||
);
|
||||
```
|
||||
|
||||
#### `ToolCall` / `FunctionCall`
|
||||
```csharp
|
||||
public record ToolCall(
|
||||
[property: JsonPropertyName("id")] string Id,
|
||||
[property: JsonPropertyName("type")] string Type,
|
||||
[property: JsonPropertyName("function")] FunctionCall Function
|
||||
);
|
||||
|
||||
public record FunctionCall(
|
||||
[property: JsonPropertyName("name")] string Name,
|
||||
[property: JsonPropertyName("arguments")] string Arguments // JSON string
|
||||
);
|
||||
```
|
||||
|
||||
**Note**: OpenQuery doesn't use tools currently, but models are defined for future tool-calling capability.
|
||||
|
||||
### Streaming
|
||||
|
||||
#### `StreamChunk`
|
||||
```csharp
|
||||
public record StreamChunk(
|
||||
string? TextDelta = null,
|
||||
ClientToolCall? Tool = null
|
||||
);
|
||||
```
|
||||
|
||||
Yielded by `OpenRouterClient.StreamAsync()` for each SSE event.
|
||||
|
||||
#### `ChatCompletionChunk` (Server Response)
|
||||
```csharp
|
||||
public record ChatCompletionChunk(
|
||||
[property: JsonPropertyName("choices")] List<ChunkChoice> Choices
|
||||
);
|
||||
|
||||
public record ChunkChoice(
|
||||
[property: JsonPropertyName("delta")] ChunkDelta Delta
|
||||
);
|
||||
|
||||
public record ChunkDelta(
|
||||
[property: JsonPropertyName("content")] string? Content = null,
|
||||
[property: JsonPropertyName("tool_calls")] List<ToolCall>? ToolCalls = null
|
||||
);
|
||||
```
|
||||
|
||||
**Streaming Response Example** (SSE):
|
||||
```
|
||||
data: {"choices":[{"delta":{"content":"Hello"}}]}
|
||||
data: {"choices":[{"delta":{"content":" world"}}]}
|
||||
data: [DONE]
|
||||
```
|
||||
|
||||
`OpenRouterClient.StreamAsync` parses and yields `StreamChunk` with non-null `TextDelta` for content.
|
||||
|
||||
### Embeddings
|
||||
|
||||
#### `EmbeddingRequest`
|
||||
```csharp
|
||||
public record EmbeddingRequest(
|
||||
[property: JsonPropertyName("model")] string Model,
|
||||
[property: JsonPropertyName("input")] List<string> Input
|
||||
);
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{
|
||||
"model": "openai/text-embedding-3-small",
|
||||
"input": ["text 1", "text 2", ...]
|
||||
}
|
||||
```
|
||||
|
||||
#### `EmbeddingResponse`
|
||||
```csharp
|
||||
public record EmbeddingResponse(
|
||||
[property: JsonPropertyName("data")] List<EmbeddingData> Data,
|
||||
[property: JsonPropertyName("usage")] Usage Usage
|
||||
);
|
||||
|
||||
public record EmbeddingData(
|
||||
[property: JsonPropertyName("embedding")] float[] Embedding,
|
||||
[property: JsonPropertyName("index")] int Index
|
||||
);
|
||||
```
|
||||
|
||||
**Response Example**:
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{ "embedding": [0.1, 0.2, ...], "index": 0 },
|
||||
{ "embedding": [0.3, 0.4, ...], "index": 1 }
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 100,
|
||||
"total_tokens": 100
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note**: `_client.EmbedAsync` orders by `index` to match input order.
|
||||
|
||||
---
|
||||
|
||||
## SearxNG API Models
|
||||
|
||||
**Location**: `Models/Searxng.cs`
|
||||
**Purpose**: DTOs for SearxNG's JSON response format
|
||||
|
||||
### `SearxngRoot`
|
||||
```csharp
|
||||
public record SearxngRoot(
|
||||
[property: JsonPropertyName("results")] List<SearxngResult> Results
|
||||
);
|
||||
```
|
||||
|
||||
Top-level response object.
|
||||
|
||||
### `SearxngResult`
|
||||
```csharp
|
||||
public record SearxngResult(
|
||||
[property: JsonPropertyName("title")] string Title,
|
||||
[property: JsonPropertyName("url")] string Url,
|
||||
[property: JsonPropertyName("content")] string Content // Snippet/description
|
||||
);
|
||||
```
|
||||
|
||||
**Fields**:
|
||||
- `Title`: Result title (from page `<title>` or OpenGraph)
|
||||
- `Url`: Absolute URL to article
|
||||
- `Content`: Short snippet (~200 chars) from search engine
|
||||
|
||||
**Usage**:
|
||||
- `Url` passed to `ArticleService.FetchArticleAsync`
|
||||
- `Title` used as fallback if article extraction fails
|
||||
- `Content` currently unused (could be for quick answer without fetching)
|
||||
|
||||
**Example Response**:
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"title": "Quantum Entanglement - Wikipedia",
|
||||
"url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
|
||||
"content": "Quantum entanglement is a physical phenomenon..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## JSON Serialization
|
||||
|
||||
### JsonContext (Source Generation)
|
||||
|
||||
**Location**: `Models/JsonContexts.cs`
|
||||
**Purpose**: Provide source-generated JSON serializer context for AOT compatibility
|
||||
|
||||
#### Declaration
|
||||
```csharp
|
||||
[JsonSerializable(typeof(ChatCompletionRequest))]
|
||||
[JsonSerializable(typeof(ChatCompletionResponse))]
|
||||
[JsonSerializable(typeof(ChatCompletionChunk))]
|
||||
[JsonSerializable(typeof(EmbeddingRequest))]
|
||||
[JsonSerializable(typeof(EmbeddingResponse))]
|
||||
[JsonSerializable(typeof(SearxngRoot))]
|
||||
[JsonJsonSerializer(typeof(List<string>))]
|
||||
internal partial class AppJsonContext : JsonSerializerContext
|
||||
{
|
||||
}
|
||||
```
|
||||
|
||||
**Usage**:
|
||||
```csharp
|
||||
var json = JsonSerializer.Serialize(request, AppJsonContext.Default.ChatCompletionRequest);
|
||||
var response = JsonSerializer.Deserialize(json, AppJsonContext.Default.ChatCompletionResponse);
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- **AOT-compatible**: No reflection, works with PublishAot=true
|
||||
- **Performance**: Pre-compiled serializers are faster
|
||||
- **Trimming safe**: Unused serializers trimmed automatically
|
||||
|
||||
**Generated**: Partial class compiled by source generator (no manual implementation)
|
||||
|
||||
**Important**: Must include ALL types that will be serialized/deserialized in `[JsonSerializable]` attributes, otherwise runtime exception in AOT.
|
||||
|
||||
---
|
||||
|
||||
## Model Relationships
|
||||
|
||||
### Object Graph (Typical Execution)
|
||||
|
||||
```
|
||||
OpenQueryOptions
|
||||
↓
|
||||
OpenQueryApp.RunAsync()
|
||||
│
|
||||
├─ queryGenerationMessages (List<Message>)
|
||||
│ ├─ system: "You are an expert researcher..."
|
||||
│ └─ user: "Generate N queries for: {question}"
|
||||
│ ↓
|
||||
│ ChatCompletionRequest → OpenRouter → ChatCompletionResponse
|
||||
│ ↓
|
||||
│ List<string> generatedQueries
|
||||
│
|
||||
├─ SearchTool.ExecuteAsync()
|
||||
│ ↓
|
||||
│ ┌─────────────────────────────────────┐
|
||||
│ │ Phase 1: Parallel Searches │
|
||||
│ │ SearxngClient.SearchAsync(query) × N
|
||||
│ │ → List<SearxngResult> │
|
||||
│ │ (Title, Url, Content) │
|
||||
│ └─────────────────────────────────────┘
|
||||
│ ↓
|
||||
│ ┌─────────────────────────────────────┐
|
||||
│ │ Phase 2: Article Fetch & Chunking │
|
||||
│ │ ArticleService.FetchAsync(Url) × M
|
||||
│ │ → Article (TextContent, Title)
|
||||
│ │ → ChunkingService.ChunkText → List<string> chunks
|
||||
│ │ → Chunk(content, url, title) × K │
|
||||
│ └─────────────────────────────────────┘
|
||||
│ ↓
|
||||
│ ┌─────────────────────────────────────┐
|
||||
│ │ Phase 3: Embeddings │
|
||||
│ │ EmbeddingService.GetEmbeddingsAsync(chunkContents)
|
||||
│ │ → float[][] chunkEmbeddings │
|
||||
│ │ → Set chunk.Embedding for each │
|
||||
│ │ Also: GetEmbeddingAsync(question) → float[] queryEmbedding
|
||||
│ └─────────────────────────────────────┘
|
||||
│ ↓
|
||||
│ ┌─────────────────────────────────────┐
|
||||
│ │ Phase 4: Ranking │
|
||||
│ │ For each chunk: Score = CosineSimilarity(queryEmbedding, chunk.Embedding)
|
||||
│ │ → Set chunk.Score │
|
||||
│ │ → OrderByDescending(Score) │
|
||||
│ │ → Take(topChunksLimit) → topChunks (List<Chunk>)
|
||||
│ └─────────────────────────────────────┘
|
||||
│ ↓
|
||||
│ Context string: formatted topChunks
|
||||
│ ↓
|
||||
└─ OpenQueryApp → final ChatCompletionRequest
|
||||
System: "Answer based on context..."
|
||||
User: "Context:\n{context}\n\nQuestion: {question}"
|
||||
↓
|
||||
StreamAsync() → StreamChunk.TextDelta → Console
|
||||
```
|
||||
|
||||
### Record Immutability
|
||||
|
||||
Most DTOs are `record` types:
|
||||
- **Immutable**: Properties are init-only (`{ get; init; }`)
|
||||
- **Value semantics**: Equality based on content
|
||||
- **Thread-safe**: Can be shared across threads
|
||||
|
||||
**Exception**:
|
||||
- `Chunk`: Has mutable properties `Embedding` and `Score` (set during pipeline)
|
||||
- `ParallelProcessingOptions`: Class with mutable setters
|
||||
- `AppConfig`: Class with mutable setters
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[API Reference](../../api/cli.md)** - How these models are used in CLI commands
|
||||
- **[OpenRouterClient](../../services/OpenRouterClient.md)** - Uses OpenRouter models
|
||||
- **[SearxngClient](../../services/SearxngClient.md)** - Uses Searxng models
|
||||
- **[SearchTool](../../components/search-tool.md)** - Orchestrates all models
|
||||
|
||||
---
|
||||
|
||||
**Quick Reference Table**
|
||||
|
||||
| Model | Category | Purpose | Mutable? |
|
||||
|-------|----------|---------|----------|
|
||||
| `OpenQueryOptions` | Core | CLI options | No (record) |
|
||||
| `Chunk` | Core | Content + metadata + ranking | Partially (Embedding, Score) |
|
||||
| `ParallelProcessingOptions` | Config | Concurrency settings | Yes (class) |
|
||||
| `ChatCompletionRequest/Response` | OpenRouter | LLM API | No |
|
||||
| `EmbeddingRequest/Response` | OpenRouter | Embeddings API | No |
|
||||
| `SearxngRoot/Result` | SearxNG | Search results | No |
|
||||
| `AppJsonContext` | Internal | JSON serialization | No (generated partial) |
|
||||
395
docs/components/openquery-app.md
Normal file
395
docs/components/openquery-app.md
Normal file
@@ -0,0 +1,395 @@
|
||||
# OpenQueryApp Component
|
||||
|
||||
Deep dive into the `OpenQueryApp` class - the main application orchestrator.
|
||||
|
||||
## Overview
|
||||
|
||||
`OpenQueryApp` is the heart of OpenQuery. It coordinates all components, manages the workflow from question to answer, and handles progress reporting.
|
||||
|
||||
## Location
|
||||
`OpenQuery.cs` in project root
|
||||
|
||||
## Class Definition
|
||||
|
||||
```csharp
|
||||
public class OpenQueryApp
|
||||
{
|
||||
private readonly OpenRouterClient _client;
|
||||
private readonly SearchTool _searchTool;
|
||||
private readonly string _model;
|
||||
|
||||
public OpenQueryApp(
|
||||
OpenRouterClient client,
|
||||
SearchTool searchTool,
|
||||
string model);
|
||||
|
||||
public async Task RunAsync(OpenQueryOptions options);
|
||||
}
|
||||
```
|
||||
|
||||
**Dependencies**:
|
||||
- `OpenRouterClient` - for query generation and final answer streaming
|
||||
- `SearchTool` - for search-retrieve-rank pipeline
|
||||
- `string _model` - model identifier to use for LLM calls
|
||||
|
||||
**Lifecycle**: Instantiated once per query execution in `Program.cs`, then `RunAsync()` called once.
|
||||
|
||||
## RunAsync Workflow
|
||||
|
||||
```csharp
|
||||
public async Task RunAsync(OpenQueryOptions options)
|
||||
{
|
||||
// 1. Setup
|
||||
using var reporter = new StatusReporter(options.Verbose);
|
||||
reporter.StartSpinner();
|
||||
|
||||
// 2. Query Generation (if needed)
|
||||
List<string> queries = await GenerateQueriesIfNeededAsync(options, reporter);
|
||||
|
||||
// 3. Search Pipeline
|
||||
string searchResult = await ExecuteSearchPipelineAsync(options, queries, reporter);
|
||||
|
||||
// 4. Final Answer Streaming
|
||||
await StreamFinalAnswerAsync(options, searchResult, reporter);
|
||||
}
|
||||
```
|
||||
|
||||
### Step 1: Status Reporter Setup
|
||||
|
||||
```csharp
|
||||
using var reporter = new StatusReporter(options.Verbose);
|
||||
reporter.StartSpinner();
|
||||
```
|
||||
|
||||
- Creates `StatusReporter` (implements `IDisposable`)
|
||||
- Starts spinner animation (unless verbose)
|
||||
- `using` ensures disposal on exit
|
||||
|
||||
### Step 2: Query Generation
|
||||
|
||||
**When**: `options.Queries > 1` (user wants multiple search queries)
|
||||
|
||||
**Purpose**: Use LLM to generate diverse, optimized search queries from the original question
|
||||
|
||||
**System Prompt** (hardcoded in `OpenQuery.cs`):
|
||||
```
|
||||
You are an expert researcher. The user will ask a question. Your task is to
|
||||
generate optimal search queries to gather comprehensive information.
|
||||
|
||||
Instructions:
|
||||
1. Break down complex questions.
|
||||
2. Use synonyms and alternative phrasing.
|
||||
3. Target different aspects (entities, mechanisms, pros/cons, history).
|
||||
|
||||
CRITICAL: Output must be a valid JSON array of strings ONLY. No markdown,
|
||||
explanations, or other text.
|
||||
```
|
||||
|
||||
**Request**:
|
||||
```csharp
|
||||
var queryGenMessages = new List<Message>
|
||||
{
|
||||
new Message("system", systemPrompt),
|
||||
new Message("user", $"Generate {options.Queries} distinct search queries for:\n{options.Question}")
|
||||
};
|
||||
var request = new ChatCompletionRequest(_model, queryGenMessages);
|
||||
var response = await _client.CompleteAsync(request);
|
||||
```
|
||||
|
||||
**Response Parsing**:
|
||||
```csharp
|
||||
var content = response.Choices.FirstOrDefault()?.Message.Content;
|
||||
if (!string.IsNullOrEmpty(content))
|
||||
{
|
||||
// Remove markdown code fences if present
|
||||
content = Regex.Replace(content, @"```json\s*|\s*```", "").Trim();
|
||||
|
||||
// Deserialize to List<string>
|
||||
var generatedQueries = JsonSerializer.Deserialize(content, AppJsonContext.Default.ListString);
|
||||
if (generatedQueries != null && generatedQueries.Count > 0)
|
||||
{
|
||||
queries = generatedQueries;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Fallback**: If any step fails (exception, null, empty, invalid JSON), use `new List<string> { options.Question }` (single query = original)
|
||||
|
||||
**Note**: Query generation reuses the same model as final answer. This could be optimized:
|
||||
- Use cheaper/faster model for query gen
|
||||
- Separate model configuration
|
||||
- Cache query generation results
|
||||
|
||||
### Step 3: Search Pipeline Execution
|
||||
|
||||
```csharp
|
||||
var searchResult = await _searchTool.ExecuteAsync(
|
||||
options.Question,
|
||||
queries,
|
||||
options.Results,
|
||||
options.Chunks,
|
||||
(progress) => {
|
||||
if (options.Verbose)
|
||||
reporter.WriteLine(progress);
|
||||
else
|
||||
reporter.UpdateStatus(parsedMessage);
|
||||
},
|
||||
options.Verbose);
|
||||
```
|
||||
|
||||
**Parameters**:
|
||||
- `originalQuery`: User's original question (used for final embedding)
|
||||
- `generatedQueries`: From step 2 (or fallback)
|
||||
- `maxResults`: `options.Results` (search results per query)
|
||||
- `topChunksLimit`: `options.Chunks` (top N chunks to return)
|
||||
- `onProgress`: Callback to update UI
|
||||
- `verbose`: Passed through to `SearchTool`
|
||||
|
||||
**Returns**: `string context` - formatted context with source citations
|
||||
|
||||
**Progress Handling**:
|
||||
- In verbose mode: all progress printed as lines (via `reporter.WriteLine()`)
|
||||
- In compact mode: parse progress messages to show concise status (e.g., "Fetching articles 3/10...")
|
||||
|
||||
### Step 4: Final Answer Streaming
|
||||
|
||||
**Status Update**:
|
||||
```csharp
|
||||
if (!options.Verbose)
|
||||
reporter.UpdateStatus("Asking AI...");
|
||||
else
|
||||
{
|
||||
reporter.ClearStatus();
|
||||
Console.WriteLine();
|
||||
}
|
||||
```
|
||||
|
||||
**Build System Prompt**:
|
||||
```csharp
|
||||
var systemPrompt = "You are a helpful AI assistant. Answer the user's question in depth, based on the provided context. Be precise and accurate. You can mention sources or citations.";
|
||||
if (options.Short) systemPrompt += " Give a very short concise answer.";
|
||||
if (options.Long) systemPrompt += " Give a long elaborate detailed answer.";
|
||||
```
|
||||
|
||||
**Prompt Structure**:
|
||||
```
|
||||
System: {systemPrompt}
|
||||
User: Context:
|
||||
{searchResult}
|
||||
|
||||
Question: {options.Question}
|
||||
```
|
||||
|
||||
Where `searchResult` is:
|
||||
```
|
||||
[Source 1: Title](URL)
|
||||
Content chunk 1
|
||||
|
||||
[Source 2: Title](URL)
|
||||
Content chunk 2
|
||||
|
||||
...
|
||||
```
|
||||
|
||||
**Streaming**:
|
||||
```csharp
|
||||
var requestStream = new ChatCompletionRequest(_model, messages);
|
||||
var assistantResponse = new StringBuilder();
|
||||
var isFirstChunk = true;
|
||||
|
||||
using var streamCts = new CancellationTokenSource();
|
||||
await foreach (var chunk in _client.StreamAsync(requestStream, streamCts.Token))
|
||||
{
|
||||
if (chunk.TextDelta == null) continue;
|
||||
|
||||
if (isFirstChunk)
|
||||
{
|
||||
reporter.StopSpinner();
|
||||
if (!options.Verbose) reporter.ClearStatus();
|
||||
else Console.Write("Assistant: ");
|
||||
isFirstChunk = false;
|
||||
}
|
||||
|
||||
Console.Write(chunk.TextDelta);
|
||||
assistantResponse.Append(chunk.TextDelta);
|
||||
}
|
||||
```
|
||||
|
||||
**Key Points**:
|
||||
- `StreamAsync` yields `StreamChunk` objects (text deltas)
|
||||
- First chunk stops spinner and clears status line
|
||||
- Each delta written to Console immediately (real-time feel)
|
||||
- Entire response accumulated in `assistantResponse` (though not used elsewhere)
|
||||
- `CancellationTokenSource` passed but not canceled (Ctrl+C would cancel from outside)
|
||||
|
||||
**Finally Block**:
|
||||
```csharp
|
||||
finally
|
||||
{
|
||||
reporter.StopSpinner();
|
||||
}
|
||||
```
|
||||
Ensures spinner stops even if streaming fails.
|
||||
|
||||
**End**:
|
||||
```csharp
|
||||
Console.WriteLine(); // Newline after complete answer
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
`RunAsync` itself does not catch exceptions. All exceptions propagate to `Program.cs`:
|
||||
|
||||
```csharp
|
||||
try
|
||||
{
|
||||
var openQuery = new OpenQueryApp(client, searchTool, model);
|
||||
await openQuery.RunAsync(options);
|
||||
}
|
||||
catch (HttpRequestException ex)
|
||||
{
|
||||
Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
|
||||
Environment.Exit(1);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
|
||||
Environment.Exit(1);
|
||||
}
|
||||
```
|
||||
|
||||
**Common Exceptions**:
|
||||
- `HttpRequestException` - network failures, API errors
|
||||
- `JsonException` - malformed JSON from API
|
||||
- `TaskCanceledException` - timeout or user interrupt
|
||||
- `Exception` - anything else
|
||||
|
||||
**No Retries at This Level**: Fail fast; user sees error immediately. Lower-level retries exist (embedding service).
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
**Query Generation**:
|
||||
- One non-streaming LLM call
|
||||
- Takes 2-5 seconds depending on model
|
||||
- Typically <1000 tokens
|
||||
|
||||
**Search Pipeline** (`SearchTool.ExecuteAsync`):
|
||||
- See `SearchTool.md` for detailed timing breakdown
|
||||
- Total 10-30 seconds typically
|
||||
|
||||
**Final Answer Streaming**:
|
||||
- Streaming LLM call
|
||||
- Time depends on answer length (typically 5-20 seconds)
|
||||
- User sees words appear progressively
|
||||
|
||||
**Total End-to-End**: 15-50 seconds for typical query
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Not Stream Query Generation?
|
||||
|
||||
Query generation currently uses `CompleteAsync` (non-streaming). Could be streamed but:
|
||||
- Queries are short (JSON array)
|
||||
- Streaming offers no UX benefit (user doesn't see intermediate queries)
|
||||
- Simpler to wait for all queries before proceeding
|
||||
|
||||
### Why Build Prompt Manually Instead of Templates?
|
||||
|
||||
Simple string concatenation is fine for few prompts. Pros:
|
||||
- No template dependencies
|
||||
- Easy to read and modify
|
||||
- No runtime compilation overhead
|
||||
|
||||
Cons:
|
||||
- No validation
|
||||
- Could benefit from prompt engineering framework
|
||||
|
||||
### Why Accumulate `assistantResponse` StringBuilder?
|
||||
|
||||
Currently built but not used. Could be:
|
||||
- Saved to file (future feature: `--output file.md`)
|
||||
- Analyzed for token counting
|
||||
- Removed if not needed
|
||||
|
||||
### Could Query Generation Be Cached?
|
||||
|
||||
Yes! For repeated questions (common in scripts), cache query results:
|
||||
- `Dictionary<string, List<string>>` cache in memory
|
||||
- Or persistent cache (Redis, file)
|
||||
- Not implemented (low priority)
|
||||
|
||||
### Single Responsibility Violation?
|
||||
|
||||
`OpenQueryApp` does:
|
||||
- Query generation
|
||||
- Pipeline orchestration
|
||||
- Answer streaming
|
||||
|
||||
That's 3 responsibilities, but they're tightly coupled to the "query → answer" workflow. Separating them would add complexity without clear benefit. Acceptable as "application coordinator".
|
||||
|
||||
## Extension Points
|
||||
|
||||
### Adding New Model for Query Generation
|
||||
|
||||
Currently uses same `_model` for queries and answer. To use different models:
|
||||
|
||||
1. Add `queryGenerationModel` parameter to constructor
|
||||
2. Use it for query gen: `new ChatCompletionRequest(queryGenerationModel, queryGenMessages)`
|
||||
3. Keep `_model` for final answer
|
||||
|
||||
Or make it configurable via environment variable: `OPENROUTER_QUERY_MODEL`
|
||||
|
||||
### Post-Processing Answer
|
||||
|
||||
Opportunity to add:
|
||||
- Source citation formatting (footnotes, clickable links)
|
||||
- Answer summarization
|
||||
- Export to Markdown/JSON
|
||||
- Text-to-speech
|
||||
|
||||
Add after streaming loop, before final newline.
|
||||
|
||||
### Progress UI Enhancement
|
||||
|
||||
Current `StatusReporter` is basic. Could add:
|
||||
- Progress bar with percentage
|
||||
- ETA calculation
|
||||
- Colors (ANSI) for different message types
|
||||
- Logging to file
|
||||
- Web dashboard
|
||||
|
||||
Would require extending `StatusReporter` or replacing it.
|
||||
|
||||
## Testing Considerations
|
||||
|
||||
**Challenges**:
|
||||
- `RunAsync` is cohesive (hard to unit test in isolation)
|
||||
- Depends on many services (need mocks)
|
||||
- Asynchronous and streaming
|
||||
|
||||
**Recommended Approach**:
|
||||
1. Extract interfaces:
|
||||
- `ISearchTool` (wrapper around `SearchTool`)
|
||||
- `IOpenRouterClient` (wrapper around `OpenRouterClient`)
|
||||
2. Mock interfaces in tests
|
||||
3. Test query generation parsing separately
|
||||
4. Test progress callback counting
|
||||
5. Test final answer prompt construction
|
||||
|
||||
**Integration Tests**:
|
||||
- End-to-end with real/mocked APIs
|
||||
- Automated tests with test SearxNG/OpenRouter instances
|
||||
|
||||
## Related Components
|
||||
|
||||
- **[SearchTool](search-tool.md)** - pipeline executed by `OpenQueryApp`
|
||||
- **[Program.cs](../Program.md)** - creates `OpenQueryApp`
|
||||
- **[StatusReporter](../services/StatusReporter.md)** - progress UI used by `OpenQueryApp`
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [SearchTool](search-tool.md) - See the pipeline in detail
|
||||
- [Services](../services/overview.md) - Understand each service
|
||||
- [CLI Reference](../../api/cli.md) - How users invoke this
|
||||
603
docs/components/overview.md
Normal file
603
docs/components/overview.md
Normal file
@@ -0,0 +1,603 @@
|
||||
# Components Overview
|
||||
|
||||
Detailed documentation for each major component in the OpenQuery system.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Component Hierarchy](#component-hierarchy)
|
||||
2. [Core Components](#core-components)
|
||||
3. [Services](#services)
|
||||
4. [Data Models](#data-models)
|
||||
5. [Component Interactions](#component-interactions)
|
||||
|
||||
## Component Hierarchy
|
||||
|
||||
```
|
||||
OpenQuery/
|
||||
├── Program.cs [Entry Point, CLI]
|
||||
├── OpenQuery.cs [OpenQueryApp - Orchestrator]
|
||||
├── Tools/
|
||||
│ └── SearchTool.cs [Pipeline Orchestration]
|
||||
├── Services/
|
||||
│ ├── OpenRouterClient.cs [LLM & Embedding API]
|
||||
│ ├── SearxngClient.cs [Search API]
|
||||
│ ├── EmbeddingService.cs [Embedding Generation + Math]
|
||||
│ ├── ChunkingService.cs [Text Splitting]
|
||||
│ ├── ArticleService.cs [Content Extraction]
|
||||
│ ├── RateLimiter.cs [Concurrency Control]
|
||||
│ └── StatusReporter.cs [Progress Display]
|
||||
├── Models/
|
||||
│ ├── OpenQueryOptions.cs [CLI Options Record]
|
||||
│ ├── Chunk.cs [Content + Metadata]
|
||||
│ ├── ParallelOptions.cs [Concurrency Settings]
|
||||
│ ├── OpenRouter.cs [API DTOs]
|
||||
│ ├── Searxng.cs [Search Result DTOs]
|
||||
│ └── JsonContexts.cs [JSON Context]
|
||||
└── ConfigManager.cs [Configuration Persistence]
|
||||
```
|
||||
|
||||
## Core Components
|
||||
|
||||
### 1. Program.cs
|
||||
|
||||
**Type**: Console Application Entry Point
|
||||
**Responsibilities**: CLI parsing, dependency wiring, error handling
|
||||
|
||||
**Key Elements**:
|
||||
- `RootCommand` from System.CommandLine
|
||||
- Options: `--chunks`, `--results`, `--queries`, `--short`, `--long`, `--verbose`
|
||||
- Subcommand: `configure` (with interactive mode)
|
||||
- Configuration loading via `ConfigManager.Load()`
|
||||
- Environment variable resolution
|
||||
- Service instantiation and coordination
|
||||
- Top-level try-catch for error reporting
|
||||
|
||||
**Code Flow**:
|
||||
1. Load config file
|
||||
2. Define CLI options and commands
|
||||
3. Set handler for root command
|
||||
4. Handler: resolve API key/model → instantiate services → call `OpenQueryApp.RunAsync()`
|
||||
5. Set handler for configure command (writes config file)
|
||||
6. Invoke command parser: `await rootCommand.InvokeAsync(args)`
|
||||
|
||||
**Exit Codes**:
|
||||
- 0 = success
|
||||
- 1 = error
|
||||
|
||||
### 2. OpenQueryApp (OpenQuery.cs)
|
||||
|
||||
**Type**: Main Application Class
|
||||
**Responsibilities**: Workflow orchestration, query generation, answer streaming
|
||||
|
||||
**Constructor Parameters**:
|
||||
- `OpenRouterClient client` - for query gen and final answer
|
||||
- `SearchTool searchTool` - for search-retrieve-rank pipeline
|
||||
- `string model` - LLM model identifier
|
||||
|
||||
**Main Method**: `RunAsync(OpenQueryOptions options)`
|
||||
|
||||
**Workflow Steps**:
|
||||
1. Create `StatusReporter` (for progress UI)
|
||||
2. **Optional Query Generation** (if `options.Queries > 1`):
|
||||
- Create system message instructing JSON array output
|
||||
- Create user message with `options.Question`
|
||||
- Call `client.CompleteAsync()` with query gen model
|
||||
- Parse JSON response; fall back to original question on failure
|
||||
- Result: `List<string> queries` (1 or many)
|
||||
3. **Execute Search Pipeline**:
|
||||
- Call `_searchTool.ExecuteAsync()` with queries, options
|
||||
- Receive `string context` (formatted context with source citations)
|
||||
- Progress reported via callback to `StatusReporter`
|
||||
4. **Generate Final Answer**:
|
||||
- Build system prompt (append "short" or "long" modifier)
|
||||
- Create user message with `Context:\n{context}\n\nQuestion: {options.Question}`
|
||||
- Stream answer via `client.StreamAsync()`
|
||||
- Write each `chunk.TextDelta` to Console as it arrives
|
||||
- Stop spinner on first chunk, continue streaming
|
||||
5. Dispose reporter
|
||||
|
||||
**Error Handling**:
|
||||
- Exceptions propagate to `Program.cs` top-level handler
|
||||
- `HttpRequestException` vs generic `Exception`
|
||||
|
||||
**Note**: Query generation uses the same model as final answer; could be separated for cost/performance.
|
||||
|
||||
### 3. SearchTool (Tools/SearchTool.cs)
|
||||
|
||||
**Type**: Pipeline Orchestrator
|
||||
**Responsibilities**: Execute 4-phase search-retrieve-rank-return workflow
|
||||
|
||||
**Constructor Parameters**:
|
||||
- `SearxngClient searxngClient`
|
||||
- `EmbeddingService embeddingService`
|
||||
|
||||
**Main Method**: `ExecuteAsync(originalQuery, generatedQueries, maxResults, topChunksLimit, onProgress, verbose)`
|
||||
|
||||
**Returns**: `Task<string>` - formatted context string with source citations
|
||||
|
||||
**Pipeline Phases**:
|
||||
|
||||
#### Phase 1: ExecuteParallelSearchesAsync
|
||||
- Parallelize `searxngClient.SearchAsync(query, maxResults)` for each query
|
||||
- Collect all results in `ConcurrentBag<SearxngResult>`
|
||||
- Deduplicate by `DistinctBy(r => r.Url)`
|
||||
|
||||
**Output**: `List<SearxngResult>` (aggregated, unique)
|
||||
|
||||
#### Phase 2: ExecuteParallelArticleFetchingAsync
|
||||
- Semaphore: `MaxConcurrentArticleFetches` (default 10)
|
||||
- For each `SearxngResult`: fetch URL via `ArticleService.FetchArticleAsync()`
|
||||
- Extract article text, title
|
||||
- Chunk via `ChunkingService.ChunkText(article.TextContent)`
|
||||
- Add each chunk as new `Chunk(content, url, title)`
|
||||
|
||||
**Output**: `List<Chunk>` (potentially 50-100 chunks)
|
||||
|
||||
#### Phase 3: ExecuteParallelEmbeddingsAsync
|
||||
- Start two parallel tasks:
|
||||
1. Query embedding: `embeddingService.GetEmbeddingAsync(originalQuery)`
|
||||
2. Chunk embeddings: `embeddingService.GetEmbeddingsWithRateLimitAsync(chunkTexts, onProgress)`
|
||||
- `Parallel.ForEachAsync` with `MaxConcurrentEmbeddingRequests` (default 4)
|
||||
- Batch size: 300 chunks per embedding API call
|
||||
- Filter chunks with empty embeddings (failed batches)
|
||||
|
||||
**Output**: `(float[] queryEmbedding, float[][] chunkEmbeddings)`
|
||||
|
||||
#### Phase 4: RankAndSelectTopChunks
|
||||
- Calculate cosine similarity for each chunk vs query
|
||||
- Assign `chunk.Score`
|
||||
- Order by descending score
|
||||
- Take `topChunksLimit` (from `--chunks` option)
|
||||
- Return `List<Chunk>` (top N)
|
||||
|
||||
**Formatting**:
|
||||
```csharp
|
||||
string context = string.Join("\n\n", topChunks.Select((c, i) =>
|
||||
$"[Source {i+1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
|
||||
```
|
||||
|
||||
**Progress Callbacks**: Invoked at each major step for UI feedback
|
||||
|
||||
## Services
|
||||
|
||||
### OpenRouterClient
|
||||
|
||||
**Purpose**: HTTP client for OpenRouter API (chat completions + embeddings)
|
||||
|
||||
**Base URL**: `https://openrouter.ai/api/v1`
|
||||
|
||||
**Authentication**: `Authorization: Bearer {apiKey}`
|
||||
|
||||
**Methods**:
|
||||
|
||||
#### `StreamAsync(ChatCompletionRequest request, CancellationToken)`
|
||||
- Sets `request.Stream = true`
|
||||
- POST to `/chat/completions`
|
||||
- Reads SSE stream line-by-line
|
||||
- Parses `data: {json}` chunks
|
||||
- Yields `StreamChunk` (text delta or tool call)
|
||||
- Supports cancellation
|
||||
|
||||
#### `CompleteAsync(ChatCompletionRequest request)`
|
||||
- Sets `request.Stream = false`
|
||||
- POST to `/chat/completions`
|
||||
- Deserializes full response
|
||||
- Returns `ChatCompletionResponse`
|
||||
|
||||
#### `EmbedAsync(string model, List<string> inputs)`
|
||||
- POST to `/embeddings`
|
||||
- Returns `float[][]` (ordered by input index)
|
||||
|
||||
**Error Handling**: `EnsureSuccessStatusCode()` throws `HttpRequestException` on failure
|
||||
|
||||
**Design**: Thin wrapper; no retry logic (delegated to EmbeddingService)
|
||||
|
||||
### SearxngClient
|
||||
|
||||
**Purpose**: HTTP client for SearxNG metasearch
|
||||
|
||||
**Base URL**: Configurable (default `http://localhost:8002`)
|
||||
|
||||
**Methods**:
|
||||
|
||||
#### `SearchAsync(string query, int limit = 10)`
|
||||
- GET `{baseUrl}/search?q={query}&format=json`
|
||||
- Deserializes to `SearxngRoot`
|
||||
- Returns `Results.Take(limit).ToList()`
|
||||
- On failure: returns empty `List<SearxngResult>` (no exception)
|
||||
|
||||
**Design**: Very simple; failures are tolerated (OpenQuery continues with other queries)
|
||||
|
||||
### EmbeddingService
|
||||
|
||||
**Purpose**: Batch embedding generation with rate limiting, parallelization, and retries
|
||||
|
||||
**Configuration** (from `ParallelProcessingOptions`):
|
||||
- `MaxConcurrentEmbeddingRequests` = 4
|
||||
- `EmbeddingBatchSize` = 300
|
||||
|
||||
**Default Embedding Model**: `openai/text-embedding-3-small`
|
||||
|
||||
**Methods**:
|
||||
|
||||
#### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
|
||||
- Splits `texts` into batches of `EmbeddingBatchSize`
|
||||
- Parallelizes batches with `Parallel.ForEachAsync` + `MaxConcurrentEmbeddingRequests`
|
||||
- Each batch: rate-limited + retry-wrapped `client.EmbedAsync(model, batch)`
|
||||
- Collects results in order (by batch index)
|
||||
- Returns `float[][]` (same order as input texts)
|
||||
- Failed batches return empty `float[]` for each text
|
||||
|
||||
#### `GetEmbeddingAsync(string text, CancellationToken)`
|
||||
- Wraps single-text call in rate limiter + retry
|
||||
- Returns `float[]`
|
||||
|
||||
#### `CosineSimilarity(float[] v1, float[] v2)`
|
||||
- Static method using `TensorPrimitives.CosineSimilarity`
|
||||
- Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
|
||||
|
||||
**Retry Policy** (Polly):
|
||||
- Max 3 attempts
|
||||
- 1s base delay, exponential backoff
|
||||
- Only `HttpRequestException`
|
||||
|
||||
**Rate Limiting**: `RateLimiter` semaphore with `MaxConcurrentEmbeddingRequests`
|
||||
|
||||
**Design Notes**:
|
||||
- Two similar methods (`GetEmbeddingsAsync` and `GetEmbeddingsWithRateLimitAsync`) - could be consolidated
|
||||
- Uses Polly for resilience (good pattern)
|
||||
- Concurrency control prevents overwhelming OpenRouter
|
||||
|
||||
### ChunkingService
|
||||
|
||||
**Purpose**: Split long text into manageable pieces
|
||||
|
||||
**Static Class** (no dependencies, pure function)
|
||||
|
||||
**Algorithm** (in `ChunkText(string text)`):
|
||||
- Constant `MAX_CHUNK_SIZE = 500`
|
||||
- While remaining text:
|
||||
- Take up to 500 chars
|
||||
- If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
|
||||
- Trim and add non-empty chunk
|
||||
- Advance start position
|
||||
|
||||
**Rationale**: 500 chars is a sweet spot for embeddings - long enough for context, short enough for semantic coherence.
|
||||
|
||||
**Edge Cases**: Handles text shorter than 500 chars, empty text, text with no natural breaks.
|
||||
|
||||
### ArticleService
|
||||
|
||||
**Purpose**: Extract clean article content from URLs
|
||||
|
||||
**Method**: `FetchArticleAsync(string url)`
|
||||
|
||||
**Implementation**: Delegates to `SmartReader.ParseArticleAsync(url)`
|
||||
|
||||
**Returns**: `Article` object (from SmartReader)
|
||||
- `Title` (string)
|
||||
- `TextContent` (string) - cleaned article body
|
||||
- `IsReadable` (bool) - quality indicator
|
||||
- Other metadata (author, date, etc.)
|
||||
|
||||
**Error Handling**: Exceptions propagate (handled by `SearchTool`)
|
||||
|
||||
**Design**: Thin wrapper around third-party library. Could be extended to add caching, custom extraction rules, etc.
|
||||
|
||||
### RateLimiter
|
||||
|
||||
**Purpose**: Limit concurrent operations via semaphore
|
||||
|
||||
**Interface**:
|
||||
```csharp
|
||||
public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken);
|
||||
public async Task ExecuteAsync(Func<Task> action, CancellationToken);
|
||||
```
|
||||
|
||||
**Implementation**: `SemaphoreSlim` with `WaitAsync` and `Release`
|
||||
|
||||
**Disposal**: `IAsyncDisposable` (awaits semaphore disposal)
|
||||
|
||||
**Usage**: Wrap API calls that need concurrency control
|
||||
```csharp
|
||||
var result = await _rateLimiter.ExecuteAsync(async () =>
|
||||
await _client.EmbedAsync(model, batch), cancellationToken);
|
||||
```
|
||||
|
||||
**Design**: Simple, reusable. Could be replaced with `Polly.RateLimiting` policy but this is lightweight.
|
||||
|
||||
### StatusReporter
|
||||
|
||||
**Purpose**: Real-time progress UI with spinner and verbose modes
|
||||
|
||||
**Architecture**:
|
||||
- Producer: UpdateStatus(text) → writes to `Channel<string>`
|
||||
- Consumer: Background task `ProcessStatusUpdatesAsync()` reads from channel
|
||||
- Spinner: Separate task animates Braille characters every 100ms
|
||||
|
||||
**Modes**:
|
||||
|
||||
**Verbose Mode** (`_verbose = true`):
|
||||
- All progress messages written as `Console.WriteLine()`
|
||||
- No spinner
|
||||
- Full audit trail
|
||||
|
||||
**Compact Mode** (default):
|
||||
- Status line with spinner (overwrites same line)
|
||||
- Only latest status visible
|
||||
- Example: `⠋ Fetching articles 3/10...`
|
||||
|
||||
**Key Methods**:
|
||||
- `UpdateStatus(message)` - fire-and-forget, non-blocking
|
||||
- `WriteLine(text)` - stops spinner temporarily, writes full line
|
||||
- `StartSpinner()` / `StopSpinner()` - manual control
|
||||
- `ClearStatus()` - ANSI escape `\r\x1b[K` to clear line
|
||||
- `Dispose()` - completes channel, waits for background tasks
|
||||
|
||||
**Spinner Chars**: `['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']` (Braille patterns, smooth animation)
|
||||
|
||||
**ANSI Codes**: `\r` (carriage return), `\x1b[K` (erase to end of line)
|
||||
|
||||
**Thread Safety**: Channel is thread-safe; multiple components can write concurrently without locks
|
||||
|
||||
**Design**: Well-encapsulated; could be reused in other CLI projects.
|
||||
|
||||
### ConfigManager
|
||||
|
||||
**Purpose**: Load/save configuration from XDG-compliant location
|
||||
|
||||
**Config Path**:
|
||||
- `Environment.SpecialFolder.UserProfile` → `~/.config/openquery/config`
|
||||
|
||||
**Schema** (`AppConfig`):
|
||||
```csharp
|
||||
public class AppConfig
|
||||
{
|
||||
public string ApiKey { get; set; } = "";
|
||||
public string Model { get; set; } = "qwen/qwen3.5-flash-02-23";
|
||||
public int DefaultQueries { get; set; } = 3;
|
||||
public int DefaultChunks { get; set; } = 3;
|
||||
public int DefaultResults { get; set; } = 5;
|
||||
}
|
||||
```
|
||||
|
||||
**Format**: Simple `key=value` (no INI parser, manual line split)
|
||||
|
||||
**Methods**:
|
||||
- `Load()` → reads file if exists, returns `AppConfig` (with defaults)
|
||||
- `Save(AppConfig)` → writes all 5 keys, overwrites existing
|
||||
|
||||
**Design**:
|
||||
- Static class (no instances)
|
||||
- Creates directory if missing
|
||||
- No validation (writes whatever values given)
|
||||
- Could be improved with JSON format (but keep simple)
|
||||
|
||||
## Data Models
|
||||
|
||||
### OpenQueryOptions
|
||||
|
||||
**Location**: `Models/OpenQueryOptions.cs`
|
||||
|
||||
**Type**: `record`
|
||||
|
||||
**Purpose**: Immutable options object passed through workflow
|
||||
|
||||
**Properties**:
|
||||
- `int Chunks` - top N chunks for context
|
||||
- `int Results` - search results per query
|
||||
- `int Queries` - number of expanded queries to generate
|
||||
- `bool Short` - concise answer flag
|
||||
- `bool Long` - detailed answer flag
|
||||
- `bool Verbose` - verbose logging flag
|
||||
- `string Question` - original user question
|
||||
|
||||
**Created**: In `Program.cs` from CLI options + config defaults
|
||||
|
||||
**Used By**: `OpenQueryApp.RunAsync()`
|
||||
|
||||
### Chunk
|
||||
|
||||
**Location**: `Models/Chunk.cs`
|
||||
|
||||
**Type**: `record`
|
||||
|
||||
**Purpose**: Content chunk with metadata and embedding
|
||||
|
||||
**Properties**:
|
||||
- `string Content` - extracted text (~500 chars)
|
||||
- `string SourceUrl` - article URL
|
||||
- `string? Title` - article title (nullable)
|
||||
- `float[]? Embedding` - vector embedding (populated by EmbeddingService)
|
||||
- `float Score` - relevance score (populated during ranking)
|
||||
|
||||
**Lifecycle**:
|
||||
1. Instantiated in `SearchTool.ExecuteParallelArticleFetchingAsync` with content, url, title
|
||||
2. `Embedding` set in `ExecuteParallelEmbeddingsAsync` after batch processing
|
||||
3. `Score` set in `RankAndSelectTopChunks` after cosine similarity
|
||||
4. Serialized into context string for final answer
|
||||
|
||||
**Equality**: Records provide value equality (based on all properties)
|
||||
|
||||
### ParallelProcessingOptions
|
||||
|
||||
**Location**: `Models/ParallelOptions.cs`
|
||||
|
||||
**Type**: `class` (mutable)
|
||||
|
||||
**Purpose**: Concurrency settings for parallel operations
|
||||
|
||||
**Properties** (with defaults):
|
||||
- `MaxConcurrentArticleFetches` = 10
|
||||
- `MaxConcurrentEmbeddingRequests` = 4
|
||||
- `EmbeddingBatchSize` = 300
|
||||
|
||||
**Used By**: `EmbeddingService` (for embeddings), `SearchTool` (for article fetching)
|
||||
|
||||
**Currently**: Hardcoded in `SearchTool` constructor; could be made configurable
|
||||
|
||||
### OpenRouter Models (Models/OpenRouter.cs)
|
||||
|
||||
**Purpose**: DTOs for OpenRouter API (JSON serializable)
|
||||
|
||||
**Chat Completion**:
|
||||
- `ChatCompletionRequest` (model, messages, tools, stream)
|
||||
- `ChatCompletionResponse` (choices[], usage[])
|
||||
- `Message` (role, content, tool_calls, tool_call_id)
|
||||
- `ToolDefinition`, `ToolFunction`, `ToolCall`, `FunctionCall`
|
||||
- `Choice`, `Usage`
|
||||
|
||||
**Embedding**:
|
||||
- `EmbeddingRequest` (model, input[])
|
||||
- `EmbeddingResponse` (data[], usage)
|
||||
- `EmbeddingData` (embedding[], index)
|
||||
|
||||
**Streaming**:
|
||||
- `StreamChunk` (TextDelta, Tool)
|
||||
- `ChatCompletionChunk`, `ChunkChoice`, `ChunkDelta`
|
||||
|
||||
**JSON Properties**: Uses `[JsonPropertyName]` to match API
|
||||
|
||||
**Serialization**: System.Text.Json with source generation (AppJsonContext)
|
||||
|
||||
### Searxng Models (Models/Searxng.cs)
|
||||
|
||||
**Purpose**: DTOs for SearxNG search results
|
||||
|
||||
**Records**:
|
||||
- `SearxngRoot` with `List<SearxngResult> Results`
|
||||
- `SearxngResult` with `Title`, `Url`, `Content` (snippet)
|
||||
|
||||
**Usage**: Deserialized from SearxNG's JSON response
|
||||
|
||||
### JsonContexts
|
||||
|
||||
**Location**: `Models/JsonContexts.cs`
|
||||
|
||||
**Purpose**: Source-generated JSON serializer context for AOT compatibility
|
||||
|
||||
**Pattern**:
|
||||
```csharp
|
||||
[JsonSerializable(typeof(ChatCompletionRequest))]
|
||||
[JsonSerializable(typeof(ChatCompletionResponse))]
|
||||
... etc ...
|
||||
internal partial class AppJsonContext : JsonSerializerContext
|
||||
{
|
||||
}
|
||||
```
|
||||
|
||||
**Generated**: Partial class compiled by source generator
|
||||
|
||||
**Used By**: All `JsonSerializer.Serialize/Deserialize` calls with `AppJsonContext.Default.{Type}`
|
||||
|
||||
**Benefits**:
|
||||
- AOT-compatible (no reflection)
|
||||
- Faster serialization (compiled delegates)
|
||||
- Smaller binary (trimming-safe)
|
||||
|
||||
## Component Interactions
|
||||
|
||||
### Dependencies Graph
|
||||
|
||||
```
|
||||
Program.cs
|
||||
├── ConfigManager (load/save)
|
||||
├── OpenRouterClient ──┐
|
||||
├── SearxngClient ─────┤
|
||||
├── EmbeddingService ──┤
|
||||
└── SearchTool ────────┤
|
||||
│
|
||||
OpenQueryApp ◄──────────┘
|
||||
│
|
||||
├── OpenRouterClient (query gen + answer streaming)
|
||||
├── SearchTool (pipeline)
|
||||
│ ├── SearxngClient (searches)
|
||||
│ ├── ArticleService (fetch)
|
||||
│ ├── ChunkingService (split)
|
||||
│ ├── EmbeddingService (embeddings)
|
||||
│ ├── RateLimiter (concurrency)
|
||||
│ └── StatusReporter (progress via callback)
|
||||
└── StatusReporter (UI)
|
||||
```
|
||||
|
||||
### Data Flow Between Components
|
||||
|
||||
```
|
||||
OpenQueryOptions
|
||||
↓
|
||||
OpenQueryApp
|
||||
├─ Query Generation
|
||||
│ └─ OpenRouterClient.CompleteAsync()
|
||||
│ → List<string> generatedQueries
|
||||
│
|
||||
├─ Search Pipeline
|
||||
│ └─ SearchTool.ExecuteAsync(originalQuery, generatedQueries, ...)
|
||||
│ ↓
|
||||
│ Phase 1: SearxngClient.SearchAsync(query) × N
|
||||
│ → ConcurrentBag<SearxngResult>
|
||||
│ → List<SearxngResult> (unique)
|
||||
│ ↓
|
||||
│ Phase 2: ArticleService.FetchArticleAsync(url) × M
|
||||
│ → ChunkingService.ChunkText(article.TextContent)
|
||||
│ → ConcurrentBag<Chunk> (content, url, title)
|
||||
│ ↓
|
||||
│ Phase 3: EmbeddingService.GetEmbeddingsAsync(chunkContents)
|
||||
│ → (queryEmbedding, chunkEmbeddings)
|
||||
│ ↓
|
||||
│ Phase 4: CosineSimilarity + Rank
|
||||
│ → List<Chunk> topChunks (with Score, Embedding set)
|
||||
│ ↓
|
||||
│ Format: context string with [Source N: Title](Url)
|
||||
│ → return context string
|
||||
│
|
||||
└─ Final Answer
|
||||
└─ OpenRouterClient.StreamAsync(prompt with context)
|
||||
→ stream deltas to Console
|
||||
```
|
||||
|
||||
### Interface Contracts
|
||||
|
||||
**SearchTool → Progress**:
|
||||
```csharp
|
||||
// Invoked as: onProgress?.Invoke("[Fetching article 1/10: example.com]")
|
||||
Action<string>? onProgress
|
||||
```
|
||||
|
||||
**StatusReporter ← Progress**:
|
||||
```csharp
|
||||
// Handler in OpenQueryApp:
|
||||
(progress) => {
|
||||
if (options.Verbose) reporter.WriteLine(progress);
|
||||
else reporter.UpdateStatus(parsedShorterMessage);
|
||||
}
|
||||
```
|
||||
|
||||
**SearchTool → ArticleService**:
|
||||
```csharp
|
||||
Article article = await ArticleService.FetchArticleAsync(url);
|
||||
```
|
||||
|
||||
**SearchTool → EmbeddingService**:
|
||||
```csharp
|
||||
(float[] queryEmbedding, float[][] chunkEmbeddings) = await ExecuteParallelEmbeddingsAsync(...);
|
||||
// Also: embeddingService.GetEmbeddingAsync(text), GetEmbeddingsWithRateLimitAsync(...)
|
||||
```
|
||||
|
||||
**SearchTool → ChunkingService**:
|
||||
```csharp
|
||||
List<string> chunks = ChunkingService.ChunkText(article.TextContent);
|
||||
```
|
||||
|
||||
**SearchTool → RateLimiter**:
|
||||
```csharp
|
||||
await _rateLimiter.ExecuteAsync(async () => await _client.EmbedAsync(...), ct);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [OpenQueryApp](openquery-app.md) - Main orchestrator details
|
||||
- [SearchTool](search-tool.md) - Pipeline implementation
|
||||
- [Services](services.md) - All service classes documented
|
||||
- [Models](models.md) - Complete data model reference
|
||||
555
docs/components/search-tool.md
Normal file
555
docs/components/search-tool.md
Normal file
@@ -0,0 +1,555 @@
|
||||
# SearchTool Component
|
||||
|
||||
Deep dive into `SearchTool` - the core pipeline orchestrator that implements the 4-phase search-retrieve-rank workflow.
|
||||
|
||||
## Overview
|
||||
|
||||
`SearchTool` is the workhorse of OpenQuery. It Takes search queries, fetches articles, generates embeddings, ranks by relevance, and returns formatted context for the final AI answer.
|
||||
|
||||
## Location
|
||||
`Tools/SearchTool.cs`
|
||||
|
||||
## Class Definition
|
||||
|
||||
```csharp
|
||||
public class SearchTool
|
||||
{
|
||||
private readonly SearxngClient _searxngClient;
|
||||
private readonly EmbeddingService _embeddingService;
|
||||
private readonly ParallelProcessingOptions _options;
|
||||
|
||||
public static string Name => "search";
|
||||
public static string Description => "Search the web for information on a topic";
|
||||
|
||||
public SearchTool(
|
||||
SearxngClient searxngClient,
|
||||
EmbeddingService embeddingService);
|
||||
|
||||
public Task<string> ExecuteAsync(
|
||||
string originalQuery,
|
||||
List<string> generatedQueries,
|
||||
int maxResults,
|
||||
int topChunksLimit,
|
||||
Action<string>? onProgress = null,
|
||||
bool verbose = true);
|
||||
}
|
||||
```
|
||||
|
||||
**Dependencies**:
|
||||
- `SearxngClient` - for web searches
|
||||
- `EmbeddingService` - for vector generation
|
||||
- `ParallelProcessingOptions` - concurrency settings (hardcoded new instance)
|
||||
|
||||
**Static Properties**:
|
||||
- `Name` - tool identifier (currently "search")
|
||||
- `Description` - tool description
|
||||
|
||||
## ExecuteAsync Method
|
||||
|
||||
**Signature**:
|
||||
```csharp
|
||||
public async Task<string> ExecuteAsync(
|
||||
string originalQuery, // User's original question
|
||||
List<string> generatedQueries, // Expanded search queries
|
||||
int maxResults, // Results per query
|
||||
int topChunksLimit, // Top N chunks to return
|
||||
Action<string>? onProgress, // Progress callback
|
||||
bool verbose) // Verbose mode flag
|
||||
```
|
||||
|
||||
**Returns**: `Task<string>` - formatted context with source citations
|
||||
|
||||
**Contract**:
|
||||
- Never returns `null` (returns "No search results found." on zero results)
|
||||
- Progress callback may be invoked frequently (many phases)
|
||||
- `verbose` passed to sub-components for their own logging
|
||||
|
||||
## The 4-Phase Pipeline
|
||||
|
||||
```
|
||||
ExecuteAsync()
|
||||
│
|
||||
├─ Phase 1: ExecuteParallelSearchesAsync
|
||||
│ Input: generatedQueries × maxResults
|
||||
│ Output: List<SearxngResult> (deduplicated)
|
||||
│
|
||||
├─ Phase 2: ExecuteParallelArticleFetchingAsync
|
||||
│ Input: List<SearxngResult>
|
||||
│ Output: List<Chunk> (with content, url, title)
|
||||
│
|
||||
├─ Phase 3: ExecuteParallelEmbeddingsAsync
|
||||
│ Input: originalQuery + List<Chunk>
|
||||
│ Output: (queryEmbedding, chunkEmbeddings)
|
||||
│ (also sets Chunk.Embedding for valid chunks)
|
||||
│
|
||||
├─ Phase 4: RankAndSelectTopChunks
|
||||
│ Input: List<Chunk> + queryEmbedding + chunkEmbeddings
|
||||
│ Output: List<Chunk> topChunks (with Score set)
|
||||
│
|
||||
└─ Format Context → return string
|
||||
```
|
||||
|
||||
### Phase 1: ExecuteParallelSearchesAsync
|
||||
|
||||
**Purpose**: Execute all search queries in parallel, collect and deduplicate results.
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
var allResults = new ConcurrentBag<SearxngResult>();
|
||||
|
||||
var searchTasks = generatedQueries.Select(async query =>
|
||||
{
|
||||
onProgress?.Invoke($"[Searching web for '{query}'...]");
|
||||
try
|
||||
{
|
||||
var results = await _searsult in results)
|
||||
{
|
||||
allResults.Add(result);
|
||||
}
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
if (verbose)
|
||||
Console.WriteLine($"Warning: Search failed for query '{query}': {ex.Message}");
|
||||
}
|
||||
});
|
||||
|
||||
await Task.WhenAll(searchTasks);
|
||||
|
||||
var uniqueResults = allResults.DistinctBy(r => r.Url).ToList();
|
||||
return uniqueResults;
|
||||
```
|
||||
|
||||
**Details**:
|
||||
- `ConcurrentBag<SearxngResult>` collects results thread-safely
|
||||
- `Task.WhenAll` - unbounded parallelism (parallel to `generatedQueries.Count`)
|
||||
- Each task: calls `_searxngClient.SearchAsync(query, maxResults)`
|
||||
- Errors caught and logged (verbose only); other queries continue
|
||||
- `DistinctBy(r => r.Url)` removes duplicates
|
||||
|
||||
**Return**: `List<SearxngResult>` (unique URLs only)
|
||||
|
||||
**Progress**: `[Searching web for '{query}'...]`
|
||||
|
||||
**Potential Issues**:
|
||||
- Could overwhelm local SearxNG if `generatedQueries` is large (100+)
|
||||
- SearxNG itself may have its own rate limiting
|
||||
|
||||
**Future Enhancement**:
|
||||
- Add semaphore to limit search concurrency
|
||||
- Add timeout per search task
|
||||
- Cache search results (same query across runs)
|
||||
|
||||
### Phase 2: ExecuteParallelArticleFetchingAsync
|
||||
|
||||
**Purpose**: Fetch each search result URL, extract article content, split into chunks.
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
var chunks = new ConcurrentBag<Chunk>();
|
||||
var completedFetches = 0;
|
||||
var totalFetches = searchResults.Count;
|
||||
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10
|
||||
|
||||
var fetchTasks = searchResults.Select(async result =>
|
||||
{
|
||||
await semaphore.WaitAsync();
|
||||
try
|
||||
{
|
||||
var current = Interlocked.Increment(ref completedFetches);
|
||||
var uri = new Uri(result.Url);
|
||||
var domain = uri.Host;
|
||||
onProgress?.Invoke($"[Fetching article {current}/{totalFetches}: {domain}]");
|
||||
|
||||
try
|
||||
{
|
||||
var article = await ArticleService.FetchArticleAsync(result.Url);
|
||||
if (!article.IsReadable || string.IsNullOrEmpty(article.TextContent))
|
||||
return;
|
||||
|
||||
var textChunks = ChunkingService.ChunkText(article.TextContent);
|
||||
foreach (var chunkText in textChunks)
|
||||
{
|
||||
chunks.Add(new Chunk(chunkText, result.Url, article.Title));
|
||||
}
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
if (verbose)
|
||||
Console.WriteLine($"Warning: Failed to fetch article {result.Url}: {ex.Message}");
|
||||
}
|
||||
}
|
||||
finally
|
||||
{
|
||||
semaphore.Release();
|
||||
}
|
||||
});
|
||||
|
||||
await Task.WhenAll(fetchTasks);
|
||||
return chunks.ToList();
|
||||
```
|
||||
|
||||
**Details**:
|
||||
- `SemaphoreSlim` limits concurrency to `MaxConcurrentArticleFetches` (10)
|
||||
- `Interlocked.Increment` for thread-safe progress counting
|
||||
- Progress: `[Fetching article X/Y: domain]` (extracts host from URL)
|
||||
- `ArticleService.FetchArticleAsync` uses SmartReader
|
||||
- Article must be `IsReadable` and have `TextContent`
|
||||
- `ChunkingService.ChunkText` splits into ~500-char pieces
|
||||
- Each chunk becomes a `Chunk(content, url, article.Title)`
|
||||
- Errors logged (verbose only); failed URLs yield no chunks
|
||||
|
||||
**Return**: `List<Chunk>` (potentially many per article)
|
||||
|
||||
**Chunk Count Estimate**:
|
||||
- 15 articles × average 3000 chars/article = 45,000 chars
|
||||
- With 500-char chunks ≈ 90 chunks
|
||||
- With natural breaks → maybe 70-80 chunks
|
||||
|
||||
**Potential Issues**:
|
||||
- Some sites block SmartReader (JS-heavy, paywalls)
|
||||
- Slow article fetches may cause long tail latency
|
||||
- Large articles create many chunks → memory + embedding cost
|
||||
|
||||
**Future Enhancements**:
|
||||
- Add per-URL timeout
|
||||
- Filter chunks by length threshold (skip tiny chunks)
|
||||
- Deduplicate chunks across articles (same content on different sites)
|
||||
- Cache article fetches by URL
|
||||
|
||||
### Phase 3: ExecuteParallelEmbeddingsAsync
|
||||
|
||||
**Purpose**: Generate embeddings for the original query and all chunks, with batching, rate limiting, and concurrency control.
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
onProgress?.Invoke($"[Generating embeddings for {chunks.Count} chunks and query...]");
|
||||
|
||||
// Start query embedding (single) and chunk embeddings (batch) concurrently
|
||||
var queryEmbeddingTask = _embeddingService.GetEmbeddingAsync(originalQuery);
|
||||
|
||||
var chunkTexts = chunks.Select(c => c.Embedding).ToList(); // WRONG in original code?
|
||||
// Actually: chunks.Select(c => c.Content).ToList();
|
||||
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
|
||||
chunkTexts, onProgress);
|
||||
|
||||
await Task.WhenAll(queryEmbeddingTask, chunkEmbeddingsTask);
|
||||
|
||||
var queryEmbedding = await queryEmbeddingTask;
|
||||
var chunkEmbeddings = await chunkEmbeddingsTask;
|
||||
|
||||
// Filter out chunks with empty embeddings
|
||||
var validChunks = new List<Chunk>();
|
||||
var validEmbeddings = new List<float[]>();
|
||||
|
||||
for (var i = 0; i < chunks.Count; i++)
|
||||
{
|
||||
if (chunkEmbeddings[i].Length > 0)
|
||||
{
|
||||
validChunks.Add(chunks[i]);
|
||||
validEmbeddings.Add(chunkEmbeddings[i]);
|
||||
}
|
||||
}
|
||||
|
||||
// Update chunks with embeddings
|
||||
for (var i = 0; i < validChunks.Count; i++)
|
||||
{
|
||||
validChunks[i].Embedding = validEmbeddings[i];
|
||||
}
|
||||
|
||||
return (queryEmbedding, validEmbeddings.ToArray());
|
||||
```
|
||||
|
||||
**Corrected Code** (matching actual source):
|
||||
```csharp
|
||||
var chunkTexts = chunks.Select(c => c.Content).ToList();
|
||||
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
|
||||
chunkTexts, onProgress);
|
||||
```
|
||||
|
||||
**Details**:
|
||||
- **Query embedding**: Single request for original question (one embedding)
|
||||
- **Chunk embeddings**: Batch processing of all chunk texts
|
||||
- Both run concurrently via `Task.WhenAll`
|
||||
- `_embeddingService.GetEmbeddingsWithRateLimitAsync` uses:
|
||||
- Batch size: 300 (default)
|
||||
- Max concurrent batches: 4 (default)
|
||||
- Polly retry (3 attempts, exponential backoff)
|
||||
- `RateLimiter` (semaphore) for API concurrency
|
||||
- Failed batches return empty `float[]` (length 0)
|
||||
- Filters out failed chunks (won't be ranked)
|
||||
- `validChunks[i].Embedding = validEmbeddings[i]` attaches embedding to chunk
|
||||
|
||||
**Return**: `(float[] queryEmbedding, float[][] chunkEmbeddings)` where:
|
||||
- `chunkEmbeddings` length = `validChunks.Count` (filtered)
|
||||
- Order matches `validChunks` order (since we filtered parallel arrays)
|
||||
|
||||
**Progress**: Interleaved from embedding service's own progress callbacks (batch X/Y)
|
||||
|
||||
**Potential Issues**:
|
||||
- `GetEmbeddingsWithRateLimitAsync` uses `results[batchIndex] = ...` which is not thread-safe without synchronization - **BUG**?
|
||||
- Actually `results` is an array, not a list, so indexing is thread-safe
|
||||
- But concurrent writes to different indices are safe
|
||||
- Filtering loop assumes `chunkEmbeddings` has same count as `chunks`; if embedding service returns fewer, might index out of range
|
||||
- Looking at `GetEmbeddingsWithRateLimitAsync`: returns `results.SelectMany(r => r).ToArray()` which should match input count (including empty arrays for failed batches)
|
||||
- So safe
|
||||
|
||||
**Memory Consideration**:
|
||||
- `chunkTexts` list holds all chunk strings (may be large, but still in memory)
|
||||
- `chunkEmbeddings` holds all float arrays (600KB for 100 chunks)
|
||||
- Total: modest (~few MB)
|
||||
|
||||
**Future Enhancements**:
|
||||
- Stream embeddings? (No benefit, need all for ranking)
|
||||
- Cache embeddings by content hash (cross-run)
|
||||
- Support different embedding model per query
|
||||
|
||||
### Phase 4: RankAndSelectTopChunks
|
||||
|
||||
**Purpose**: Score chunks by semantic relevance to query, sort, and select top N.
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
var chunksWithEmbeddings = chunks.Where(c => c.Embedding != null).ToList();
|
||||
|
||||
foreach (var chunk in chunksWithEmbeddings)
|
||||
{
|
||||
chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
|
||||
}
|
||||
|
||||
var topChunks = chunksWithEmbeddings
|
||||
.OrderByDescending(c => c.Score)
|
||||
.Take(topChunksLimit)
|
||||
.ToList();
|
||||
|
||||
return topChunks;
|
||||
```
|
||||
|
||||
**Details**:
|
||||
- Filters to chunks that have embeddings (successful phase 3)
|
||||
- For each: `Score = CosineSimilarity(queryEmbedding, chunkEmbedding)`
|
||||
- Uses `TensorPrimitives.CosineSimilarity` (SIMD-accelerated)
|
||||
- Returns float typically 0-1 (higher = more relevant)
|
||||
- `OrderByDescending` - highest scores first
|
||||
- `Take(topChunksLimit)` - select top N (from `--chunks` option)
|
||||
- Returns `List<Chunk>` (now with `Score` set)
|
||||
|
||||
**Return**: Top N chunks ready for context formatting
|
||||
|
||||
**Complexity**:
|
||||
- O(n) for scoring (where n = valid chunks, typically 50-100)
|
||||
- O(n log n) for sorting (fast for n=100)
|
||||
- Negligible CPU time
|
||||
|
||||
**Edge Cases**:
|
||||
- If `topChunksLimit` > `chunksWithEmbeddings.Count`, returns all (no padding)
|
||||
- If all embeddings failed, returns empty list
|
||||
- Should handle `topChunksLimit == 0` (returns empty)
|
||||
|
||||
### Context Formatting (After Phase 4)
|
||||
|
||||
**Location**: In `ExecuteAsync`, after ranking:
|
||||
|
||||
```csharp
|
||||
var context = string.Join("\n\n", topChunks.Select((c, i) =>
|
||||
$"[Source {i + 1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));
|
||||
|
||||
return context;
|
||||
```
|
||||
|
||||
**Format**:
|
||||
```
|
||||
[Source 1: Article Title](https://example.com/article)
|
||||
Chunk content text...
|
||||
|
||||
[Source 2: Another Title](https://example.com/another)
|
||||
Chunk content text...
|
||||
|
||||
[Source 3: Third Title](https://example.com/third)
|
||||
Chunk content text...
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- Each source numbered 1, 2, 3... (matches order of topChunks = descending relevance)
|
||||
- Title or "Unknown" if null
|
||||
- Title is markdown link to original URL
|
||||
- Chunk content as plain text (may contain its own formatting)
|
||||
- Double newline between sources
|
||||
|
||||
**Rationale**:
|
||||
- Markdown links allow copy-pasting to browsers
|
||||
- Numbers allow LLM to cite `[Source 1]` in answer
|
||||
- Original title helps user recognize source
|
||||
|
||||
**Potential Issues**:
|
||||
- LLM might misinterpret "Source 1" as literal citation required
|
||||
- If chunks contain markdown, may conflict (no escaping)
|
||||
- Some titles may have markdown special chars (unlikely but possible)
|
||||
|
||||
**Alternative**: Could use XML-style tags or more robust citation format.
|
||||
|
||||
## Error Handling & Edge Cases
|
||||
|
||||
### Empty Results Handling
|
||||
|
||||
At end of `ExecuteAsync`:
|
||||
```csharp
|
||||
if (searchResults.Count == 0)
|
||||
return "No search results found.";
|
||||
|
||||
if (chunks.Count == 0)
|
||||
return "Found search results but could not extract readable content.";
|
||||
```
|
||||
|
||||
These messages appear in final answer (LLM will respond to these contexts).
|
||||
|
||||
### Partial Failures
|
||||
|
||||
- Some search queries fail → proceed with others
|
||||
- Some articles fail to fetch → continue
|
||||
- Some embedding batches fail → those chunks filtered out
|
||||
- Ranking proceeds with whatever valid embeddings exist
|
||||
|
||||
### Verbose vs Compact Progress
|
||||
|
||||
`verbose` parameter affects what's passed to phases:
|
||||
- **Article fetching**: errors only shown if `verbose`
|
||||
- **Embeddings**: always shows batch progress via `onProgress` (from EmbeddingService)
|
||||
- **Searches**: no error suppression (warning always logged to Console, not through callback)
|
||||
|
||||
### Progress Callback Pattern
|
||||
|
||||
`onProgress` is invoked at major milestones:
|
||||
- Searching: `[Searching web for '{query}'...]`
|
||||
- Article fetch: `[Fetching article X/Y: domain]`
|
||||
- Embeddings: `[Generating embeddings: batch X/Y]`
|
||||
- Final: `[Found top X most relevant chunks overall. Generating answer...]`
|
||||
|
||||
Each phase may invoke many times (e.g., embedding batches). `StatusReporter` handles these appropriately.
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles):
|
||||
|
||||
| Phase | Time | Dominated By |
|
||||
|-------|------|--------------|
|
||||
| Searches | 3-8s | Network latency to SearxNG |
|
||||
| Article Fetching | 5-15s | Network + SmartReader CPU |
|
||||
| Embeddings | 2-4s | OpenRouter API latency (4 concurrent batches) |
|
||||
| Ranking | <0.1s | CPU (O(n log n) sort, n~100) |
|
||||
| **Total Pipeline** | **10-30s** | Articles + Searches |
|
||||
|
||||
### Concurrency Limits Effect
|
||||
|
||||
**Article Fetching** (`MaxConcurrentArticleFetches` = 10):
|
||||
- 15 articles → 2 waves (10 then 5)
|
||||
- If each takes 2s → ~4s total (vs 30s sequential)
|
||||
|
||||
**Embedding Batching** (`MaxConcurrentEmbeddingRequests` = 4, `EmbeddingBatchSize` = 300):
|
||||
- 80 chunks → 1 batch of 300 (all fit)
|
||||
- If 300 chunks → 1 batch (300 fits), but max concurrent = 4 if multiple embedding calls
|
||||
- Here: single embedding call with 80 items = 1 batch (no parallelism needed)
|
||||
|
||||
### Memory Usage
|
||||
|
||||
- `searchResults` (15 items) → ~30KB
|
||||
- `chunks` (80 items × 500 chars) → ~40KB text + embeddings ~400KB (80 × 1536 × 4)
|
||||
- Total ≈ 500KB excluding temporary HTTP buffers
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Use ConcurrentBag for Results/Chunks?
|
||||
|
||||
Thread-safe collection allows parallel tasks to add without locks. Order is not preserved (but we `DistinctBy` and `Select` maintains order of insertion? Actually no, `ConcurrentBag` doesn't guarantee order. But we later `ToList()` and `DistinctBy` preserves first occurrence order from the bag's enumeration (which is nondeterministic). This is acceptable because order doesn't matter (ranking is semantic). If order mattered, would need `ConcurrentQueue` or sorting by source.
|
||||
|
||||
### Why Not Use Parallel.ForEach for Article Fetching?
|
||||
|
||||
We use `Task.WhenAll` with `Select` + semaphore. `Parallel.ForEachAsync` could also work but requires .NET 6+ and we want to use same pattern as other phases. Semaphore gives explicit concurrency control.
|
||||
|
||||
### Why Separate Query Embedding from Chunk Embeddings?
|
||||
|
||||
`GetEmbeddingAsync` is called directly (not batched) because there's only one query. Could be batched with chunks but:
|
||||
- Query is small (single string)
|
||||
- Batch API has overhead (request structure)
|
||||
- Separate call allows independent completion (no need to wait for chunks to start query embedding)
|
||||
|
||||
### Why Two Different Embedding Methods?
|
||||
|
||||
`EmbeddingService` has:
|
||||
- `GetEmbeddingsWithRateLimitAsync` (used in SearchTool)
|
||||
- `GetEmbeddingsAsync` (similar but different implementation)
|
||||
|
||||
Probably legacy/refactor artifact. Could consolidate.
|
||||
|
||||
### Why Not Deduplicate URLs Earlier?
|
||||
|
||||
Deduplication happens after search aggregation. Could also deduplicate within each search result (SearxNG might already dedupe across engines). But global dedupe is necessary.
|
||||
|
||||
### Why Not Early Filtering (e.g., by domain, length)?
|
||||
|
||||
Possibly could improve quality:
|
||||
- Filter by domain reputation
|
||||
- Filter articles too short (<200 chars) or too long (>50KB)
|
||||
- Not implemented (keep simple)
|
||||
|
||||
## Testing Considerations
|
||||
|
||||
**Unit Testability**: `SearchTool` is fairly testable with mocks:
|
||||
- Mock `SearxngClient` to return predetermined results
|
||||
- Mock `ArticleService` via `EmbeddingService` (or mock that too)
|
||||
- Verify progress callback invocations
|
||||
- Verify final context format
|
||||
|
||||
**Integration Testing**:
|
||||
- End-to-end with real/mocked external services
|
||||
- Need test SearxNG instance and test OpenRouter key (or mock responses)
|
||||
|
||||
**Performance Testing**:
|
||||
- Benchmark with different concurrency settings
|
||||
- Profile memory for large result sets (1000+ articles)
|
||||
- Measure embedding API latency impact
|
||||
|
||||
## Known Issues
|
||||
|
||||
### Bug in ExecuteParallelEmbeddingsAsync?
|
||||
|
||||
Looking at the actual source code of `ExecuteParallelEmbeddingsAsync` **in the core SearchTool**:
|
||||
|
||||
```csharp
|
||||
var chunkTexts = chunks.Select(c => c.Content).ToList();
|
||||
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
|
||||
chunkTexts, onProgress);
|
||||
```
|
||||
|
||||
This is correct.
|
||||
|
||||
But in the **initial search result**, I notice there might be confusion. I'll verify this when writing the full component documentation.
|
||||
|
||||
### Potential Race Condition in GetEmbeddingsWithRateLimitAsync
|
||||
|
||||
```csharp
|
||||
results[batchIndex] = batchResults;
|
||||
```
|
||||
|
||||
This is writing to an array index from multiple parallel tasks. Array index writes are atomic for reference types on 64-bit? Actually, writes to different indices are safe because they don't overlap. This is fine.
|
||||
|
||||
### Progress Callback May Overwhelm
|
||||
|
||||
If invoked synchronously from many parallel tasks, could saturate the channel. `Channel.TryWrite` will return false if buffer full; we ignore return value. Could drop messages under heavy load. Acceptable for CLI UI (some messages may be lost but overall progress visible).
|
||||
|
||||
## Related Components
|
||||
|
||||
- **[OpenQueryApp](openquery-app.md)** - calls this
|
||||
- **[SearxngClient](../../services/SearxngClient.md)** - phase 1
|
||||
- **[ArticleService](../../services/ArticleService.md)** - phase 2a
|
||||
- **[ChunkingService](../../services/ChunkingService.md)** - phase 2b
|
||||
- **[EmbeddingService](../../services/EmbeddingService.md)** - phase 3
|
||||
- **[Ranking](../../services/EmbeddingService.md#cosinesimilarity)** - cosine similarity
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Services Overview](../services/overview.md) - See supporting services
|
||||
- [CLI Reference](../../api/cli.md) - How users trigger this pipeline
|
||||
- [Performance](../performance.md) - Optimize pipeline settings
|
||||
471
docs/components/services.md
Normal file
471
docs/components/services.md
Normal file
@@ -0,0 +1,471 @@
|
||||
# Services Overview
|
||||
|
||||
Comprehensive reference for all service classes in OpenQuery.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Service Catalog](#service-catalog)
|
||||
2. [Client Services](#client-services)
|
||||
3. [Processing Services](#processing-services)
|
||||
4. [Infrastructure Services](#infrastructure-services)
|
||||
5. [Service Interactions](#service-interactions)
|
||||
|
||||
## Service Catalog
|
||||
|
||||
OpenQuery's services are organized into three categories:
|
||||
|
||||
| Category | Services | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| **Clients** | `OpenRouterClient`, `SearxngClient` | External API communication |
|
||||
| **Processors** | `EmbeddingService`, `ChunkingService`, `ArticleService` | Data transformation & extraction |
|
||||
| **Infrastructure** | `RateLimiter`, `StatusReporter` | Cross-cutting concerns |
|
||||
|
||||
All services are **stateless** (except for internal configuration) and can be safely reused across multiple operations.
|
||||
|
||||
---
|
||||
|
||||
## Client Services
|
||||
|
||||
### OpenRouterClient
|
||||
|
||||
**Location**: `Services/OpenRouterClient.cs`
|
||||
**Purpose**: HTTP client for OpenRouter AI APIs (chat completions & embeddings)
|
||||
|
||||
#### API Endpoints
|
||||
|
||||
| Method | Endpoint | Purpose |
|
||||
|--------|----------|---------|
|
||||
| POST | `/chat/completions` | Chat completion (streaming or non-streaming) |
|
||||
| POST | `/embeddings` | Embedding generation for text inputs |
|
||||
|
||||
#### Authentication
|
||||
```
|
||||
Authorization: Bearer {apiKey}
|
||||
Accept: application/json
|
||||
```
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `StreamAsync(ChatCompletionRequest request, CancellationToken cancellationToken)`
|
||||
- **Returns**: `IAsyncEnumerable<StreamChunk>`
|
||||
- **Behavior**: Sets `request.Stream = true`, posts, reads Server-Sent Events stream
|
||||
- **Use Case**: Final answer streaming, real-time responses
|
||||
- **Stream Format**: SSE lines `data: {json}`; yields `TextDelta` or `ToolCall`
|
||||
|
||||
##### `CompleteAsync(ChatCompletionRequest request)`
|
||||
- **Returns**: `Task<ChatCompletionResponse>`
|
||||
- **Behavior**: Sets `request.Stream = false`, posts, returns full response
|
||||
- **Use Case**: Query generation (non-streaming)
|
||||
|
||||
##### `EmbedAsync(string model, List<string> inputs)`
|
||||
- **Returns**: `Task<float[][]>`
|
||||
- **Behavior**: POST `/embeddings`, returns array of vectors (ordered by input index)
|
||||
- **Use Case**: Batch embedding generation
|
||||
|
||||
##### `HttpClient`
|
||||
- **Property**: Internal `_httpClient` (created per instance)
|
||||
- **Note**: Could use `IHttpClientFactory` for pooling (not needed for CLI)
|
||||
|
||||
#### Error Handling
|
||||
- `EnsureSuccessStatusCode()` throws `HttpRequestException` on 4xx/5xx
|
||||
- No retry logic (handled by `EmbeddingService`)
|
||||
|
||||
#### Configuration
|
||||
```csharp
|
||||
public OpenRouterClient(string apiKey)
|
||||
{
|
||||
_apiKey = apiKey;
|
||||
_httpClient = new HttpClient();
|
||||
_httpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
|
||||
_httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
|
||||
}
|
||||
```
|
||||
|
||||
#### Example Usage
|
||||
```csharp
|
||||
var client = new OpenRouterClient("sk-or-...");
|
||||
var request = new ChatCompletionRequest("model", new List<Message> { ... });
|
||||
await foreach (var chunk in client.StreamAsync(request))
|
||||
{
|
||||
Console.Write(chunk.TextDelta);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SearxngClient
|
||||
|
||||
**Location**: `Services/SearxngClient.cs`
|
||||
**Purpose**: HTTP client for SearxNG metasearch engine
|
||||
|
||||
#### API Endpoint
|
||||
```
|
||||
GET /search?q={query}&format=json
|
||||
```
|
||||
|
||||
#### Constructor
|
||||
```csharp
|
||||
public SearxngClient(string baseUrl) // e.g., "http://localhost:8002"
|
||||
```
|
||||
- `baseUrl` trimmed of trailing `/`
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `SearchAsync(string query, int limit = 10)`
|
||||
- **Returns**: `Task<List<SearxngResult>>`
|
||||
- **Behavior**: GET request, deserialize JSON, take up to `limit` results
|
||||
- **On Failure**: Returns empty `List<SearxngResult>` (no exception)
|
||||
|
||||
#### Error Handling
|
||||
- `response.EnsureSuccessStatusCode()` would throw, but code doesn't call it
|
||||
- If invalid JSON or missing `Results`, returns empty list
|
||||
- Failures are **tolerated** - individual search queries may fail without aborting whole operation
|
||||
|
||||
#### Example Searxng Response
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"title": "Quantum Entanglement - Wikipedia",
|
||||
"url": "https://en.wikipedia.org/wiki/Quantum_entanglement",
|
||||
"content": "Quantum entanglement is a physical phenomenon..."
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Processing Services
|
||||
|
||||
### EmbeddingService
|
||||
|
||||
**Location**: `Services/EmbeddingService.cs`
|
||||
**Purpose**: Generate embeddings with batching, rate limiting, and retry logic
|
||||
|
||||
#### Configuration
|
||||
|
||||
**Embedding Model**: `openai/text-embedding-3-small` (default, configurable via constructor)
|
||||
|
||||
**ParallelProcessingOptions** (hardcoded defaults):
|
||||
```csharp
|
||||
public class ParallelProcessingOptions
|
||||
{
|
||||
public int MaxConcurrentEmbeddingRequests { get; set; } = 4;
|
||||
public int EmbeddingBatchSize { get; set; } = 300;
|
||||
}
|
||||
```
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `GetEmbeddingsAsync(List<string> texts, Action<string>? onProgress, CancellationToken)`
|
||||
- **Returns**: `Task<float[][]>`
|
||||
- **Behavior**:
|
||||
- Splits `texts` into batches of `EmbeddingBatchSize`
|
||||
- Parallel executes batches (max `MaxConcurrentEmbeddingRequests` concurrent)
|
||||
- Each batch: rate-limited, retry-wrapped `client.EmbedAsync(model, batch)`
|
||||
- Reassembles in original order
|
||||
- Failed batches → empty `float[]` for each text
|
||||
- **Progress**: Invokes `onProgress` for each batch: `"[Generating embeddings: batch X/Y]"`
|
||||
- **Thread-Safe**: Uses lock for collecting results
|
||||
|
||||
##### `GetEmbeddingAsync(string text, CancellationToken)`
|
||||
- **Returns**: `Task<float[]>`
|
||||
- **Behavior**: Single embedding with rate limiting and retry
|
||||
- **Use Case**: Query embedding
|
||||
|
||||
##### `Cos static float CosineSimilarity(float[] vector1, float[] vector2)
|
||||
```
|
||||
Uses `System.Numerics.Tensors.TensorPrimitives.CosineSimilarity`
|
||||
|
||||
Returns float between -1 and 1 (typically 0-1 for normalized embeddings)
|
||||
```
|
||||
|
||||
**Implementation**: Single line calling SIMD-accelerated tensor primitive
|
||||
|
||||
---
|
||||
|
||||
### ArticleService
|
||||
|
||||
**Location**: `Services/ArticleService.cs`
|
||||
**Purpose**: Extract clean article content from web URLs
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `FetchArticleAsync(string url)`
|
||||
- **Returns**: `Task<Article>`
|
||||
- **Behavior**: Delegates to `SmartReader.ParseArticleAsync(url)`
|
||||
- **Result**: `Article` with `Title`, `TextContent`, `IsReadable`, and metadata
|
||||
|
||||
#### Errors
|
||||
- Propagates exceptions (SmartReader may throw on network failures, malformed HTML)
|
||||
- `SearchTool` catches and logs
|
||||
|
||||
#### SmartReader Notes
|
||||
- Open-source article extraction library (bundled via NuGet)
|
||||
- Uses Readability algorithm (similar to Firefox Reader View)
|
||||
- Removes ads, navigation, boilerplate
|
||||
- `IsReadable` indicates quality (e.g., not a 404 page, not too short)
|
||||
|
||||
---
|
||||
|
||||
### ChunkingService
|
||||
|
||||
**Location**: `Services/ChunkingService.cs`
|
||||
**Purpose**: Split text into 500-character chunks at natural boundaries
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `ChunkText(string text)`
|
||||
- **Returns**: `List<string>`
|
||||
- **Algorithm**:
|
||||
- Constant `MAX_CHUNK_SIZE = 500`
|
||||
- While remaining text:
|
||||
- Take up to 500 chars
|
||||
- If not at end, backtrack to last `[' ', '\n', '\r', '.', '!']`
|
||||
- Trim, add if non-empty
|
||||
- Advance start
|
||||
- Returns all chunks
|
||||
|
||||
#### Characteristics
|
||||
- Static class (no instances)
|
||||
- Pure function (no side effects)
|
||||
- Zero dependencies
|
||||
- Handles edge cases (empty text, short text, text without breaks)
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure Services
|
||||
|
||||
### RateLimiter
|
||||
|
||||
**Location**: `Services/RateLimiter.cs`
|
||||
**Purpose**: Limit concurrent operations using semaphore
|
||||
|
||||
#### Constructor
|
||||
```csharp
|
||||
public RateLimiter(int maxConcurrentRequests)
|
||||
```
|
||||
Creates `SemaphoreSlim` with `maxConcurrentRequests`
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `ExecuteAsync<T>(Func<Task<T>> action, CancellationToken)`
|
||||
```csharp
|
||||
public async Task<T> ExecuteAsync<T>(Func<Task<T>> action, CancellationToken cancellationToken = default)
|
||||
{
|
||||
await _semaphore.WaitAsync(cancellationToken);
|
||||
try
|
||||
{
|
||||
return await action();
|
||||
}
|
||||
finally
|
||||
{
|
||||
_semaphore.Release();
|
||||
}
|
||||
}
|
||||
```
|
||||
- Waits for semaphore slot
|
||||
- Executes `action` (typically an API call)
|
||||
- Releases semaphore (even if exception)
|
||||
- Returns result from `action`
|
||||
|
||||
##### `ExecuteAsync(Func<Task> action, CancellationToken)`
|
||||
- Non-generic version (for void-returning actions)
|
||||
|
||||
#### Disposal
|
||||
```csharp
|
||||
public async ValueTask DisposeAsync()
|
||||
{
|
||||
_semaphore.Dispose();
|
||||
}
|
||||
```
|
||||
Implements `IAsyncDisposable` for async cleanup
|
||||
|
||||
#### Usage Pattern
|
||||
```csharp
|
||||
var result = await _rateLimiter.ExecuteAsync(async () =>
|
||||
{
|
||||
return await SomeApiCall();
|
||||
}, cancellationToken);
|
||||
```
|
||||
|
||||
#### Where Used
|
||||
- `EmbeddingService`: Limits concurrent embedding batch requests (default 4)
|
||||
|
||||
---
|
||||
|
||||
### StatusReporter
|
||||
|
||||
**Location**: `Services/StatusReporter.cs`
|
||||
**Purpose**: Real-time progress display with spinner (compact) or verbose lines
|
||||
|
||||
#### Constructor
|
||||
```csharp
|
||||
public StatusReporter(bool verbose)
|
||||
```
|
||||
- `verbose = true`: all progress via `WriteLine()` (no spinner)
|
||||
- `verbose = false`: spinner with latest status
|
||||
|
||||
#### Architecture
|
||||
|
||||
**Components**:
|
||||
- `Channel<string> _statusChannel` - producer-consumer queue
|
||||
- `Task _statusProcessor` - background task reading from channel
|
||||
- `CancellationTokenSource _spinnerCts` - spinner task cancellation
|
||||
- `Task _spinnerTask` - spinner animation task
|
||||
- `char[] _spinnerChars` - Braille spinner pattern
|
||||
|
||||
**Spinner Animation**:
|
||||
- Runs at 10 FPS (100ms interval)
|
||||
- Cycles through `['⠋','⠙','⠹','⠸','⠼','⠴','⠦','⠧','⠇','⠏']`
|
||||
- Displays: `⠋ Fetching articles...`
|
||||
- Updates in place using ANSI: `\r\x1b[K` (carriage return + erase line)
|
||||
|
||||
#### Public Methods
|
||||
|
||||
##### `UpdateStatus(string message)`
|
||||
- Fire-and-forget: writes to channel via `TryWrite` (non-blocking)
|
||||
- If channel full, message dropped (acceptable loss for UI)
|
||||
|
||||
##### `WriteLine(string text)`
|
||||
- Stops spinner temporarily
|
||||
- Clears current status line
|
||||
- Writes `text` with newline
|
||||
- In verbose mode: just `Console.WriteLine(text)`
|
||||
|
||||
##### `ClearStatus()`
|
||||
- In compact mode: `Console.Write("\r\x1b[K")` (erase line)
|
||||
- In verbose: no-op
|
||||
- Sets `_currentMessage = null`
|
||||
|
||||
##### `StartSpinner()` / `StopSpinner()`
|
||||
- Manual control (usually `StartSpinner` constructor call, `StopSpinner` by `Dispose`)
|
||||
|
||||
##### `Dispose()`
|
||||
- Completes channel writer
|
||||
- Awaits `_statusProcessor` completion
|
||||
- Calls `StopSpinner()`
|
||||
|
||||
#### Background Processing
|
||||
|
||||
**Status Processor**:
|
||||
```csharp
|
||||
private async Task ProcessStatusUpdatesAsync()
|
||||
{
|
||||
await foreach (var message in _statusChannel.Reader.ReadAllAsync())
|
||||
{
|
||||
if (_verbose)
|
||||
{
|
||||
Console.WriteLine(message);
|
||||
continue;
|
||||
}
|
||||
Console.Write("\r\x1b[K"); // Clear line
|
||||
Console.Write($"{_spinnerChars[0]} {message}"); // Static spinner
|
||||
_currentMessage = message;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Spinner Task**:
|
||||
```csharp
|
||||
_spinnerTask = Task.Run(async () =>
|
||||
{
|
||||
while (_spinnerCts is { Token.IsCancellationRequested: false })
|
||||
{
|
||||
if (_currentMessage != null)
|
||||
{
|
||||
Console.Write("\r\x1b[K");
|
||||
var charIndex = index++ % spinner.Length;
|
||||
Console.Write($"{spinner[charIndex]} {_currentMessage}");
|
||||
}
|
||||
await Task.Delay(100, _spinnerCts.Token);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
#### Thread Safety
|
||||
- `UpdateStatus` (producer) writes to channel
|
||||
- `ProcessStatusUpdatesAsync` (consumer) reads from channel
|
||||
- `_spinnerTask` runs concurrently
|
||||
- All UI writes happen in consumer/spinner task context (single-threaded UI)
|
||||
|
||||
#### Design Notes
|
||||
- Could be simplified: just use `Console.CursorLeft` for spinner, no channel
|
||||
- Channel allows random `UpdateStatus` calls from any thread without blocking
|
||||
- Braille spinner requires terminal that supports Unicode (most modern terminals do)
|
||||
|
||||
---
|
||||
|
||||
## Service Interactions
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
OpenQueryApp
|
||||
├── OpenRouterClient ← (used for query gen + final answer)
|
||||
└── SearchTool
|
||||
├── SearxngClient
|
||||
├── ArticleService (uses SmartReader)
|
||||
├── ChunkingService (static)
|
||||
├── EmbeddingService
|
||||
│ └── OpenRouterClient (different instance)
|
||||
│ └── RateLimiter
|
||||
└── ParallelProcessingOptions (config)
|
||||
```
|
||||
|
||||
### Service Lifetimes
|
||||
|
||||
All services are **transient** (new instance per query execution):
|
||||
- `OpenRouterClient` → 1 instance for query gen + answer
|
||||
- `SearxngClient` → 1 instance for all searches
|
||||
- `EmbeddingService` → 1 instance with its own `OpenRouterClient` and `RateLimiter`
|
||||
- `SearchTool` → 1 instance per query (constructed in `Program.cs`)
|
||||
|
||||
No singleton or static state (except static utility classes like `ChunkingService`).
|
||||
|
||||
### Data Flow Through Services
|
||||
|
||||
```
|
||||
OpenQueryApp
|
||||
│
|
||||
├─ OpenRouterClient.CompleteAsync() → query generation
|
||||
│ Messages → JSON → HTTP request → response → JSON → Messages
|
||||
│
|
||||
└─ SearchTool.ExecuteAsync()
|
||||
│
|
||||
├─ SearxngClient.SearchAsync() × N
|
||||
│ query → URL encode → GET → JSON → SearxngResult[]
|
||||
│
|
||||
├─ ArticleService.FetchArticleAsync() × M
|
||||
│ URL → HTTP GET → SmartReader → Article
|
||||
│
|
||||
├─ ChunkingService.ChunkText() × M
|
||||
│ Article.TextContent → List<string> chunks
|
||||
│
|
||||
├─ EmbeddingService.GetEmbeddingAsync(query) + GetEmbeddingsAsync(chunks[])
|
||||
│ texts → batches → rate-limited HTTP POST → JSON → float[][]
|
||||
│
|
||||
├─ CosineSimilarity(queryEmbedding, chunkEmbedding) × M
|
||||
│ Vectors → dot product → magnitude → score
|
||||
│
|
||||
└─ return context string (formatted chunks)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[OpenQueryApp](../components/openquery-app.md)** - Orchestrates services
|
||||
- **[SearchTool](../components/search-tool.md)** - Coordinates pipeline
|
||||
- **[Models](../components/models.md)** - Data structures passed between services
|
||||
- **[API Reference](../../api/cli.md)** - CLI that uses these services
|
||||
|
||||
---
|
||||
|
||||
**Service Design Principles**:
|
||||
- Single Responsibility: Each service does one thing well
|
||||
- Stateless: No instance state beyond constructor args
|
||||
- Composable: Services depend on abstractions (other services) not implementations
|
||||
- Testable: Can mock dependencies for unit testing
|
||||
356
docs/configuration.md
Normal file
356
docs/configuration.md
Normal file
@@ -0,0 +1,356 @@
|
||||
# Configuration
|
||||
|
||||
Complete guide to configuring OpenQuery for your environment.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Configuration Methods](#configuration-methods)
|
||||
2. [Configuration File](#configuration-file)
|
||||
3. [Environment Variables](#environment-variables)
|
||||
4. [Command-Line Options](#command-line-options)
|
||||
5. [Configuration Priority](#configuration-priority)
|
||||
6. [Recommended Settings](#recommended-settings)
|
||||
7. [Advanced Configuration](#advanced-configuration)
|
||||
|
||||
## Configuration Methods
|
||||
|
||||
OpenQuery can be configured through three methods, which merge together with clear priority:
|
||||
|
||||
| Method | Persistence | Use Case |
|
||||
|--------|-------------|----------|
|
||||
| Configuration File | Permanent | Default values you use daily |
|
||||
| Environment Variables | Session/Shell | CI/CD, scripting, temporary overrides |
|
||||
| Command-Line Options | Per-execution | One-off customizations |
|
||||
|
||||
## Configuration File
|
||||
|
||||
### Location
|
||||
OpenQuery follows the [XDG Base Directory](https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html) specification:
|
||||
|
||||
- **Linux/macOS**: `~/.config/openquery/config`
|
||||
- **Windows**: `%APPDATA%\openquery\config` (e.g., `C:\Users\<user>\AppData\Roaming\openquery\config`)
|
||||
|
||||
### Format
|
||||
Simple `key=value` pairs, one per line:
|
||||
|
||||
```ini
|
||||
ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
Model=qwen/qwen3.5-flash-02-23
|
||||
DefaultQueries=3
|
||||
DefaultChunks=3
|
||||
DefaultResults=5
|
||||
```
|
||||
|
||||
### Schema
|
||||
|
||||
| Key | Type | Default | Description |
|
||||
|-----|------|---------|-------------|
|
||||
| `ApiKey` | string | "" | OpenRouter API authentication key |
|
||||
| `Model` | string | `qwen/qwen3.5-flash-02-23` | Default LLM model to use |
|
||||
| `DefaultQueries` | int | 3 | Number of search queries to generate |
|
||||
| `DefaultChunks` | int | 3 | Number of top context chunks to include |
|
||||
| `DefaultResults` | int | 5 | Number of search results per query |
|
||||
|
||||
### Example Configurations
|
||||
|
||||
**Minimal** (just API key):
|
||||
```ini
|
||||
ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
**Optimized for Research**:
|
||||
```ini
|
||||
ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
Model=google/gemini-3-flash-preview
|
||||
DefaultQueries=5
|
||||
DefaultChunks=4
|
||||
DefaultResults=10
|
||||
```
|
||||
|
||||
**Cost-Conscious**:
|
||||
```ini
|
||||
ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
Model=qwen/qwen3.5-flash-02-23
|
||||
DefaultQueries=2
|
||||
DefaultChunks=2
|
||||
DefaultResults=3
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Environment variables override the configuration file and can be set temporarily or permanently in your shell profile.
|
||||
|
||||
### Available Variables
|
||||
|
||||
| Variable | Purpose | Required | Example |
|
||||
|----------|---------|----------|---------|
|
||||
| `OPENROUTER_API_KEY` | OpenRouter API key | **Yes** (unless in config file) | `export OPENROUTER_API_KEY="sk-or-..."` |
|
||||
| `OPENROUTER_MODEL` | Override default LLM model | No | `export OPENROUTER_MODEL="deepseek/deepseek-v3.2"` |
|
||||
| `SEARXNG_URL` | URL of SearxNG instance | No (default: `http://localhost:8002`) | `export SEARXNG_URL="https://searx.example.com"` |
|
||||
|
||||
### Setting Environment Variables
|
||||
|
||||
#### Temporary (Current Session)
|
||||
```bash
|
||||
# Linux/macOS
|
||||
export OPENROUTER_API_KEY="sk-or-..."
|
||||
export SEARXNG_URL="http://localhost:8002"
|
||||
|
||||
# Windows PowerShell
|
||||
$env:OPENROUTER_API_KEY="sk-or-..."
|
||||
$env:SEARXNG_URL="http://localhost:8002"
|
||||
```
|
||||
|
||||
#### Permanent (Shell Profile)
|
||||
|
||||
**bash** (`~/.bashrc` or `~/.bash_profile`):
|
||||
```bash
|
||||
export OPENROUTER_API_KEY="sk-or-..."
|
||||
export SEARXNG_URL="http://localhost:8002"
|
||||
```
|
||||
|
||||
**zsh** (`~/.zshrc`):
|
||||
```zsh
|
||||
export OPENROUTER_API_KEY="sk-or-..."
|
||||
export SEARXNG_URL="http://localhost:8002"
|
||||
```
|
||||
|
||||
**fish** (`~/.config/fish/config.fish`):
|
||||
```fish
|
||||
set -x OPENROUTER_API_KEY "sk-or-..."
|
||||
set -x SEARXNG_URL "http://localhost:8002"
|
||||
```
|
||||
|
||||
**Windows** (PowerShell profile):
|
||||
```powershell
|
||||
[Environment]::SetEnvironmentVariable("OPENROUTER_API_KEY", "sk-or-...", "User")
|
||||
[Environment]::SetEnvironmentVariable("SEARXNG_URL", "http://localhost:8002", "User")
|
||||
```
|
||||
|
||||
After editing profile files, restart your terminal or run `source ~/.bashrc` (or equivalent).
|
||||
|
||||
### Security Note
|
||||
Never commit your API key to version control. Use environment variables or config file that's in `.gitignore`. The default `.gitignore` already excludes common build directories but doesn't include the config file since it's outside the project directory (`~/.config/`).
|
||||
|
||||
## Command-Line Options
|
||||
|
||||
Options passed directly to the `openquery` command override both config file and environment variables for that specific execution.
|
||||
|
||||
### Main Command Options
|
||||
|
||||
```bash
|
||||
openquery [OPTIONS] <question>
|
||||
```
|
||||
|
||||
| Option | Aliases | Type | Default Source | Description |
|
||||
|--------|---------|------|----------------|-------------|
|
||||
| `--chunks` | `-c` | int | Config `DefaultChunks` | Number of top context chunks |
|
||||
| `--results` | `-r` | int | Config `DefaultResults` | Search results per query |
|
||||
| ``--queries` | `-q` | int | Config `DefaultQueries` | Number of search queries |
|
||||
| `--short` | `-s` | bool | false | Request concise answer |
|
||||
| `--long` | `-l` | bool | false | Request detailed answer |
|
||||
| `--verbose` | `-v` | bool | false | Show detailed progress |
|
||||
|
||||
### Configure Command Options
|
||||
|
||||
```bash
|
||||
openquery configure [OPTIONS]
|
||||
```
|
||||
|
||||
| Option | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `--interactive` / `-i` | bool | Launch interactive configuration wizard |
|
||||
| `--key` | string | Set API key |
|
||||
| `--model` | string | Set default model |
|
||||
| `--queries` | int? | Set default queries |
|
||||
| `--chunks` | int? | Set default chunks |
|
||||
| `--results` | int? | Set default results |
|
||||
|
||||
## Configuration Priority
|
||||
|
||||
When OpenQuery needs a value, it checks sources in this order (highest to lowest priority):
|
||||
|
||||
1. **Command-line option** (if provided)
|
||||
2. **Environment variable** (if set)
|
||||
3. **Configuration file** (if key exists)
|
||||
4. **Hard-coded default** (if all above missing)
|
||||
|
||||
### Examples
|
||||
|
||||
**Example 1**: Environment overrides config
|
||||
```bash
|
||||
# config file: DefaultQueries=5
|
||||
export OPENROUTER_MODEL="deepseek/deepseek-v3.2"
|
||||
openquery --queries 2 "question" # Uses: queries=2 (CLI), model=deepseek (env), chunks=3 (config)
|
||||
```
|
||||
|
||||
**Example 2**: CLI overrides everything
|
||||
```bash
|
||||
export OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23"
|
||||
openquery --model "google/gemini-3-flash-preview" --chunks 5 "question"
|
||||
# Uses: model=google (CLI), chunks=5 (CLI), queries=3 (default)
|
||||
```
|
||||
|
||||
**Example 3**: All sources combined
|
||||
```bash
|
||||
# config: DefaultChunks=4
|
||||
# env: OPENROUTER_MODEL="moonshotai/kimi-k2.5", SEARXNG_URL="http://custom:8002"
|
||||
# CLI: --queries 6 --short
|
||||
openquery "question"
|
||||
# Uses: queries=6 (CLI), chunks=4 (config), results=5 (config),
|
||||
# model=kimi-k2.5 (env), searxng=custom (env), short=true (CLI)
|
||||
```
|
||||
|
||||
## Recommended Settings
|
||||
|
||||
### For Quick Questions (Facts, Definitions)
|
||||
```bash
|
||||
openquery -q 2 -r 3 -c 2 "What is the capital of France?"
|
||||
```
|
||||
- Few queries (2) for straightforward facts
|
||||
- Few results (3) to minimize processing
|
||||
- Few chunks (2) for focused answer
|
||||
|
||||
### For Research (Complex Topics)
|
||||
```bash
|
||||
openquery -q 5 -r 10 -c 4 -l "Explain the causes of the French Revolution"
|
||||
```
|
||||
- More queries (5) for diverse perspectives
|
||||
- More results (10) for comprehensive coverage
|
||||
- More chunks (4) for rich context
|
||||
- Long format for depth
|
||||
|
||||
### For Exploration (Broad Topics)
|
||||
```bash
|
||||
openquery -q 8 -r 15 -c 5 "What are the latest developments in AI?"
|
||||
```
|
||||
- Many queries (8) to explore different angles
|
||||
- Many results (15) for breadth
|
||||
- More chunks (5) for extensive context
|
||||
|
||||
### Cost Optimization
|
||||
```bash
|
||||
openquery configure --model "qwen/qwen3.5-flash-02-23"
|
||||
# Keep defaults: -q 3 -r 5 -c 3
|
||||
```
|
||||
- Qwen Flash is very cost-effective
|
||||
- Default parameters provide good balance
|
||||
|
||||
### Performance Optimization
|
||||
```bash
|
||||
# Adjust ParallelProcessingOptions in SearchTool.cs if needed
|
||||
# Default: MaxConcurrentArticleFetches=10, MaxConcurrentEmbeddingRequests=4
|
||||
```
|
||||
- Reduce these values if you see rate limits or memory pressure
|
||||
- Increase them if you have fast network/API and want more speed
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Changing Concurrency Limits
|
||||
|
||||
Concurrency limits are currently hardcoded in `SearchTool.cs` but can be adjusted:
|
||||
|
||||
```csharp
|
||||
public class ParallelProcessingOptions
|
||||
{
|
||||
public int MaxConcurrentArticleFetches { get; set; } = 10; // ← Change this
|
||||
public int MaxConcurrentEmbeddingRequests { get; set; } = 4; // ← Change this
|
||||
public int EmbeddingBatchSize { get; set; } = 300; // ← Change this
|
||||
}
|
||||
```
|
||||
|
||||
To make these configurable, you could:
|
||||
1. Add fields to `AppConfig`
|
||||
2. Read from config file
|
||||
3. Pass through to `SearchTool` constructor
|
||||
|
||||
### Custom Embedding Model
|
||||
|
||||
The embedding model is hardcoded to `openai/text-embedding-3-small`. To change:
|
||||
|
||||
Edit the `EmbeddingService` constructor:
|
||||
```csharp
|
||||
public EmbeddingService(OpenRouterClient client, string embeddingModel = "your-model")
|
||||
```
|
||||
|
||||
Or make it configurable via CLI/config (future enhancement).
|
||||
|
||||
### Changing Chunk Size
|
||||
|
||||
Chunk size (500 chars) is defined in `ChunkingService.cs`:
|
||||
```csharp
|
||||
private const int MAX_CHUNK_SIZE = 500;
|
||||
```
|
||||
|
||||
Modify this constant to change how articles are split. Larger chunks:
|
||||
- ✅ More context per chunk
|
||||
- ❌ Fewer chunks for same article
|
||||
- ❌ Higher token usage in final answer
|
||||
|
||||
Smaller chunks:
|
||||
- ✅ More granular matching
|
||||
- ❌ May lose context across chunk boundaries
|
||||
|
||||
### Using a Custom SearxNG Instance
|
||||
|
||||
Some SearxNG deployments may require HTTPS, authentication, or custom paths:
|
||||
|
||||
```bash
|
||||
# With authentication (if supported)
|
||||
export SEARXNG_URL="https://user:pass@searx.example.com:8080"
|
||||
|
||||
# With custom path
|
||||
export SEARXNG_URL="https://searx.example.com/custom-path"
|
||||
```
|
||||
|
||||
Note: Most SearxNG instances don't require auth as they're designed for privacy.
|
||||
|
||||
### OpenRouter Settings
|
||||
|
||||
OpenRouter supports additional parameters (not yet exposed in OpenQuery):
|
||||
|
||||
- `temperature` - Randomness (0-2, default ~1)
|
||||
- `max_tokens` - Response length limit
|
||||
- `top_p` - Nucleus sampling
|
||||
- `frequency_penalty` / `presence_penalty`
|
||||
|
||||
These could be added to `ChatCompletionRequest` in future versions.
|
||||
|
||||
## Managing Multiple Configurations
|
||||
|
||||
You can maintain multiple config files and symlink or set per-project:
|
||||
|
||||
```bash
|
||||
# Create project-specific config
|
||||
cp ~/.config/openquery/config ~/myproject/openquery.config
|
||||
|
||||
# Use it temporarily
|
||||
OPENQUERY_CONFIG=~/myproject/openquery.config openquery "question"
|
||||
```
|
||||
|
||||
**Note**: Currently OpenQuery only looks at `~/.config/openquery/config`. Multi-config support would require code changes (reading from `OPENQUERY_CONFIG` env var).
|
||||
|
||||
## Configuration Validation
|
||||
|
||||
OpenQuery doesn't strictly validate config values. Invalid settings may cause runtime errors:
|
||||
|
||||
- `DefaultQueries <= 0` → May cause exceptions or zero queries
|
||||
- `DefaultChunks <= 0` → May return no context
|
||||
- `DefaultResults <= 0` → No search results
|
||||
|
||||
Validate manually:
|
||||
```bash
|
||||
# Test your config loads
|
||||
cat ~/.config/openquery/config
|
||||
|
||||
# Test with verbose mode
|
||||
openquery -v "test"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Usage Guide](usage.md) - Learn how to use the CLI
|
||||
- [Architecture](architecture.md) - Understand the system design
|
||||
- [Troubleshooting](troubleshooting.md) - Fix common issues
|
||||
173
docs/index.md
Normal file
173
docs/index.md
Normal file
@@ -0,0 +1,173 @@
|
||||
# OpenQuery Documentation
|
||||
|
||||
Welcome to the comprehensive documentation for OpenQuery - the AI-powered search and answer system.
|
||||
|
||||
## 📚 Documentation Overview
|
||||
|
||||
### Getting Started
|
||||
- **[Installation Guide](installation.md)** - Build, install, and setup instructions
|
||||
- **[Configuration](configuration.md)** - Configure API keys, models, and settings
|
||||
- **[Usage Guide](usage.md)** - Complete CLI reference with examples
|
||||
|
||||
### Deep Dive
|
||||
- **[Architecture](architecture.md)** - System design, patterns, and data flow
|
||||
- **[Components](components/overview.md)** - Detailed component documentation
|
||||
- [OpenQueryApp](components/openquery-app.md)
|
||||
- [SearchTool](components/search-tool.md)
|
||||
- [Services](components/services.md)
|
||||
- [Models](components/models.md)
|
||||
- **[API Reference](api/cli.md)** - Complete command-line interface reference
|
||||
- [Environment Variables](api/environment-variables.md)
|
||||
- [Programmatic APIs](api/programmatic.md)
|
||||
|
||||
### Support
|
||||
- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
|
||||
- **[Performance](performance.md)** - Performance characteristics and optimization
|
||||
|
||||
## 🎯 Quick Links
|
||||
|
||||
### For Users
|
||||
- [Install OpenQuery](installation.md) in 5 minutes
|
||||
- [Configure your API key](configuration.md)
|
||||
- [Learn the basics](usage.md)
|
||||
- [Solve common problems](troubleshooting.md)
|
||||
|
||||
### For Developers
|
||||
- [Understand the architecture](architecture.md)
|
||||
- [Explore components](components/overview.md)
|
||||
- [Use the APIs programmatically](api/programmatic.md)
|
||||
- [Performance tuning](performance.md)
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Project Overview](#project-overview)
|
||||
2. [Key Concepts](#key-concepts)
|
||||
3. [Technology Stack](#technology-stack)
|
||||
4. [System Workflow](#system-workflow)
|
||||
|
||||
## Project Overview
|
||||
|
||||
**OpenQuery** is a sophisticated CLI tool that combines the power of large language models with web search to provide accurate, well-sourced answers to complex questions.
|
||||
|
||||
### What It Does
|
||||
- Takes a natural language question as input
|
||||
- Generates multiple diverse search queries
|
||||
- Searches the web via SearxNG
|
||||
- Extracts and processes article content
|
||||
- Uses semantic similarity to rank relevance
|
||||
- Synthesizes a comprehensive AI-generated answer with citations
|
||||
|
||||
### Why Use OpenQuery?
|
||||
- **Accuracy**: Multiple search queries reduce bias and increase coverage
|
||||
- **Transparency**: Sources are cited in the final answer
|
||||
- **Speed**: Parallel processing minimizes latency
|
||||
- **Control**: Fine-tune every aspect from query count to chunk selection
|
||||
- **Privacy**: SearxNG provides anonymous, aggregating search
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Search Queries
|
||||
Instead of using your exact question, OpenQuery generates multiple optimized search queries (default: 3). For example, "What is quantum entanglement?" might become:
|
||||
- "quantum entanglement definition"
|
||||
- "how quantum entanglement works"
|
||||
- "quantum entanglement experiments"
|
||||
|
||||
### Content Chunks
|
||||
Long articles are split into ~500-character chunks. Each chunk is:
|
||||
- Stored with its source URL and title
|
||||
- Converted to a vector embedding (1536 dimensions)
|
||||
- Scored against your query embedding
|
||||
|
||||
### Semantic Ranking
|
||||
Using cosine similarity between embeddings, OpenQuery ranks chunks by relevance and selects the top N (default: 3) for the final context.
|
||||
|
||||
### Streaming Answer
|
||||
The LLM receives your question plus the top chunks as context and streams the answer in real-time, citing sources like `[Source 1]`.
|
||||
|
||||
## Technology Stack
|
||||
|
||||
| Layer | Technology | Purpose |
|
||||
|-------|------------|---------|
|
||||
| Runtime | .NET 10.0 AOT | Native performance, minimal footprint |
|
||||
| LLM | OpenRouter API | Chat completions and embeddings |
|
||||
| Search | SearxNG | Metasearch engine |
|
||||
| Content Extraction | SmartReader | Article text extraction |
|
||||
| Vector Math | System.Numerics.Tensors | High-performance cosine similarity |
|
||||
| Resilience | Polly | Retry and circuit breaker policies |
|
||||
| CLI | System.CommandLine | Command parsing and help |
|
||||
| JSON | System.Text.Json (source-gen) | Fast serialization |
|
||||
|
||||
## System Workflow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ OpenQuery Workflow │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 1. User Query: "What is quantum entanglement?" │
|
||||
│ │
|
||||
│ 2. Query Generation (Optional) │
|
||||
│ LLM generates: ["quantum entanglement physics", │
|
||||
│ "quantum entanglement definition", │
|
||||
│ "how does quantum entanglement work"] │
|
||||
│ │
|
||||
│ 3. Parallel Searches │
|
||||
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
|
||||
│ │ Query 1 → │→ │ SearxNG │→ │ Results │ │
|
||||
│ └────────────┘ └────────────┘ └────────────┘ │
|
||||
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
|
||||
│ │ Query 2 → │→ │ SearxNG │→ │ Results │ │
|
||||
│ └────────────┘ └────────────┘ └────────────┘ │
|
||||
│ ┌────────────┐ ┌────────────┐ ┌────────────┘ │
|
||||
│ │ Query 3 → │→ │ SearxNG │→ │ Results (combined) │
|
||||
│ └────────────┘ └────────────┘ └────────────┘ │
|
||||
│ │
|
||||
│ 4. Parallel Article Fetching │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ URL 1 → │→ │ Article │→ │ Chunks │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ URL 2 → │→ │ Article │→ │ Chunks │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
│ ... (concurrent, max 10 at a time) │
|
||||
│ │
|
||||
│ 5. Parallel Embeddings │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Chunks │→ │ Embed- │→ │ Vectors │ │
|
||||
│ │ Batch 1 │ │ ding API │ │ │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Chunks │→ │ Embed- │→ │ Vectors │ │
|
||||
│ │ Batch 2 │ │ ding API │ │ │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
│ (batches of 300, up to 4 concurrent) │
|
||||
│ │
|
||||
│ 6. Semantic Ranking │
|
||||
│ Query Embedding + Chunk Embeddings → Cosine Similarity → │
|
||||
│ Score → Sort Descending → Top 3 Chunks │
|
||||
│ │
|
||||
│ 7. Final Answer Generation │
|
||||
│ ┌────────────────────────────────────────────┐ │
|
||||
│ │ System: "Answer based on this context:" │ │
|
||||
│ │ Context: [Top 3 chunks with sources] │ │
|
||||
│ │ Question: "What is quantum entanglement?" │ │
|
||||
│ └────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ LLM Streams Answer │
|
||||
│ "Quantum entanglement is..." │
|
||||
│ with citations like [Source 1] │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **[Install OpenQuery](installation.md)**
|
||||
2. **[Configure it](configuration.md)**
|
||||
3. **[Start asking questions](usage.md)**
|
||||
|
||||
For detailed technical information, continue to [the architecture guide](architecture.md).
|
||||
|
||||
---
|
||||
|
||||
**Need help?** Check the [Troubleshooting](troubleshooting.md) guide.
|
||||
358
docs/installation.md
Normal file
358
docs/installation.md
Normal file
@@ -0,0 +1,358 @@
|
||||
# Installation Guide
|
||||
|
||||
This guide covers how to build, install, and configure OpenQuery on your system.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Prerequisites](#prerequisites)
|
||||
2. [Quick Install](#quick-install)
|
||||
3. [Manual Build](#manual-build)
|
||||
4. [Platform-Specific Instructions](#platform-specific-instructions)
|
||||
5. [Post-Installation](#post-installation)
|
||||
6. [Verification](#verification)
|
||||
7. [Uninstallation](#uninstallation)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Software
|
||||
- **.NET SDK 10.0** or later
|
||||
- Download from [dotnet.microsoft.com](https://dotnet.microsoft.com/download)
|
||||
- Verify: `dotnet --version` should show 10.x or higher
|
||||
|
||||
### External Services (Setup Required)
|
||||
1. **SearxNG Instance** - Metasearch engine
|
||||
- **Docker (Recommended)**:
|
||||
```bash
|
||||
docker run -d \
|
||||
--name searxng \
|
||||
-p 8002:8080 \
|
||||
-v searxng-data:/etc/searxng \
|
||||
searxng/searxng:latest
|
||||
```
|
||||
- Access at `http://localhost:8002`
|
||||
|
||||
- **Alternative**: Use a public SearxNG instance from [searx.space](https://searx.space)
|
||||
|
||||
2. **OpenRouter API Key** - AI model provider
|
||||
- Sign up at [openrouter.ai](https://openrouter.ai)
|
||||
- Get your API key from dashboard
|
||||
- Free tier available with rate limits
|
||||
|
||||
## Quick Install
|
||||
|
||||
The easiest way to get OpenQuery up and running:
|
||||
|
||||
```bash
|
||||
# 1. Clone the repository
|
||||
git clone <your-repo-url>
|
||||
cd OpenQuery
|
||||
|
||||
# 2. Make install script executable and run
|
||||
chmod +x install.sh
|
||||
./install.sh
|
||||
|
||||
# 3. Configure your API key
|
||||
openquery configure -i
|
||||
|
||||
# 4. Test it
|
||||
openquery "Hello world"
|
||||
```
|
||||
|
||||
**What the install script does**:
|
||||
- Builds the project in Release mode
|
||||
- Publishes as self-contained AOT binary
|
||||
- Copies to `~/.local/bin/OpenQuery` (Linux/macOS)
|
||||
- Creates config directory `~/.config/openquery/`
|
||||
|
||||
## Manual Build
|
||||
|
||||
If you prefer to build manually or need a specific platform:
|
||||
|
||||
### Step 1: Restore Dependencies
|
||||
```bash
|
||||
dotnet restore
|
||||
```
|
||||
|
||||
### Step 2: Build
|
||||
```bash
|
||||
dotnet build -c Release
|
||||
```
|
||||
|
||||
### Step 3: Publish
|
||||
|
||||
#### For Current Platform (Self-Contained AOT)
|
||||
```bash
|
||||
dotnet publish -c Release \
|
||||
--self-contained true \
|
||||
/p:PublishAot=true
|
||||
```
|
||||
|
||||
The binary will be at:
|
||||
```
|
||||
bin/Release/net10.0/<rid>/publish/OpenQuery
|
||||
```
|
||||
|
||||
#### For Specific Platform (Cross-Compilation)
|
||||
|
||||
**Runtime Identifiers (RIDs)**:
|
||||
| Platform | RID |
|
||||
|----------|-----|
|
||||
| Linux x64 | `linux-x64` |
|
||||
| Linux ARM64 | `linux-arm64` |
|
||||
| macOS x64 | `osx-x64` |
|
||||
| macOS ARM64 | `osx-arm64` |
|
||||
| Windows x64 | `win-x64` |
|
||||
| Windows ARM64 | `win-arm64` |
|
||||
|
||||
Example for Linux x64:
|
||||
```bash
|
||||
dotnet publish -c Release \
|
||||
-r linux-x64 \
|
||||
--self-contained true \
|
||||
/p:PublishAot=true
|
||||
```
|
||||
|
||||
### Step 4: Deploy
|
||||
|
||||
Copy the binary to a directory in your PATH:
|
||||
|
||||
```bash
|
||||
# Linux/macOS
|
||||
sudo cp bin/Release/net10.0/linux-x64/publish/OpenQuery /usr/local/bin/
|
||||
chmod +x /usr/local/bin/OpenQuery
|
||||
|
||||
# Windows (PowerShell as Admin)
|
||||
Copy-Item bin\Release\net10.0\win-x64\publish\OpenQuery.exe C:\Program Files\OpenQuery\
|
||||
```
|
||||
|
||||
Or use a local bin directory:
|
||||
```bash
|
||||
mkdir -p ~/.local/bin
|
||||
cp bin/Release/net10.0/linux-x64/publish/OpenQuery ~/.local/bin/
|
||||
# Add to PATH if not already: export PATH="$HOME/.local/bin:$PATH"
|
||||
```
|
||||
|
||||
## Platform-Specific Instructions
|
||||
|
||||
### Linux
|
||||
|
||||
#### Ubuntu/Debian
|
||||
```bash
|
||||
# Install .NET SDK 10.0
|
||||
wget https://dot.net/v10/dotnet-install.sh -O dotnet-install.sh
|
||||
chmod +x dotnet-install.sh
|
||||
./dotnet-install.sh --channel 10.0
|
||||
|
||||
# Add to PATH
|
||||
export PATH="$HOME/.dotnet:$PATH"
|
||||
|
||||
# Build and install (as shown above)
|
||||
```
|
||||
|
||||
#### With Systemd Service (Optional)
|
||||
If you run SearxNG locally, you might want it as a service:
|
||||
```bash
|
||||
# Create systemd service for SearxNG (if using Docker)
|
||||
sudo nano /etc/systemd/system/searxng.service
|
||||
```
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=SearxNG Search Engine
|
||||
Requires=docker.service
|
||||
After=docker.service
|
||||
|
||||
[Service]
|
||||
Restart=always
|
||||
ExecStart=/usr/bin/docker start -a searxng
|
||||
ExecStop=/usr/bin/docker stop -t 2 searxng
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
```bash
|
||||
sudo systemctl enable searxng
|
||||
sudo systemctl start searxng
|
||||
```
|
||||
|
||||
### macOS
|
||||
|
||||
#### Homebrew Install (if .NET available)
|
||||
```bash
|
||||
brew install dotnet-sdk
|
||||
```
|
||||
|
||||
#### M1/M2 (ARM64) Notes
|
||||
- Use RID: `osx-arm64`
|
||||
- Ensure you have the ARM64 version of .NET SDK
|
||||
|
||||
### Windows
|
||||
|
||||
#### Using Winget (Windows 10/11)
|
||||
```powershell
|
||||
winget install Microsoft.DotNet.SDK.10
|
||||
```
|
||||
|
||||
#### Manual Install
|
||||
1. Download installer from [dotnet.microsoft.com](https://dotnet.microsoft.com/download)
|
||||
2. Run installer
|
||||
3. Verify in PowerShell:
|
||||
```powershell
|
||||
dotnet --version
|
||||
```
|
||||
|
||||
#### Building
|
||||
```powershell
|
||||
dotnet publish -c Release -r win-x64 --self-contained true /p:PublishAot=true
|
||||
```
|
||||
|
||||
## Post-Installation
|
||||
|
||||
### 1. Verify SearxNG is Running
|
||||
```bash
|
||||
curl "http://localhost:8002/search?q=test&format=json"
|
||||
```
|
||||
|
||||
Expected: JSON response with results array.
|
||||
|
||||
### 2. Configure OpenQuery
|
||||
```bash
|
||||
# Interactive setup
|
||||
openquery configure -i
|
||||
|
||||
# Or via environment variables
|
||||
setx OPENROUTER_API_KEY "sk-or-..." # Windows
|
||||
export OPENROUTER_API_KEY="sk-or-..." # Linux/macOS
|
||||
```
|
||||
|
||||
### 3. Optional: Set Defaults
|
||||
```bash
|
||||
openquery configure --queries 5 --chunks 4 --results 10
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
### Test Installation
|
||||
```bash
|
||||
# Check binary exists and is executable
|
||||
which openquery # Linux/macOS
|
||||
where openquery # Windows
|
||||
|
||||
# If installed as OpenQuery (capital O)
|
||||
which OpenQuery
|
||||
```
|
||||
|
||||
### Test Configuration
|
||||
```bash
|
||||
# Should show your config or defaults
|
||||
cat ~/.config/openquery/config
|
||||
```
|
||||
|
||||
### Test the System
|
||||
```bash
|
||||
# Simple query (should work with any API key)
|
||||
openquery "What is 2+2?"
|
||||
|
||||
# More complex query
|
||||
openquery -v "What are the benefits of exercise?"
|
||||
```
|
||||
|
||||
Expected output:
|
||||
- Spinner animation with status updates
|
||||
- Streaming answer from the AI
|
||||
- Citations like `[Source 1](url)` in the answer
|
||||
|
||||
## Uninstallation
|
||||
|
||||
### Using Uninstall Script
|
||||
```bash
|
||||
chmod +x uninstall.sh
|
||||
./uninstall.sh
|
||||
```
|
||||
|
||||
The script will:
|
||||
- Remove binary from `~/.local/bin/`
|
||||
- Ask if you want to delete config directory
|
||||
|
||||
### Manual Removal
|
||||
```bash
|
||||
# Remove binary
|
||||
rm ~/.local/bin/OpenQuery
|
||||
|
||||
# Remove config (optional)
|
||||
rm -r ~/.config/openquery
|
||||
```
|
||||
|
||||
### Remove SearxNG (if no longer needed)
|
||||
```bash
|
||||
docker rm -f searxng
|
||||
docker volume rm searxng-data
|
||||
```
|
||||
|
||||
## Advanced Build Options
|
||||
|
||||
### Reduce Binary Size
|
||||
Edit `OpenQuery.csproj`:
|
||||
```xml
|
||||
<PropertyGroup>
|
||||
<PublishAot>true</PublishAot>
|
||||
<InvariantGlobalization>true</InvariantGlobalization> <!-- Already set -->
|
||||
<StripSymbols>true</StripSymbols>
|
||||
</PropertyGroup>
|
||||
```
|
||||
|
||||
### Debug Build
|
||||
```bash
|
||||
dotnet build -c Debug
|
||||
dotnet run -- "your question"
|
||||
```
|
||||
|
||||
### With Symbols (for debugging)
|
||||
```bash
|
||||
dotnet publish -c Release -r linux-x64 \
|
||||
--self-contained true \
|
||||
/p:PublishAot=true \
|
||||
/p:DebugType=portable
|
||||
```
|
||||
|
||||
## Troubleshooting Installation
|
||||
|
||||
### "dotnet: command not found"
|
||||
- Add `.dotnet` to PATH: `export PATH="$HOME/.dotnet:$PATH"`
|
||||
- Restart terminal or source shell config
|
||||
|
||||
### "The SDK 'Microsoft.NET.Sdk' was not found"
|
||||
- .NET SDK not installed correctly
|
||||
- Re-run installer or use `dotnet-install.sh`
|
||||
|
||||
### AOT Build Fails
|
||||
- Some platforms may not support AOT yet
|
||||
- Remove `/p:PublishAot=true` to use JIT
|
||||
- Check [.NET AOT support](https://docs.microsoft.com/dotnet/core/deploying/native-aot/)
|
||||
|
||||
### Docker Pull Fails (SearxNG)
|
||||
```bash
|
||||
# Pull image separately first
|
||||
docker pull searxng/searxng:latest
|
||||
# Then run container
|
||||
docker run -d --name searxng -p 8002:8080 searxng/searxng
|
||||
```
|
||||
|
||||
### Port 8002 Already in Use
|
||||
Change port in docker command:
|
||||
```bash
|
||||
docker run -d --name searxng -p 8080:8080 searxng/searxng
|
||||
# Then set SEARXNG_URL=http://localhost:8080
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
After successful installation:
|
||||
1. [Configure OpenQuery](configuration.md)
|
||||
2. [Learn how to use it](usage.md)
|
||||
3. Read the [Architecture](architecture.md) to understand how it works
|
||||
|
||||
---
|
||||
|
||||
**Need help?** See [Troubleshooting](troubleshooting.md) or open an issue.
|
||||
522
docs/performance.md
Normal file
522
docs/performance.md
Normal file
@@ -0,0 +1,522 @@
|
||||
# Performance
|
||||
|
||||
Performance characteristics, optimization strategies, and scalability considerations for OpenQuery.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Performance Overview](#performance-overview)
|
||||
2. [Latency Breakdown](#latency-breakdown)
|
||||
3. [Throughput](#throughput)
|
||||
4. [Memory Usage](#memory-usage)
|
||||
5. [Benchmarking](#benchmarking)
|
||||
6. [Optimization Strategies](#optimization-strategies)
|
||||
7. [Scalability Limits](#scalability-limits)
|
||||
|
||||
## Performance Overview
|
||||
|
||||
OpenQuery is designed for **low-latency interactive use** (15-50 seconds end-to-end) while maximizing parallelization to minimize wait time.
|
||||
|
||||
### Key Metrics
|
||||
|
||||
| Metric | Typical | Best Case | Worst Case |
|
||||
|--------|---------|-----------|------------|
|
||||
| **End-to-End Latency** | 15-50s | 10s | 120s+ |
|
||||
| **API Cost** | $0.01-0.05 | $0.005 | $0.20+ |
|
||||
| **Memory Footprint** | 100-300MB | 50MB | 1GB+ |
|
||||
| **Network I/O** | 5-20MB | 1MB | 100MB+ |
|
||||
|
||||
**Note**: Wide variance due to network latency, content size, and LLM speed.
|
||||
|
||||
---
|
||||
|
||||
## Latency Breakdown
|
||||
|
||||
### Default Configuration
|
||||
|
||||
`-q 3 -r 5 -c 3` (3 queries, 5 results each, 3 final chunks)
|
||||
|
||||
| Stage | Operation | Parallelism | Time (p50) | Time (p95) | Dominant Factor |
|
||||
|-------|-----------|-------------|------------|------------|-----------------|
|
||||
| 1 | Query Generation | 1 | 2-5s | 10s | LLM inference speed |
|
||||
| 2a | Searches (3 queries × 5 results) | 3 concurrent | 3-8s | 15s | SearxNG latency |
|
||||
| 2b | Article Fetching (≈15 URLs) | 10 concurrent | 5-15s | 30s | Each site's response time |
|
||||
| 2c | Chunking | 10 concurrent | <1s | 2s | CPU (HTML parsing) |
|
||||
| 3a | Query Embedding | 1 | 0.5-1s | 3s | Embedding API latency |
|
||||
| 3b | Chunk Embeddings (≈50 chunks) | 4 concurrent | 1-3s | 10s | Batch API latency |
|
||||
| 4 | Ranking | 1 | <0.1s | 0.5s | CPU (vector math) |
|
||||
| 5 | Final Answer Streaming | 1 | 5-20s | 40s | LLM generation speed |
|
||||
| **Total** | | | **16-50s** | **~60s** | |
|
||||
|
||||
### Phase Details
|
||||
|
||||
#### Phase 1: Query Generation (2-5s)
|
||||
- Single non-streaming LLM call
|
||||
- Input: system prompt + user question (~200 tokens)
|
||||
- Output: JSON array of 3-5 short strings (~50 tokens)
|
||||
- Fast because small context and output
|
||||
|
||||
#### Phase 2a: Searches (3-8s)
|
||||
- 3 parallel `SearxngClient.SearchAsync` calls
|
||||
- Each: query → SearxNG → aggregator engines → scraped results
|
||||
- Latency highly variable based on:
|
||||
- SearxNG instance performance
|
||||
- Network distance to SearxNG
|
||||
- SearxNG's upstream search engines
|
||||
|
||||
#### Phase 2b: Article Fetching (5-15s)
|
||||
- ≈15 URLs to fetch (3 queries × 5 results minus duplicates)
|
||||
- Up to 10 concurrent fetches (semaphore)
|
||||
- Each: TCP connect + TLS handshake + HTTP GET + SmartReader parse
|
||||
- Latency:
|
||||
- Fast sites (CDN, cached): 200-500ms
|
||||
- Normal sites: 1-3s
|
||||
- Slow/unresponsive sites: timeout after ~30s
|
||||
|
||||
Why 5-15s for 15 URLs with 10 concurrent?
|
||||
- First wave (10 URLs): max latency among them ≈ 3s → 3s
|
||||
- Second wave (5 URLs): another ≈ 3s → total 6s
|
||||
- But many URLs faster (500ms) → total ≈ 2-3s
|
||||
- However, some sites take 5-10s → dominates
|
||||
|
||||
**Tail latency**: Slowest few URLs can dominate total time. Cannot proceed until all fetch attempts complete (or fail).
|
||||
|
||||
#### Phase 2c: Chunking (<1s)
|
||||
- CPU-bound HTML cleaning and splitting
|
||||
- SmartReader is surprisingly fast; C# HTML parser
|
||||
- Typically 100-300 chunks total
|
||||
- <1s on modern CPU
|
||||
|
||||
#### Phase 3: Embeddings (1.5-4s)
|
||||
- **Query embedding**: 1 call, ~200 tokens, ≈ 0.5-1s
|
||||
- **Chunk embeddings**: ≈50 chunks → 1 batch of 50 (batch size 300 unused here)
|
||||
- Batch of 50: still single API call, ~15K tokens (50 × 300 chars ≈ 15K tokens)
|
||||
- If using `text-embedding-3-small`: $0.00002 per 1K → ~$0.0003 per batch
|
||||
- Latency: 1-3s for embedding API
|
||||
|
||||
If more chunks (say 500), would be 2 batches → maybe 2-4s.
|
||||
|
||||
Parallel batches (4 concurrent) help if many batches (1500+ chunks).
|
||||
|
||||
#### Phase 4: Ranking (<0.1s)
|
||||
- Cosine similarity for 50-100 chunks
|
||||
- Each: dot product + normalization (O(dim)=1536)
|
||||
- 100 × 1536 ≈ 150K FLOPs → negligible on modern CPU
|
||||
- SIMD acceleration from `TensorPrimitives`
|
||||
|
||||
#### Phase 5: Final Answer (5-20s)
|
||||
- Streaming chat completion
|
||||
- Input: system prompt + context (50K tokens for 3×500-char chunks) + question
|
||||
- Output: varies wildly (200-2000 tokens typically)
|
||||
- Longer context slightly increases latency
|
||||
- Model choice major factor:
|
||||
- Qwen Flash: fast (5-10s for 1000 output tokens)
|
||||
- Gemini Flash: moderate (10-15s)
|
||||
- Llama-class: slower (20-40s)
|
||||
|
||||
---
|
||||
|
||||
## Throughput
|
||||
|
||||
### Sequential Execution
|
||||
|
||||
Running queries one after another (default CLI behavior):
|
||||
- Latency per query: 16-50s
|
||||
- Throughput: 1 query / 20s ≈ 180 queries/hour (theoretically)
|
||||
|
||||
But API rate limits will kick in before that:
|
||||
- OpenRouter free tier: limited RPM/TPM
|
||||
- Even paid: soft limits
|
||||
|
||||
### Concurrent Execution (Multiple OpenQuery Instances)
|
||||
|
||||
You could run multiple OpenQuery processes in parallel (different terminals), but they share:
|
||||
- Same API key (OpenRouter rate limit is per API key, not per process)
|
||||
- Same SearxNG instance (could saturate it)
|
||||
|
||||
**Practical**: 3-5 concurrent processes before hitting diminishing returns or rate limits.
|
||||
|
||||
### Throughput Optimization
|
||||
|
||||
To maximize queries per hour:
|
||||
1. Use fastest model (Qwen Flash)
|
||||
2. Reduce `--chunks` to 1-2
|
||||
3. Reduce `--queries` to 1
|
||||
4. Use local/fast SearxNG
|
||||
5. Cache embedding results (not implemented)
|
||||
6. Batch multiple questions in one process (not implemented; would require redesign)
|
||||
|
||||
**Achievable**: Maybe 500-1000 queries/hour on paid OpenRouter plan with aggressive settings.
|
||||
|
||||
---
|
||||
|
||||
## Memory Usage
|
||||
|
||||
### Baseline
|
||||
|
||||
.NET 10 AOT app with dependencies:
|
||||
- **Code**: ~30MB (AOT compiled native code)
|
||||
- **Runtime**: ~20MB (.NET runtime overhead)
|
||||
- **Base Memory**: ~50MB
|
||||
|
||||
### Per-Query Memory
|
||||
|
||||
| Component | Memory | Lifetime |
|
||||
|-----------|--------|----------|
|
||||
| Search results (15 items) | ~30KB | Pipeline |
|
||||
| Articles (raw HTML) | ~5MB (transient) | Freed after parse |
|
||||
| Articles (extracted text) | ~500KB | Until pipeline complete |
|
||||
| Chunks (≈100 items) | ~50KB text + embeddings 600KB | Until pipeline complete |
|
||||
| Embeddings (100 × 1536 floats) | ~600KB | Until pipeline complete |
|
||||
| HTTP buffers | ~1MB per concurrent request | Short-lived |
|
||||
| **Total per query** | **~2-5MB** (excluding base) | Released after complete |
|
||||
|
||||
**Peak**: When all articles fetched but not yet embedded, we have text ~500KB + chunks ~650KB = ~1.2MB + overhead ≈ 2-3MB.
|
||||
|
||||
**If processing many queries in parallel** (unlikely for CLI), memory would scale linearly.
|
||||
|
||||
### Memory Leak Risks
|
||||
|
||||
- `HttpClient` instances: Created per `OpenRouterClient` and `SearxngClient`. Should be disposed (not happening). But short-lived process exits anyway.
|
||||
- `StatusReporter` background task: Disposed via `using`
|
||||
- `RateLimiter` semaphore: Disposed via `IAsyncDisposable` if wrapped in `using` (not currently, but short-lived)
|
||||
|
||||
No major leaks observed.
|
||||
|
||||
### Memory Optimization Opportunities
|
||||
|
||||
1. **Reuse HttpClient** with `IHttpClientFactory` (but not needed for CLI)
|
||||
2. **Stream article fetching** instead of buffering all articles before embedding (possible: embed as URLs complete)
|
||||
3. **Early chunk filtering**: Discard low-quality chunks before embedding to reduce embedding count
|
||||
4. **Cache embeddings**: By content hash, avoid re-embedding seen text (would need persistent storage)
|
||||
|
||||
---
|
||||
|
||||
## Benchmarking
|
||||
|
||||
### Methodology
|
||||
|
||||
Measure with `time` command and verbose logging:
|
||||
|
||||
```bash
|
||||
time openquery -v "What is quantum entanglement?" 2>&1 | tee log.txt
|
||||
```
|
||||
|
||||
Parse log for timestamps (or add them manually by modifying code).
|
||||
|
||||
### Sample Benchmark
|
||||
|
||||
**Environment**:
|
||||
- Linux x64, .NET 10 AOT
|
||||
- SearxNG local Docker (localhost:8002)
|
||||
- OpenRouter API (US East)
|
||||
- Model: qwen/qwen3.5-flash-02-23
|
||||
|
||||
**Run 1**:
|
||||
```
|
||||
real 0m23.4s
|
||||
user 0m1.2s
|
||||
sys 0m0.3s
|
||||
```
|
||||
Log breakdown:
|
||||
- Query generation: 3.2s
|
||||
- Searches: 4.1s
|
||||
- Article fetching: 8.7s (12 URLs)
|
||||
- Embeddings: 2.8s (45 chunks)
|
||||
- Final answer: 4.6s (325 tokens)
|
||||
|
||||
**Run 2** (cached SearxNG results, same URLs):
|
||||
```
|
||||
real 0m15.8s
|
||||
```
|
||||
Faster article fetching (2.3s) because sites cached or faster second request.
|
||||
|
||||
**Run 3** (verbose `-s` short answer):
|
||||
```
|
||||
real 0m18.2s
|
||||
```
|
||||
Final answer faster (2.1s instead of 4.6s) due to shorter output.
|
||||
|
||||
### Benchmarking Tips
|
||||
|
||||
1. **Warm up**: First run slower (JIT or AOT cold start). Discard first measurement.
|
||||
2. **Network variance**: Run multiple times and average.
|
||||
3. **Control variables**: Same question, same SearxNG instance, same network conditions.
|
||||
4. **Measure API costs**: Check OpenRouter dashboard for token counts.
|
||||
5. **Profile with dotTrace** or `perf` if investigating CPU bottlenecks.
|
||||
|
||||
---
|
||||
|
||||
## Optimization Strategies
|
||||
|
||||
### 1. Tune Concurrent Limits
|
||||
|
||||
Edit `SearchTool.cs` where `_options` is created:
|
||||
|
||||
```csharp
|
||||
var _options = new ParallelProcessingOptions
|
||||
{
|
||||
MaxConcurrentArticleFetches = 5, // ↓ from 10
|
||||
MaxConcurrentEmbeddingRequests = 2, // ↓ from 4
|
||||
EmbeddingBatchSize = 300 // ↑ or ↓ (rarely matters)
|
||||
};
|
||||
```
|
||||
|
||||
**Why tune down?**
|
||||
- Hit OpenRouter rate limits
|
||||
- Network bandwidth saturated
|
||||
- Too many concurrent fetches overwhelm target sites (ethical/scraping etiquette)
|
||||
|
||||
**Why tune up?**
|
||||
- Fast network, powerful CPU, no rate limits
|
||||
- Many chunks (>500) needing parallel embedding batches
|
||||
|
||||
**Monitor**:
|
||||
- `openquery -v` shows embedding progress: `[Generating embeddings: batch X/Y]`
|
||||
- If Y=1 (all fitted in one batch), batch size is fine
|
||||
- If Y>1 and max concurrent = Y, you're using full parallelism
|
||||
|
||||
### 2. Reduce Data Volume
|
||||
|
||||
**Fewer search results**:
|
||||
```bash
|
||||
openquery -r 3 "question" # instead of 5 or 10
|
||||
```
|
||||
Effect: Fetches fewer URLs, extracts fewer chunks. Linear reduction in work.
|
||||
|
||||
**Fewer queries**:
|
||||
```bash
|
||||
openquery -q 1 "question"
|
||||
```
|
||||
Effect: One search instead of N. Quality may suffer (less diverse sources).
|
||||
|
||||
**Fewer chunks**:
|
||||
```bash
|
||||
openquery -c 1 "question"
|
||||
```
|
||||
Effect: Only top 1 chunk in context → fewer tokens → faster final answer, but may miss relevant info.
|
||||
|
||||
**Chunk size** (compile-time constant):
|
||||
Edit `ChunkingService.cs`:
|
||||
```csharp
|
||||
private const int MAX_CHUNK_SIZE = 300; // instead of 500
|
||||
```
|
||||
Effect: More chunks (more granular ranking) but each chunk shorter → more chunks to rank, more embeddings to generate. Could increase or decrease total time. Likely more tokens overall (more chunks in context if `-c` is fixed number).
|
||||
|
||||
### 3. Change Embedding Model
|
||||
|
||||
Currently hardcoded to `openai/text-embedding-3-small`. Could use:
|
||||
- `openai/text-embedding-3-large` (higher quality, slower, more expensive)
|
||||
- `intfloat/multilingual-e5-large` (multilingual, smaller)
|
||||
|
||||
Modify `EmbeddingService` constructor:
|
||||
```csharp
|
||||
public EmbeddingService(OpenRouterClient client, string embeddingModel = "your-model")
|
||||
```
|
||||
|
||||
Then pass:
|
||||
```csharp
|
||||
var embeddingService = new EmbeddingService(client, "intfloat/multilingual-e5-large");
|
||||
```
|
||||
|
||||
**Impact**: Different dimensionality (1536 vs 1024 vs 4096). Memory scales with dim. Quality may vary for non-English queries.
|
||||
|
||||
### 4. Caching
|
||||
|
||||
**Current**: No caching. Every query hits all APIs.
|
||||
|
||||
**Embedding cache** (by text hash):
|
||||
- Could store in memory: `Dictionary<string, float[]>`
|
||||
- Or disk: `~/.cache/openquery/embeddings/`
|
||||
- Invalidation: embeddings are deterministic per model, so long-term cache viable
|
||||
|
||||
**Search cache** (by query hash):
|
||||
- Cache `List<SearxngResult>` for identical queries
|
||||
- TTL: maybe 1 hour (search results change over time)
|
||||
|
||||
**Article cache** (by URL hash):
|
||||
- Cache `Article` (text content) per URL
|
||||
- Invalidation: could check `Last-Modified` header or use TTL (1 day)
|
||||
|
||||
**Implementation effort**: Medium. Would need cache abstraction (interface, in-memory + disk options).
|
||||
|
||||
**Benefit**: Repeat queries (common in testing or similar questions) become instant.
|
||||
|
||||
### 5. Parallelize More (Aggressive)
|
||||
|
||||
**Currently**:
|
||||
- Searches: unbounded (as many as `--queries`)
|
||||
- Fetches: max 10
|
||||
- Embeddings: max 4
|
||||
|
||||
Could increase:
|
||||
- Fetches to 20 or 50 (if network/CPU can handle)
|
||||
- Embeddings to 8-16 (if OpenRouter rate limit allows)
|
||||
|
||||
**Risk**:
|
||||
- Overwhelming target sites (unethical scraping)
|
||||
- API rate limits → 429 errors
|
||||
- Local bandwidth saturation
|
||||
|
||||
### 6. Local Models (Self-Hosted)
|
||||
|
||||
Replace OpenRouter with local LLM:
|
||||
- **Query generation**: Could run tiny model locally (no API latency)
|
||||
- **Embeddings**: Could run `all-MiniLM-L6-v2` locally (fast, free after setup)
|
||||
- **Answer**: Could run Llama 3 8B locally (no cost, but slower than GPT-4/Gemini)
|
||||
|
||||
**Benefits**:
|
||||
- Zero API costs (after hardware)
|
||||
- No network latency
|
||||
- Unlimited queries
|
||||
|
||||
**Drawbacks**:
|
||||
- GPU required for decent speed (or CPU very slow)
|
||||
- Setup complexity (Ollama, llama.cpp, vLLM, etc.)
|
||||
- Model quality may lag behind commercial APIs
|
||||
|
||||
**Integration**: Would need to implement local inference backends (separate project scope).
|
||||
|
||||
---
|
||||
|
||||
## Scalability Limits
|
||||
|
||||
### API Rate Limits
|
||||
|
||||
**OpenRouter**:
|
||||
- Free tier: Very limited (few RPM)
|
||||
- Paid: Varies by model, but typical ~10-30 requests/second
|
||||
- Embedding API has separate limits
|
||||
|
||||
**Mitigation**:
|
||||
- Reduce concurrency (see tuning)
|
||||
- Add exponential backoff (already have for embeddings)
|
||||
- Batch embedding requests (already done)
|
||||
|
||||
### SearxNG Limits
|
||||
|
||||
**Single instance**:
|
||||
- Can handle ~10-50 QPS depending on hardware
|
||||
- Upstream search engines may rate limit per instance
|
||||
- Memory ~100-500MB
|
||||
|
||||
**Mitigation**:
|
||||
- Run multiple SearxNG instances behind load balancer
|
||||
- Use different public instances
|
||||
- Implement client-side rate limiting (currently only per-URL fetches limited, not searches)
|
||||
|
||||
### Network Bandwidth
|
||||
|
||||
**Typical data transfer**:
|
||||
- Searches: 1KB per query × 3 = 3KB
|
||||
- Articles: 100-500KB per fetch × 15 = 1.5-7.5MB (raw HTML)
|
||||
- Extracted text: ~10% of HTML size = 150-750KB
|
||||
- Embeddings: 100 chunks × 1536 × 4 bytes = 600KB (request + response)
|
||||
- Final answer: 2-10KB
|
||||
|
||||
**Total**: ~3-10MB per query
|
||||
|
||||
**100 queries/hour**: ~300MB-1GB data transfer
|
||||
|
||||
**Not an issue** for broadband, but could matter on metered connections.
|
||||
|
||||
---
|
||||
|
||||
## Moatslaw's Law: Scaling with Chunk Count
|
||||
|
||||
Let:
|
||||
- C = number of chunks with valid embeddings
|
||||
- d = embedding dimension (1536)
|
||||
- B = embedding batch size (300)
|
||||
- P = max parallel embedding batches (4)
|
||||
|
||||
**Embedding Time** ≈ `O(C/B * 1/P)` (batches divided by parallelism)
|
||||
|
||||
**Ranking Time** ≈ `O(C * d)` (dot product per chunk)
|
||||
|
||||
**Context Tokens** (for final answer) ≈ `C * avg_chunk_tokens` (≈ 500 chars = 125 tokens)
|
||||
|
||||
**As C increases**:
|
||||
- Embedding time: linear in C/B (sublinear if batch fits in one)
|
||||
- Ranking time: linear in C
|
||||
- Final answer latency: more tokens in context → longer context processing + potentially longer answer (more relevant chunks to synthesize)
|
||||
|
||||
**Practical limit**:
|
||||
- With defaults, C ~ 50-100 (from 15 articles)
|
||||
- Could reach C ~ 500-1000 if:
|
||||
- `--queries` = 10
|
||||
- `--results` = 20 (200 URLs)
|
||||
- Many articles long → many chunks each
|
||||
- At C = 1000:
|
||||
- Embeddings: 1000/300 ≈ 4 batches, with 4 parallel → still 1 sequential step (if 4 batches, parallel all 4 → time ≈ 1 batch duration)
|
||||
- But OpenRouter may have per-minute limits on embedding requests
|
||||
- Ranking: 1000 × 1536 = 1.5M FLOPs → still <0.01s
|
||||
- Context tokens: 1000 × 125 = 125K tokens! Many LLMs have 200K context, so fits, but expensive and slow.
|
||||
|
||||
**Conclusion**: Current defaults scale to C ~ 100-200 comfortably. Beyond that:
|
||||
- Need to increase batch size or parallelism for embeddings
|
||||
- May hit embedding API rate limits
|
||||
- Context token count becomes expensive and may degrade answer quality (LLMs lose focus in very long context)
|
||||
|
||||
---
|
||||
|
||||
## Profiling
|
||||
|
||||
### CPU Profiling
|
||||
|
||||
Use `dotnet-trace` or `perf`:
|
||||
|
||||
```bash
|
||||
# Collect trace for 30 seconds while running query
|
||||
dotnet-trace collect --process-id $(pgrep OpenQuery) --duration 30s -o trace.nettrace
|
||||
|
||||
# Analyze with Visual Studio or PerfView
|
||||
```
|
||||
|
||||
Look for:
|
||||
- Hot methods: `ChunkingService.ChunkText`, `EmbeddingService.GetEmbeddingsAsync`, cosine similarity
|
||||
- Allocation hotspots
|
||||
|
||||
### Memory Profiling
|
||||
|
||||
```bash
|
||||
dotnet-gcdump collect -p <pid>
|
||||
# Open in VS or dotnet-gcdump analyze
|
||||
```
|
||||
|
||||
Check heap size, object counts (look for large `string` objects from article content).
|
||||
|
||||
### Network Profiling
|
||||
|
||||
Use `tcpdump` or `wireshark`:
|
||||
```bash
|
||||
tcpdump -i any port 8002 or port 443 -w capture.pcap
|
||||
```
|
||||
|
||||
Or simpler: `time` on individual curl commands to measure latency components.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Configuration](../configuration.md) - Tune for your environment
|
||||
- [Troubleshooting](../troubleshooting.md) - Diagnose slow performance
|
||||
- [Architecture](../architecture.md) - Understand pipeline bottlenecks
|
||||
|
||||
---
|
||||
|
||||
**Quick Tuning Cheatsheet**
|
||||
|
||||
```bash
|
||||
# Fast & cheap (factual Q&A)
|
||||
openquery -q 1 -r 3 -c 2 -s "What is X?"
|
||||
|
||||
# Thorough (research)
|
||||
openquery -q 5 -r 10 -c 5 -l "Deep dive on X"
|
||||
|
||||
# Custom code edit for concurrency
|
||||
# In SearchTool.cs:
|
||||
_options = new ParallelProcessingOptions {
|
||||
MaxConcurrentArticleFetches = 20, // if network can handle
|
||||
MaxConcurrentEmbeddingRequests = 8 // if API allows
|
||||
};
|
||||
```
|
||||
699
docs/troubleshooting.md
Normal file
699
docs/troubleshooting.md
Normal file
@@ -0,0 +1,699 @@
|
||||
# Troubleshooting
|
||||
|
||||
Solve common issues, errors, and performance problems with OpenQuery.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Common Errors](#common-errors)
|
||||
2. [Performance Issues](#performance-issues)
|
||||
3. [Debugging Strategies](#debugging-strategies)
|
||||
4. [Getting Help](#getting-help)
|
||||
|
||||
## Common Errors
|
||||
|
||||
### ❌ "API Key is missing"
|
||||
|
||||
**Error Message**:
|
||||
```
|
||||
[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i' to set it up.
|
||||
```
|
||||
|
||||
**Cause**: No API key available from environment or config file.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Set environment variable** (temporary):
|
||||
```bash
|
||||
export OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
|
||||
```
|
||||
|
||||
2. **Configure interactively** (persistent):
|
||||
```bash
|
||||
openquery configure -i
|
||||
# Follow prompts to enter API key
|
||||
```
|
||||
|
||||
3. **Check config file**:
|
||||
```bash
|
||||
cat ~/.config/openquery/config
|
||||
# Should contain: ApiKey=sk-or-...
|
||||
```
|
||||
|
||||
4. **Verify environment**:
|
||||
```bash
|
||||
echo $OPENROUTER_API_KEY
|
||||
# If empty, you didn't export or exported in wrong shell
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ❌ "Network request failed"
|
||||
|
||||
**Error Message**:
|
||||
```
|
||||
[Error] Network request failed. Details: Name or service not known
|
||||
```
|
||||
|
||||
**Cause**: Cannot reach OpenRouter or SearxNG API endpoints.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Check internet connectivity**:
|
||||
```bash
|
||||
ping 8.8.8.8
|
||||
curl https://openrouter.ai
|
||||
```
|
||||
|
||||
2. **Verify SearxNG is running**:
|
||||
```bash
|
||||
curl "http://localhost:8002/search?q=test&format=json"
|
||||
# Should return JSON
|
||||
```
|
||||
|
||||
If connection refused:
|
||||
```bash
|
||||
# Start SearxNG if using Docker
|
||||
docker start searxng
|
||||
# Or run fresh
|
||||
docker run -d --name searxng -p 8002:8080 searxng/searxng:latest
|
||||
```
|
||||
|
||||
3. **Check firewall/proxy**:
|
||||
```bash
|
||||
# Test OpenRouter API
|
||||
curl -H "Authorization: Bearer $OPENROUTER_API_KEY" \
|
||||
https://openrouter.ai/api/v1/models
|
||||
```
|
||||
|
||||
4. **Test from different network** (if behind restrictive firewall)
|
||||
|
||||
---
|
||||
|
||||
### ❌ "No search results found"
|
||||
|
||||
**Error Message**:
|
||||
```
|
||||
No search results found.
|
||||
```
|
||||
|
||||
**Cause**: Search queries returned zero results from SearxNG.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Test SearxNG manually**:
|
||||
```bash
|
||||
curl "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
|
||||
# Should be > 0
|
||||
```
|
||||
|
||||
2. **Check SearxNG configuration**:
|
||||
- If self-hosted: ensure internet access is enabled in `/etc/searxng/settings.yml`
|
||||
- Some public instances disable certain engines or have rate limits
|
||||
|
||||
3. **Try a different SearxNG instance**:
|
||||
```bash
|
||||
export SEARXNG_URL="https://searx.example.com"
|
||||
openquery "question"
|
||||
```
|
||||
|
||||
4. **Use simpler queries**: Some queries may be too obscure or malformed
|
||||
|
||||
5. **Verbose mode to see queries**:
|
||||
```bash
|
||||
openquery -v "complex question"
|
||||
# See what queries were generated
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ❌ "Found search results but could not extract readable content."
|
||||
|
||||
**Cause**: SearxNG returned results but `ArticleService` failed to extract content from all URLs.
|
||||
|
||||
**Common Reasons**:
|
||||
- JavaScript-heavy sites (React, Vue apps) where content loaded dynamically
|
||||
- Paywalled sites (NYT, academic journals)
|
||||
- PDFs or non-HTML content
|
||||
- Malformed HTML
|
||||
- Server returned error (404, 403, 500)
|
||||
- `robots.txt` blocked crawler
|
||||
|
||||
**Solutions**:
|
||||
1. **Accept that some sites can't be scraped** - try different query to get different results
|
||||
2. **Use site:reddit.com or site:wikipedia.org** - these are usually scrape-friendly
|
||||
3. **Increase `--results`** to get more URLs (some will work)
|
||||
4. **Check verbose output**:
|
||||
```bash
|
||||
openquery -v "question"
|
||||
# Look for "Warning: Failed to fetch article"
|
||||
```
|
||||
5. **Try a local SearxNG instance with more engines** - some engines fetch different sources
|
||||
|
||||
---
|
||||
|
||||
### ❌ Rate Limiting (429 Too Many Requests)
|
||||
|
||||
**Symptoms**:
|
||||
```bash
|
||||
[Error] Response status code does not indicate success: 429 (Too Many Requests).
|
||||
```
|
||||
|
||||
Or retries exhausting after Polly attempts.
|
||||
|
||||
**Cause**: Too many concurrent requests to OpenRouter API.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Reduce concurrency** (edit `SearchTool.cs`):
|
||||
```csharp
|
||||
var _options = new ParallelProcessingOptions
|
||||
{
|
||||
MaxConcurrentArticleFetches = 5, // reduce from 10
|
||||
MaxConcurrentEmbeddingRequests = 2, // reduce from 4
|
||||
EmbeddingBatchSize = 150 // reduce from 300
|
||||
};
|
||||
```
|
||||
|
||||
2. **Add delay** between embedding batches (custom implementation)
|
||||
|
||||
3. **Upgrade OpenRouter plan** to higher rate limits
|
||||
|
||||
4. **Wait and retry** - rate limits reset after time window
|
||||
|
||||
---
|
||||
|
||||
### ❌ Slow Performance
|
||||
|
||||
**Symptom**: Queries take 60+ seconds when they usually take 20s.
|
||||
|
||||
**Diagnosis Steps**:
|
||||
|
||||
1. **Run with verbose mode**:
|
||||
```bash
|
||||
openquery -v "question"
|
||||
```
|
||||
Watch which phase takes longest:
|
||||
- Query generation?
|
||||
- Searching?
|
||||
- Fetching articles?
|
||||
- Embeddings?
|
||||
|
||||
2. **Check network latency**:
|
||||
```bash
|
||||
time curl "https://openrouter.ai/api/v1/models"
|
||||
time curl "http://localhost:8002/search?q=test&format=json"
|
||||
```
|
||||
|
||||
**Common Causes & Fixes**:
|
||||
|
||||
| Phase | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| Searches | SearxNG overloaded/slow | Check CPU/memory, restart container |
|
||||
| Fetching | Target sites slow | Reduce `--results` to fewer URLs |
|
||||
| Embeddings | API rate limited | Reduce concurrency (see above) |
|
||||
| Answer | Heavy model/load | Switch to faster model (e.g., Qwen Flash) |
|
||||
|
||||
3. **Resource monitoring**:
|
||||
```bash
|
||||
htop # CPU/memory usage
|
||||
iftop # network throughput
|
||||
```
|
||||
|
||||
4. **Reduce parameters**:
|
||||
```bash
|
||||
openquery -q 2 -r 3 -c 2 "question" # lighter load
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ❌ Out of Memory
|
||||
|
||||
**Symptoms**:
|
||||
- Process killed by OOM killer (Linux)
|
||||
- `System.OutOfMemoryException`
|
||||
- System becomes unresponsive
|
||||
|
||||
**Cause**: Processing too many large articles simultaneously.
|
||||
|
||||
**Why**: Each article can be 100KB+ of text, split into many chunks, embeddings are 6KB per chunk (1536 floats × 4 bytes). 200 chunks = 1.2MB embeddings, plus text ~100KB = 1.3MB. Not huge, but many large articles could create thousands of chunks.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Reduce `--results`** (fewer URLs per query):
|
||||
```bash
|
||||
openquery -r 3 "question" # instead of 10
|
||||
```
|
||||
|
||||
2. **Reduce `--queries`** (fewer search queries):
|
||||
```bash
|
||||
openquery -q 2 "question"
|
||||
```
|
||||
|
||||
3. **Fetches already limited** to 10 concurrent by default, which is reasonable
|
||||
|
||||
4. **Check article size**: Some sites (PDFs, long documents) may yield megabytes of text; SmartReader should truncate but may not
|
||||
|
||||
---
|
||||
|
||||
### ❌ Invalid JSON from Query Generation
|
||||
|
||||
**Symptom**: Query generation fails silently, falls back to original question.
|
||||
|
||||
**Cause**: LLM returned non-JSON (even though instructed). Could be:
|
||||
- Model not instruction-following
|
||||
- Output exceeded context window
|
||||
- API error in response
|
||||
|
||||
**Detection**: Run with `-v` to see:
|
||||
```
|
||||
[Failed to generate queries, falling back to original question. Error: ...]
|
||||
```
|
||||
|
||||
**Solutions**:
|
||||
- Try a different model (configure to use Gemini or DeepSeek)
|
||||
- Reduce `--queries` count (simpler task)
|
||||
- Tune system prompt (would require code change)
|
||||
- Accept fallback - the original question often works as sole query
|
||||
|
||||
---
|
||||
|
||||
### ❌ Spinner Artifacts in Output
|
||||
|
||||
**Symptom**: When redirecting output to file, you see weird characters like `⠋`, `<60>`, etc.
|
||||
|
||||
**Cause**: Spinner uses Unicode Braille characters and ANSI escape codes.
|
||||
|
||||
**Fix**: Use `2>/dev/null | sed 's/.\x08//g'` to clean:
|
||||
```bash
|
||||
openquery "question" 2>/dev/null | sed 's/.\x08//g' > answer.md
|
||||
```
|
||||
|
||||
Or run with `--verbose` (no spinner, only newline-separated messages):
|
||||
```bash
|
||||
openquery -v "question" > answer.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ❌ "The type or namespace name '...' does not exist" (Build Error)
|
||||
|
||||
**Cause**: Missing NuGet package or wrong .NET SDK version.
|
||||
|
||||
**Solution**:
|
||||
|
||||
1. **Verify .NET SDK 10.0**:
|
||||
```bash
|
||||
dotnet --version
|
||||
# Should be 10.x
|
||||
```
|
||||
|
||||
If lower: https://dotnet.microsoft.com/download/dotnet/10.0
|
||||
|
||||
2. **Restore packages**:
|
||||
```bash
|
||||
dotnet restore
|
||||
```
|
||||
|
||||
3. **Clean and rebuild**:
|
||||
```bash
|
||||
dotnet clean
|
||||
dotnet build
|
||||
```
|
||||
|
||||
4. **Check OpenQuery.csproj** for package references:
|
||||
```xml
|
||||
<PackageReference Include="Polly.Core" Version="8.6.6" />
|
||||
<PackageReference Include="Polly.RateLimiting" Version="8.6.6" />
|
||||
<PackageReference Include="SmartReader" Version="0.11.0" />
|
||||
<PackageReference Include="System.CommandLine" Version="2.0.0-beta4.22272.1" />
|
||||
<PackageReference Include="System.Numerics.Tensors" Version="9.0.0" />
|
||||
```
|
||||
|
||||
If restore fails, these packages may not be available for .NET 10 preview. Consider:
|
||||
- Downgrade to .NET 8.0 (if packages incompatible)
|
||||
- Or find package versions compatible with .NET 10
|
||||
|
||||
---
|
||||
|
||||
### ❌ AOT Compilation Fails
|
||||
|
||||
**Error**: `error NETSDK1085: The current .NET SDK does not support targeting .NET 10.0.`
|
||||
|
||||
**Cause**: Using .NET SDK older than 10.0.
|
||||
|
||||
**Fix**: Install .NET SDK 10.0 preview.
|
||||
|
||||
**Or**: Disable AOT for development (edit `.csproj`):
|
||||
```xml
|
||||
<!-- Remove or set to false -->
|
||||
<PublishAot>false</PublishAot>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Issues
|
||||
|
||||
### Slow First Request
|
||||
|
||||
**Expected**: First query slower (JIT compilation for .NET runtime if not AOT, or initial API connections).
|
||||
|
||||
If not using AOT:
|
||||
- Consider publishing with `/p:PublishAot=true` for production distribution
|
||||
- Development builds use JIT, which adds 500ms-2s warmup
|
||||
|
||||
**Mitigation**: Accept as warmup cost, or pre-warm with dummy query.
|
||||
|
||||
---
|
||||
|
||||
### High Memory Usage
|
||||
|
||||
**Check**:
|
||||
```bash
|
||||
ps aux | grep OpenQuery
|
||||
# Look at RSS (resident set size)
|
||||
```
|
||||
|
||||
**Typical**: 50-200MB (including .NET runtime, AOT code, data structures)
|
||||
|
||||
**If >500MB**:
|
||||
- Likely processing very many articles
|
||||
- Check `--results` and `--queries` values
|
||||
- Use `--verbose` to see counts: `[Fetched X search results]`, `[Extracted Y chunks]`
|
||||
|
||||
**Reduce**:
|
||||
- `--queries 2` instead of 10
|
||||
- `--results 3` instead of 15
|
||||
- These directly limit number of URLs to fetch
|
||||
|
||||
---
|
||||
|
||||
### High CPU Usage
|
||||
|
||||
**Cause**:
|
||||
- SmartReader HTML parsing (CPU-bound)
|
||||
- Cosine similarity calculations (many chunks, but usually fast)
|
||||
- Spinner animation (negligible)
|
||||
|
||||
**Check**: `htop` → which core at 100%? If single core, likely parsing. If all cores, parallel fetch.
|
||||
|
||||
**Mitigation**:
|
||||
- Ensure `MaxConcurrentArticleFetches` not excessively high (default 10 is okay)
|
||||
- Accept - CPU spikes normal during fetch phase
|
||||
|
||||
---
|
||||
|
||||
### API Costs Higher Than Expected
|
||||
|
||||
**Symptom**: OpenRouter dashboard shows high token usage.
|
||||
|
||||
**Causes**:
|
||||
1. Using expensive model (check `OPENROUTER_MODEL`)
|
||||
2. High `--chunks` → more tokens in context
|
||||
3. High `--queries` + `--results` → many articles → many embedding tokens (usually cheap)
|
||||
4. Long answers (many completion tokens) - especially with `--long`
|
||||
|
||||
**Mitigation**:
|
||||
- Use `qwen/qwen3.5-flash-02-23` (cheapest good option)
|
||||
- Reduce `--chunks` to 2-3
|
||||
- Use `--short` when detailed answer not needed
|
||||
- Set `MaxTokens` in request (would need code change or **LLM capabilities**
|
||||
|
||||
---
|
||||
|
||||
## Debugging Strategies
|
||||
|
||||
### 1. Enable Verbose Mode
|
||||
|
||||
Always start with:
|
||||
```bash
|
||||
openquery -v "question" 2>&1 | tee debug.log
|
||||
```
|
||||
|
||||
Logs everything:
|
||||
- Generated queries
|
||||
- URLs fetched
|
||||
- Progress counts
|
||||
- Errors/warnings
|
||||
|
||||
**Analyze log**:
|
||||
- How many queries generated? (Should match `--queries`)
|
||||
- How many search results per query? (Should be ≤ `--results`)
|
||||
- How many articles fetched successfully?
|
||||
- How many chunks extracted?
|
||||
- Any warnings?
|
||||
|
||||
---
|
||||
|
||||
### 2. Isolate Components
|
||||
|
||||
**Test SearxNG**:
|
||||
```bash
|
||||
curl "http://localhost:8002/search?q=test&format=json" | jq '.results[0]'
|
||||
```
|
||||
|
||||
**Test OpenRouter API**:
|
||||
```bash
|
||||
curl -X POST https://openrouter.ai/api/v1/chat/completions \
|
||||
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model":"qwen/qwen3.5-flash-02-23","messages":[{"role":"user","content":"Hello"}]}'
|
||||
```
|
||||
|
||||
**Test Article Fetching** (with known good URL):
|
||||
```bash
|
||||
curl -L "https://example.com/article" | head -50
|
||||
```
|
||||
Then check if SmartReader can parse.
|
||||
|
||||
---
|
||||
|
||||
### 3. Reduce Scope
|
||||
|
||||
Test with minimal parameters to isolate failing phase:
|
||||
|
||||
```bash
|
||||
# 1 query, 2 results, 1 chunk - should be fast and simple
|
||||
openquery -q 1 -r 2 -c 1 "simple test question" -v
|
||||
|
||||
# If that works, gradually increase:
|
||||
openquery -q 1 -r 5 -c 1 "simple question"
|
||||
openquery -q 3 -r 5 -c 1 "simple question"
|
||||
openquery -q 3 -r 5 -c 3 "simple question"
|
||||
|
||||
# Then try complex question
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Check Resource Limits
|
||||
|
||||
**File descriptors**: If fetching many articles, may hit limit.
|
||||
```bash
|
||||
ulimit -n # usually 1024, should be fine
|
||||
```
|
||||
|
||||
**Memory**: Monitor with `free -h` while running.
|
||||
|
||||
**Disk space**: Not much disk use, but logs could fill if verbose mode used repeatedly.
|
||||
|
||||
---
|
||||
|
||||
### 5. Examine Config File
|
||||
|
||||
```bash
|
||||
cat ~/.config/openquery/config
|
||||
# Ensure no spaces around '='
|
||||
# Correct: ApiKey=sk-or-...
|
||||
# Wrong: ApiKey = sk-or-... (spaces become part of value)
|
||||
```
|
||||
|
||||
Reconfigure if needed:
|
||||
```bash
|
||||
openquery configure --key "sk-or-..."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Clear Cache / Reset
|
||||
|
||||
No persistent cache exists, but:
|
||||
- Re-start SearxNG container: `docker restart searxng`
|
||||
- Clear DNS cache if network issues: `sudo systemd-resolve --flush-caches`
|
||||
|
||||
---
|
||||
|
||||
## Getting Help
|
||||
|
||||
### Before Asking
|
||||
|
||||
Gather information:
|
||||
|
||||
1. **OpenQuery version** (commit or build date if available)
|
||||
2. **OS and architecture**: `uname -a` (Linux/macOS) or `systeminfo` (Windows)
|
||||
3. **Full command** you ran
|
||||
4. **Verbose output**: `openquery -v "question" 2>&1 | tee log.txt`
|
||||
5. **Config** (redact API key):
|
||||
```bash
|
||||
sed 's/ApiKey=.*/ApiKey=REDACTED/' ~/.config/openquery/config
|
||||
```
|
||||
6. **SearxNG test**:
|
||||
```bash
|
||||
curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
|
||||
```
|
||||
7. **OpenRouter test**:
|
||||
```bash
|
||||
curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \
|
||||
https://openrouter.ai/api/v1/models | jq '.data[0].id'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Where to Ask
|
||||
|
||||
1. **GitHub Issues** (if repository hosted there):
|
||||
- Search existing issues first
|
||||
- Provide all info from above
|
||||
- Include log file (or link to gist)
|
||||
|
||||
2. **Community Forum** (if exists)
|
||||
|
||||
3. **Self-Diagnose**:
|
||||
- Check `docs/troubleshooting.md` (this file)
|
||||
- Check `docs/configuration.md`
|
||||
- Check `docs/usage.md`
|
||||
|
||||
---
|
||||
|
||||
### Example Bug Report
|
||||
|
||||
```
|
||||
Title: OpenQuery hangs on "Fetching article X/Y"
|
||||
|
||||
Platform: Ubuntu 22.04, .NET 10.0, OpenQuery built from commit abc123
|
||||
Command: openquery -v "What is Docker?" 2>&1 | tee log.txt
|
||||
|
||||
Verbose output shows:
|
||||
[...]
|
||||
[Fetching article 1/15: docker.com]
|
||||
[Fetching article 2/15: hub.docker.com]
|
||||
[Fetching article 3/15: docs.docker.com]
|
||||
# Hangs here indefinitely, no more progress
|
||||
|
||||
SearxNG test:
|
||||
$ curl "http://localhost:8002/search?q=docker&format=json" | jq '.results | length'
|
||||
15 # SearxNG works
|
||||
|
||||
Config:
|
||||
ApiKey=sk-or-xxxx (redacted)
|
||||
Model=qwen/qwen3.5-flash-02-23
|
||||
DefaultQueries=3
|
||||
DefaultChunks=3
|
||||
DefaultResults=5
|
||||
|
||||
Observation:
|
||||
- Fetches 3 articles fine, then stalls
|
||||
- Nothing in log after "Fetching article 3/15"
|
||||
- Process uses ~150MB memory, CPU 0% (idle)
|
||||
- Ctrl+C exits immediately
|
||||
|
||||
Expected: Should fetch remaining 12 articles (concurrent up to 10)
|
||||
Actual: Only 3 fetched, then silent hang
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Known Issues
|
||||
|
||||
### Issue: Spinner Characters Not Displaying
|
||||
|
||||
Some terminals don't support Braille Unicode patterns.
|
||||
|
||||
**Symptoms**: Spinner shows as `?` or boxes.
|
||||
|
||||
**Fix**: Use font with Unicode support, or disable spinner by setting `TERM=dumb` or use `--verbose`.
|
||||
|
||||
---
|
||||
|
||||
### Issue: Progress Messages Overwritten
|
||||
|
||||
In very fast operations, progress updates may overlap.
|
||||
|
||||
**Cause**: `StatusReporter` uses `Console.Write` without lock in compact mode; concurrent writes from channel processor and spinner task could interleave.
|
||||
|
||||
**Mitigation**: Unlikely in practice (channel serializes, spinner only updates when `_currentMessage` set). If problematic, add lock around Console operations.
|
||||
|
||||
---
|
||||
|
||||
### Issue: Articles with No Text Content
|
||||
|
||||
Some URLs return articles with empty `TextContent`.
|
||||
|
||||
**Cause**: SmartReader's quality heuristic (`IsReadable`) failed, or article truly has no text (image, script, error page).
|
||||
|
||||
**Effect**: Those URLs contribute zero chunks.
|
||||
|
||||
**Acceptable**: Part of normal operation; not all URLs yield readable content.
|
||||
|
||||
---
|
||||
|
||||
### Issue: Duplicate Sources in Answer
|
||||
|
||||
Same website may appear multiple times (different articles).
|
||||
|
||||
**Cause**: Different URLs from different search results may be from same domain but different pages.
|
||||
|
||||
**Effect**: `[Source 1]` and `[Source 3]` could both be `example.com`. Not necessarily bad - they're different articles.
|
||||
|
||||
---
|
||||
|
||||
## Performance Tuning Reference
|
||||
|
||||
| Setting | Default | Fastest | Most Thorough | Notes |
|
||||
|---------|---------|---------|---------------|-------|
|
||||
| `--queries` | 3 | 1 | 8+ | More queries = more searches |
|
||||
| `--results` | 5 | 2 | 15+ | Fewer = fewer articles to fetch |
|
||||
| `--chunks` | 3 | 1 | 5+ | More chunks = more context tokens |
|
||||
| `MaxConcurrentArticleFetches` | 10 | 5 | 20 | Higher = more parallel fetches |
|
||||
| `MaxConcurrentEmbeddingRequests` | 4 | 2 | 8 | Higher = faster embeddings (may hit rate limits) |
|
||||
| `EmbeddingBatchSize` | 300 | 100 | 1000 | Larger = fewer API calls, more data per call |
|
||||
|
||||
**Start**: Defaults are balanced.
|
||||
|
||||
**Adjust if**:
|
||||
- Slow: Reduce `--results`, `--queries`, or concurrency limits
|
||||
- Poor quality: Increase `--chunks`, `--results`, `--queries`
|
||||
- Rate limited: Reduce concurrency limits
|
||||
- High cost: Use `--short`, reduce `--chunks`, choose cheaper model
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Performance](../performance.md) - Detailed performance analysis
|
||||
- [Configuration](../configuration.md) - Adjust settings
|
||||
- [Usage](../usage.md) - Optimize workflow
|
||||
|
||||
---
|
||||
|
||||
**Quick Diagnostic Checklist**
|
||||
|
||||
```bash
|
||||
# 1. Check API key
|
||||
echo $OPENROUTER_API_KEY | head -c 10
|
||||
|
||||
# 2. Test SearxNG
|
||||
curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
|
||||
|
||||
# 3. Test OpenRouter
|
||||
curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \
|
||||
https://openrouter.ai/api/v1/models | jq '.data[0].id'
|
||||
|
||||
# 4. Run verbose
|
||||
openquery -v "test" 2>&1 | grep -E "Fetching|Generated|Found"
|
||||
|
||||
# 5. Check resource usage while running
|
||||
htop
|
||||
|
||||
# 6. Reduce scope and retry
|
||||
openquery -q 1 -r 2 -c 1 "simple test"
|
||||
```
|
||||
483
docs/usage.md
Normal file
483
docs/usage.md
Normal file
@@ -0,0 +1,483 @@
|
||||
# Usage Guide
|
||||
|
||||
Complete reference for using the OpenQuery command-line interface.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
1. [Basic Usage](#basic-usage)
|
||||
2. [Command Reference](#command-reference)
|
||||
3. [Examples](#examples)
|
||||
4. [Output Format](#output-format)
|
||||
5. [Tips and Tricks](#tips-and-tricks)
|
||||
|
||||
## Basic Usage
|
||||
|
||||
### Simplest Form
|
||||
```bash
|
||||
openquery "your question here"
|
||||
```
|
||||
|
||||
That's it! OpenQuery will:
|
||||
1. Generate search queries
|
||||
2. Search the web
|
||||
3. Extract relevant content
|
||||
4. Stream an answer with sources
|
||||
|
||||
### Common Pattern
|
||||
```bash
|
||||
openquery [OPTIONS] "your question"
|
||||
```
|
||||
|
||||
Quotes around the question are recommended to preserve spaces.
|
||||
|
||||
## Command Reference
|
||||
|
||||
### Main Command
|
||||
|
||||
#### `openquery [options] <question>`
|
||||
|
||||
Ask a question and get an AI-powered answer with citations.
|
||||
|
||||
**Arguments**:
|
||||
- `question` (positional, one or more words) - The question to ask
|
||||
|
||||
**Options**:
|
||||
|
||||
| Option | Aliases | Type | Default | Description |
|
||||
|--------|---------|------|---------|-------------|
|
||||
| `--chunks` | `-c` | int | 3 (from config) | Number of top relevant content chunks to include in context |
|
||||
| `--results` | `-r` | int | 5 (from config) | Number of search results to fetch per generated query |
|
||||
| `--queries` | `-q` | int | 3 (from config) | Number of search queries to generate from your question |
|
||||
| `--short` | `-s` | bool | false | Request a concise, to-the-point answer |
|
||||
| `--long` | `-l` | bool | false | Request a detailed, comprehensive answer |
|
||||
| `--verbose` | `-v` | bool | false | Show detailed progress information and debug output |
|
||||
|
||||
**Behavior**:
|
||||
- Short and long are mutually exclusive but can both be omitted (balanced answer)
|
||||
- If both `--short` and `--long` are specified, `--long` takes precedence
|
||||
- Options override configuration file defaults
|
||||
|
||||
#### `openquery configure [options]`
|
||||
|
||||
Configure OpenQuery settings (API key, model, defaults).
|
||||
|
||||
**Options**:
|
||||
|
||||
| Option | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `--interactive` / `-i` | bool | Launch interactive configuration wizard |
|
||||
| `--key` | string | Set the OpenRouter API key |
|
||||
| `--model` | string | Set the default model |
|
||||
| `--queries` | int? | Set default number of queries |
|
||||
| `--chunks` | int? | Set default number of chunks |
|
||||
| `--results` | int? | Set default number of results |
|
||||
|
||||
**Examples**:
|
||||
```bash
|
||||
# Interactive wizard
|
||||
openquery configure -i
|
||||
|
||||
# Set just the API key
|
||||
openquery configure --key "sk-or-..."
|
||||
|
||||
# Set multiple defaults non-interactively
|
||||
openquery configure --model "deepseek/deepseek-v3.2" --queries 5 --chunks 4
|
||||
```
|
||||
|
||||
**Note**: Options with `?` are nullable; only provided values are updated.
|
||||
|
||||
## Examples
|
||||
|
||||
### Everyday Queries
|
||||
|
||||
**Simple factual question**:
|
||||
```bash
|
||||
openquery "What is the speed of light?"
|
||||
```
|
||||
|
||||
**Multi-word question**:
|
||||
```bash
|
||||
openquery "How do solar panels work?"
|
||||
```
|
||||
|
||||
**Question with special characters**:
|
||||
```bash
|
||||
openquery "What's the weather in New York?"
|
||||
```
|
||||
|
||||
### Customizing Output
|
||||
|
||||
**Get a quick answer**:
|
||||
```bash
|
||||
openquery -s "Who is the CEO of Tesla?"
|
||||
```
|
||||
Output: "Elon Musk is the CEO of Tesla." (minimal explanation)
|
||||
|
||||
**Get detailed analysis**:
|
||||
```bash
|
||||
openquery -l "Explain how nuclear fusion works"
|
||||
```
|
||||
Output: Multi-paragraph detailed explanation with scientific details
|
||||
|
||||
**See everything**:
|
||||
```bash
|
||||
openquery -v "What is machine learning?"
|
||||
```
|
||||
Output: Shows all progress messages alongside the answer
|
||||
|
||||
### Adjusting Search Depth
|
||||
|
||||
**Minimal search** (fast, cheap):
|
||||
```bash
|
||||
openquery -q 1 -r 2 -c 1 "What time is it in London?"
|
||||
```
|
||||
- 1 generated query
|
||||
- 2 results per query
|
||||
- 1 context chunk
|
||||
|
||||
**Thorough research** (slow, comprehensive):
|
||||
```bash
|
||||
openquery -q 8 -r 15 -c 5 "History and applications of cryptography"
|
||||
```
|
||||
- 8 diverse queries
|
||||
- 15 results each
|
||||
- 5 top chunks
|
||||
|
||||
**Balanced (recommended defaults)**:
|
||||
```bash
|
||||
openquery "Latest advancements in CRISPR technology"
|
||||
```
|
||||
- 3 queries
|
||||
- 5 results each
|
||||
- 3 top chunks
|
||||
|
||||
### Combining Options
|
||||
|
||||
**Verbose custom search**:
|
||||
```bash
|
||||
openquery -v -q 5 -r 10 -c 4 "What are the ethical implications of AI?"
|
||||
```
|
||||
|
||||
**Short answer with more context**:
|
||||
```bash
|
||||
openquery -s -c 5 "Python vs JavaScript for web development"
|
||||
```
|
||||
|
||||
**Long answer, lots of research**:
|
||||
```bash
|
||||
openquery -l -q 10 -r 20 -c 6 "Complete guide to quantum computing"
|
||||
```
|
||||
|
||||
### Practical Use Cases
|
||||
|
||||
**News and Current Events**:
|
||||
```bash
|
||||
openquery "Latest developments in the Ukraine conflict"
|
||||
```
|
||||
|
||||
**Technical Questions**:
|
||||
```bash
|
||||
openquery "How to set up a PostgreSQL replication cluster"
|
||||
```
|
||||
|
||||
**Health Information** (verify with doctor!):
|
||||
```bash
|
||||
openquery "What are the symptoms of vitamin D deficiency?"
|
||||
```
|
||||
|
||||
**Cooking**:
|
||||
```bash
|
||||
openquery "How to make authentic Italian pizza dough"
|
||||
```
|
||||
|
||||
**Travel**:
|
||||
```bash
|
||||
openquery "Best things to do in Tokyo in spring"
|
||||
```
|
||||
|
||||
**Programming**:
|
||||
```bash
|
||||
openquery "Rust vs Go for backend development in 2025"
|
||||
```
|
||||
|
||||
### Configuration Examples
|
||||
|
||||
**Set up for the first time**:
|
||||
```bash
|
||||
openquery configure -i
|
||||
# Follow prompts to enter API key, choose model, set defaults
|
||||
```
|
||||
|
||||
**Switch to a different model**:
|
||||
```bash
|
||||
openquery configure --model "google/gemini-3-flash-preview"
|
||||
```
|
||||
|
||||
**Update default number of queries**:
|
||||
```bash
|
||||
openquery configure --queries 5
|
||||
```
|
||||
|
||||
**Set cost-effective defaults**:
|
||||
```bash
|
||||
openquery configure --model "qwen/qwen3.5-flash-02-23" --queries 2 --chunks 2 --results 3
|
||||
```
|
||||
|
||||
**Check your configuration**:
|
||||
```bash
|
||||
cat ~/.config/openquery/config
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
### Standard Output (Streaming)
|
||||
|
||||
The answer streams in real-time, character by character, like this:
|
||||
|
||||
```
|
||||
⠋ Generating search queries... (spinner with status)
|
||||
⠹ Searching web...
|
||||
⠸ Fetching articles...
|
||||
⠼ Processing embeddings...
|
||||
⠴ Generating answer...
|
||||
Assistant: Quantum entanglement is a phenomenon where pairs or groups of
|
||||
particles interact in ways such that the quantum state of each particle
|
||||
cannot be described independently of the others, even when separated by
|
||||
large distances.
|
||||
|
||||
[Source 1: Understanding Quantum Mechanics](https://example.com/quantum)
|
||||
[Source 2: Quantum Physics Overview](https://example.com/physics)
|
||||
```
|
||||
|
||||
### Verbose Mode Output (`-v`)
|
||||
|
||||
When `--verbose` is enabled, you see detailed progress:
|
||||
|
||||
```
|
||||
[Generating 3 search queries based on your question...]
|
||||
[Generated queries:
|
||||
1. quantum entanglement definition
|
||||
2. how quantum entanglement works
|
||||
3. quantum entanglement Bell's theorem
|
||||
]
|
||||
[Searching web for 'quantum entanglement definition'...]
|
||||
[Searching web for 'how quantum entanglement works'...]
|
||||
[Searching web for 'quantum entanglement Bell's theorem'...]
|
||||
[Fetched 15 search results total]
|
||||
[Fetching article 1/12: physicsworld.com]
|
||||
[Fetching article 2/12: nature.com]
|
||||
...
|
||||
[Fetching article 12/12: scientificamerican.com]
|
||||
[Extracted 48 content chunks]
|
||||
[Generating embeddings: batch 1/4]
|
||||
[Generating embeddings: batch 2/4]
|
||||
[Generating embeddings: batch 3/4]
|
||||
[Generating embeddings: batch 4/4]
|
||||
[Ranked chunks by relevance]
|
||||
[Found top 3 most relevant chunks overall. Generating answer...]
|
||||
|
||||
Assistant: Quantum entanglement is a fundamental phenomenon in quantum
|
||||
mechanics where...
|
||||
```
|
||||
|
||||
### Source Citations
|
||||
|
||||
Sources are formatted as markdown links in the answer:
|
||||
```
|
||||
[Source 1: Article Title](https://example.com/article)
|
||||
```
|
||||
|
||||
These appear inline where the AI references that source. Multiple sources can be cited in a single paragraph.
|
||||
|
||||
### Error Output
|
||||
|
||||
Errors are written to stderr and exit with non-zero status:
|
||||
|
||||
```
|
||||
[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i'.
|
||||
```
|
||||
|
||||
## Tips and Tricks
|
||||
|
||||
### Speed Tips
|
||||
|
||||
1. **Reduce concurrency limits** (edit `SearchTool.cs` if constantly rate-limited)
|
||||
2. **Reduce `--results`** - fewer articles to fetch and process
|
||||
3. **Reduce `--queries`** - fewer parallel searches
|
||||
4. **Use local SearxNG** - minimize network latency to search backend
|
||||
5. **Cache results** - future enhancement could add caching
|
||||
|
||||
### Quality Tips
|
||||
|
||||
1. **Increase `--chunks`** to 4-5 for complex topics
|
||||
2. **Increase `--queries`** to 5-8 for broad exploration
|
||||
3. **Use `--long`** for deep topics that need elaboration
|
||||
4. **Check `-v` output** to see which sources were selected
|
||||
5. **Try different models** - some are better at synthesis, others at facts
|
||||
|
||||
### Cost Tips
|
||||
|
||||
1. **Use `qwen/qwen3.5-flash-02-23`** - cheapest good model
|
||||
2. **Reduce `--chunks`** and `--results`** - fewer tokens in context
|
||||
3. **Use `--short`** - shorter answers use fewer completion tokens
|
||||
4. **Monitor usage** at [openrouter.ai](https://openrouter.ai) dashboard
|
||||
|
||||
### Workflow Tips
|
||||
|
||||
**Iterative deepening**:
|
||||
```bash
|
||||
# Start broad
|
||||
openquery -v "machine learning"
|
||||
|
||||
# Identify subtopics from answer, then dive deeper
|
||||
openquery "What is transformer architecture in LLMs?"
|
||||
```
|
||||
|
||||
**Compare answers**:
|
||||
```bash
|
||||
# Same question with different models
|
||||
OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23" openquery "question"
|
||||
OPENROUTER_MODEL="google/gemini-3-flash-preview" openquery "question"
|
||||
```
|
||||
|
||||
**Save answers**:
|
||||
```bash
|
||||
openquery "What is Docker?" > answer.md
|
||||
# answer.md will contain the streamed output (including spinner chars, so filter):
|
||||
openquery "What is Docker?" 2>/dev/null | sed 's/.\x08//g' > clean-answer.md
|
||||
```
|
||||
|
||||
### Shell Aliases and Functions
|
||||
|
||||
Add to `~/.bashrc` or `~/.zshrc`:
|
||||
|
||||
```bash
|
||||
# Short alias
|
||||
alias oq='openquery'
|
||||
|
||||
# With common options
|
||||
alias oql='openquery -l -q 5 -r 10' # long, thorough
|
||||
alias oqs='openquery -s' # short
|
||||
alias oqv='openquery -v' # verbose
|
||||
|
||||
# Function to save output cleanly
|
||||
oqsave() {
|
||||
openquery "$@" 2>/dev/null | sed 's/.\x08//g' > "answer-$(date +%Y%m%d-%H%M%S).md"
|
||||
}
|
||||
```
|
||||
|
||||
### Scripting
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# batch-questions.sh
|
||||
|
||||
while IFS= read -r question; do
|
||||
echo "## $question" >> research.md
|
||||
echo "" >> research.md
|
||||
openquery -l "$question" 2>/dev/null | sed 's/.\x08//g' >> research.md
|
||||
echo "" >> research.md
|
||||
done < questions.txt
|
||||
```
|
||||
|
||||
### Chaining with Other Tools
|
||||
|
||||
Pipe to `jq` (if you modify to output JSON):
|
||||
```bash
|
||||
# Future: openquery --json "question" | jq '.answer'
|
||||
```
|
||||
|
||||
Pipe to `pbcopy` (macOS) or `xclip` (Linux):
|
||||
```bash
|
||||
openquery "quick fact" 2>/dev/null | sed 's/.\x08//g' | pbcopy
|
||||
```
|
||||
|
||||
Filter sources:
|
||||
```bash
|
||||
openquery "topic" 2>/dev/null | sed 's/.\x08//g' | grep -E '^\[Source'
|
||||
```
|
||||
|
||||
## Keyboard Interrupts
|
||||
|
||||
- **Ctrl+C** during processing: Cancels current operation, exits gracefully
|
||||
- **Ctrl+C** during streaming answer: Stops streaming, shows partial answer
|
||||
- **Ctrl+Z** (suspend): Not recommended; may leave background tasks running
|
||||
|
||||
OpenQuery uses proper cancellation tokens to clean up resources on interrupt.
|
||||
|
||||
## Exit Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 0 | Success - answer was generated |
|
||||
| 1 | Error - see stderr message |
|
||||
| 2 | Configuration error (missing API key) |
|
||||
|
||||
You can check the exit code in shell scripts:
|
||||
```bash
|
||||
openquery "question"
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "Success!"
|
||||
else
|
||||
echo "Failed"
|
||||
fi
|
||||
```
|
||||
|
||||
## Limitation and Workarounds
|
||||
|
||||
### Question Length
|
||||
Very long questions (>2000 chars) may be truncated by the LLM's context window or cause token limits.
|
||||
|
||||
**Workaround**: Keep questions concise; discuss complex multi-part questions separately.
|
||||
|
||||
### Answer Length Limits
|
||||
The LLM may hit `max_tokens` limits for very complex questions.
|
||||
|
||||
**Workaround**: Use `--long` flag (already maximizes allowed tokens) or break into sub-questions.
|
||||
|
||||
### Rate Limiting
|
||||
OpenRouter may rate limit if you send too many requests too quickly.
|
||||
|
||||
**Symptoms**: 429 errors, occasional timeouts.
|
||||
|
||||
**Workaround**: The built-in retry (Polly) handles this automatically. For persistent issues:
|
||||
- Reduce concurrency (edit code)
|
||||
- Add delays between queries
|
||||
- Upgrade OpenRouter plan
|
||||
|
||||
### SearxNG Timeouts
|
||||
Large SearxNG responses or slow targets may timeout.
|
||||
|
||||
**Workaround**: Reduce `--results` or check SearxNG logs. Nothing to do on OpenQuery side (HTTP client timeout is ~30s default).
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Architecture](architecture.md) - Understand how OpenQuery works under the hood
|
||||
- [Configuration](configuration.md) - Fine-tune your setup
|
||||
- [Troubleshooting](troubleshooting.md) - Solve common problems
|
||||
|
||||
---
|
||||
|
||||
**Quick Reference Card**
|
||||
|
||||
```bash
|
||||
# Basic
|
||||
openquery "question"
|
||||
|
||||
# Quick fact
|
||||
openquery -s "question"
|
||||
|
||||
# Deep research
|
||||
openquery -l -q 5 -r 10 -c 4 "question"
|
||||
|
||||
# See progress
|
||||
openquery -v "question"
|
||||
|
||||
# Configure
|
||||
openquery configure -i
|
||||
|
||||
# Check config
|
||||
cat ~/.config/openquery/config
|
||||
```
|
||||
Reference in New Issue
Block a user