docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide
- Create docs/ folder with structured technical documentation:
  - installation.md: Build and setup instructions
  - configuration.md: Complete config reference
  - usage.md: CLI usage guide with examples
  - architecture.md: System design and patterns
  - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models)
  - api/: CLI reference, environment variables, programmatic API
  - troubleshooting.md: Common issues and solutions
  - performance.md: Latency, throughput, and optimization
- All documentation fully cross-referenced with internal links
- Covers project overview, architecture, components, APIs, and support

See individual files for complete documentation.
This commit is contained in:
OpenQuery Documentation
2026-03-19 10:01:58 +01:00
parent b28d8998f7
commit 65ca2401ae
16 changed files with 7073 additions and 0 deletions

356
docs/configuration.md Normal file
View File

@@ -0,0 +1,356 @@
# Configuration
Complete guide to configuring OpenQuery for your environment.
## 📋 Table of Contents
1. [Configuration Methods](#configuration-methods)
2. [Configuration File](#configuration-file)
3. [Environment Variables](#environment-variables)
4. [Command-Line Options](#command-line-options)
5. [Configuration Priority](#configuration-priority)
6. [Recommended Settings](#recommended-settings)
7. [Advanced Configuration](#advanced-configuration)
## Configuration Methods
OpenQuery can be configured through three methods, which merge together with clear priority:
| Method | Persistence | Use Case |
|--------|-------------|----------|
| Configuration File | Permanent | Default values you use daily |
| Environment Variables | Session/Shell | CI/CD, scripting, temporary overrides |
| Command-Line Options | Per-execution | One-off customizations |
## Configuration File
### Location
OpenQuery follows the [XDG Base Directory](https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html) specification:
- **Linux/macOS**: `~/.config/openquery/config`
- **Windows**: `%APPDATA%\openquery\config` (e.g., `C:\Users\<user>\AppData\Roaming\openquery\config`)
### Format
Simple `key=value` pairs, one per line:
```ini
ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Model=qwen/qwen3.5-flash-02-23
DefaultQueries=3
DefaultChunks=3
DefaultResults=5
```
### Schema
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `ApiKey` | string | "" | OpenRouter API authentication key |
| `Model` | string | `qwen/qwen3.5-flash-02-23` | Default LLM model to use |
| `DefaultQueries` | int | 3 | Number of search queries to generate |
| `DefaultChunks` | int | 3 | Number of top context chunks to include |
| `DefaultResults` | int | 5 | Number of search results per query |
### Example Configurations
**Minimal** (just API key):
```ini
ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
**Optimized for Research**:
```ini
ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Model=google/gemini-3-flash-preview
DefaultQueries=5
DefaultChunks=4
DefaultResults=10
```
**Cost-Conscious**:
```ini
ApiKey=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Model=qwen/qwen3.5-flash-02-23
DefaultQueries=2
DefaultChunks=2
DefaultResults=3
```
## Environment Variables
Environment variables override the configuration file and can be set temporarily or permanently in your shell profile.
### Available Variables
| Variable | Purpose | Required | Example |
|----------|---------|----------|---------|
| `OPENROUTER_API_KEY` | OpenRouter API key | **Yes** (unless in config file) | `export OPENROUTER_API_KEY="sk-or-..."` |
| `OPENROUTER_MODEL` | Override default LLM model | No | `export OPENROUTER_MODEL="deepseek/deepseek-v3.2"` |
| `SEARXNG_URL` | URL of SearxNG instance | No (default: `http://localhost:8002`) | `export SEARXNG_URL="https://searx.example.com"` |
### Setting Environment Variables
#### Temporary (Current Session)
```bash
# Linux/macOS
export OPENROUTER_API_KEY="sk-or-..."
export SEARXNG_URL="http://localhost:8002"
# Windows PowerShell
$env:OPENROUTER_API_KEY="sk-or-..."
$env:SEARXNG_URL="http://localhost:8002"
```
#### Permanent (Shell Profile)
**bash** (`~/.bashrc` or `~/.bash_profile`):
```bash
export OPENROUTER_API_KEY="sk-or-..."
export SEARXNG_URL="http://localhost:8002"
```
**zsh** (`~/.zshrc`):
```zsh
export OPENROUTER_API_KEY="sk-or-..."
export SEARXNG_URL="http://localhost:8002"
```
**fish** (`~/.config/fish/config.fish`):
```fish
set -x OPENROUTER_API_KEY "sk-or-..."
set -x SEARXNG_URL "http://localhost:8002"
```
**Windows** (PowerShell profile):
```powershell
[Environment]::SetEnvironmentVariable("OPENROUTER_API_KEY", "sk-or-...", "User")
[Environment]::SetEnvironmentVariable("SEARXNG_URL", "http://localhost:8002", "User")
```
After editing profile files, restart your terminal or run `source ~/.bashrc` (or equivalent).
### Security Note
Never commit your API key to version control. Use environment variables or config file that's in `.gitignore`. The default `.gitignore` already excludes common build directories but doesn't include the config file since it's outside the project directory (`~/.config/`).
## Command-Line Options
Options passed directly to the `openquery` command override both config file and environment variables for that specific execution.
### Main Command Options
```bash
openquery [OPTIONS] <question>
```
| Option | Aliases | Type | Default Source | Description |
|--------|---------|------|----------------|-------------|
| `--chunks` | `-c` | int | Config `DefaultChunks` | Number of top context chunks |
| `--results` | `-r` | int | Config `DefaultResults` | Search results per query |
| ``--queries` | `-q` | int | Config `DefaultQueries` | Number of search queries |
| `--short` | `-s` | bool | false | Request concise answer |
| `--long` | `-l` | bool | false | Request detailed answer |
| `--verbose` | `-v` | bool | false | Show detailed progress |
### Configure Command Options
```bash
openquery configure [OPTIONS]
```
| Option | Type | Description |
|--------|------|-------------|
| `--interactive` / `-i` | bool | Launch interactive configuration wizard |
| `--key` | string | Set API key |
| `--model` | string | Set default model |
| `--queries` | int? | Set default queries |
| `--chunks` | int? | Set default chunks |
| `--results` | int? | Set default results |
## Configuration Priority
When OpenQuery needs a value, it checks sources in this order (highest to lowest priority):
1. **Command-line option** (if provided)
2. **Environment variable** (if set)
3. **Configuration file** (if key exists)
4. **Hard-coded default** (if all above missing)
### Examples
**Example 1**: Environment overrides config
```bash
# config file: DefaultQueries=5
export OPENROUTER_MODEL="deepseek/deepseek-v3.2"
openquery --queries 2 "question" # Uses: queries=2 (CLI), model=deepseek (env), chunks=3 (config)
```
**Example 2**: CLI overrides everything
```bash
export OPENROUTER_MODEL="qwen/qwen3.5-flash-02-23"
openquery --model "google/gemini-3-flash-preview" --chunks 5 "question"
# Uses: model=google (CLI), chunks=5 (CLI), queries=3 (default)
```
**Example 3**: All sources combined
```bash
# config: DefaultChunks=4
# env: OPENROUTER_MODEL="moonshotai/kimi-k2.5", SEARXNG_URL="http://custom:8002"
# CLI: --queries 6 --short
openquery "question"
# Uses: queries=6 (CLI), chunks=4 (config), results=5 (config),
# model=kimi-k2.5 (env), searxng=custom (env), short=true (CLI)
```
## Recommended Settings
### For Quick Questions (Facts, Definitions)
```bash
openquery -q 2 -r 3 -c 2 "What is the capital of France?"
```
- Few queries (2) for straightforward facts
- Few results (3) to minimize processing
- Few chunks (2) for focused answer
### For Research (Complex Topics)
```bash
openquery -q 5 -r 10 -c 4 -l "Explain the causes of the French Revolution"
```
- More queries (5) for diverse perspectives
- More results (10) for comprehensive coverage
- More chunks (4) for rich context
- Long format for depth
### For Exploration (Broad Topics)
```bash
openquery -q 8 -r 15 -c 5 "What are the latest developments in AI?"
```
- Many queries (8) to explore different angles
- Many results (15) for breadth
- More chunks (5) for extensive context
### Cost Optimization
```bash
openquery configure --model "qwen/qwen3.5-flash-02-23"
# Keep defaults: -q 3 -r 5 -c 3
```
- Qwen Flash is very cost-effective
- Default parameters provide good balance
### Performance Optimization
```bash
# Adjust ParallelProcessingOptions in SearchTool.cs if needed
# Default: MaxConcurrentArticleFetches=10, MaxConcurrentEmbeddingRequests=4
```
- Reduce these values if you see rate limits or memory pressure
- Increase them if you have fast network/API and want more speed
## Advanced Configuration
### Changing Concurrency Limits
Concurrency limits are currently hardcoded in `SearchTool.cs` but can be adjusted:
```csharp
public class ParallelProcessingOptions
{
public int MaxConcurrentArticleFetches { get; set; } = 10; // ← Change this
public int MaxConcurrentEmbeddingRequests { get; set; } = 4; // ← Change this
public int EmbeddingBatchSize { get; set; } = 300; // ← Change this
}
```
To make these configurable, you could:
1. Add fields to `AppConfig`
2. Read from config file
3. Pass through to `SearchTool` constructor
### Custom Embedding Model
The embedding model is hardcoded to `openai/text-embedding-3-small`. To change:
Edit the `EmbeddingService` constructor:
```csharp
public EmbeddingService(OpenRouterClient client, string embeddingModel = "your-model")
```
Or make it configurable via CLI/config (future enhancement).
### Changing Chunk Size
Chunk size (500 chars) is defined in `ChunkingService.cs`:
```csharp
private const int MAX_CHUNK_SIZE = 500;
```
Modify this constant to change how articles are split. Larger chunks:
- ✅ More context per chunk
- ❌ Fewer chunks for same article
- ❌ Higher token usage in final answer
Smaller chunks:
- ✅ More granular matching
- ❌ May lose context across chunk boundaries
### Using a Custom SearxNG Instance
Some SearxNG deployments may require HTTPS, authentication, or custom paths:
```bash
# With authentication (if supported)
export SEARXNG_URL="https://user:pass@searx.example.com:8080"
# With custom path
export SEARXNG_URL="https://searx.example.com/custom-path"
```
Note: Most SearxNG instances don't require auth as they're designed for privacy.
### OpenRouter Settings
OpenRouter supports additional parameters (not yet exposed in OpenQuery):
- `temperature` - Randomness (0-2, default ~1)
- `max_tokens` - Response length limit
- `top_p` - Nucleus sampling
- `frequency_penalty` / `presence_penalty`
These could be added to `ChatCompletionRequest` in future versions.
## Managing Multiple Configurations
You can maintain multiple config files and symlink or set per-project:
```bash
# Create project-specific config
cp ~/.config/openquery/config ~/myproject/openquery.config
# Use it temporarily
OPENQUERY_CONFIG=~/myproject/openquery.config openquery "question"
```
**Note**: Currently OpenQuery only looks at `~/.config/openquery/config`. Multi-config support would require code changes (reading from `OPENQUERY_CONFIG` env var).
## Configuration Validation
OpenQuery doesn't strictly validate config values. Invalid settings may cause runtime errors:
- `DefaultQueries <= 0` → May cause exceptions or zero queries
- `DefaultChunks <= 0` → May return no context
- `DefaultResults <= 0` → No search results
Validate manually:
```bash
# Test your config loads
cat ~/.config/openquery/config
# Test with verbose mode
openquery -v "test"
```
---
## Next Steps
- [Usage Guide](usage.md) - Learn how to use the CLI
- [Architecture](architecture.md) - Understand the system design
- [Troubleshooting](troubleshooting.md) - Fix common issues