- Add user-friendly README.md with quick start guide - Create docs/ folder with structured technical documentation: - installation.md: Build and setup instructions - configuration.md: Complete config reference - usage.md: CLI usage guide with examples - architecture.md: System design and patterns - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models) - api/: CLI reference, environment variables, programmatic API - troubleshooting.md: Common issues and solutions - performance.md: Latency, throughput, and optimization - All documentation fully cross-referenced with internal links - Covers project overview, architecture, components, APIs, and support See individual files for complete documentation.
10 KiB
OpenQuery Documentation
Welcome to the comprehensive documentation for OpenQuery - the AI-powered search and answer system.
📚 Documentation Overview
Getting Started
- Installation Guide - Build, install, and setup instructions
- Configuration - Configure API keys, models, and settings
- Usage Guide - Complete CLI reference with examples
Deep Dive
- Architecture - System design, patterns, and data flow
- Components - Detailed component documentation
- API Reference - Complete command-line interface reference
Support
- Troubleshooting - Common issues and solutions
- Performance - Performance characteristics and optimization
🎯 Quick Links
For Users
For Developers
📋 Table of Contents
Project Overview
OpenQuery is a sophisticated CLI tool that combines the power of large language models with web search to provide accurate, well-sourced answers to complex questions.
What It Does
- Takes a natural language question as input
- Generates multiple diverse search queries
- Searches the web via SearxNG
- Extracts and processes article content
- Uses semantic similarity to rank relevance
- Synthesizes a comprehensive AI-generated answer with citations
Why Use OpenQuery?
- Accuracy: Multiple search queries reduce bias and increase coverage
- Transparency: Sources are cited in the final answer
- Speed: Parallel processing minimizes latency
- Control: Fine-tune every aspect from query count to chunk selection
- Privacy: SearxNG provides anonymous, aggregating search
Key Concepts
Search Queries
Instead of using your exact question, OpenQuery generates multiple optimized search queries (default: 3). For example, "What is quantum entanglement?" might become:
- "quantum entanglement definition"
- "how quantum entanglement works"
- "quantum entanglement experiments"
Content Chunks
Long articles are split into ~500-character chunks. Each chunk is:
- Stored with its source URL and title
- Converted to a vector embedding (1536 dimensions)
- Scored against your query embedding
Semantic Ranking
Using cosine similarity between embeddings, OpenQuery ranks chunks by relevance and selects the top N (default: 3) for the final context.
Streaming Answer
The LLM receives your question plus the top chunks as context and streams the answer in real-time, citing sources like [Source 1].
Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Runtime | .NET 10.0 AOT | Native performance, minimal footprint |
| LLM | OpenRouter API | Chat completions and embeddings |
| Search | SearxNG | Metasearch engine |
| Content Extraction | SmartReader | Article text extraction |
| Vector Math | System.Numerics.Tensors | High-performance cosine similarity |
| Resilience | Polly | Retry and circuit breaker policies |
| CLI | System.CommandLine | Command parsing and help |
| JSON | System.Text.Json (source-gen) | Fast serialization |
System Workflow
┌─────────────────────────────────────────────────────────────────┐
│ OpenQuery Workflow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. User Query: "What is quantum entanglement?" │
│ │
│ 2. Query Generation (Optional) │
│ LLM generates: ["quantum entanglement physics", │
│ "quantum entanglement definition", │
│ "how does quantum entanglement work"] │
│ │
│ 3. Parallel Searches │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Query 1 → │→ │ SearxNG │→ │ Results │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Query 2 → │→ │ SearxNG │→ │ Results │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┘ │
│ │ Query 3 → │→ │ SearxNG │→ │ Results (combined) │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
│ 4. Parallel Article Fetching │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ URL 1 → │→ │ Article │→ │ Chunks │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ URL 2 → │→ │ Article │→ │ Chunks │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ... (concurrent, max 10 at a time) │
│ │
│ 5. Parallel Embeddings │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Chunks │→ │ Embed- │→ │ Vectors │ │
│ │ Batch 1 │ │ ding API │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Chunks │→ │ Embed- │→ │ Vectors │ │
│ │ Batch 2 │ │ ding API │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ (batches of 300, up to 4 concurrent) │
│ │
│ 6. Semantic Ranking │
│ Query Embedding + Chunk Embeddings → Cosine Similarity → │
│ Score → Sort Descending → Top 3 Chunks │
│ │
│ 7. Final Answer Generation │
│ ┌────────────────────────────────────────────┐ │
│ │ System: "Answer based on this context:" │ │
│ │ Context: [Top 3 chunks with sources] │ │
│ │ Question: "What is quantum entanglement?" │ │
│ └────────────────────────────────────────────┘ │
│ ↓ │
│ LLM Streams Answer │
│ "Quantum entanglement is..." │
│ with citations like [Source 1] │
│ │
└─────────────────────────────────────────────────────────────────┘
Next Steps
For detailed technical information, continue to the architecture guide.
Need help? Check the Troubleshooting guide.