Files

OpenQuery Documentation 65ca2401ae docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide
- Create docs/ folder with structured technical documentation:
  - installation.md: Build and setup instructions
  - configuration.md: Complete config reference
  - usage.md: CLI usage guide with examples
  - architecture.md: System design and patterns
  - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models)
  - api/: CLI reference, environment variables, programmatic API
  - troubleshooting.md: Common issues and solutions
  - performance.md: Latency, throughput, and optimization
- All documentation fully cross-referenced with internal links
- Covers project overview, architecture, components, APIs, and support

See individual files for complete documentation.

2026-03-19 10:01:58 +01:00

19 KiB

Raw Permalink Blame History

SearchTool Component

Deep dive into SearchTool - the core pipeline orchestrator that implements the 4-phase search-retrieve-rank workflow.

Overview

SearchTool is the workhorse of OpenQuery. It Takes search queries, fetches articles, generates embeddings, ranks by relevance, and returns formatted context for the final AI answer.

Location

Tools/SearchTool.cs

Class Definition

public class SearchTool
{
    private readonly SearxngClient _searxngClient;
    private readonly EmbeddingService _embeddingService;
    private readonly ParallelProcessingOptions _options;
    
    public static string Name => "search";
    public static string Description => "Search the web for information on a topic";
    
    public SearchTool(
        SearxngClient searxngClient,
        EmbeddingService embeddingService);
    
    public Task<string> ExecuteAsync(
        string originalQuery,
        List<string> generatedQueries,
        int maxResults,
        int topChunksLimit,
        Action<string>? onProgress = null,
        bool verbose = true);
}

Dependencies:

SearxngClient - for web searches
EmbeddingService - for vector generation
ParallelProcessingOptions - concurrency settings (hardcoded new instance)

Static Properties:

Name - tool identifier (currently "search")
Description - tool description

ExecuteAsync Method

Signature:

public async Task<string> ExecuteAsync(
    string originalQuery,        // User's original question
    List<string> generatedQueries, // Expanded search queries
    int maxResults,              // Results per query
    int topChunksLimit,          // Top N chunks to return
    Action<string>? onProgress,  // Progress callback
    bool verbose)                // Verbose mode flag

Returns: Task<string> - formatted context with source citations

Contract:

Never returns null (returns "No search results found." on zero results)
Progress callback may be invoked frequently (many phases)
verbose passed to sub-components for their own logging

The 4-Phase Pipeline

ExecuteAsync()
  │
  ├─ Phase 1: ExecuteParallelSearchesAsync
  │   Input: generatedQueries × maxResults
  │   Output: List<SearxngResult> (deduplicated)
  │
  ├─ Phase 2: ExecuteParallelArticleFetchingAsync
  │   Input: List<SearxngResult>
  │   Output: List<Chunk> (with content, url, title)
  │
  ├─ Phase 3: ExecuteParallelEmbeddingsAsync
  │   Input: originalQuery + List<Chunk>
  │   Output: (queryEmbedding, chunkEmbeddings)
  │   (also sets Chunk.Embedding for valid chunks)
  │
  ├─ Phase 4: RankAndSelectTopChunks
  │   Input: List<Chunk> + queryEmbedding + chunkEmbeddings
  │   Output: List<Chunk> topChunks (with Score set)
  │
  └─ Format Context → return string

Phase 1: ExecuteParallelSearchesAsync

Purpose: Execute all search queries in parallel, collect and deduplicate results.

Implementation:

var allResults = new ConcurrentBag<SearxngResult>();

var searchTasks = generatedQueries.Select(async query =>
{
    onProgress?.Invoke($"[Searching web for '{query}'...]");
    try
    {
        var results = await _searsult in results)
            {
                allResults.Add(result);
            }
    }
    catch (Exception ex)
    {
        if (verbose)
            Console.WriteLine($"Warning: Search failed for query '{query}': {ex.Message}");
    }
});

await Task.WhenAll(searchTasks);

var uniqueResults = allResults.DistinctBy(r => r.Url).ToList();
return uniqueResults;

Details:

ConcurrentBag<SearxngResult> collects results thread-safely
Task.WhenAll - unbounded parallelism (parallel to generatedQueries.Count)
Each task: calls _searxngClient.SearchAsync(query, maxResults)
Errors caught and logged (verbose only); other queries continue
DistinctBy(r => r.Url) removes duplicates

Return: List<SearxngResult> (unique URLs only)

Progress: [Searching web for '{query}'...]

Potential Issues:

Could overwhelm local SearxNG if generatedQueries is large (100+)
SearxNG itself may have its own rate limiting

Future Enhancement:

Add semaphore to limit search concurrency
Add timeout per search task
Cache search results (same query across runs)

Phase 2: ExecuteParallelArticleFetchingAsync

Purpose: Fetch each search result URL, extract article content, split into chunks.

Implementation:

var chunks = new ConcurrentBag<Chunk>();
var completedFetches = 0;
var totalFetches = searchResults.Count;
var semaphore = new SemaphoreSlim(_options.MaxConcurrentArticleFetches); // 10

var fetchTasks = searchResults.Select(async result =>
{
    await semaphore.WaitAsync();
    try
    {
        var current = Interlocked.Increment(ref completedFetches);
        var uri = new Uri(result.Url);
        var domain = uri.Host;
        onProgress?.Invoke($"[Fetching article {current}/{totalFetches}: {domain}]");
        
        try
        {
            var article = await ArticleService.FetchArticleAsync(result.Url);
            if (!article.IsReadable || string.IsNullOrEmpty(article.TextContent))
                return;
            
            var textChunks = ChunkingService.ChunkText(article.TextContent);
            foreach (var chunkText in textChunks)
            {
                chunks.Add(new Chunk(chunkText, result.Url, article.Title));
            }
        }
        catch (Exception ex)
        {
            if (verbose)
                Console.WriteLine($"Warning: Failed to fetch article {result.Url}: {ex.Message}");
        }
    }
    finally
    {
        semaphore.Release();
    }
});

await Task.WhenAll(fetchTasks);
return chunks.ToList();

Details:

SemaphoreSlim limits concurrency to MaxConcurrentArticleFetches (10)
Interlocked.Increment for thread-safe progress counting
Progress: [Fetching article X/Y: domain] (extracts host from URL)
ArticleService.FetchArticleAsync uses SmartReader
Article must be IsReadable and have TextContent
ChunkingService.ChunkText splits into ~500-char pieces
Each chunk becomes a Chunk(content, url, article.Title)
Errors logged (verbose only); failed URLs yield no chunks

Return: List<Chunk> (potentially many per article)

Chunk Count Estimate:

15 articles × average 3000 chars/article = 45,000 chars
With 500-char chunks ≈ 90 chunks
With natural breaks → maybe 70-80 chunks

Potential Issues:

Some sites block SmartReader (JS-heavy, paywalls)
Slow article fetches may cause long tail latency
Large articles create many chunks → memory + embedding cost

Future Enhancements:

Add per-URL timeout
Filter chunks by length threshold (skip tiny chunks)
Deduplicate chunks across articles (same content on different sites)
Cache article fetches by URL

Phase 3: ExecuteParallelEmbeddingsAsync

Purpose: Generate embeddings for the original query and all chunks, with batching, rate limiting, and concurrency control.

Implementation:

onProgress?.Invoke($"[Generating embeddings for {chunks.Count} chunks and query...]");

// Start query embedding (single) and chunk embeddings (batch) concurrently
var queryEmbeddingTask = _embeddingService.GetEmbeddingAsync(originalQuery);

var chunkTexts = chunks.Select(c => c.Embedding).ToList(); // WRONG in original code?
// Actually: chunks.Select(c => c.Content).ToList();
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
    chunkTexts, onProgress);

await Task.WhenAll(queryEmbeddingTask, chunkEmbeddingsTask);

var queryEmbedding = await queryEmbeddingTask;
var chunkEmbeddings = await chunkEmbeddingsTask;

// Filter out chunks with empty embeddings
var validChunks = new List<Chunk>();
var validEmbeddings = new List<float[]>();

for (var i = 0; i < chunks.Count; i++)
{
    if (chunkEmbeddings[i].Length > 0)
    {
        validChunks.Add(chunks[i]);
        validEmbeddings.Add(chunkEmbeddings[i]);
    }
}

// Update chunks with embeddings
for (var i = 0; i < validChunks.Count; i++)
{
    validChunks[i].Embedding = validEmbeddings[i];
}

return (queryEmbedding, validEmbeddings.ToArray());

Corrected Code (matching actual source):

var chunkTexts = chunks.Select(c => c.Content).ToList();
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
    chunkTexts, onProgress);

Details:

Query embedding: Single request for original question (one embedding)
Chunk embeddings: Batch processing of all chunk texts
Both run concurrently via Task.WhenAll
_embeddingService.GetEmbeddingsWithRateLimitAsync uses:
- Batch size: 300 (default)
- Max concurrent batches: 4 (default)
- Polly retry (3 attempts, exponential backoff)
- RateLimiter (semaphore) for API concurrency
Failed batches return empty float[] (length 0)
Filters out failed chunks (won't be ranked)
validChunks[i].Embedding = validEmbeddings[i] attaches embedding to chunk

Return: (float[] queryEmbedding, float[][] chunkEmbeddings) where:

chunkEmbeddings length = validChunks.Count (filtered)
Order matches validChunks order (since we filtered parallel arrays)

Progress: Interleaved from embedding service's own progress callbacks (batch X/Y)

Potential Issues:

GetEmbeddingsWithRateLimitAsync uses results[batchIndex] = ... which is not thread-safe without synchronization - BUG?
- Actually results is an array, not a list, so indexing is thread-safe
- But concurrent writes to different indices are safe
Filtering loop assumes chunkEmbeddings has same count as chunks; if embedding service returns fewer, might index out of range
- Looking at GetEmbeddingsWithRateLimitAsync: returns results.SelectMany(r => r).ToArray() which should match input count (including empty arrays for failed batches)
- So safe

Memory Consideration:

chunkTexts list holds all chunk strings (may be large, but still in memory)
chunkEmbeddings holds all float arrays (600KB for 100 chunks)
Total: modest (~few MB)

Future Enhancements:

Stream embeddings? (No benefit, need all for ranking)
Cache embeddings by content hash (cross-run)
Support different embedding model per query

Phase 4: RankAndSelectTopChunks

Purpose: Score chunks by semantic relevance to query, sort, and select top N.

Implementation:

var chunksWithEmbeddings = chunks.Where(c => c.Embedding != null).ToList();

foreach (var chunk in chunksWithEmbeddings)
{
    chunk.Score = EmbeddingService.CosineSimilarity(queryEmbedding, chunk.Embedding!);
}

var topChunks = chunksWithEmbeddings
    .OrderByDescending(c => c.Score)
    .Take(topChunksLimit)
    .ToList();

return topChunks;

Details:

Filters to chunks that have embeddings (successful phase 3)
For each: Score = CosineSimilarity(queryEmbedding, chunkEmbedding)
- Uses TensorPrimitives.CosineSimilarity (SIMD-accelerated)
- Returns float typically 0-1 (higher = more relevant)
OrderByDescending - highest scores first
Take(topChunksLimit) - select top N (from --chunks option)
Returns List<Chunk> (now with Score set)

Return: Top N chunks ready for context formatting

Complexity:

O(n) for scoring (where n = valid chunks, typically 50-100)
O(n log n) for sorting (fast for n=100)
Negligible CPU time

Edge Cases:

If topChunksLimit > chunksWithEmbeddings.Count, returns all (no padding)
If all embeddings failed, returns empty list
Should handle topChunksLimit == 0 (returns empty)

Context Formatting (After Phase 4)

Location: In ExecuteAsync, after ranking:

var context = string.Join("\n\n", topChunks.Select((c, i) =>
    $"[Source {i + 1}: {c.Title ?? "Unknown"}]({c.SourceUrl})\n{c.Content}"));

return context;

Format:

[Source 1: Article Title](https://example.com/article)
Chunk content text...

[Source 2: Another Title](https://example.com/another)
Chunk content text...

[Source 3: Third Title](https://example.com/third)
Chunk content text...

Features:

Each source numbered 1, 2, 3... (matches order of topChunks = descending relevance)
Title or "Unknown" if null
Title is markdown link to original URL
Chunk content as plain text (may contain its own formatting)
Double newline between sources

Rationale:

Markdown links allow copy-pasting to browsers
Numbers allow LLM to cite [Source 1] in answer
Original title helps user recognize source

Potential Issues:

LLM might misinterpret "Source 1" as literal citation required
If chunks contain markdown, may conflict (no escaping)
Some titles may have markdown special chars (unlikely but possible)

Alternative: Could use XML-style tags or more robust citation format.

Error Handling & Edge Cases

Empty Results Handling

At end of ExecuteAsync:

if (searchResults.Count == 0)
    return "No search results found.";

if (chunks.Count == 0)
    return "Found search results but could not extract readable content.";

These messages appear in final answer (LLM will respond to these contexts).

Partial Failures

Some search queries fail → proceed with others
Some articles fail to fetch → continue
Some embedding batches fail → those chunks filtered out
Ranking proceeds with whatever valid embeddings exist

Verbose vs Compact Progress

verbose parameter affects what's passed to phases:

Article fetching: errors only shown if verbose
Embeddings: always shows batch progress via onProgress (from EmbeddingService)
Searches: no error suppression (warning always logged to Console, not through callback)

Progress Callback Pattern

onProgress is invoked at major milestones:

Searching: [Searching web for '{query}'...]
Article fetch: [Fetching article X/Y: domain]
Embeddings: [Generating embeddings: batch X/Y]
Final: [Found top X most relevant chunks overall. Generating answer...]

Each phase may invoke many times (e.g., embedding batches). StatusReporter handles these appropriately.

Performance Characteristics

Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles):

Phase	Time	Dominated By
Searches	3-8s	Network latency to SearxNG
Article Fetching	5-15s	Network + SmartReader CPU
Embeddings	2-4s	OpenRouter API latency (4 concurrent batches)
Ranking	<0.1s	CPU (O(n log n) sort, n~100)
Total Pipeline	10-30s	Articles + Searches

Concurrency Limits Effect

Article Fetching (MaxConcurrentArticleFetches = 10):

15 articles → 2 waves (10 then 5)
If each takes 2s → ~4s total (vs 30s sequential)

Embedding Batching (MaxConcurrentEmbeddingRequests = 4, EmbeddingBatchSize = 300):

80 chunks → 1 batch of 300 (all fit)
If 300 chunks → 1 batch (300 fits), but max concurrent = 4 if multiple embedding calls
Here: single embedding call with 80 items = 1 batch (no parallelism needed)

Memory Usage

searchResults (15 items) → ~30KB
chunks (80 items × 500 chars) → ~40KB text + embeddings ~400KB (80 × 1536 × 4)
Total ≈ 500KB excluding temporary HTTP buffers

Design Decisions

Why Use ConcurrentBag for Results/Chunks?

Thread-safe collection allows parallel tasks to add without locks. Order is not preserved (but we DistinctBy and Select maintains order of insertion? Actually no, ConcurrentBag doesn't guarantee order. But we later ToList() and DistinctBy preserves first occurrence order from the bag's enumeration (which is nondeterministic). This is acceptable because order doesn't matter (ranking is semantic). If order mattered, would need ConcurrentQueue or sorting by source.

Why Not Use Parallel.ForEach for Article Fetching?

We use Task.WhenAll with Select + semaphore. Parallel.ForEachAsync could also work but requires .NET 6+ and we want to use same pattern as other phases. Semaphore gives explicit concurrency control.

Why Separate Query Embedding from Chunk Embeddings?

GetEmbeddingAsync is called directly (not batched) because there's only one query. Could be batched with chunks but:

Query is small (single string)
Batch API has overhead (request structure)
Separate call allows independent completion (no need to wait for chunks to start query embedding)

Why Two Different Embedding Methods?

EmbeddingService has:

GetEmbeddingsWithRateLimitAsync (used in SearchTool)
GetEmbeddingsAsync (similar but different implementation)

Probably legacy/refactor artifact. Could consolidate.

Why Not Deduplicate URLs Earlier?

Deduplication happens after search aggregation. Could also deduplicate within each search result (SearxNG might already dedupe across engines). But global dedupe is necessary.

Why Not Early Filtering (e.g., by domain, length)?

Possibly could improve quality:

Filter by domain reputation
Filter articles too short (<200 chars) or too long (>50KB)
Not implemented (keep simple)

Testing Considerations

Unit Testability: SearchTool is fairly testable with mocks:

Mock SearxngClient to return predetermined results
Mock ArticleService via EmbeddingService (or mock that too)
Verify progress callback invocations
Verify final context format

Integration Testing:

End-to-end with real/mocked external services
Need test SearxNG instance and test OpenRouter key (or mock responses)

Performance Testing:

Benchmark with different concurrency settings
Profile memory for large result sets (1000+ articles)
Measure embedding API latency impact

Known Issues

Bug in ExecuteParallelEmbeddingsAsync?

Looking at the actual source code of ExecuteParallelEmbeddingsAsync in the core SearchTool:

var chunkTexts = chunks.Select(c => c.Content).ToList();
var chunkEmbeddingsTask = _embeddingService.GetEmbeddingsWithRateLimitAsync(
    chunkTexts, onProgress);

This is correct.

But in the initial search result, I notice there might be confusion. I'll verify this when writing the full component documentation.

Potential Race Condition in GetEmbeddingsWithRateLimitAsync

results[batchIndex] = batchResults;

This is writing to an array index from multiple parallel tasks. Array index writes are atomic for reference types on 64-bit? Actually, writes to different indices are safe because they don't overlap. This is fine.

Progress Callback May Overwhelm

If invoked synchronously from many parallel tasks, could saturate the channel. Channel.TryWrite will return false if buffer full; we ignore return value. Could drop messages under heavy load. Acceptable for CLI UI (some messages may be lost but overall progress visible).

OpenQueryApp - calls this
SearxngClient - phase 1
ArticleService - phase 2a
ChunkingService - phase 2b
EmbeddingService - phase 3
Ranking - cosine similarity

Next Steps

Services Overview - See supporting services
CLI Reference - How users trigger this pipeline
Performance - Optimize pipeline settings

19 KiB Raw Permalink Blame History Unescape Escape

SearchTool Component

Overview

Location

Class Definition

ExecuteAsync Method

The 4-Phase Pipeline

Phase 1: ExecuteParallelSearchesAsync

Phase 2: ExecuteParallelArticleFetchingAsync

Phase 3: ExecuteParallelEmbeddingsAsync

Phase 4: RankAndSelectTopChunks

Context Formatting (After Phase 4)

Error Handling & Edge Cases

Empty Results Handling

Partial Failures

Verbose vs Compact Progress

Progress Callback Pattern

Performance Characteristics

Time Estimate per Phase (for typical 3 queries, 5 results each, ~15 articles):

Concurrency Limits Effect

Memory Usage

Design Decisions

Why Use ConcurrentBag for Results/Chunks?

Why Not Use Parallel.ForEach for Article Fetching?

Why Separate Query Embedding from Chunk Embeddings?

Why Two Different Embedding Methods?

Why Not Deduplicate URLs Earlier?

Why Not Early Filtering (e.g., by domain, length)?

Testing Considerations

Known Issues

Bug in ExecuteParallelEmbeddingsAsync?

Potential Race Condition in GetEmbeddingsWithRateLimitAsync

Progress Callback May Overwhelm

Related Components

Next Steps

19 KiB

Raw Permalink Blame History