Files

OpenQuery Documentation 65ca2401ae docs: add comprehensive documentation with README and detailed guides

- Add user-friendly README.md with quick start guide
- Create docs/ folder with structured technical documentation:
  - installation.md: Build and setup instructions
  - configuration.md: Complete config reference
  - usage.md: CLI usage guide with examples
  - architecture.md: System design and patterns
  - components/: Deep dive into each component (OpenQueryApp, SearchTool, Services, Models)
  - api/: CLI reference, environment variables, programmatic API
  - troubleshooting.md: Common issues and solutions
  - performance.md: Latency, throughput, and optimization
- All documentation fully cross-referenced with internal links
- Covers project overview, architecture, components, APIs, and support

See individual files for complete documentation.

2026-03-19 10:01:58 +01:00

11 KiB

Raw Permalink Blame History

OpenQueryApp Component

Deep dive into the OpenQueryApp class - the main application orchestrator.

Overview

OpenQueryApp is the heart of OpenQuery. It coordinates all components, manages the workflow from question to answer, and handles progress reporting.

Location

OpenQuery.cs in project root

Class Definition

public class OpenQueryApp
{
    private readonly OpenRouterClient _client;
    private readonly SearchTool _searchTool;
    private readonly string _model;

    public OpenQueryApp(
        OpenRouterClient client,
        SearchTool searchTool,
        string model);
    
    public async Task RunAsync(OpenQueryOptions options);
}

Dependencies:

OpenRouterClient - for query generation and final answer streaming
SearchTool - for search-retrieve-rank pipeline
string _model - model identifier to use for LLM calls

Lifecycle: Instantiated once per query execution in Program.cs, then RunAsync() called once.

RunAsync Workflow

public async Task RunAsync(OpenQueryOptions options)
{
    // 1. Setup
    using var reporter = new StatusReporter(options.Verbose);
    reporter.StartSpinner();
    
    // 2. Query Generation (if needed)
    List<string> queries = await GenerateQueriesIfNeededAsync(options, reporter);
    
    // 3. Search Pipeline
    string searchResult = await ExecuteSearchPipelineAsync(options, queries, reporter);
    
    // 4. Final Answer Streaming
    await StreamFinalAnswerAsync(options, searchResult, reporter);
}

Step 1: Status Reporter Setup

using var reporter = new StatusReporter(options.Verbose);
reporter.StartSpinner();

Creates StatusReporter (implements IDisposable)
Starts spinner animation (unless verbose)
using ensures disposal on exit

Step 2: Query Generation

When: options.Queries > 1 (user wants multiple search queries)

Purpose: Use LLM to generate diverse, optimized search queries from the original question

System Prompt (hardcoded in OpenQuery.cs):

You are an expert researcher. The user will ask a question. Your task is to 
generate optimal search queries to gather comprehensive information.

Instructions:
1. Break down complex questions.
2. Use synonyms and alternative phrasing.
3. Target different aspects (entities, mechanisms, pros/cons, history).

CRITICAL: Output must be a valid JSON array of strings ONLY. No markdown, 
explanations, or other text.

Request:

var queryGenMessages = new List<Message>
{
    new Message("system", systemPrompt),
    new Message("user", $"Generate {options.Queries} distinct search queries for:\n{options.Question}")
};
var request = new ChatCompletionRequest(_model, queryGenMessages);
var response = await _client.CompleteAsync(request);

Response Parsing:

var content = response.Choices.FirstOrDefault()?.Message.Content;
if (!string.IsNullOrEmpty(content))
{
    // Remove markdown code fences if present
    content = Regex.Replace(content, @"```json\s*|\s*```", "").Trim();
    
    // Deserialize to List<string>
    var generatedQueries = JsonSerializer.Deserialize(content, AppJsonContext.Default.ListString);
    if (generatedQueries != null && generatedQueries.Count > 0)
    {
        queries = generatedQueries;
    }
}

Fallback: If any step fails (exception, null, empty, invalid JSON), use new List<string> { options.Question } (single query = original)

Note: Query generation reuses the same model as final answer. This could be optimized:

Use cheaper/faster model for query gen
Separate model configuration
Cache query generation results

Step 3: Search Pipeline Execution

var searchResult = await _searchTool.ExecuteAsync(
    options.Question,
    queries,
    options.Results,
    options.Chunks,
    (progress) => {
        if (options.Verbose)
            reporter.WriteLine(progress);
        else
            reporter.UpdateStatus(parsedMessage);
    },
    options.Verbose);

Parameters:

originalQuery: User's original question (used for final embedding)
generatedQueries: From step 2 (or fallback)
maxResults: options.Results (search results per query)
topChunksLimit: options.Chunks (top N chunks to return)
onProgress: Callback to update UI
verbose: Passed through to SearchTool

Returns: string context - formatted context with source citations

Progress Handling:

In verbose mode: all progress printed as lines (via reporter.WriteLine())
In compact mode: parse progress messages to show concise status (e.g., "Fetching articles 3/10...")

Step 4: Final Answer Streaming

Status Update:

if (!options.Verbose)
    reporter.UpdateStatus("Asking AI...");
else
{
    reporter.ClearStatus();
    Console.WriteLine();
}

Build System Prompt:

var systemPrompt = "You are a helpful AI assistant. Answer the user's question in depth, based on the provided context. Be precise and accurate. You can mention sources or citations.";
if (options.Short) systemPrompt += " Give a very short concise answer.";
if (options.Long) systemPrompt += " Give a long elaborate detailed answer.";

Prompt Structure:

System: {systemPrompt}
User: Context:
{searchResult}

Question: {options.Question}

Where searchResult is:

[Source 1: Title](URL)
Content chunk 1

[Source 2: Title](URL)
Content chunk 2

...

Streaming:

var requestStream = new ChatCompletionRequest(_model, messages);
var assistantResponse = new StringBuilder();
var isFirstChunk = true;

using var streamCts = new CancellationTokenSource();
await foreach (var chunk in _client.StreamAsync(requestStream, streamCts.Token))
{
    if (chunk.TextDelta == null) continue;
    
    if (isFirstChunk)
    {
        reporter.StopSpinner();
        if (!options.Verbose) reporter.ClearStatus();
        else Console.Write("Assistant: ");
        isFirstChunk = false;
    }
    
    Console.Write(chunk.TextDelta);
    assistantResponse.Append(chunk.TextDelta);
}

Key Points:

StreamAsync yields StreamChunk objects (text deltas)
First chunk stops spinner and clears status line
Each delta written to Console immediately (real-time feel)
Entire response accumulated in assistantResponse (though not used elsewhere)
CancellationTokenSource passed but not canceled (Ctrl+C would cancel from outside)

Finally Block:

finally
{
    reporter.StopSpinner();
}

Ensures spinner stops even if streaming fails.

End:

Console.WriteLine(); // Newline after complete answer

Error Handling

RunAsync itself does not catch exceptions. All exceptions propagate to Program.cs:

try
{
    var openQuery = new OpenQueryApp(client, searchTool, model);
    await openQuery.RunAsync(options);
}
catch (HttpRequestException ex)
{
    Console.Error.WriteLine($"\n[Error] Network request failed. Details: {ex.Message}");
    Environment.Exit(1);
}
catch (Exception ex)
{
    Console.Error.WriteLine($"\n[Error] An unexpected error occurred: {ex.Message}");
    Environment.Exit(1);
}

Common Exceptions:

HttpRequestException - network failures, API errors
JsonException - malformed JSON from API
TaskCanceledException - timeout or user interrupt
Exception - anything else

No Retries at This Level: Fail fast; user sees error immediately. Lower-level retries exist (embedding service).

Performance Characteristics

Query Generation:

One non-streaming LLM call
Takes 2-5 seconds depending on model
Typically <1000 tokens

Search Pipeline (SearchTool.ExecuteAsync):

See SearchTool.md for detailed timing breakdown
Total 10-30 seconds typically

Final Answer Streaming:

Streaming LLM call
Time depends on answer length (typically 5-20 seconds)
User sees words appear progressively

Total End-to-End: 15-50 seconds for typical query

Design Decisions

Why Not Stream Query Generation?

Query generation currently uses CompleteAsync (non-streaming). Could be streamed but:

Queries are short (JSON array)
Streaming offers no UX benefit (user doesn't see intermediate queries)
Simpler to wait for all queries before proceeding

Why Build Prompt Manually Instead of Templates?

Simple string concatenation is fine for few prompts. Pros:

No template dependencies
Easy to read and modify
No runtime compilation overhead

Cons:

No validation
Could benefit from prompt engineering framework

Why Accumulate `assistantResponse` StringBuilder?

Currently built but not used. Could be:

Saved to file (future feature: --output file.md)
Analyzed for token counting
Removed if not needed

Could Query Generation Be Cached?

Yes! For repeated questions (common in scripts), cache query results:

Dictionary<string, List<string>> cache in memory
Or persistent cache (Redis, file)
Not implemented (low priority)

Single Responsibility Violation?

OpenQueryApp does:

Query generation
Pipeline orchestration
Answer streaming

That's 3 responsibilities, but they're tightly coupled to the "query → answer" workflow. Separating them would add complexity without clear benefit. Acceptable as "application coordinator".

Extension Points

Adding New Model for Query Generation

Currently uses same _model for queries and answer. To use different models:

Add queryGenerationModel parameter to constructor
Use it for query gen: new ChatCompletionRequest(queryGenerationModel, queryGenMessages)
Keep _model for final answer

Or make it configurable via environment variable: OPENROUTER_QUERY_MODEL

Post-Processing Answer

Opportunity to add:

Source citation formatting (footnotes, clickable links)
Answer summarization
Export to Markdown/JSON
Text-to-speech

Add after streaming loop, before final newline.

Progress UI Enhancement

Current StatusReporter is basic. Could add:

Progress bar with percentage
ETA calculation
Colors (ANSI) for different message types
Logging to file
Web dashboard

Would require extending StatusReporter or replacing it.

Testing Considerations

Challenges:

RunAsync is cohesive (hard to unit test in isolation)
Depends on many services (need mocks)
Asynchronous and streaming

Recommended Approach:

Extract interfaces:
- ISearchTool (wrapper around SearchTool)
- IOpenRouterClient (wrapper around OpenRouterClient)
Mock interfaces in tests
Test query generation parsing separately
Test progress callback counting
Test final answer prompt construction

Integration Tests:

End-to-end with real/mocked APIs
Automated tests with test SearxNG/OpenRouter instances

SearchTool - pipeline executed by OpenQueryApp
Program.cs - creates OpenQueryApp
StatusReporter - progress UI used by OpenQueryApp

Next Steps

SearchTool - See the pipeline in detail
Services - Understand each service
CLI Reference - How users invoke this

11 KiB Raw Permalink Blame History