OpenQuery/docs/troubleshooting.md

# Troubleshooting

Solve common issues, errors, and performance problems with OpenQuery.

## 📋 Table of Contents

1. [Common Errors](#common-errors)
2. [Performance Issues](#performance-issues)
3. [Debugging Strategies](#debugging-strategies)
4. [Getting Help](#getting-help)

## Common Errors

### ❌ "API Key is missing"

**Error Message**:
```
[Error] API Key is missing. Set OPENROUTER_API_KEY environment variable or run 'configure -i' to set it up.
```

**Cause**: No API key available from environment or config file.

**Solutions**:

1. **Set environment variable** (temporary):
```bash
export OPENROUTER_API_KEY="sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
```

2. **Configure interactively** (persistent):
```bash
openquery configure -i
# Follow prompts to enter API key
```

3. **Check config file**:
```bash
cat ~/.config/openquery/config
# Should contain: ApiKey=sk-or-...
```

4. **Verify environment**:
```bash
echo $OPENROUTER_API_KEY
# If empty, you didn't export or exported in wrong shell
```

---

### ❌ "Network request failed"

**Error Message**:
```
[Error] Network request failed. Details: Name or service not known
```

**Cause**: Cannot reach OpenRouter or SearxNG API endpoints.

**Solutions**:

1. **Check internet connectivity**:
```bash
ping 8.8.8.8
curl https://openrouter.ai
```

2. **Verify SearxNG is running**:
```bash
curl "http://localhost:8002/search?q=test&format=json"
# Should return JSON
```

If connection refused:
```bash
# Start SearxNG if using Docker
docker start searxng
# Or run fresh
docker run -d --name searxng -p 8002:8080 searxng/searxng:latest
```

3. **Check firewall/proxy**:
```bash
# Test OpenRouter API
curl -H "Authorization: Bearer $OPENROUTER_API_KEY" \
     https://openrouter.ai/api/v1/models
```

4. **Test from different network** (if behind restrictive firewall)

---

### ❌ "No search results found"

**Error Message**:
```
No search results found.
```

**Cause**: Search queries returned zero results from SearxNG.

**Solutions**:

1. **Test SearxNG manually**:
```bash
curl "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
# Should be > 0
```

2. **Check SearxNG configuration**:
- If self-hosted: ensure internet access is enabled in `/etc/searxng/settings.yml`
- Some public instances disable certain engines or have rate limits

3. **Try a different SearxNG instance**:
```bash
export SEARXNG_URL="https://searx.example.com"
openquery "question"
```

4. **Use simpler queries**: Some queries may be too obscure or malformed

5. **Verbose mode to see queries**:
```bash
openquery -v "complex question"
# See what queries were generated
```

---

### ❌ "Found search results but could not extract readable content."

**Cause**: SearxNG returned results but `ArticleService` failed to extract content from all URLs.

**Common Reasons**:
- JavaScript-heavy sites (React, Vue apps) where content loaded dynamically
- Paywalled sites (NYT, academic journals)
- PDFs or non-HTML content
- Malformed HTML
- Server returned error (404, 403, 500)
- `robots.txt` blocked crawler

**Solutions**:
1. **Accept that some sites can't be scraped** - try different query to get different results
2. **Use site:reddit.com or site:wikipedia.org** - these are usually scrape-friendly
3. **Increase `--results`** to get more URLs (some will work)
4. **Check verbose output**:
```bash
openquery -v "question"
# Look for "Warning: Failed to fetch article"
```
5. **Try a local SearxNG instance with more engines** - some engines fetch different sources

---

### ❌ Rate Limiting (429 Too Many Requests)

**Symptoms**:
```bash
[Error] Response status code does not indicate success: 429 (Too Many Requests).
```

Or retries exhausting after Polly attempts.

**Cause**: Too many concurrent requests to OpenRouter API.

**Solutions**:

1. **Reduce concurrency** (edit `SearchTool.cs`):
```csharp
var _options = new ParallelProcessingOptions
{
    MaxConcurrentArticleFetches = 5,  // reduce from 10
    MaxConcurrentEmbeddingRequests = 2,  // reduce from 4
    EmbeddingBatchSize = 150  // reduce from 300
};
```

2. **Add delay** between embedding batches (custom implementation)

3. **Upgrade OpenRouter plan** to higher rate limits

4. **Wait and retry** - rate limits reset after time window

---

### ❌ Slow Performance

**Symptom**: Queries take 60+ seconds when they usually take 20s.

**Diagnosis Steps**:

1. **Run with verbose mode**:
```bash
openquery -v "question"
```
Watch which phase takes longest:
- Query generation?
- Searching?
- Fetching articles?
- Embeddings?

2. **Check network latency**:
```bash
time curl "https://openrouter.ai/api/v1/models"
time curl "http://localhost:8002/search?q=test&format=json"
```

**Common Causes & Fixes**:

| Phase | Cause | Fix |
|-------|-------|-----|
| Searches | SearxNG overloaded/slow | Check CPU/memory, restart container |
| Fetching | Target sites slow | Reduce `--results` to fewer URLs |
| Embeddings | API rate limited | Reduce concurrency (see above) |
| Answer | Heavy model/load | Switch to faster model (e.g., Qwen Flash) |

3. **Resource monitoring**:
```bash
htop  # CPU/memory usage
iftop  # network throughput
```

4. **Reduce parameters**:
```bash
openquery -q 2 -r 3 -c 2 "question"  # lighter load
```

---

### ❌ Out of Memory

**Symptoms**:
- Process killed by OOM killer (Linux)
- `System.OutOfMemoryException`
- System becomes unresponsive

**Cause**: Processing too many large articles simultaneously.

**Why**: Each article can be 100KB+ of text, split into many chunks, embeddings are 6KB per chunk (1536 floats × 4 bytes). 200 chunks = 1.2MB embeddings, plus text ~100KB = 1.3MB. Not huge, but many large articles could create thousands of chunks.

**Solutions**:

1. **Reduce `--results`** (fewer URLs per query):
```bash
openquery -r 3 "question"  # instead of 10
```

2. **Reduce `--queries`** (fewer search queries):
```bash
openquery -q 2 "question"
```

3. **Fetches already limited** to 10 concurrent by default, which is reasonable

4. **Check article size**: Some sites (PDFs, long documents) may yield megabytes of text; SmartReader should truncate but may not

---

### ❌ Invalid JSON from Query Generation

**Symptom**: Query generation fails silently, falls back to original question.

**Cause**: LLM returned non-JSON (even though instructed). Could be:
- Model not instruction-following
- Output exceeded context window
- API error in response

**Detection**: Run with `-v` to see:
```
[Failed to generate queries, falling back to original question. Error: ...]
```

**Solutions**:
- Try a different model (configure to use Gemini or DeepSeek)
- Reduce `--queries` count (simpler task)
- Tune system prompt (would require code change)
- Accept fallback - the original question often works as sole query

---

### ❌ Spinner Artifacts in Output

**Symptom**: When redirecting output to file, you see weird characters like `⠋`, `<60>`, etc.

**Cause**: Spinner uses Unicode Braille characters and ANSI escape codes.

**Fix**: Use `2>/dev/null | sed 's/.\x08//g'` to clean:
```bash
openquery "question" 2>/dev/null | sed 's/.\x08//g' > answer.md
```

Or run with `--verbose` (no spinner, only newline-separated messages):
```bash
openquery -v "question" > answer.txt
```

---

### ❌ "The type or namespace name '...' does not exist" (Build Error)

**Cause**: Missing NuGet package or wrong .NET SDK version.

**Solution**:

1. **Verify .NET SDK 10.0**:
```bash
dotnet --version
# Should be 10.x
```

If lower: https://dotnet.microsoft.com/download/dotnet/10.0

2. **Restore packages**:
```bash
dotnet restore
```

3. **Clean and rebuild**:
```bash
dotnet clean
dotnet build
```

4. **Check OpenQuery.csproj** for package references:
```xml
<PackageReference Include="Polly.Core" Version="8.6.6" />
<PackageReference Include="Polly.RateLimiting" Version="8.6.6" />
<PackageReference Include="SmartReader" Version="0.11.0" />
<PackageReference Include="System.CommandLine" Version="2.0.0-beta4.22272.1" />
<PackageReference Include="System.Numerics.Tensors" Version="9.0.0" />
```

If restore fails, these packages may not be available for .NET 10 preview. Consider:
- Downgrade to .NET 8.0 (if packages incompatible)
- Or find package versions compatible with .NET 10

---

### ❌ AOT Compilation Fails

**Error**: `error NETSDK1085: The current .NET SDK does not support targeting .NET 10.0.`

**Cause**: Using .NET SDK older than 10.0.

**Fix**: Install .NET SDK 10.0 preview.

**Or**: Disable AOT for development (edit `.csproj`):
```xml
<!-- Remove or set to false -->
<PublishAot>false</PublishAot>
```

---

## Performance Issues

### Slow First Request

**Expected**: First query slower (JIT compilation for .NET runtime if not AOT, or initial API connections).

If not using AOT:
- Consider publishing with `/p:PublishAot=true` for production distribution
- Development builds use JIT, which adds 500ms-2s warmup

**Mitigation**: Accept as warmup cost, or pre-warm with dummy query.

---

### High Memory Usage

**Check**:
```bash
ps aux | grep OpenQuery
# Look at RSS (resident set size)
```

**Typical**: 50-200MB (including .NET runtime, AOT code, data structures)

**If >500MB**:
- Likely processing very many articles
- Check `--results` and `--queries` values
- Use `--verbose` to see counts: `[Fetched X search results]`, `[Extracted Y chunks]`

**Reduce**:
- `--queries 2` instead of 10
- `--results 3` instead of 15
- These directly limit number of URLs to fetch

---

### High CPU Usage

**Cause**:
- SmartReader HTML parsing (CPU-bound)
- Cosine similarity calculations (many chunks, but usually fast)
- Spinner animation (negligible)

**Check**: `htop` → which core at 100%? If single core, likely parsing. If all cores, parallel fetch.

**Mitigation**:
- Ensure `MaxConcurrentArticleFetches` not excessively high (default 10 is okay)
- Accept - CPU spikes normal during fetch phase

---

### API Costs Higher Than Expected

**Symptom**: OpenRouter dashboard shows high token usage.

**Causes**:
1. Using expensive model (check `OPENROUTER_MODEL`)
2. High `--chunks` → more tokens in context
3. High `--queries` + `--results` → many articles → many embedding tokens (usually cheap)
4. Long answers (many completion tokens) - especially with `--long`

**Mitigation**:
- Use `qwen/qwen3.5-flash-02-23` (cheapest good option)
- Reduce `--chunks` to 2-3
- Use `--short` when detailed answer not needed
- Set `MaxTokens` in request (would need code change or **LLM capabilities**

---

## Debugging Strategies

### 1. Enable Verbose Mode

Always start with:
```bash
openquery -v "question" 2>&1 | tee debug.log
```

Logs everything:
- Generated queries
- URLs fetched
- Progress counts
- Errors/warnings

**Analyze log**:
- How many queries generated? (Should match `--queries`)
- How many search results per query? (Should be ≤ `--results`)
- How many articles fetched successfully?
- How many chunks extracted?
- Any warnings?

---

### 2. Isolate Components

**Test SearxNG**:
```bash
curl "http://localhost:8002/search?q=test&format=json" | jq '.results[0]'
```

**Test OpenRouter API**:
```bash
curl -X POST https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen/qwen3.5-flash-02-23","messages":[{"role":"user","content":"Hello"}]}'
```

**Test Article Fetching** (with known good URL):
```bash
curl -L "https://example.com/article" | head -50
```
Then check if SmartReader can parse.

---

### 3. Reduce Scope

Test with minimal parameters to isolate failing phase:

```bash
# 1 query, 2 results, 1 chunk - should be fast and simple
openquery -q 1 -r 2 -c 1 "simple test question" -v

# If that works, gradually increase:
openquery -q 1 -r 5 -c 1 "simple question"
openquery -q 3 -r 5 -c 1 "simple question"
openquery -q 3 -r 5 -c 3 "simple question"

# Then try complex question
```

---

### 4. Check Resource Limits

**File descriptors**: If fetching many articles, may hit limit.
```bash
ulimit -n  # usually 1024, should be fine
```

**Memory**: Monitor with `free -h` while running.

**Disk space**: Not much disk use, but logs could fill if verbose mode used repeatedly.

---

### 5. Examine Config File

```bash
cat ~/.config/openquery/config
# Ensure no spaces around '='
# Correct: ApiKey=sk-or-...
# Wrong: ApiKey = sk-or-...  (spaces become part of value)
```

Reconfigure if needed:
```bash
openquery configure --key "sk-or-..."
```

---

### 6. Clear Cache / Reset

No persistent cache exists, but:
- Re-start SearxNG container: `docker restart searxng`
- Clear DNS cache if network issues: `sudo systemd-resolve --flush-caches`

---

## Getting Help

### Before Asking

Gather information:

1. **OpenQuery version** (commit or build date if available)
2. **OS and architecture**: `uname -a` (Linux/macOS) or `systeminfo` (Windows)
3. **Full command** you ran
4. **Verbose output**: `openquery -v "question" 2>&1 | tee log.txt`
5. **Config** (redact API key):
```bash
sed 's/ApiKey=.*/ApiKey=REDACTED/' ~/.config/openquery/config
```
6. **SearxNG test**:
```bash
curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length'
```
7. **OpenRouter test**:
```bash
curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \
     https://openrouter.ai/api/v1/models | jq '.data[0].id'
```

---

### Where to Ask

1. **GitHub Issues** (if repository hosted there):
   - Search existing issues first
   - Provide all info from above
   - Include log file (or link to gist)

2. **Community Forum** (if exists)

3. **Self-Diagnose**:
   - Check `docs/troubleshooting.md` (this file)
   - Check `docs/configuration.md`
   - Check `docs/usage.md`

---

### Example Bug Report

```
Title: OpenQuery hangs on "Fetching article X/Y"

Platform: Ubuntu 22.04, .NET 10.0, OpenQuery built from commit abc123
Command: openquery -v "What is Docker?" 2>&1 | tee log.txt

Verbose output shows:
[...]
[Fetching article 1/15: docker.com]
[Fetching article 2/15: hub.docker.com]
[Fetching article 3/15: docs.docker.com]
# Hangs here indefinitely, no more progress

SearxNG test:
$ curl "http://localhost:8002/search?q=docker&format=json" | jq '.results | length'
15  # SearxNG works

Config:
ApiKey=sk-or-xxxx (redacted)
Model=qwen/qwen3.5-flash-02-23
DefaultQueries=3
DefaultChunks=3
DefaultResults=5

Observation:
- Fetches 3 articles fine, then stalls
- Nothing in log after "Fetching article 3/15"
- Process uses ~150MB memory, CPU 0% (idle)
- Ctrl+C exits immediately

Expected: Should fetch remaining 12 articles (concurrent up to 10)
Actual: Only 3 fetched, then silent hang
```

---

## Known Issues

### Issue: Spinner Characters Not Displaying

Some terminals don't support Braille Unicode patterns.

**Symptoms**: Spinner shows as `?` or boxes.

**Fix**: Use font with Unicode support, or disable spinner by setting `TERM=dumb` or use `--verbose`.

---

### Issue: Progress Messages Overwritten

In very fast operations, progress updates may overlap.

**Cause**: `StatusReporter` uses `Console.Write` without lock in compact mode; concurrent writes from channel processor and spinner task could interleave.

**Mitigation**: Unlikely in practice (channel serializes, spinner only updates when `_currentMessage` set). If problematic, add lock around Console operations.

---

### Issue: Articles with No Text Content

Some URLs return articles with empty `TextContent`.

**Cause**: SmartReader's quality heuristic (`IsReadable`) failed, or article truly has no text (image, script, error page).

**Effect**: Those URLs contribute zero chunks.

**Acceptable**: Part of normal operation; not all URLs yield readable content.

---

### Issue: Duplicate Sources in Answer

Same website may appear multiple times (different articles).

**Cause**: Different URLs from different search results may be from same domain but different pages.

**Effect**: `[Source 1]` and `[Source 3]` could both be `example.com`. Not necessarily bad - they're different articles.

---

## Performance Tuning Reference

| Setting | Default | Fastest | Most Thorough | Notes |
|---------|---------|---------|---------------|-------|
| `--queries` | 3 | 1 | 8+ | More queries = more searches |
| `--results` | 5 | 2 | 15+ | Fewer = fewer articles to fetch |
| `--chunks` | 3 | 1 | 5+ | More chunks = more context tokens |
| `MaxConcurrentArticleFetches` | 10 | 5 | 20 | Higher = more parallel fetches |
| `MaxConcurrentEmbeddingRequests` | 4 | 2 | 8 | Higher = faster embeddings (may hit rate limits) |
| `EmbeddingBatchSize` | 300 | 100 | 1000 | Larger = fewer API calls, more data per call |

**Start**: Defaults are balanced.

**Adjust if**:
- Slow: Reduce `--results`, `--queries`, or concurrency limits
- Poor quality: Increase `--chunks`, `--results`, `--queries`
- Rate limited: Reduce concurrency limits
- High cost: Use `--short`, reduce `--chunks`, choose cheaper model

---

## Next Steps

- [Performance](../performance.md) - Detailed performance analysis
- [Configuration](../configuration.md) - Adjust settings
- [Usage](../usage.md) - Optimize workflow

---

**Quick Diagnostic Checklist**

```bash
# 1. Check API key
echo $OPENROUTER_API_KEY | head -c 10

# 2. Test SearxNG
curl -s "http://localhost:8002/search?q=test&format=json" | jq '.results | length'

# 3. Test OpenRouter
curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" \
     https://openrouter.ai/api/v1/models | jq '.data[0].id'

# 4. Run verbose
openquery -v "test" 2>&1 | grep -E "Fetching|Generated|Found"

# 5. Check resource usage while running
htop

# 6. Reduce scope and retry
openquery -q 1 -r 2 -c 1 "simple test"
```