# AGENTS.md - Toak Project Guide This document helps AI agents work effectively in the Toak codebase. ## Project Overview **Toak** is a high-speed Linux dictation system written in C#/.NET 10. It captures audio via ffmpeg, transcribes via Groq's Whisper API, refines via Llama 3.1, and types the result into the active window using xdotool/wtype. **Repository**: C# console application using .NET 10 SDK **Platform**: Linux only (requires ALSA/PulseAudio, notify-send, xdotool/wtype) --- ## Essential Commands ### Build & Run ```bash # Build the project dotnet build # Build for release dotnet build -c Release # Run with arguments dotnet run -- toggle # Start/stop recording dotnet run -- setup # Interactive configuration wizard dotnet run -- show # Display current configuration dotnet run -- config # Update a config setting ``` ### Test (No Test Project Currently) There is no test project configured. Tests would need to be added manually if required. ### Dependencies (Linux System Packages) The application requires these system binaries in PATH: - `ffmpeg` - Audio recording from ALSA - `notify-send` - Desktop notifications - `xdotool` OR `wtype` - Text injection (X11 vs Wayland) --- ## Code Organization ``` Toak/ ├── Program.cs # Entry point, CLI argument handling ├── AudioRecorder.cs # ffmpeg process wrapper for recording ├── GroqApiClient.cs # HTTP client for Whisper + Llama APIs ├── PromptBuilder.cs # Dynamic system prompt construction ├── TextInjector.cs # xdotool/wtype wrapper for typing text ├── ConfigManager.cs # JSON config load/save (~/.config/toak/) ├── StateTracker.cs # PID-based recording state via /tmp/ ├── Notifications.cs # notify-send wrapper ├── Toak.csproj # .NET 10 SDK project ├── PROJECT_PLAN.md # Original architecture document └── IMPLEMENTATION_PLAN.md # Implementation phases document ``` --- ## Code Patterns & Conventions ### Namespace Style - Use **file-scoped namespaces**: `namespace Toak;` at the top of the file - Never use block-style namespace declarations ### Class Structure - **Static classes** for stateless utilities: `ConfigManager`, `StateTracker`, `Notifications`, `TextInjector`, `PromptBuilder`, `AudioRecorder` - **Instance classes** for stateful clients: `GroqApiClient` (holds HttpClient) - **POCOs** for JSON serialization at bottom of `GroqApiClient.cs` ### Naming Conventions - PascalCase for classes, methods, properties - Private fields prefixed with underscore: `_httpClient` - Constants use PascalCase: `ConfigDir`, `StateFilePath` - JSON property names use camelCase with `[JsonPropertyName]` attributes ### Error Handling - Try/catch with console logging to stderr: `Console.WriteLine($"[ClassName] Error: {ex.Message}");` - User-facing errors go through `Notifications.Notify()` for desktop alerts - Silent failures are acceptable for non-critical paths (notifications, cleanup) ### Async Patterns - Use `async Task` for I/O operations (API calls) - Use synchronous methods for process spawning where `Process.Start()` is fire-and-forget --- ## Key Implementation Details ### State Management (Critical) Recording state is tracked via **file-based PID tracking** (not in-memory): - State file: `/tmp/toak_state.pid` (contains ffmpeg process ID) - Audio file: `/tmp/toak_recording.wav` - Toggle mechanism: New process checks state file, signals existing ffmpeg process to stop ### Configuration Storage - Location: `~/.config/toak/config.json` - Format: JSON with PascalCase property names - Default values set in `ToakConfig` class constructor pattern ### API Integration (Groq) - Base URL: `https://api.groq.com/openai/v1/` - Authentication: Bearer token via `Authorization` header - Models: `whisper-large-v3-turbo` (STT), `llama-3.1-8b-instant` (refinement) - Temperature: Always 0.0 for deterministic output - Security: Transcript wrapped in `` tags to prevent prompt injection ### Process Wrappers All external tool calls use `ProcessStartInfo` with: - `UseShellExecute = false` - `CreateNoWindow = true` - Arguments properly escaped (quote replacement for text injection) --- ## Testing Approach **No automated tests currently exist.** The application relies on: 1. Manual testing via `dotnet run -- toggle` 2. Checking `/tmp/toak_recording.wav` exists during recording 3. Verifying `notify-send` displays status messages 4. Confirming text appears in active window after transcription --- ## Important Gotchas 1. **Linux Only**: This application cannot run on Windows/Mac - it depends on `ffmpeg` with ALSA, `notify-send`, and X11/Wayland tools 2. **Process Kill Behavior**: `process.Kill()` sends SIGKILL to ffmpeg. This is intentional for immediate stop, but means graceful shutdown isn't attempted 3. **State File Orphaning**: If the app crashes, `/tmp/toak_state.pid` may be left behind. The next run will attempt to use a stale PID (handled by try/catch in `StopRecording`) 4. **API Key Required**: Without `GroqApiKey` configured via `toak setup`, the app will fail with a notification error 5. **Quote Escaping in TextInjector**: Text containing quotes is escaped as `\"` for shell safety 6. **ImplicitUsings Enabled**: No explicit `using System;` etc. required - .NET 10 implicit usings handle common namespaces 7. **Nullable Enabled**: All projects use `enable` - handle nulls properly --- ## Adding New Features When modifying this codebase: 1. **Maintain static/instance pattern**: Stateless utilities = static, Stateful clients = instance 2. **Follow file-scoped namespace**: Single `namespace Toak;` at top 3. **Use System.Text.Json**: Prefer over Newtonsoft.Json (already configured) 4. **Add config options**: Update `ToakConfig` class, then wire in `Program.cs` CLI handling 5. **External dependencies**: If adding new system tool calls, follow `ProcessStartInfo` pattern in existing classes 6. **Error handling**: Use Notifications for user-visible errors, Console.WriteLine for debug info --- ## Documentation References - `PROJECT_PLAN.md` - Original architecture and design goals - `IMPLEMENTATION_PLAN.md` - Detailed phase-by-phase implementation notes