6.1 KiB
AGENTS.md - Toak Project Guide
This document helps AI agents work effectively in the Toak codebase.
Project Overview
Toak is a high-speed Linux dictation system written in C#/.NET 10. It captures audio via ffmpeg, transcribes via Groq's Whisper API, refines via Llama 3.1, and types the result into the active window using xdotool/wtype.
Repository: C# console application using .NET 10 SDK
Platform: Linux only (requires ALSA/PulseAudio, notify-send, xdotool/wtype)
Essential Commands
Build & Run
# Build the project
dotnet build
# Build for release
dotnet build -c Release
# Run with arguments
dotnet run -- toggle # Start/stop recording
dotnet run -- setup # Interactive configuration wizard
dotnet run -- show # Display current configuration
dotnet run -- config <key> <value> # Update a config setting
Test (No Test Project Currently)
There is no test project configured. Tests would need to be added manually if required.
Dependencies (Linux System Packages)
The application requires these system binaries in PATH:
ffmpeg- Audio recording from ALSAnotify-send- Desktop notificationsxdotoolORwtype- Text injection (X11 vs Wayland)
Code Organization
Toak/
├── Program.cs # Entry point, CLI argument handling
├── AudioRecorder.cs # ffmpeg process wrapper for recording
├── GroqApiClient.cs # HTTP client for Whisper + Llama APIs
├── PromptBuilder.cs # Dynamic system prompt construction
├── TextInjector.cs # xdotool/wtype wrapper for typing text
├── ConfigManager.cs # JSON config load/save (~/.config/toak/)
├── StateTracker.cs # PID-based recording state via /tmp/
├── Notifications.cs # notify-send wrapper
├── Toak.csproj # .NET 10 SDK project
├── PROJECT_PLAN.md # Original architecture document
└── IMPLEMENTATION_PLAN.md # Implementation phases document
Code Patterns & Conventions
Namespace Style
- Use file-scoped namespaces:
namespace Toak;at the top of the file - Never use block-style namespace declarations
Class Structure
- Static classes for stateless utilities:
ConfigManager,StateTracker,Notifications,TextInjector,PromptBuilder,AudioRecorder - Instance classes for stateful clients:
GroqApiClient(holds HttpClient) - POCOs for JSON serialization at bottom of
GroqApiClient.cs
Naming Conventions
- PascalCase for classes, methods, properties
- Private fields prefixed with underscore:
_httpClient - Constants use PascalCase:
ConfigDir,StateFilePath - JSON property names use camelCase with
[JsonPropertyName]attributes
Error Handling
- Try/catch with console logging to stderr:
Console.WriteLine($"[ClassName] Error: {ex.Message}"); - User-facing errors go through
Notifications.Notify()for desktop alerts - Silent failures are acceptable for non-critical paths (notifications, cleanup)
Async Patterns
- Use
async Task<T>for I/O operations (API calls) - Use synchronous methods for process spawning where
Process.Start()is fire-and-forget
Key Implementation Details
State Management (Critical)
Recording state is tracked via file-based PID tracking (not in-memory):
- State file:
/tmp/toak_state.pid(contains ffmpeg process ID) - Audio file:
/tmp/toak_recording.wav - Toggle mechanism: New process checks state file, signals existing ffmpeg process to stop
Configuration Storage
- Location:
~/.config/toak/config.json - Format: JSON with PascalCase property names
- Default values set in
ToakConfigclass constructor pattern
API Integration (Groq)
- Base URL:
https://api.groq.com/openai/v1/ - Authentication: Bearer token via
Authorizationheader - Models:
whisper-large-v3-turbo(STT),llama-3.1-8b-instant(refinement) - Temperature: Always 0.0 for deterministic output
- Security: Transcript wrapped in
<transcript>tags to prevent prompt injection
Process Wrappers
All external tool calls use ProcessStartInfo with:
UseShellExecute = falseCreateNoWindow = true- Arguments properly escaped (quote replacement for text injection)
Testing Approach
No automated tests currently exist. The application relies on:
- Manual testing via
dotnet run -- toggle - Checking
/tmp/toak_recording.wavexists during recording - Verifying
notify-senddisplays status messages - Confirming text appears in active window after transcription
Important Gotchas
-
Linux Only: This application cannot run on Windows/Mac - it depends on
ffmpegwith ALSA,notify-send, and X11/Wayland tools -
Process Kill Behavior:
process.Kill()sends SIGKILL to ffmpeg. This is intentional for immediate stop, but means graceful shutdown isn't attempted -
State File Orphaning: If the app crashes,
/tmp/toak_state.pidmay be left behind. The next run will attempt to use a stale PID (handled by try/catch inStopRecording) -
API Key Required: Without
GroqApiKeyconfigured viatoak setup, the app will fail with a notification error -
Quote Escaping in TextInjector: Text containing quotes is escaped as
\"for shell safety -
ImplicitUsings Enabled: No explicit
using System;etc. required - .NET 10 implicit usings handle common namespaces -
Nullable Enabled: All projects use
<Nullable>enable</Nullable>- handle nulls properly
Adding New Features
When modifying this codebase:
- Maintain static/instance pattern: Stateless utilities = static, Stateful clients = instance
- Follow file-scoped namespace: Single
namespace Toak;at top - Use System.Text.Json: Prefer over Newtonsoft.Json (already configured)
- Add config options: Update
ToakConfigclass, then wire inProgram.csCLI handling - External dependencies: If adding new system tool calls, follow
ProcessStartInfopattern in existing classes - Error handling: Use Notifications for user-visible errors, Console.WriteLine for debug info
Documentation References
PROJECT_PLAN.md- Original architecture and design goalsIMPLEMENTATION_PLAN.md- Detailed phase-by-phase implementation notes