1
0
Files
Toak/AGENTS.md
2026-02-25 21:51:27 +01:00

6.1 KiB

AGENTS.md - Toak Project Guide

This document helps AI agents work effectively in the Toak codebase.

Project Overview

Toak is a high-speed Linux dictation system written in C#/.NET 10. It captures audio via ffmpeg, transcribes via Groq's Whisper API, refines via Llama 3.1, and types the result into the active window using xdotool/wtype.

Repository: C# console application using .NET 10 SDK
Platform: Linux only (requires ALSA/PulseAudio, notify-send, xdotool/wtype)


Essential Commands

Build & Run

# Build the project
dotnet build

# Build for release
dotnet build -c Release

# Run with arguments
dotnet run -- toggle      # Start/stop recording
dotnet run -- setup       # Interactive configuration wizard
dotnet run -- show        # Display current configuration
dotnet run -- config <key> <value>  # Update a config setting

Test (No Test Project Currently)

There is no test project configured. Tests would need to be added manually if required.

Dependencies (Linux System Packages)

The application requires these system binaries in PATH:

  • ffmpeg - Audio recording from ALSA
  • notify-send - Desktop notifications
  • xdotool OR wtype - Text injection (X11 vs Wayland)

Code Organization

Toak/
├── Program.cs           # Entry point, CLI argument handling
├── AudioRecorder.cs     # ffmpeg process wrapper for recording
├── GroqApiClient.cs     # HTTP client for Whisper + Llama APIs
├── PromptBuilder.cs     # Dynamic system prompt construction
├── TextInjector.cs      # xdotool/wtype wrapper for typing text
├── ConfigManager.cs     # JSON config load/save (~/.config/toak/)
├── StateTracker.cs      # PID-based recording state via /tmp/
├── Notifications.cs     # notify-send wrapper
├── Toak.csproj          # .NET 10 SDK project
├── PROJECT_PLAN.md      # Original architecture document
└── IMPLEMENTATION_PLAN.md  # Implementation phases document

Code Patterns & Conventions

Namespace Style

  • Use file-scoped namespaces: namespace Toak; at the top of the file
  • Never use block-style namespace declarations

Class Structure

  • Static classes for stateless utilities: ConfigManager, StateTracker, Notifications, TextInjector, PromptBuilder, AudioRecorder
  • Instance classes for stateful clients: GroqApiClient (holds HttpClient)
  • POCOs for JSON serialization at bottom of GroqApiClient.cs

Naming Conventions

  • PascalCase for classes, methods, properties
  • Private fields prefixed with underscore: _httpClient
  • Constants use PascalCase: ConfigDir, StateFilePath
  • JSON property names use camelCase with [JsonPropertyName] attributes

Error Handling

  • Try/catch with console logging to stderr: Console.WriteLine($"[ClassName] Error: {ex.Message}");
  • User-facing errors go through Notifications.Notify() for desktop alerts
  • Silent failures are acceptable for non-critical paths (notifications, cleanup)

Async Patterns

  • Use async Task<T> for I/O operations (API calls)
  • Use synchronous methods for process spawning where Process.Start() is fire-and-forget

Key Implementation Details

State Management (Critical)

Recording state is tracked via file-based PID tracking (not in-memory):

  • State file: /tmp/toak_state.pid (contains ffmpeg process ID)
  • Audio file: /tmp/toak_recording.wav
  • Toggle mechanism: New process checks state file, signals existing ffmpeg process to stop

Configuration Storage

  • Location: ~/.config/toak/config.json
  • Format: JSON with PascalCase property names
  • Default values set in ToakConfig class constructor pattern

API Integration (Groq)

  • Base URL: https://api.groq.com/openai/v1/
  • Authentication: Bearer token via Authorization header
  • Models: whisper-large-v3-turbo (STT), llama-3.1-8b-instant (refinement)
  • Temperature: Always 0.0 for deterministic output
  • Security: Transcript wrapped in <transcript> tags to prevent prompt injection

Process Wrappers

All external tool calls use ProcessStartInfo with:

  • UseShellExecute = false
  • CreateNoWindow = true
  • Arguments properly escaped (quote replacement for text injection)

Testing Approach

No automated tests currently exist. The application relies on:

  1. Manual testing via dotnet run -- toggle
  2. Checking /tmp/toak_recording.wav exists during recording
  3. Verifying notify-send displays status messages
  4. Confirming text appears in active window after transcription

Important Gotchas

  1. Linux Only: This application cannot run on Windows/Mac - it depends on ffmpeg with ALSA, notify-send, and X11/Wayland tools

  2. Process Kill Behavior: process.Kill() sends SIGKILL to ffmpeg. This is intentional for immediate stop, but means graceful shutdown isn't attempted

  3. State File Orphaning: If the app crashes, /tmp/toak_state.pid may be left behind. The next run will attempt to use a stale PID (handled by try/catch in StopRecording)

  4. API Key Required: Without GroqApiKey configured via toak setup, the app will fail with a notification error

  5. Quote Escaping in TextInjector: Text containing quotes is escaped as \" for shell safety

  6. ImplicitUsings Enabled: No explicit using System; etc. required - .NET 10 implicit usings handle common namespaces

  7. Nullable Enabled: All projects use <Nullable>enable</Nullable> - handle nulls properly


Adding New Features

When modifying this codebase:

  1. Maintain static/instance pattern: Stateless utilities = static, Stateful clients = instance
  2. Follow file-scoped namespace: Single namespace Toak; at top
  3. Use System.Text.Json: Prefer over Newtonsoft.Json (already configured)
  4. Add config options: Update ToakConfig class, then wire in Program.cs CLI handling
  5. External dependencies: If adding new system tool calls, follow ProcessStartInfo pattern in existing classes
  6. Error handling: Use Notifications for user-visible errors, Console.WriteLine for debug info

Documentation References

  • PROJECT_PLAN.md - Original architecture and design goals
  • IMPLEMENTATION_PLAN.md - Detailed phase-by-phase implementation notes