162 lines
6.1 KiB
Markdown
162 lines
6.1 KiB
Markdown
# AGENTS.md - Toak Project Guide
|
|
|
|
This document helps AI agents work effectively in the Toak codebase.
|
|
|
|
## Project Overview
|
|
|
|
**Toak** is a high-speed Linux dictation system written in C#/.NET 10. It captures audio via ffmpeg, transcribes via Groq's Whisper API, refines via Llama 3.1, and types the result into the active window using xdotool/wtype.
|
|
|
|
**Repository**: C# console application using .NET 10 SDK
|
|
**Platform**: Linux only (requires ALSA/PulseAudio, notify-send, xdotool/wtype)
|
|
|
|
---
|
|
|
|
## Essential Commands
|
|
|
|
### Build & Run
|
|
```bash
|
|
# Build the project
|
|
dotnet build
|
|
|
|
# Build for release
|
|
dotnet build -c Release
|
|
|
|
# Run with arguments
|
|
dotnet run -- toggle # Start/stop recording
|
|
dotnet run -- setup # Interactive configuration wizard
|
|
dotnet run -- show # Display current configuration
|
|
dotnet run -- config <key> <value> # Update a config setting
|
|
```
|
|
|
|
### Test (No Test Project Currently)
|
|
There is no test project configured. Tests would need to be added manually if required.
|
|
|
|
### Dependencies (Linux System Packages)
|
|
The application requires these system binaries in PATH:
|
|
- `ffmpeg` - Audio recording from ALSA
|
|
- `notify-send` - Desktop notifications
|
|
- `xdotool` OR `wtype` - Text injection (X11 vs Wayland)
|
|
|
|
---
|
|
|
|
## Code Organization
|
|
|
|
```
|
|
Toak/
|
|
├── Program.cs # Entry point, CLI argument handling
|
|
├── AudioRecorder.cs # ffmpeg process wrapper for recording
|
|
├── GroqApiClient.cs # HTTP client for Whisper + Llama APIs
|
|
├── PromptBuilder.cs # Dynamic system prompt construction
|
|
├── TextInjector.cs # xdotool/wtype wrapper for typing text
|
|
├── ConfigManager.cs # JSON config load/save (~/.config/toak/)
|
|
├── StateTracker.cs # PID-based recording state via /tmp/
|
|
├── Notifications.cs # notify-send wrapper
|
|
├── Toak.csproj # .NET 10 SDK project
|
|
├── PROJECT_PLAN.md # Original architecture document
|
|
└── IMPLEMENTATION_PLAN.md # Implementation phases document
|
|
```
|
|
|
|
---
|
|
|
|
## Code Patterns & Conventions
|
|
|
|
### Namespace Style
|
|
- Use **file-scoped namespaces**: `namespace Toak;` at the top of the file
|
|
- Never use block-style namespace declarations
|
|
|
|
### Class Structure
|
|
- **Static classes** for stateless utilities: `ConfigManager`, `StateTracker`, `Notifications`, `TextInjector`, `PromptBuilder`, `AudioRecorder`
|
|
- **Instance classes** for stateful clients: `GroqApiClient` (holds HttpClient)
|
|
- **POCOs** for JSON serialization at bottom of `GroqApiClient.cs`
|
|
|
|
### Naming Conventions
|
|
- PascalCase for classes, methods, properties
|
|
- Private fields prefixed with underscore: `_httpClient`
|
|
- Constants use PascalCase: `ConfigDir`, `StateFilePath`
|
|
- JSON property names use camelCase with `[JsonPropertyName]` attributes
|
|
|
|
### Error Handling
|
|
- Try/catch with console logging to stderr: `Console.WriteLine($"[ClassName] Error: {ex.Message}");`
|
|
- User-facing errors go through `Notifications.Notify()` for desktop alerts
|
|
- Silent failures are acceptable for non-critical paths (notifications, cleanup)
|
|
|
|
### Async Patterns
|
|
- Use `async Task<T>` for I/O operations (API calls)
|
|
- Use synchronous methods for process spawning where `Process.Start()` is fire-and-forget
|
|
|
|
---
|
|
|
|
## Key Implementation Details
|
|
|
|
### State Management (Critical)
|
|
Recording state is tracked via **file-based PID tracking** (not in-memory):
|
|
- State file: `/tmp/toak_state.pid` (contains ffmpeg process ID)
|
|
- Audio file: `/tmp/toak_recording.wav`
|
|
- Toggle mechanism: New process checks state file, signals existing ffmpeg process to stop
|
|
|
|
### Configuration Storage
|
|
- Location: `~/.config/toak/config.json`
|
|
- Format: JSON with PascalCase property names
|
|
- Default values set in `ToakConfig` class constructor pattern
|
|
|
|
### API Integration (Groq)
|
|
- Base URL: `https://api.groq.com/openai/v1/`
|
|
- Authentication: Bearer token via `Authorization` header
|
|
- Models: `whisper-large-v3-turbo` (STT), `llama-3.1-8b-instant` (refinement)
|
|
- Temperature: Always 0.0 for deterministic output
|
|
- Security: Transcript wrapped in `<transcript>` tags to prevent prompt injection
|
|
|
|
### Process Wrappers
|
|
All external tool calls use `ProcessStartInfo` with:
|
|
- `UseShellExecute = false`
|
|
- `CreateNoWindow = true`
|
|
- Arguments properly escaped (quote replacement for text injection)
|
|
|
|
---
|
|
|
|
## Testing Approach
|
|
|
|
**No automated tests currently exist.** The application relies on:
|
|
1. Manual testing via `dotnet run -- toggle`
|
|
2. Checking `/tmp/toak_recording.wav` exists during recording
|
|
3. Verifying `notify-send` displays status messages
|
|
4. Confirming text appears in active window after transcription
|
|
|
|
---
|
|
|
|
## Important Gotchas
|
|
|
|
1. **Linux Only**: This application cannot run on Windows/Mac - it depends on `ffmpeg` with ALSA, `notify-send`, and X11/Wayland tools
|
|
|
|
2. **Process Kill Behavior**: `process.Kill()` sends SIGKILL to ffmpeg. This is intentional for immediate stop, but means graceful shutdown isn't attempted
|
|
|
|
3. **State File Orphaning**: If the app crashes, `/tmp/toak_state.pid` may be left behind. The next run will attempt to use a stale PID (handled by try/catch in `StopRecording`)
|
|
|
|
4. **API Key Required**: Without `GroqApiKey` configured via `toak setup`, the app will fail with a notification error
|
|
|
|
5. **Quote Escaping in TextInjector**: Text containing quotes is escaped as `\"` for shell safety
|
|
|
|
6. **ImplicitUsings Enabled**: No explicit `using System;` etc. required - .NET 10 implicit usings handle common namespaces
|
|
|
|
7. **Nullable Enabled**: All projects use `<Nullable>enable</Nullable>` - handle nulls properly
|
|
|
|
---
|
|
|
|
## Adding New Features
|
|
|
|
When modifying this codebase:
|
|
|
|
1. **Maintain static/instance pattern**: Stateless utilities = static, Stateful clients = instance
|
|
2. **Follow file-scoped namespace**: Single `namespace Toak;` at top
|
|
3. **Use System.Text.Json**: Prefer over Newtonsoft.Json (already configured)
|
|
4. **Add config options**: Update `ToakConfig` class, then wire in `Program.cs` CLI handling
|
|
5. **External dependencies**: If adding new system tool calls, follow `ProcessStartInfo` pattern in existing classes
|
|
6. **Error handling**: Use Notifications for user-visible errors, Console.WriteLine for debug info
|
|
|
|
---
|
|
|
|
## Documentation References
|
|
|
|
- `PROJECT_PLAN.md` - Original architecture and design goals
|
|
- `IMPLEMENTATION_PLAN.md` - Detailed phase-by-phase implementation notes
|