1
0

initial commit

This commit is contained in:
2026-02-25 21:51:27 +01:00
commit 863063f124
15 changed files with 1330 additions and 0 deletions

161
AGENTS.md Normal file
View File

@@ -0,0 +1,161 @@
# AGENTS.md - Toak Project Guide
This document helps AI agents work effectively in the Toak codebase.
## Project Overview
**Toak** is a high-speed Linux dictation system written in C#/.NET 10. It captures audio via ffmpeg, transcribes via Groq's Whisper API, refines via Llama 3.1, and types the result into the active window using xdotool/wtype.
**Repository**: C# console application using .NET 10 SDK
**Platform**: Linux only (requires ALSA/PulseAudio, notify-send, xdotool/wtype)
---
## Essential Commands
### Build & Run
```bash
# Build the project
dotnet build
# Build for release
dotnet build -c Release
# Run with arguments
dotnet run -- toggle # Start/stop recording
dotnet run -- setup # Interactive configuration wizard
dotnet run -- show # Display current configuration
dotnet run -- config <key> <value> # Update a config setting
```
### Test (No Test Project Currently)
There is no test project configured. Tests would need to be added manually if required.
### Dependencies (Linux System Packages)
The application requires these system binaries in PATH:
- `ffmpeg` - Audio recording from ALSA
- `notify-send` - Desktop notifications
- `xdotool` OR `wtype` - Text injection (X11 vs Wayland)
---
## Code Organization
```
Toak/
├── Program.cs # Entry point, CLI argument handling
├── AudioRecorder.cs # ffmpeg process wrapper for recording
├── GroqApiClient.cs # HTTP client for Whisper + Llama APIs
├── PromptBuilder.cs # Dynamic system prompt construction
├── TextInjector.cs # xdotool/wtype wrapper for typing text
├── ConfigManager.cs # JSON config load/save (~/.config/toak/)
├── StateTracker.cs # PID-based recording state via /tmp/
├── Notifications.cs # notify-send wrapper
├── Toak.csproj # .NET 10 SDK project
├── PROJECT_PLAN.md # Original architecture document
└── IMPLEMENTATION_PLAN.md # Implementation phases document
```
---
## Code Patterns & Conventions
### Namespace Style
- Use **file-scoped namespaces**: `namespace Toak;` at the top of the file
- Never use block-style namespace declarations
### Class Structure
- **Static classes** for stateless utilities: `ConfigManager`, `StateTracker`, `Notifications`, `TextInjector`, `PromptBuilder`, `AudioRecorder`
- **Instance classes** for stateful clients: `GroqApiClient` (holds HttpClient)
- **POCOs** for JSON serialization at bottom of `GroqApiClient.cs`
### Naming Conventions
- PascalCase for classes, methods, properties
- Private fields prefixed with underscore: `_httpClient`
- Constants use PascalCase: `ConfigDir`, `StateFilePath`
- JSON property names use camelCase with `[JsonPropertyName]` attributes
### Error Handling
- Try/catch with console logging to stderr: `Console.WriteLine($"[ClassName] Error: {ex.Message}");`
- User-facing errors go through `Notifications.Notify()` for desktop alerts
- Silent failures are acceptable for non-critical paths (notifications, cleanup)
### Async Patterns
- Use `async Task<T>` for I/O operations (API calls)
- Use synchronous methods for process spawning where `Process.Start()` is fire-and-forget
---
## Key Implementation Details
### State Management (Critical)
Recording state is tracked via **file-based PID tracking** (not in-memory):
- State file: `/tmp/toak_state.pid` (contains ffmpeg process ID)
- Audio file: `/tmp/toak_recording.wav`
- Toggle mechanism: New process checks state file, signals existing ffmpeg process to stop
### Configuration Storage
- Location: `~/.config/toak/config.json`
- Format: JSON with PascalCase property names
- Default values set in `ToakConfig` class constructor pattern
### API Integration (Groq)
- Base URL: `https://api.groq.com/openai/v1/`
- Authentication: Bearer token via `Authorization` header
- Models: `whisper-large-v3-turbo` (STT), `llama-3.1-8b-instant` (refinement)
- Temperature: Always 0.0 for deterministic output
- Security: Transcript wrapped in `<transcript>` tags to prevent prompt injection
### Process Wrappers
All external tool calls use `ProcessStartInfo` with:
- `UseShellExecute = false`
- `CreateNoWindow = true`
- Arguments properly escaped (quote replacement for text injection)
---
## Testing Approach
**No automated tests currently exist.** The application relies on:
1. Manual testing via `dotnet run -- toggle`
2. Checking `/tmp/toak_recording.wav` exists during recording
3. Verifying `notify-send` displays status messages
4. Confirming text appears in active window after transcription
---
## Important Gotchas
1. **Linux Only**: This application cannot run on Windows/Mac - it depends on `ffmpeg` with ALSA, `notify-send`, and X11/Wayland tools
2. **Process Kill Behavior**: `process.Kill()` sends SIGKILL to ffmpeg. This is intentional for immediate stop, but means graceful shutdown isn't attempted
3. **State File Orphaning**: If the app crashes, `/tmp/toak_state.pid` may be left behind. The next run will attempt to use a stale PID (handled by try/catch in `StopRecording`)
4. **API Key Required**: Without `GroqApiKey` configured via `toak setup`, the app will fail with a notification error
5. **Quote Escaping in TextInjector**: Text containing quotes is escaped as `\"` for shell safety
6. **ImplicitUsings Enabled**: No explicit `using System;` etc. required - .NET 10 implicit usings handle common namespaces
7. **Nullable Enabled**: All projects use `<Nullable>enable</Nullable>` - handle nulls properly
---
## Adding New Features
When modifying this codebase:
1. **Maintain static/instance pattern**: Stateless utilities = static, Stateful clients = instance
2. **Follow file-scoped namespace**: Single `namespace Toak;` at top
3. **Use System.Text.Json**: Prefer over Newtonsoft.Json (already configured)
4. **Add config options**: Update `ToakConfig` class, then wire in `Program.cs` CLI handling
5. **External dependencies**: If adding new system tool calls, follow `ProcessStartInfo` pattern in existing classes
6. **Error handling**: Use Notifications for user-visible errors, Console.WriteLine for debug info
---
## Documentation References
- `PROJECT_PLAN.md` - Original architecture and design goals
- `IMPLEMENTATION_PLAN.md` - Detailed phase-by-phase implementation notes