initial commit
This commit is contained in:
161
AGENTS.md
Normal file
161
AGENTS.md
Normal file
@@ -0,0 +1,161 @@
|
||||
# AGENTS.md - Toak Project Guide
|
||||
|
||||
This document helps AI agents work effectively in the Toak codebase.
|
||||
|
||||
## Project Overview
|
||||
|
||||
**Toak** is a high-speed Linux dictation system written in C#/.NET 10. It captures audio via ffmpeg, transcribes via Groq's Whisper API, refines via Llama 3.1, and types the result into the active window using xdotool/wtype.
|
||||
|
||||
**Repository**: C# console application using .NET 10 SDK
|
||||
**Platform**: Linux only (requires ALSA/PulseAudio, notify-send, xdotool/wtype)
|
||||
|
||||
---
|
||||
|
||||
## Essential Commands
|
||||
|
||||
### Build & Run
|
||||
```bash
|
||||
# Build the project
|
||||
dotnet build
|
||||
|
||||
# Build for release
|
||||
dotnet build -c Release
|
||||
|
||||
# Run with arguments
|
||||
dotnet run -- toggle # Start/stop recording
|
||||
dotnet run -- setup # Interactive configuration wizard
|
||||
dotnet run -- show # Display current configuration
|
||||
dotnet run -- config <key> <value> # Update a config setting
|
||||
```
|
||||
|
||||
### Test (No Test Project Currently)
|
||||
There is no test project configured. Tests would need to be added manually if required.
|
||||
|
||||
### Dependencies (Linux System Packages)
|
||||
The application requires these system binaries in PATH:
|
||||
- `ffmpeg` - Audio recording from ALSA
|
||||
- `notify-send` - Desktop notifications
|
||||
- `xdotool` OR `wtype` - Text injection (X11 vs Wayland)
|
||||
|
||||
---
|
||||
|
||||
## Code Organization
|
||||
|
||||
```
|
||||
Toak/
|
||||
├── Program.cs # Entry point, CLI argument handling
|
||||
├── AudioRecorder.cs # ffmpeg process wrapper for recording
|
||||
├── GroqApiClient.cs # HTTP client for Whisper + Llama APIs
|
||||
├── PromptBuilder.cs # Dynamic system prompt construction
|
||||
├── TextInjector.cs # xdotool/wtype wrapper for typing text
|
||||
├── ConfigManager.cs # JSON config load/save (~/.config/toak/)
|
||||
├── StateTracker.cs # PID-based recording state via /tmp/
|
||||
├── Notifications.cs # notify-send wrapper
|
||||
├── Toak.csproj # .NET 10 SDK project
|
||||
├── PROJECT_PLAN.md # Original architecture document
|
||||
└── IMPLEMENTATION_PLAN.md # Implementation phases document
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Code Patterns & Conventions
|
||||
|
||||
### Namespace Style
|
||||
- Use **file-scoped namespaces**: `namespace Toak;` at the top of the file
|
||||
- Never use block-style namespace declarations
|
||||
|
||||
### Class Structure
|
||||
- **Static classes** for stateless utilities: `ConfigManager`, `StateTracker`, `Notifications`, `TextInjector`, `PromptBuilder`, `AudioRecorder`
|
||||
- **Instance classes** for stateful clients: `GroqApiClient` (holds HttpClient)
|
||||
- **POCOs** for JSON serialization at bottom of `GroqApiClient.cs`
|
||||
|
||||
### Naming Conventions
|
||||
- PascalCase for classes, methods, properties
|
||||
- Private fields prefixed with underscore: `_httpClient`
|
||||
- Constants use PascalCase: `ConfigDir`, `StateFilePath`
|
||||
- JSON property names use camelCase with `[JsonPropertyName]` attributes
|
||||
|
||||
### Error Handling
|
||||
- Try/catch with console logging to stderr: `Console.WriteLine($"[ClassName] Error: {ex.Message}");`
|
||||
- User-facing errors go through `Notifications.Notify()` for desktop alerts
|
||||
- Silent failures are acceptable for non-critical paths (notifications, cleanup)
|
||||
|
||||
### Async Patterns
|
||||
- Use `async Task<T>` for I/O operations (API calls)
|
||||
- Use synchronous methods for process spawning where `Process.Start()` is fire-and-forget
|
||||
|
||||
---
|
||||
|
||||
## Key Implementation Details
|
||||
|
||||
### State Management (Critical)
|
||||
Recording state is tracked via **file-based PID tracking** (not in-memory):
|
||||
- State file: `/tmp/toak_state.pid` (contains ffmpeg process ID)
|
||||
- Audio file: `/tmp/toak_recording.wav`
|
||||
- Toggle mechanism: New process checks state file, signals existing ffmpeg process to stop
|
||||
|
||||
### Configuration Storage
|
||||
- Location: `~/.config/toak/config.json`
|
||||
- Format: JSON with PascalCase property names
|
||||
- Default values set in `ToakConfig` class constructor pattern
|
||||
|
||||
### API Integration (Groq)
|
||||
- Base URL: `https://api.groq.com/openai/v1/`
|
||||
- Authentication: Bearer token via `Authorization` header
|
||||
- Models: `whisper-large-v3-turbo` (STT), `llama-3.1-8b-instant` (refinement)
|
||||
- Temperature: Always 0.0 for deterministic output
|
||||
- Security: Transcript wrapped in `<transcript>` tags to prevent prompt injection
|
||||
|
||||
### Process Wrappers
|
||||
All external tool calls use `ProcessStartInfo` with:
|
||||
- `UseShellExecute = false`
|
||||
- `CreateNoWindow = true`
|
||||
- Arguments properly escaped (quote replacement for text injection)
|
||||
|
||||
---
|
||||
|
||||
## Testing Approach
|
||||
|
||||
**No automated tests currently exist.** The application relies on:
|
||||
1. Manual testing via `dotnet run -- toggle`
|
||||
2. Checking `/tmp/toak_recording.wav` exists during recording
|
||||
3. Verifying `notify-send` displays status messages
|
||||
4. Confirming text appears in active window after transcription
|
||||
|
||||
---
|
||||
|
||||
## Important Gotchas
|
||||
|
||||
1. **Linux Only**: This application cannot run on Windows/Mac - it depends on `ffmpeg` with ALSA, `notify-send`, and X11/Wayland tools
|
||||
|
||||
2. **Process Kill Behavior**: `process.Kill()` sends SIGKILL to ffmpeg. This is intentional for immediate stop, but means graceful shutdown isn't attempted
|
||||
|
||||
3. **State File Orphaning**: If the app crashes, `/tmp/toak_state.pid` may be left behind. The next run will attempt to use a stale PID (handled by try/catch in `StopRecording`)
|
||||
|
||||
4. **API Key Required**: Without `GroqApiKey` configured via `toak setup`, the app will fail with a notification error
|
||||
|
||||
5. **Quote Escaping in TextInjector**: Text containing quotes is escaped as `\"` for shell safety
|
||||
|
||||
6. **ImplicitUsings Enabled**: No explicit `using System;` etc. required - .NET 10 implicit usings handle common namespaces
|
||||
|
||||
7. **Nullable Enabled**: All projects use `<Nullable>enable</Nullable>` - handle nulls properly
|
||||
|
||||
---
|
||||
|
||||
## Adding New Features
|
||||
|
||||
When modifying this codebase:
|
||||
|
||||
1. **Maintain static/instance pattern**: Stateless utilities = static, Stateful clients = instance
|
||||
2. **Follow file-scoped namespace**: Single `namespace Toak;` at top
|
||||
3. **Use System.Text.Json**: Prefer over Newtonsoft.Json (already configured)
|
||||
4. **Add config options**: Update `ToakConfig` class, then wire in `Program.cs` CLI handling
|
||||
5. **External dependencies**: If adding new system tool calls, follow `ProcessStartInfo` pattern in existing classes
|
||||
6. **Error handling**: Use Notifications for user-visible errors, Console.WriteLine for debug info
|
||||
|
||||
---
|
||||
|
||||
## Documentation References
|
||||
|
||||
- `PROJECT_PLAN.md` - Original architecture and design goals
|
||||
- `IMPLEMENTATION_PLAN.md` - Detailed phase-by-phase implementation notes
|
||||
Reference in New Issue
Block a user