4.1 KiB
Toak Project Structure
This document outlines the high-level architecture and directory structure of the Toak project to help contributors navigate the codebase.
Overview
Toak is designed as a fast, Linux-native dictation application utilizing C# AOT (Ahead-Of-Time compilation) for minimal latency. It operates primarily as a client-daemon architecture where background application state is managed by a daemon process while short-lived CLI commands issue control messages via Unix domain sockets.
Directory Structure
Toak/
├── Api/
│ ├── GroqApiClient.cs # Client for external transcription and LLM API calls (Groq/Whisper)
│ ├── OpenAiCompatibleClient.cs # Generic OpenAI-compatible client for Groq and Together AI
│ └── Models/ # API payload representations
├── Assets/ # Sound files or other static resources
├── Audio/
│ ├── PipewireAudioRecorder.cs # Handles audio capture via PipeWire (pw-record)
│ └── FfmpegAudioRecorder.cs # Universal audio capture via ffmpeg
├── Commands/
│ ├── ToggleCommand.cs # Client command to start/stop recording via socket
│ ├── DiscardCommand.cs # Client command to abort current recording
│ ├── OnboardCommand.cs # Interactive configuration setup wizard
│ ├── ConfigUpdaterCommand.cs # Direct configuration modifications
│ ├── ShowCommand.cs # Display current configuration
│ ├── SkillCommand.cs # CLI controller for managing JSON Skills
│ ├── LatencyTestCommand.cs # Pipeline benchmark tool
│ ├── HistoryCommand.cs # Interface to query past transcriptions
│ └── StatsCommand.cs # Aggregated usage analytics
├── Configuration/
│ ├── ConfigManager.cs # Loads/saves JSON configuration
│ └── ToakConfig.cs # Data model for user preferences
├── Core/
│ ├── DaemonService.cs # Background daemon maintaining the socket server
│ ├── TranscriptionOrchestrator.cs # Coordinates audio recording, STT, LLM, and output
│ ├── Logger.cs # Logging utility
│ ├── HistoryManager.cs # Thread-safe history management (.jsonl)
│ ├── HistoryEntry.cs # Data model for transcription history
│ ├── PromptBuilder.cs # Constructs LLM system prompts
│ ├── StateTracker.cs # Tracks application state and recording PIDs
│ ├── Interfaces/ # Core abstractions (ILlmClient, IAudioRecorder, etc.)
│ └── Skills/ # Data-driven JSON skill integrations
├── IO/
│ ├── ClipboardManager.cs # Cross-session clipboard manipulation (wl-copy, xclip)
│ ├── TextInjector.cs # Native keyboard injection (wtype, xdotool, ydotool)
│ └── Notifications.cs # System notifications and sound playback
├── Serialization/
│ └── AppJsonSerializerContext.cs # System.Text.Json source generation for AOT
├── bin/ # Compiler output
├── docs/ # Documentation
├── install.sh # Native AOT build and installation script
├── toak.service # systemd user service definition
└── Program.cs # Application entry point using System.CommandLine
Key Architectural Concepts
The Daemon Process
The DaemonService (toak daemon) is the heart of Toak. It listens on a Unix domain socket for IPC messages. This allows toak toggle to execute almost instantaneously, delegating all heavy lifting and state management to an already-hot background process.
Unix Sockets IPC
Client commands communicate with the daemon via Unix sockets. For details on the byte payloads used for communication, please refer to PROTOCOL.md.
AOT Compilation
The project relies on Native AOT compilation (dotnet publish -c Release -r linux-x64 --aot) to avoid JIT-startup time on CLI executions, making toak toggle fast enough to bind seamlessly to hotkeys.