67 lines
4.0 KiB
Markdown
67 lines
4.0 KiB
Markdown
# Toak Project Structure
|
|
|
|
This document outlines the high-level architecture and directory structure of the Toak project to help contributors navigate the codebase.
|
|
|
|
## Overview
|
|
|
|
Toak is designed as a fast, Linux-native dictation application utilizing C# AOT (Ahead-Of-Time compilation) for minimal latency. It operates primarily as a client-daemon architecture where background application state is managed by a daemon process while short-lived CLI commands issue control messages via Unix domain sockets.
|
|
|
|
## Directory Structure
|
|
|
|
```text
|
|
Toak/
|
|
├── Api/
|
|
│ ├── GroqApiClient.cs # Client for external transcription and LLM API calls (Groq/Whisper)
|
|
│ ├── OpenAiCompatibleClient.cs # Generic OpenAI-compatible client for Groq and Together AI
|
|
│ └── Models/ # API payload representations
|
|
├── Assets/ # Sound files or other static resources
|
|
├── Audio/
|
|
│ ├── AudioRecorder.cs # Handles audio capture via PipeWire (pw-record)
|
|
│ └── FfmpegAudioRecorder.cs # Universal audio capture via ffmpeg
|
|
├── Commands/
|
|
│ ├── ToggleCommand.cs # Client command to start/stop recording via socket
|
|
│ ├── DiscardCommand.cs # Client command to abort current recording
|
|
│ ├── OnboardCommand.cs # Interactive configuration setup wizard
|
|
│ ├── ConfigUpdaterCommand.cs # Direct configuration modifications
|
|
│ ├── ShowCommand.cs # Display current configuration
|
|
│ ├── SkillCommand.cs # CLI controller for managing JSON Skills
|
|
│ ├── LatencyTestCommand.cs # Pipeline benchmark tool
|
|
│ ├── HistoryCommand.cs # Interface to query past transcriptions
|
|
│ └── StatsCommand.cs # Aggregated usage analytics
|
|
├── Configuration/
|
|
│ ├── ConfigManager.cs # Loads/saves JSON configuration
|
|
│ └── ToakConfig.cs # Data model for user preferences
|
|
├── Core/
|
|
│ ├── DaemonService.cs # Background daemon maintaining the socket server
|
|
│ ├── TranscriptionOrchestrator.cs # Coordinates audio recording, STT, LLM, and output
|
|
│ ├── Logger.cs # Logging utility
|
|
│ ├── HistoryManager.cs # Thread-safe history management (.jsonl)
|
|
│ ├── HistoryEntry.cs # Data model for transcription history
|
|
│ ├── PromptBuilder.cs # Constructs LLM system prompts
|
|
│ ├── StateTracker.cs # Tracks application state and recording PIDs
|
|
│ ├── Interfaces/ # Core abstractions (ILlmClient, IAudioRecorder, etc.)
|
|
│ └── Skills/ # Data-driven JSON skill integrations
|
|
├── IO/
|
|
│ ├── ClipboardManager.cs # Cross-session clipboard manipulation (wl-copy, xclip)
|
|
│ ├── TextInjector.cs # Native keyboard injection (wtype, xdotool, ydotool)
|
|
│ └── Notifications.cs # System notifications and sound playback
|
|
├── Serialization/
|
|
│ └── AppJsonSerializerContext.cs # System.Text.Json source generation for AOT
|
|
├── bin/ # Compiler output
|
|
├── docs/ # Documentation
|
|
├── install.sh # Native AOT build and installation script
|
|
├── toak.service # systemd user service definition
|
|
└── Program.cs # Application entry point using System.CommandLine
|
|
```
|
|
|
|
## Key Architectural Concepts
|
|
|
|
### The Daemon Process
|
|
The `DaemonService` (`toak daemon`) is the heart of Toak. It listens on a Unix domain socket for IPC messages. This allows `toak toggle` to execute almost instantaneously, delegating all heavy lifting and state management to an already-hot background process.
|
|
|
|
### Unix Sockets IPC
|
|
Client commands communicate with the daemon via Unix sockets. For details on the byte payloads used for communication, please refer to [PROTOCOL.md](./PROTOCOL.md).
|
|
|
|
### AOT Compilation
|
|
The project relies on Native AOT compilation (`dotnet publish -c Release -r linux-x64 --aot`) to avoid JIT-startup time on CLI executions, making `toak toggle` fast enough to bind seamlessly to hotkeys.
|