Toak: High-speed Linux Dictation
Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq or Fireworks AI) with LLM refinement (Llama/DeepSeek/GPT via Groq, Together AI, Cerebras, or Fireworks AI) to provide a seamless, articulate, and highly configurable dictation experience.
Built with .NET 10 and compiled to Native AOT, Toak runs as a lightning-fast standalone binary with zero runtime overhead, using a client-daemon architecture for near-zero latency.
🚀 Key Features
- Blazing Fast: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
- Native AOT: Compiled to a native Linux binary for instant startup and minimal footprint.
- Intelligent Refinement: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
- Multi-Provider LLM: Supports Groq, Together AI, Cerebras, and Fireworks AI for text refinement, including Llama 3 and DeepSeek models.
- Multi-Provider Whisper: Easily swap between Groq and Fireworks AI for state-of-the-art STT APIs.
- Reasoning Capabilities: Optional reasoning effort settings for complex text processing.
- Modular Skills: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
- Multiple Backends: Types directly into your active window (
wtype,xdotool, orydotool), copies to clipboard, or pipes to stdout. - High-Quality Audio: Native support for PipeWire (
pw-record) and FFmpeg for universal compatibility. - Beautiful CLI: Interactive onboarding and configuration powered by
Spectre.Console.
🛠 Prerequisites
- .NET 10 SDK (for building from source)
- Audio Capture:
pipewire/pw-record(recommended) orffmpeg - Typing Backend:
wtype(Wayland),xdotool(X11), orydotool(Virtual Input) - Clipboard:
wl-copy(Wayland) orxclip(X11) - API Keys: API Keys for your chosen providers (Groq, Together AI, Cerebras, or Fireworks).
📦 Installation
Toak includes a self-contained installation script that handles the native compilation and setup:
git clone https://github.com/your-repo/toak.git
cd toak
./install.sh
The script will:
- Publish the project as a Native AOT Release binary.
- Install the executable to
/usr/bin/toak. - Install Zsh completions to
/usr/share/zsh/site-functions/. - Install and enable the
toak.servicesystemd user service so the background daemon runs automatically.
To remove Toak from your system, simply run:
./uninstall.sh
🎮 Usage
Toak operates as a persistent daemon that manages state and API calls, while the CLI sends commands to it via Unix sockets.
Core Commands
toak toggle: The primary command. Start recording; run again to stop, transcribe, and type/copy/stdout the result.toak start: Explicitly start recording.toak stop: Explicitly stop recording.toak status: Output the daemon's current state (Recording,Idle, or JSON).toak daemon: Runs the background service manually (usually managed by systemd).toak onboard: Launches the interactive configuration wizard for providers, models, and backends.toak discard: Instantly aborts the current recording without performing any transcription.toak latency-test: Benchmarks your network and API latency to ensure optimal performance.toak show: Displays your current configuration in a clean table.toak config <key> <value>: Update settings likellm,whisper,lang,backend,punctuation, ortech.toak skill: Manage dynamic JSON skills vialist,add, orremove.toak history: Query transcription history (-n <count>,--grep <pattern>,--export <file>,--shred).toak stats: Display usage statistics like total recordings, latency, and top words.
Global Flags
-p, --pipe: Stop recording and output the finalized text tostdoutinstead of typing it.--copy: Stop recording and copy the result to the system clipboard.-v, --verbose: Enable detailed debug logging for troubleshooting.
🤖 Skills System
Toak includes a robust, data-driven skills system triggered by saying hotwords (like "System") at the start of your dictation. Skills are defined as simple JSON files entirely configurable without modifying the C# codebase.
Default Skills
| Skill | Hotwords | Description | Type |
|---|---|---|---|
| Terminal | "System terminal", "System run" | Translates request into a bash command and executes it via terminal_action.sh. |
script |
| Translate | "System translate to [language]" | Translates your dictation into the target language. | type |
| Professional | "System professional", "System formalize" | Rewrites your text to be articulate and formal. | type |
| Summary | "System summary", "System concise" | Strips fluff and provides a direct, crisp summary. | type |
Customizing & Adding Skills
Run toak skill add to build interactive custom capabilities. All skills live in ~/.config/toak/skills/.
⚙️ Configuration
Key settings in ToakConfig.cs (managed via toak onboard or toak config):
LlmProvider: Choice ofgroq(default),together,cerebras, orfireworks.LlmModel: The refinement model (e.g.,llama-3.3-70b-versatileoraccounts/fireworks/models/deepseek-v3p1).ReasoningEffort: Set tonone(default) orlowfor O1-style reasoning.WhisperProvider: Choice ofgroq(default) orfireworks.WhisperModel: The STT model (default:whisper-large-v3-turbo).WhisperLanguage: Set spoken language (e.g.,en,es,fr).TypingBackend: Choose betweenwtype,xdotool, orydotool.AudioBackend: Choose betweenpw-record(PipeWire) orffmpeg.MinRecordingDuration: Set the minimum recording duration in ms (default:500).ModulePunctuation: Toggle automatic grammar and punctuation fixing.ModuleTechnicalSanitization: Ensures technical terms are formatted correctly.