toak skill command and introduce an uninstallation script.
Toak: High-speed Linux Dictation
Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT) to provide a seamless, articulate, and highly configurable dictation experience.
Built with .NET 10 and compiled to Native AOT, Toak runs as a lightning-fast standalone binary with zero runtime overhead.
🚀 Key Features
- Blazing Fast: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
- Native AOT: Compiled to a native Linux binary for instant startup and minimal footprint.
- Intelligent Refinement: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
- Modular Skills: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
- Multiple Backends: Types directly into your active window (
wtypeorxdotool), copies to clipboard, or pipes to stdout. - Beautiful CLI: Interactive onboarding and configuration powered by
Spectre.Console.
🛠 Prerequisites
- .NET 10 SDK (for building from source)
- pipewire / pw-record (for native Wayland/Linux audio capture)
- Typing Backend:
wtype(Wayland) orxdotool(X11) - Groq API Key: Get one at console.groq.com
📦 Installation
Toak includes a self-contained installation script that handles the native compilation and setup:
git clone https://github.com/your-repo/toak.git
cd toak
./install.sh
The script will:
- Publish the project as a Native AOT Release binary.
- Install the executable to
/usr/bin/toak. - Install Zsh completions to
/usr/share/zsh/site-functions/. - Install and enable the
toak.servicesystemd user service so the background daemon runs automatically.
To remove Toak from your system, simply run:
./uninstall.sh
🎮 Usage
Core Commands
toak toggle: The primary command. Run it to start recording; run it again to stop, transcribe, and type/copy the result.toak discard: Instantly aborts the current recording without performing any transcription.toak onboard: Launches the interactive configuration wizard.toak latency-test: Benchmarks your network and API latency to ensure optimal performance.toak show: Displays your current configuration in a clean table.toak config <key> <value>: Quickly update a specific setting (e.g.,toak config whisper whisper-large-v3-turbo).toak skill: Manage dynamic JSON skills vialist,add, orremovesubcommands.
Flags
-p, --pipe: Output the finalized text tostdoutinstead of typing it.--copy: Copy the result to the system clipboard.-v, --verbose: Enable detailed debug logging.
🤖 Skills System
Toak includes a robust, data-driven skills system triggered by saying hotwords (like "System") at the start of your dictation. Skills are defined as simple JSON files entirely configurable without modifying the C# codebase.
At runtime, skills are loaded from ~/.config/toak/skills/.
Default Skills
| Skill | Hotwords | Description | Type |
|---|---|---|---|
| Terminal | "System terminal", "System run" | Passes the command to terminal_action.sh to execute the spoken shell command. |
script |
| Translate | "System translate to [language]" | Translates your dictation into the target language. | type |
| Professional | "System professional", "System formalize" | Rewrites your text to be articulate and formal. | type |
| Summary | "System summary", "System concise" | Strips fluff and provides a direct, crisp summary. | type |
Customizing & Adding Skills
You can build infinite capabilities using the CLI wizard:
toak skill add
This generates a JSON file in ~/.config/toak/skills/. Skills support two actions:
typeAction: Transforms the dictated text via a custom System Prompt and types it into the active window (or pipes/copies if flags are requested).scriptAction: Transforms the text and passes the result as$1to a local shell script, allowing Toak to control system functions, APIs, or smart home devices completely autonomously.
⚙️ Configuration
Toak's behavior is defined in ToakConfig.cs and can be managed via the CLI. Key settings include:
WhisperModel: The STT model (default:whisper-large-v3-turbo).LlmModel: The refinement model (default:openai/gpt-oss-20b).TypingBackend: Choose betweenwtype(Wayland) orxdotool(X11).ModulePunctuation: Toggle automatic grammar and punctuation fixing.ModuleTechnicalSanitization: Ensures technical terms likeC#,SQL, orAPIare formatted correctly.