# Toak: High-speed Linux Dictation Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT) to provide a seamless, articulate, and highly configurable dictation experience. Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-fast standalone binary with zero runtime overhead. --- ## 🚀 Key Features - **Blazing Fast**: Uses Groq's API for sub-second Whisper transcription and LLM refinement. - **Native AOT**: Compiled to a native Linux binary for instant startup and minimal footprint. - **Intelligent Refinement**: Automatically fixes grammar, punctuation, and technical terms while preserving your voice. - **Modular Skills**: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization. - **Multiple Backends**: Types directly into your active window (`wtype` or `xdotool`), copies to clipboard, or pipes to stdout. - **Beautiful CLI**: Interactive onboarding and configuration powered by `Spectre.Console`. --- ## 🛠 Prerequisites - **.NET 10 SDK** (for building from source) - **pipewire** / **pw-record** (for native Wayland/Linux audio capture) - **Typing Backend**: `wtype` (Wayland) or `xdotool` (X11) - **Groq API Key**: Get one at [console.groq.com](https://console.groq.com/) --- ## 📦 Installation Toak includes a self-contained installation script that handles the native compilation and setup: ```bash git clone https://github.com/your-repo/toak.git cd toak ./install.sh ``` The script will: 1. Publish the project as a **Native AOT** Release binary. 2. Install the executable to `/usr/bin/toak`. 3. Install **Zsh completions** to `/usr/share/zsh/site-functions/`. 4. Install and enable the `toak.service` systemd user service so the background daemon runs automatically. To remove Toak from your system, simply run: ```bash ./uninstall.sh ``` --- ## 🎮 Usage ### Core Commands - **`toak toggle`**: The primary command. Run it to start recording; run it again to stop, transcribe, and type/copy the result. - **`toak discard`**: Instantly aborts the current recording without performing any transcription. - **`toak onboard`**: Launches the interactive configuration wizard. - **`toak latency-test`**: Benchmarks your network and API latency to ensure optimal performance. - **`toak show`**: Displays your current configuration in a clean table. - **`toak config `**: Quickly update a specific setting (e.g., `toak config whisper whisper-large-v3-turbo`). - **`toak skill`**: Manage dynamic JSON skills via `list`, `add`, or `remove` subcommands. ### Flags - `-p, --pipe`: Output the finalized text to `stdout` instead of typing it. - `--copy`: Copy the result to the system clipboard. - `-v, --verbose`: Enable detailed debug logging. --- ## 🤖 Skills System Toak includes a robust, data-driven skills system triggered by saying hotwords (like **"System"**) at the start of your dictation. Skills are defined as simple JSON files entirely configurable without modifying the C# codebase. At runtime, skills are loaded from `~/.config/toak/skills/`. ### Default Skills | Skill | Hotwords | Description | Type | | :--- | :--- | :--- | :--- | | **Terminal** | "System terminal", "System run" | Passes the command to `terminal_action.sh` to execute the spoken shell command. | `script` | | **Translate** | "System translate to [language]" | Translates your dictation into the target language. | `type` | | **Professional**| "System professional", "System formalize" | Rewrites your text to be articulate and formal. | `type` | | **Summary** | "System summary", "System concise" | Strips fluff and provides a direct, crisp summary. | `type` | ### Customizing & Adding Skills You can build infinite capabilities using the CLI wizard: ```bash toak skill add ``` This generates a JSON file in `~/.config/toak/skills/`. Skills support two actions: 1. **`type` Action**: Transforms the dictated text via a custom System Prompt and types it into the active window (or pipes/copies if flags are requested). 2. **`script` Action**: Transforms the text and passes the result as `$1` to a local shell script, allowing Toak to control system functions, APIs, or smart home devices completely autonomously. --- ## ⚙️ Configuration Toak's behavior is defined in `ToakConfig.cs` and can be managed via the CLI. Key settings include: - `WhisperModel`: The STT model (default: `whisper-large-v3-turbo`). - `LlmModel`: The refinement model (default: `openai/gpt-oss-20b`). - `TypingBackend`: Choose between `wtype` (Wayland) or `xdotool` (X11). - `ModulePunctuation`: Toggle automatic grammar and punctuation fixing. - `ModuleTechnicalSanitization`: Ensures technical terms like `C#`, `SQL`, or `API` are formatted correctly.