# Toak: High-speed Linux Dictation Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT via Groq or Together AI) to provide a seamless, articulate, and highly configurable dictation experience. Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-fast standalone binary with zero runtime overhead, using a client-daemon architecture for near-zero latency. --- ## 🚀 Key Features - **Blazing Fast**: Uses Groq's API for sub-second Whisper transcription and LLM refinement. - **Native AOT**: Compiled to a native Linux binary for instant startup and minimal footprint. - **Intelligent Refinement**: Automatically fixes grammar, punctuation, and technical terms while preserving your voice. - **Multi-Provider LLM**: Supports both **Groq** and **Together AI** for text refinement, including Llama 3 models. - **Reasoning Capabilities**: Optional reasoning effort settings for complex text processing. - **Modular Skills**: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization. - **Multiple Backends**: Types directly into your active window (`wtype`, `xdotool`, or `ydotool`), copies to clipboard, or pipes to stdout. - **High-Quality Audio**: Native support for **PipeWire** (`pw-record`) and **FFmpeg** for universal compatibility. - **Beautiful CLI**: Interactive onboarding and configuration powered by `Spectre.Console`. --- ## 🛠 Prerequisites - **.NET 10 SDK** (for building from source) - **Audio Capture**: `pipewire` / `pw-record` (recommended) or `ffmpeg` - **Typing Backend**: `wtype` (Wayland), `xdotool` (X11), or `ydotool` (Virtual Input) - **Clipboard**: `wl-copy` (Wayland) or `xclip` (X11) - **API Keys**: Groq (required for Whisper) and optionally Together AI. --- ## 📦 Installation Toak includes a self-contained installation script that handles the native compilation and setup: ```bash git clone https://github.com/your-repo/toak.git cd toak ./install.sh ``` The script will: 1. Publish the project as a **Native AOT** Release binary. 2. Install the executable to `/usr/bin/toak`. 3. Install **Zsh completions** to `/usr/share/zsh/site-functions/`. 4. Install and enable the `toak.service` systemd user service so the background daemon runs automatically. To remove Toak from your system, simply run: ```bash ./uninstall.sh ``` --- ## 🎮 Usage Toak operates as a persistent **daemon** that manages state and API calls, while the **CLI** sends commands to it via Unix sockets. ### Core Commands - **`toak toggle`**: The primary command. Start recording; run again to stop, transcribe, and type/copy/stdout the result. - **`toak daemon`**: Runs the background service manually (usually managed by systemd). - **`toak onboard`**: Launches the interactive configuration wizard for providers, models, and backends. - **`toak discard`**: Instantly aborts the current recording without performing any transcription. - **`toak latency-test`**: Benchmarks your network and API latency to ensure optimal performance. - **`toak show`**: Displays your current configuration in a clean table. - **`toak config `**: Update settings like `llm`, `whisper`, `lang`, `backend`, `punctuation`, or `tech`. - **`toak skill`**: Manage dynamic JSON skills via `list`, `add`, or `remove`. - **`toak history`**: Query transcription history (`-n `, `--grep `, `--export `, `--shred`). - **`toak stats`**: Display usage statistics like total recordings, latency, and top words. ### Global Flags - `-p, --pipe`: Stop recording and output the finalized text to `stdout` instead of typing it. - `--copy`: Stop recording and copy the result to the system clipboard. - `-v, --verbose`: Enable detailed debug logging for troubleshooting. --- ## 🤖 Skills System Toak includes a robust, data-driven skills system triggered by saying hotwords (like **"System"**) at the start of your dictation. Skills are defined as simple JSON files entirely configurable without modifying the C# codebase. ### Default Skills | Skill | Hotwords | Description | Type | | :--- | :--- | :--- | :--- | | **Terminal** | "System terminal", "System run" | Translates request into a bash command and executes it via `terminal_action.sh`. | `script` | | **Translate** | "System translate to [language]" | Translates your dictation into the target language. | `type` | | **Professional**| "System professional", "System formalize" | Rewrites your text to be articulate and formal. | `type` | | **Summary** | "System summary", "System concise" | Strips fluff and provides a direct, crisp summary. | `type` | ### Customizing & Adding Skills Run `toak skill add` to build interactive custom capabilities. All skills live in `~/.config/toak/skills/`. --- ## ⚙️ Configuration Key settings in `ToakConfig.cs` (managed via `toak onboard` or `toak config`): - `LlmProvider`: Choice of `groq` (default) or `together`. - `LlmModel`: The refinement model (e.g., `llama-3.3-70b-versatile`). - `ReasoningEffort`: Set to `none` (default) or `low` for O1-style reasoning. - `WhisperModel`: The STT model (default: `whisper-large-v3-turbo`). - `WhisperLanguage`: Set spoken language (e.g., `en`, `es`, `fr`). - `TypingBackend`: Choose between `wtype`, `xdotool`, or `ydotool`. - `AudioBackend`: Choose between `pw-record` (PipeWire) or `ffmpeg`. - `ModulePunctuation`: Toggle automatic grammar and punctuation fixing. - `ModuleTechnicalSanitization`: Ensures technical terms are formatted correctly.