1
0
2026-02-25 21:51:27 +01:00
2026-02-25 21:51:27 +01:00
2026-02-25 21:51:27 +01:00
2026-02-25 21:51:27 +01:00

Toak: High-speed Linux Dictation

Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT) to provide a seamless, articulate, and highly configurable dictation experience.

Built with .NET 10 and compiled to Native AOT, Toak runs as a lightning-fast standalone binary with zero runtime overhead.


🚀 Key Features

  • Blazing Fast: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
  • Native AOT: Compiled to a native Linux binary for instant startup and minimal footprint.
  • Intelligent Refinement: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
  • Modular Skills: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
  • Multiple Backends: Types directly into your active window (wtype or xdotool), copies to clipboard, or pipes to stdout.
  • Beautiful CLI: Interactive onboarding and configuration powered by Spectre.Console.

🛠 Prerequisites

  • .NET 10 SDK (for building from source)
  • pipewire / pw-record (for native Wayland/Linux audio capture)
  • Typing Backend: wtype (Wayland) or xdotool (X11)
  • Groq API Key: Get one at console.groq.com

📦 Installation

Toak includes a self-contained installation script that handles the native compilation and setup:

git clone https://github.com/your-repo/toak.git
cd toak
./install.sh

The script will:

  1. Publish the project as a Native AOT Release binary.
  2. Install the executable to /usr/bin/toak.
  3. Install Zsh completions to /usr/share/zsh/site-functions/.
  4. Install and enable the toak.service systemd user service so the background daemon runs automatically.

To remove Toak from your system, simply run:

./uninstall.sh

🎮 Usage

Core Commands

  • toak toggle: The primary command. Run it to start recording; run it again to stop, transcribe, and type/copy the result.
  • toak discard: Instantly aborts the current recording without performing any transcription.
  • toak onboard: Launches the interactive configuration wizard.
  • toak latency-test: Benchmarks your network and API latency to ensure optimal performance.
  • toak show: Displays your current configuration in a clean table.
  • toak config <key> <value>: Quickly update a specific setting (e.g., toak config whisper whisper-large-v3-turbo).
  • toak skill: Manage dynamic JSON skills via list, add, or remove subcommands.

Flags

  • -p, --pipe: Output the finalized text to stdout instead of typing it.
  • --copy: Copy the result to the system clipboard.
  • -v, --verbose: Enable detailed debug logging.

🤖 Skills System

Toak includes a robust, data-driven skills system triggered by saying hotwords (like "System") at the start of your dictation. Skills are defined as simple JSON files entirely configurable without modifying the C# codebase.

At runtime, skills are loaded from ~/.config/toak/skills/.

Default Skills

Skill Hotwords Description Type
Terminal "System terminal", "System run" Passes the command to terminal_action.sh to execute the spoken shell command. script
Translate "System translate to [language]" Translates your dictation into the target language. type
Professional "System professional", "System formalize" Rewrites your text to be articulate and formal. type
Summary "System summary", "System concise" Strips fluff and provides a direct, crisp summary. type

Customizing & Adding Skills

You can build infinite capabilities using the CLI wizard:

toak skill add

This generates a JSON file in ~/.config/toak/skills/. Skills support two actions:

  1. type Action: Transforms the dictated text via a custom System Prompt and types it into the active window (or pipes/copies if flags are requested).
  2. script Action: Transforms the text and passes the result as $1 to a local shell script, allowing Toak to control system functions, APIs, or smart home devices completely autonomously.

⚙️ Configuration

Toak's behavior is defined in ToakConfig.cs and can be managed via the CLI. Key settings include:

  • WhisperModel: The STT model (default: whisper-large-v3-turbo).
  • LlmModel: The refinement model (default: openai/gpt-oss-20b).
  • TypingBackend: Choose between wtype (Wayland) or xdotool (X11).
  • ModulePunctuation: Toggle automatic grammar and punctuation fixing.
  • ModuleTechnicalSanitization: Ensures technical terms like C#, SQL, or API are formatted correctly.
Description
Lightning-fast Linux dictation. Combines sub-second Whisper STT with LLM refinement for instant, articulate typing across Wayland and X11.
Readme 469 KiB
Languages
C# 98.4%
Shell 1.6%