1
0
2026-02-25 21:51:27 +01:00

Toak: High-speed Linux Dictation

Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT via Groq or Together AI) to provide a seamless, articulate, and highly configurable dictation experience.

Built with .NET 10 and compiled to Native AOT, Toak runs as a lightning-fast standalone binary with zero runtime overhead, using a client-daemon architecture for near-zero latency.


🚀 Key Features

  • Blazing Fast: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
  • Native AOT: Compiled to a native Linux binary for instant startup and minimal footprint.
  • Intelligent Refinement: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
  • Multi-Provider LLM: Supports both Groq and Together AI for text refinement, including Llama 3 models.
  • Reasoning Capabilities: Optional reasoning effort settings for complex text processing.
  • Modular Skills: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
  • Multiple Backends: Types directly into your active window (wtype, xdotool, or ydotool), copies to clipboard, or pipes to stdout.
  • High-Quality Audio: Native support for PipeWire (pw-record) and FFmpeg for universal compatibility.
  • Beautiful CLI: Interactive onboarding and configuration powered by Spectre.Console.

🛠 Prerequisites

  • .NET 10 SDK (for building from source)
  • Audio Capture: pipewire / pw-record (recommended) or ffmpeg
  • Typing Backend: wtype (Wayland), xdotool (X11), or ydotool (Virtual Input)
  • Clipboard: wl-copy (Wayland) or xclip (X11)
  • API Keys: Groq (required for Whisper) and optionally Together AI.

📦 Installation

Toak includes a self-contained installation script that handles the native compilation and setup:

git clone https://github.com/your-repo/toak.git
cd toak
./install.sh

The script will:

  1. Publish the project as a Native AOT Release binary.
  2. Install the executable to /usr/bin/toak.
  3. Install Zsh completions to /usr/share/zsh/site-functions/.
  4. Install and enable the toak.service systemd user service so the background daemon runs automatically.

To remove Toak from your system, simply run:

./uninstall.sh

🎮 Usage

Toak operates as a persistent daemon that manages state and API calls, while the CLI sends commands to it via Unix sockets.

Core Commands

  • toak toggle: The primary command. Start recording; run again to stop, transcribe, and type/copy/stdout the result.
  • toak start: Explicitly start recording.
  • toak stop: Explicitly stop recording.
  • toak status: Output the daemon's current state (Recording, Idle, or JSON).
  • toak daemon: Runs the background service manually (usually managed by systemd).
  • toak onboard: Launches the interactive configuration wizard for providers, models, and backends.
  • toak discard: Instantly aborts the current recording without performing any transcription.
  • toak latency-test: Benchmarks your network and API latency to ensure optimal performance.
  • toak show: Displays your current configuration in a clean table.
  • toak config <key> <value>: Update settings like llm, whisper, lang, backend, punctuation, or tech.
  • toak skill: Manage dynamic JSON skills via list, add, or remove.
  • toak history: Query transcription history (-n <count>, --grep <pattern>, --export <file>, --shred).
  • toak stats: Display usage statistics like total recordings, latency, and top words.

Global Flags

  • -p, --pipe: Stop recording and output the finalized text to stdout instead of typing it.
  • --copy: Stop recording and copy the result to the system clipboard.
  • -v, --verbose: Enable detailed debug logging for troubleshooting.

🤖 Skills System

Toak includes a robust, data-driven skills system triggered by saying hotwords (like "System") at the start of your dictation. Skills are defined as simple JSON files entirely configurable without modifying the C# codebase.

Default Skills

Skill Hotwords Description Type
Terminal "System terminal", "System run" Translates request into a bash command and executes it via terminal_action.sh. script
Translate "System translate to [language]" Translates your dictation into the target language. type
Professional "System professional", "System formalize" Rewrites your text to be articulate and formal. type
Summary "System summary", "System concise" Strips fluff and provides a direct, crisp summary. type

Customizing & Adding Skills

Run toak skill add to build interactive custom capabilities. All skills live in ~/.config/toak/skills/.


⚙️ Configuration

Key settings in ToakConfig.cs (managed via toak onboard or toak config):

  • LlmProvider: Choice of groq (default) or together.
  • LlmModel: The refinement model (e.g., llama-3.3-70b-versatile).
  • ReasoningEffort: Set to none (default) or low for O1-style reasoning.
  • WhisperModel: The STT model (default: whisper-large-v3-turbo).
  • WhisperLanguage: Set spoken language (e.g., en, es, fr).
  • TypingBackend: Choose between wtype, xdotool, or ydotool.
  • AudioBackend: Choose between pw-record (PipeWire) or ffmpeg.
  • MinRecordingDuration: Set the minimum recording duration in ms (default: 500).
  • ModulePunctuation: Toggle automatic grammar and punctuation fixing.
  • ModuleTechnicalSanitization: Ensures technical terms are formatted correctly.
Description
Lightning-fast Linux dictation. Combines sub-second Whisper STT with LLM refinement for instant, articulate typing across Wayland and X11.
Readme 469 KiB
Languages
C# 98.4%
Shell 1.6%