1
0
Files
Toak/README.md

3.6 KiB

Toak: High-speed Linux Dictation

Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT) to provide a seamless, articulate, and highly configurable dictation experience.

Built with .NET 10 and compiled to Native AOT, Toak runs as a lightning-fast standalone binary with zero runtime overhead.


🚀 Key Features

  • Blazing Fast: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
  • Native AOT: Compiled to a native Linux binary for instant startup and minimal footprint.
  • Intelligent Refinement: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
  • Modular Skills: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
  • Multiple Backends: Types directly into your active window (wtype or xdotool), copies to clipboard, or pipes to stdout.
  • Beautiful CLI: Interactive onboarding and configuration powered by Spectre.Console.

🛠 Prerequisites

  • .NET 10 SDK (for building from source)
  • ffmpeg (for audio capture processing)
  • Typing Backend: wtype (Wayland) or xdotool (X11)
  • Groq API Key: Get one at console.groq.com

📦 Installation

Toak includes a self-contained installation script that handles the native compilation and setup:

git clone https://github.com/your-repo/toak.git
cd toak
./install.sh

The script will:

  1. Publish the project as a Native AOT Release binary.
  2. Install the executable to /usr/bin/toak.
  3. Install Zsh completions to /usr/share/zsh/site-functions/.

🎮 Usage

Core Commands

  • toak toggle: The primary command. Run it to start recording; run it again to stop, transcribe, and type/copy the result.
  • toak discard: Instantly aborts the current recording without performing any transcription.
  • toak onboard: Launches the interactive configuration wizard.
  • toak latency-test: Benchmarks your network and API latency to ensure optimal performance.
  • toak show: Displays your current configuration in a clean table.
  • toak config <key> <value>: Quickly update a specific setting (e.g., toak config whisper whisper-large-v3-turbo).

Flags

  • -p, --pipe: Output the finalized text to stdout instead of typing it.
  • --copy: Copy the result to the system clipboard.
  • -v, --verbose: Enable detailed debug logging.

🤖 Skills System

Toak includes a modular skills system triggered by saying the "System" hotword at the start of your dictation.

Skill Hotwords Description
Terminal "System terminal", "System run" Executes the spoken command in your shell.
Translate "System translate to [language]" Translates your dictation into the target language.
Professional "System professional", "System formalize" Rewrites your text to be articulate and formal.
Summary "System summary", "System concise" Strips fluff and provides a direct, crisp summary.

⚙️ Configuration

Toak's behavior is defined in ToakConfig.cs and can be managed via the CLI. Key settings include:

  • WhisperModel: The STT model (default: whisper-large-v3-turbo).
  • LlmModel: The refinement model (default: openai/gpt-oss-20b).
  • TypingBackend: Choose between wtype (Wayland) or xdotool (X11).
  • ModulePunctuation: Toggle automatic grammar and punctuation fixing.
  • ModuleTechnicalSanitization: Ensures technical terms like C#, SQL, or API are formatted correctly.