115 lines
5.7 KiB
Markdown
115 lines
5.7 KiB
Markdown
# Toak: High-speed Linux Dictation
|
|
|
|
Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT via Groq or Together AI) to provide a seamless, articulate, and highly configurable dictation experience.
|
|
|
|
Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-fast standalone binary with zero runtime overhead, using a client-daemon architecture for near-zero latency.
|
|
|
|
---
|
|
|
|
## 🚀 Key Features
|
|
|
|
- **Blazing Fast**: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
|
|
- **Native AOT**: Compiled to a native Linux binary for instant startup and minimal footprint.
|
|
- **Intelligent Refinement**: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
|
|
- **Multi-Provider LLM**: Supports both **Groq** and **Together AI** for text refinement, including Llama 3 models.
|
|
- **Reasoning Capabilities**: Optional reasoning effort settings for complex text processing.
|
|
- **Modular Skills**: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
|
|
- **Multiple Backends**: Types directly into your active window (`wtype`, `xdotool`, or `ydotool`), copies to clipboard, or pipes to stdout.
|
|
- **High-Quality Audio**: Native support for **PipeWire** (`pw-record`) and **FFmpeg** for universal compatibility.
|
|
- **Beautiful CLI**: Interactive onboarding and configuration powered by `Spectre.Console`.
|
|
|
|
---
|
|
|
|
## 🛠 Prerequisites
|
|
|
|
- **.NET 10 SDK** (for building from source)
|
|
- **Audio Capture**: `pipewire` / `pw-record` (recommended) or `ffmpeg`
|
|
- **Typing Backend**: `wtype` (Wayland), `xdotool` (X11), or `ydotool` (Virtual Input)
|
|
- **Clipboard**: `wl-copy` (Wayland) or `xclip` (X11)
|
|
- **API Keys**: Groq (required for Whisper) and optionally Together AI.
|
|
|
|
---
|
|
|
|
## 📦 Installation
|
|
|
|
Toak includes a self-contained installation script that handles the native compilation and setup:
|
|
|
|
```bash
|
|
git clone https://github.com/your-repo/toak.git
|
|
cd toak
|
|
./install.sh
|
|
```
|
|
|
|
The script will:
|
|
1. Publish the project as a **Native AOT** Release binary.
|
|
2. Install the executable to `/usr/bin/toak`.
|
|
3. Install **Zsh completions** to `/usr/share/zsh/site-functions/`.
|
|
4. Install and enable the `toak.service` systemd user service so the background daemon runs automatically.
|
|
|
|
To remove Toak from your system, simply run:
|
|
```bash
|
|
./uninstall.sh
|
|
```
|
|
|
|
---
|
|
|
|
## 🎮 Usage
|
|
|
|
Toak operates as a persistent **daemon** that manages state and API calls, while the **CLI** sends commands to it via Unix sockets.
|
|
|
|
### Core Commands
|
|
|
|
- **`toak toggle`**: The primary command. Start recording; run again to stop, transcribe, and type/copy/stdout the result.
|
|
- **`toak start`**: Explicitly start recording.
|
|
- **`toak stop`**: Explicitly stop recording.
|
|
- **`toak status`**: Output the daemon's current state (`Recording`, `Idle`, or JSON).
|
|
- **`toak daemon`**: Runs the background service manually (usually managed by systemd).
|
|
- **`toak onboard`**: Launches the interactive configuration wizard for providers, models, and backends.
|
|
- **`toak discard`**: Instantly aborts the current recording without performing any transcription.
|
|
- **`toak latency-test`**: Benchmarks your network and API latency to ensure optimal performance.
|
|
- **`toak show`**: Displays your current configuration in a clean table.
|
|
- **`toak config <key> <value>`**: Update settings like `llm`, `whisper`, `lang`, `backend`, `punctuation`, or `tech`.
|
|
- **`toak skill`**: Manage dynamic JSON skills via `list`, `add`, or `remove`.
|
|
- **`toak history`**: Query transcription history (`-n <count>`, `--grep <pattern>`, `--export <file>`, `--shred`).
|
|
- **`toak stats`**: Display usage statistics like total recordings, latency, and top words.
|
|
|
|
### Global Flags
|
|
|
|
- `-p, --pipe`: Stop recording and output the finalized text to `stdout` instead of typing it.
|
|
- `--copy`: Stop recording and copy the result to the system clipboard.
|
|
- `-v, --verbose`: Enable detailed debug logging for troubleshooting.
|
|
|
|
---
|
|
|
|
## 🤖 Skills System
|
|
|
|
Toak includes a robust, data-driven skills system triggered by saying hotwords (like **"System"**) at the start of your dictation. Skills are defined as simple JSON files entirely configurable without modifying the C# codebase.
|
|
|
|
### Default Skills
|
|
| Skill | Hotwords | Description | Type |
|
|
| :--- | :--- | :--- | :--- |
|
|
| **Terminal** | "System terminal", "System run" | Translates request into a bash command and executes it via `terminal_action.sh`. | `script` |
|
|
| **Translate** | "System translate to [language]" | Translates your dictation into the target language. | `type` |
|
|
| **Professional**| "System professional", "System formalize" | Rewrites your text to be articulate and formal. | `type` |
|
|
| **Summary** | "System summary", "System concise" | Strips fluff and provides a direct, crisp summary. | `type` |
|
|
|
|
### Customizing & Adding Skills
|
|
Run `toak skill add` to build interactive custom capabilities. All skills live in `~/.config/toak/skills/`.
|
|
|
|
---
|
|
|
|
## ⚙️ Configuration
|
|
|
|
Key settings in `ToakConfig.cs` (managed via `toak onboard` or `toak config`):
|
|
|
|
- `LlmProvider`: Choice of `groq` (default) or `together`.
|
|
- `LlmModel`: The refinement model (e.g., `llama-3.3-70b-versatile`).
|
|
- `ReasoningEffort`: Set to `none` (default) or `low` for O1-style reasoning.
|
|
- `WhisperModel`: The STT model (default: `whisper-large-v3-turbo`).
|
|
- `WhisperLanguage`: Set spoken language (e.g., `en`, `es`, `fr`).
|
|
- `TypingBackend`: Choose between `wtype`, `xdotool`, or `ydotool`.
|
|
- `AudioBackend`: Choose between `pw-record` (PipeWire) or `ffmpeg`.
|
|
- `MinRecordingDuration`: Set the minimum recording duration in ms (default: `500`).
|
|
- `ModulePunctuation`: Toggle automatic grammar and punctuation fixing.
|
|
- `ModuleTechnicalSanitization`: Ensures technical terms are formatted correctly.
|