1
0

docs: Add initial README.md detailing project overview, features, prerequisites, installation, usage, skills, and configuration.

This commit is contained in:
2026-02-27 01:53:11 +01:00
parent 91a99910f5
commit 77bc1d4ee5

86
README.md Normal file
View File

@@ -0,0 +1,86 @@
# Toak: High-speed Linux Dictation
Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT) to provide a seamless, articulate, and highly configurable dictation experience.
Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-fast standalone binary with zero runtime overhead.
---
## 🚀 Key Features
- **Blazing Fast**: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
- **Native AOT**: Compiled to a native Linux binary for instant startup and minimal footprint.
- **Intelligent Refinement**: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
- **Modular Skills**: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
- **Multiple Backends**: Types directly into your active window (`wtype` or `xdotool`), copies to clipboard, or pipes to stdout.
- **Beautiful CLI**: Interactive onboarding and configuration powered by `Spectre.Console`.
---
## 🛠 Prerequisites
- **.NET 10 SDK** (for building from source)
- **ffmpeg** (for audio capture processing)
- **Typing Backend**: `wtype` (Wayland) or `xdotool` (X11)
- **Groq API Key**: Get one at [console.groq.com](https://console.groq.com/)
---
## 📦 Installation
Toak includes a self-contained installation script that handles the native compilation and setup:
```bash
git clone https://github.com/your-repo/toak.git
cd toak
./install.sh
```
The script will:
1. Publish the project as a **Native AOT** Release binary.
2. Install the executable to `/usr/bin/toak`.
3. Install **Zsh completions** to `/usr/share/zsh/site-functions/`.
---
## 🎮 Usage
### Core Commands
- **`toak toggle`**: The primary command. Run it to start recording; run it again to stop, transcribe, and type/copy the result.
- **`toak discard`**: Instantly aborts the current recording without performing any transcription.
- **`toak onboard`**: Launches the interactive configuration wizard.
- **`toak latency-test`**: Benchmarks your network and API latency to ensure optimal performance.
- **`toak show`**: Displays your current configuration in a clean table.
- **`toak config <key> <value>`**: Quickly update a specific setting (e.g., `toak config whisper whisper-large-v3-turbo`).
### Flags
- `-p, --pipe`: Output the finalized text to `stdout` instead of typing it.
- `--copy`: Copy the result to the system clipboard.
- `-v, --verbose`: Enable detailed debug logging.
---
## 🤖 Skills System
Toak includes a modular skills system triggered by saying the **"System"** hotword at the start of your dictation.
| Skill | Hotwords | Description |
| :--- | :--- | :--- |
| **Terminal** | "System terminal", "System run" | Executes the spoken command in your shell. |
| **Translate** | "System translate to [language]" | Translates your dictation into the target language. |
| **Professional**| "System professional", "System formalize" | Rewrites your text to be articulate and formal. |
| **Summary** | "System summary", "System concise" | Strips fluff and provides a direct, crisp summary. |
---
## ⚙️ Configuration
Toak's behavior is defined in `ToakConfig.cs` and can be managed via the CLI. Key settings include:
- `WhisperModel`: The STT model (default: `whisper-large-v3-turbo`).
- `LlmModel`: The refinement model (default: `openai/gpt-oss-20b`).
- `TypingBackend`: Choose between `wtype` (Wayland) or `xdotool` (X11).
- `ModulePunctuation`: Toggle automatic grammar and punctuation fixing.
- `ModuleTechnicalSanitization`: Ensures technical terms like `C#`, `SQL`, or `API` are formatted correctly.