diff --git a/README.md b/README.md new file mode 100644 index 0000000..c8bcdf5 --- /dev/null +++ b/README.md @@ -0,0 +1,86 @@ +# Toak: High-speed Linux Dictation + +Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT) to provide a seamless, articulate, and highly configurable dictation experience. + +Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-fast standalone binary with zero runtime overhead. + +--- + +## 🚀 Key Features + +- **Blazing Fast**: Uses Groq's API for sub-second Whisper transcription and LLM refinement. +- **Native AOT**: Compiled to a native Linux binary for instant startup and minimal footprint. +- **Intelligent Refinement**: Automatically fixes grammar, punctuation, and technical terms while preserving your voice. +- **Modular Skills**: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization. +- **Multiple Backends**: Types directly into your active window (`wtype` or `xdotool`), copies to clipboard, or pipes to stdout. +- **Beautiful CLI**: Interactive onboarding and configuration powered by `Spectre.Console`. + +--- + +## 🛠 Prerequisites + +- **.NET 10 SDK** (for building from source) +- **ffmpeg** (for audio capture processing) +- **Typing Backend**: `wtype` (Wayland) or `xdotool` (X11) +- **Groq API Key**: Get one at [console.groq.com](https://console.groq.com/) + +--- + +## 📦 Installation + +Toak includes a self-contained installation script that handles the native compilation and setup: + +```bash +git clone https://github.com/your-repo/toak.git +cd toak +./install.sh +``` + +The script will: +1. Publish the project as a **Native AOT** Release binary. +2. Install the executable to `/usr/bin/toak`. +3. Install **Zsh completions** to `/usr/share/zsh/site-functions/`. + +--- + +## 🎮 Usage + +### Core Commands + +- **`toak toggle`**: The primary command. Run it to start recording; run it again to stop, transcribe, and type/copy the result. +- **`toak discard`**: Instantly aborts the current recording without performing any transcription. +- **`toak onboard`**: Launches the interactive configuration wizard. +- **`toak latency-test`**: Benchmarks your network and API latency to ensure optimal performance. +- **`toak show`**: Displays your current configuration in a clean table. +- **`toak config `**: Quickly update a specific setting (e.g., `toak config whisper whisper-large-v3-turbo`). + +### Flags + +- `-p, --pipe`: Output the finalized text to `stdout` instead of typing it. +- `--copy`: Copy the result to the system clipboard. +- `-v, --verbose`: Enable detailed debug logging. + +--- + +## 🤖 Skills System + +Toak includes a modular skills system triggered by saying the **"System"** hotword at the start of your dictation. + +| Skill | Hotwords | Description | +| :--- | :--- | :--- | +| **Terminal** | "System terminal", "System run" | Executes the spoken command in your shell. | +| **Translate** | "System translate to [language]" | Translates your dictation into the target language. | +| **Professional**| "System professional", "System formalize" | Rewrites your text to be articulate and formal. | +| **Summary** | "System summary", "System concise" | Strips fluff and provides a direct, crisp summary. | + +--- + +## ⚙️ Configuration + +Toak's behavior is defined in `ToakConfig.cs` and can be managed via the CLI. Key settings include: + +- `WhisperModel`: The STT model (default: `whisper-large-v3-turbo`). +- `LlmModel`: The refinement model (default: `openai/gpt-oss-20b`). +- `TypingBackend`: Choose between `wtype` (Wayland) or `xdotool` (X11). +- `ModulePunctuation`: Toggle automatic grammar and punctuation fixing. +- `ModuleTechnicalSanitization`: Ensures technical terms like `C#`, `SQL`, or `API` are formatted correctly.