feat: Add Together AI and FFmpeg support, introduce core orchestration, and update documentation and install scripts.

2026-02-28 16:23:37 +01:00
parent 4e04cc6042
commit 9f611269b1
7 changed files with 99 additions and 98 deletions
--- a/README.md
+++ b/README.md
@@ -1,8 +1,8 @@
 # Toak: High-speed Linux Dictation

-Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT) to provide a seamless, articulate, and highly configurable dictation experience.
+Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT via Groq or Together AI) to provide a seamless, articulate, and highly configurable dictation experience.

-Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-fast standalone binary with zero runtime overhead.
+Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-fast standalone binary with zero runtime overhead, using a client-daemon architecture for near-zero latency.

 ---

@@ -11,8 +11,11 @@ Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-
 - **Blazing Fast**: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
 - **Native AOT**: Compiled to a native Linux binary for instant startup and minimal footprint.
 - **Intelligent Refinement**: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
+- **Multi-Provider LLM**: Supports both **Groq** and **Together AI** for text refinement, including Llama 3 models.
+- **Reasoning Capabilities**: Optional reasoning effort settings for complex text processing.
 - **Modular Skills**: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
- **Multiple Backends**: Types directly into your active window (`wtype` or `xdotool`), copies to clipboard, or pipes to stdout.
+- **Multiple Backends**: Types directly into your active window (`wtype`, `xdotool`, or `ydotool`), copies to clipboard, or pipes to stdout.
+- **High-Quality Audio**: Native support for **PipeWire** (`pw-record`) and **FFmpeg** for universal compatibility.
 - **Beautiful CLI**: Interactive onboarding and configuration powered by `Spectre.Console`.

 ---
@@ -20,9 +23,10 @@ Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-
 ## 🛠 Prerequisites

 - **.NET 10 SDK** (for building from source)
- **pipewire** / **pw-record** (for native Wayland/Linux audio capture)
- **Typing Backend**: `wtype` (Wayland) or `xdotool` (X11)
- **Groq API Key**: Get one at [console.groq.com](https://console.groq.com/)
+- **Audio Capture**: `pipewire` / `pw-record` (recommended) or `ffmpeg`
+- **Typing Backend**: `wtype` (Wayland), `xdotool` (X11), or `ydotool` (Virtual Input)
+- **Clipboard**: `wl-copy` (Wayland) or `xclip` (X11)
+- **API Keys**: Groq (required for Whisper) and optionally Together AI.

 ---

@@ -51,23 +55,26 @@ To remove Toak from your system, simply run:

 ## 🎮 Usage

+Toak operates as a persistent **daemon** that manages state and API calls, while the **CLI** sends commands to it via Unix sockets.
+
 ### Core Commands

- **`toak toggle`**: The primary command. Run it to start recording; run it again to stop, transcribe, and type/copy the result.
+- **`toak toggle`**: The primary command. Start recording; run again to stop, transcribe, and type/copy/stdout the result.
+- **`toak daemon`**: Runs the background service manually (usually managed by systemd).
+- **`toak onboard`**: Launches the interactive configuration wizard for providers, models, and backends.
 - **`toak discard`**: Instantly aborts the current recording without performing any transcription.
- **`toak onboard`**: Launches the interactive configuration wizard.
 - **`toak latency-test`**: Benchmarks your network and API latency to ensure optimal performance.
 - **`toak show`**: Displays your current configuration in a clean table.
- **`toak config <key> <value>`**: Quickly update a specific setting (e.g., `toak config whisper whisper-large-v3-turbo`).
- **`toak skill`**: Manage dynamic JSON skills via `list`, `add`, or `remove` subcommands.
- **`toak history`**: Display your recent dictation history (`-n <count>`, `--grep <pattern>`, `--export <file>`, `--shred`).
- **`toak stats`**: Display usage statistics and analytics like most active day and top words.
+- **`toak config <key> <value>`**: Update settings like `llm`, `whisper`, `lang`, `backend`, `punctuation`, or `tech`.
+- **`toak skill`**: Manage dynamic JSON skills via `list`, `add`, or `remove`.
+- **`toak history`**: Query transcription history (`-n <count>`, `--grep <pattern>`, `--export <file>`, `--shred`).
+- **`toak stats`**: Display usage statistics like total recordings, latency, and top words.

-### Flags
+### Global Flags

- `-p, --pipe`: Output the finalized text to `stdout` instead of typing it.
- `--copy`: Copy the result to the system clipboard.
- `-v, --verbose`: Enable detailed debug logging.
+- `-p, --pipe`: Stop recording and output the finalized text to `stdout` instead of typing it.
+- `--copy`: Stop recording and copy the result to the system clipboard.
+- `-v, --verbose`: Enable detailed debug logging for troubleshooting.

 ---

@@ -75,34 +82,29 @@ To remove Toak from your system, simply run:

 Toak includes a robust, data-driven skills system triggered by saying hotwords (like **"System"**) at the start of your dictation. Skills are defined as simple JSON files entirely configurable without modifying the C# codebase.

-At runtime, skills are loaded from `~/.config/toak/skills/`.
-
 ### Default Skills
 | Skill | Hotwords | Description | Type |
 | :--- | :--- | :--- | :--- |
-| **Terminal** | "System terminal", "System run" | Passes the command to `terminal_action.sh` to execute the spoken shell command. | `script` |
+| **Terminal** | "System terminal", "System run" | Translates request into a bash command and executes it via `terminal_action.sh`. | `script` |
 | **Translate** | "System translate to [language]" | Translates your dictation into the target language. | `type` |
 | **Professional**| "System professional", "System formalize" | Rewrites your text to be articulate and formal. | `type` |
 | **Summary** | "System summary", "System concise" | Strips fluff and provides a direct, crisp summary. | `type` |

 ### Customizing & Adding Skills
-You can build infinite capabilities using the CLI wizard:
-```bash
-toak skill add
-```
-
-This generates a JSON file in `~/.config/toak/skills/`. Skills support two actions:
-1. **`type` Action**: Transforms the dictated text via a custom System Prompt and types it into the active window (or pipes/copies if flags are requested).
-2. **`script` Action**: Transforms the text and passes the result as `$1` to a local shell script, allowing Toak to control system functions, APIs, or smart home devices completely autonomously.
+Run `toak skill add` to build interactive custom capabilities. All skills live in `~/.config/toak/skills/`.

 ---

 ## ⚙️ Configuration

-Toak's behavior is defined in `ToakConfig.cs` and can be managed via the CLI. Key settings include:
+Key settings in `ToakConfig.cs` (managed via `toak onboard` or `toak config`):

+- `LlmProvider`: Choice of `groq` (default) or `together`.
+- `LlmModel`: The refinement model (e.g., `llama-3.3-70b-versatile`).
+- `ReasoningEffort`: Set to `none` (default) or `low` for O1-style reasoning.
 - `WhisperModel`: The STT model (default: `whisper-large-v3-turbo`).
- `LlmModel`: The refinement model (default: `openai/gpt-oss-20b`).
- `TypingBackend`: Choose between `wtype` (Wayland) or `xdotool` (X11).
+- `WhisperLanguage`: Set spoken language (e.g., `en`, `es`, `fr`).
+- `TypingBackend`: Choose between `wtype`, `xdotool`, or `ydotool`.
+- `AudioBackend`: Choose between `pw-record` (PipeWire) or `ffmpeg`.
 - `ModulePunctuation`: Toggle automatic grammar and punctuation fixing.
- `ModuleTechnicalSanitization`: Ensures technical terms like `C#`, `SQL`, or `API` are formatted correctly.
+- `ModuleTechnicalSanitization`: Ensures technical terms are formatted correctly.