feat: Expand LLM and Whisper STT provider support to include Fireworks AI and Cerebras, and update related documentation.
This commit is contained in:
12
README.md
12
README.md
@@ -1,6 +1,6 @@
|
|||||||
# Toak: High-speed Linux Dictation
|
# Toak: High-speed Linux Dictation
|
||||||
|
|
||||||
Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq) with LLM refinement (Llama/GPT via Groq or Together AI) to provide a seamless, articulate, and highly configurable dictation experience.
|
Toak is a high-speed, professional-grade dictation tool for Linux. It combines state-of-the-art Speech-to-Text (Whisper via Groq or Fireworks AI) with LLM refinement (Llama/DeepSeek/GPT via Groq, Together AI, Cerebras, or Fireworks AI) to provide a seamless, articulate, and highly configurable dictation experience.
|
||||||
|
|
||||||
Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-fast standalone binary with zero runtime overhead, using a client-daemon architecture for near-zero latency.
|
Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-fast standalone binary with zero runtime overhead, using a client-daemon architecture for near-zero latency.
|
||||||
|
|
||||||
@@ -11,7 +11,8 @@ Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-
|
|||||||
- **Blazing Fast**: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
|
- **Blazing Fast**: Uses Groq's API for sub-second Whisper transcription and LLM refinement.
|
||||||
- **Native AOT**: Compiled to a native Linux binary for instant startup and minimal footprint.
|
- **Native AOT**: Compiled to a native Linux binary for instant startup and minimal footprint.
|
||||||
- **Intelligent Refinement**: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
|
- **Intelligent Refinement**: Automatically fixes grammar, punctuation, and technical terms while preserving your voice.
|
||||||
- **Multi-Provider LLM**: Supports both **Groq** and **Together AI** for text refinement, including Llama 3 models.
|
- **Multi-Provider LLM**: Supports **Groq**, **Together AI**, **Cerebras**, and **Fireworks AI** for text refinement, including Llama 3 and DeepSeek models.
|
||||||
|
- **Multi-Provider Whisper**: Easily swap between Groq and Fireworks AI for state-of-the-art STT APIs.
|
||||||
- **Reasoning Capabilities**: Optional reasoning effort settings for complex text processing.
|
- **Reasoning Capabilities**: Optional reasoning effort settings for complex text processing.
|
||||||
- **Modular Skills**: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
|
- **Modular Skills**: Actionable "System" commands for translation, terminal execution, professional rewriting, and summarization.
|
||||||
- **Multiple Backends**: Types directly into your active window (`wtype`, `xdotool`, or `ydotool`), copies to clipboard, or pipes to stdout.
|
- **Multiple Backends**: Types directly into your active window (`wtype`, `xdotool`, or `ydotool`), copies to clipboard, or pipes to stdout.
|
||||||
@@ -26,7 +27,7 @@ Built with **.NET 10** and compiled to **Native AOT**, Toak runs as a lightning-
|
|||||||
- **Audio Capture**: `pipewire` / `pw-record` (recommended) or `ffmpeg`
|
- **Audio Capture**: `pipewire` / `pw-record` (recommended) or `ffmpeg`
|
||||||
- **Typing Backend**: `wtype` (Wayland), `xdotool` (X11), or `ydotool` (Virtual Input)
|
- **Typing Backend**: `wtype` (Wayland), `xdotool` (X11), or `ydotool` (Virtual Input)
|
||||||
- **Clipboard**: `wl-copy` (Wayland) or `xclip` (X11)
|
- **Clipboard**: `wl-copy` (Wayland) or `xclip` (X11)
|
||||||
- **API Keys**: Groq (required for Whisper) and optionally Together AI.
|
- **API Keys**: API Keys for your chosen providers (Groq, Together AI, Cerebras, or Fireworks).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -102,9 +103,10 @@ Run `toak skill add` to build interactive custom capabilities. All skills live i
|
|||||||
|
|
||||||
Key settings in `ToakConfig.cs` (managed via `toak onboard` or `toak config`):
|
Key settings in `ToakConfig.cs` (managed via `toak onboard` or `toak config`):
|
||||||
|
|
||||||
- `LlmProvider`: Choice of `groq` (default) or `together`.
|
- `LlmProvider`: Choice of `groq` (default), `together`, `cerebras`, or `fireworks`.
|
||||||
- `LlmModel`: The refinement model (e.g., `llama-3.3-70b-versatile`).
|
- `LlmModel`: The refinement model (e.g., `llama-3.3-70b-versatile` or `accounts/fireworks/models/deepseek-v3p1`).
|
||||||
- `ReasoningEffort`: Set to `none` (default) or `low` for O1-style reasoning.
|
- `ReasoningEffort`: Set to `none` (default) or `low` for O1-style reasoning.
|
||||||
|
- `WhisperProvider`: Choice of `groq` (default) or `fireworks`.
|
||||||
- `WhisperModel`: The STT model (default: `whisper-large-v3-turbo`).
|
- `WhisperModel`: The STT model (default: `whisper-large-v3-turbo`).
|
||||||
- `WhisperLanguage`: Set spoken language (e.g., `en`, `es`, `fr`).
|
- `WhisperLanguage`: Set spoken language (e.g., `en`, `es`, `fr`).
|
||||||
- `TypingBackend`: Choose between `wtype`, `xdotool`, or `ydotool`.
|
- `TypingBackend`: Choose between `wtype`, `xdotool`, or `ydotool`.
|
||||||
|
|||||||
Reference in New Issue
Block a user