1
0

feat: Add Together AI and FFmpeg support, introduce core orchestration, and update documentation and install scripts.

This commit is contained in:
2026-02-28 16:23:37 +01:00
parent 4e04cc6042
commit 9f611269b1
7 changed files with 99 additions and 98 deletions

View File

@@ -1,10 +1,10 @@
# History and Stats Implementation Plan
# History and Stats
This document outlines the design and implementation of the `history` and `stats` features in Toak.
## Data Storage
All transcriptions will be stored in a JSON Lines (`.jsonl`) file located at `~/.local/share/toak/history.jsonl`.
Since Toak uses Native AOT and JSON serialization needs source generation, we'll keep the model simple.
All transcriptions are stored in a JSON Lines (`.jsonl`) file.
- **Location**: `~/.local/share/toak/history.jsonl`
**Entry Model:**
```json
@@ -12,32 +12,29 @@ Since Toak uses Native AOT and JSON serialization needs source generation, we'll
"Timestamp": "2025-01-15T09:23:00Z",
"RawTranscript": "hello world",
"RefinedText": "Hello world.",
"SkillName": "Professional", // null if default type/script
"DurationMs": 1500 // time taken for STT + LLM
"SkillName": "Professional", // null if default
"DurationMs": 1500 // total processing time
}
```
## `toak history` Command
Provides access to past dictations.
- `toak history` - Shows the last 10 entries.
- `toak history -n <count>` - Shows the last `<count>` entries.
- `toak history --grep <pattern>` - Filters the history entries matching the given keyword in the RefinedText (case-insensitive).
- `toak history --export <file>` - Writes the output as a Markdown file.
- `toak history --shred` - Deletes the `history.jsonl` file entirely.
The CLI provides access to past dictations:
- `toak history`: Shows the last 10 entries.
- `-n <count>`: Shows the last `<count>` entries.
- `--grep <pattern>`: Case-insensitive search through refined text.
- `--export <file>`: Export history as a Markdown file.
- `--shred`: Securely delete the entire `history.jsonl` file.
## `toak stats` Command
Reads the `history.jsonl` file and outputs usage analytics using `Spectre.Console`.
Aggregates usage metrics from the history file:
- **Total recordings**: Count of all entries.
- **Total duration**: Cumulative time spent transcribing (in minutes).
- **Average latency**: Mean processing time per request (in seconds).
- **Most active day**: Date with the highest number of recordings.
- **Top spoken words**: The 5 most frequent words (>3 characters).
**Metrics:**
- Total recording count
- Total processing duration (sum of `DurationMs`)
- Average processing duration
- Most active day
- Most frequently used skill (if any)
## Architecture Changes
1. **`HistoryManager.cs`**: Handles thread-safe appending `HistoryEntry` to the `.jsonl` file, reading, and clearing.
2. **`DaemonService.cs`**: Calls `HistoryManager.SaveEntry` during the `ProcessStopRecordingAsync` method after text is finalized.
3. **`HistoryCommand.cs` & `StatsCommand.cs`**: CLI command definitions.
4. **`AppJsonSerializerContext.cs`**: Needs `[JsonSerializable(typeof(HistoryEntry))]`.
## Architecture
1. **`HistoryManager.cs`**: Handles thread-safe appending and reading of the `.jsonl` file.
2. **`TranscriptionOrchestrator.cs`**: Calls `HistoryManager.SaveEntry` after text is finalized.
3. **`HistoryCommand.cs` & `StatsCommand.cs`**: CLI command implementations.
4. **`AppJsonSerializerContext.cs`**: Provides AOT-compatible serialization for `HistoryEntry`.

View File

@@ -9,48 +9,48 @@ The UNIX domain socket is typically located at:
## Message Format
Clients send small byte arrays to issue commands to the server. Depending on the command, the structure ranges from a single byte to a 3-byte payload containing the command ID and configuration flags for standard output handling.
Clients send small byte arrays (1 to 3 bytes) to issue commands to the server.
### Command Bytes
| Command | Byte | Description |
|---|---|---|
| **START** | `1` | Forces the daemon to start recording. Ignored if already recording. |
| **STOP** | `2` | Forces the daemon to stop recording and begin processing the audio. Takes flags for response handling. |
| **STOP** | `2` | Forces the daemon to stop recording and begin processing. Takes flags. |
| **ABORT** | `3` | Stops audio recording and discards the buffer without making API calls. |
| **TOGGLE** | `4` | Stops recording if currently recording; starts recording if currently inactive. Takes flags for response handling. |
| **TOGGLE** | `4` | Stops recording if currently recording; starts recording if inactive. Takes flags. |
## Payload Formats
### 1-Byte Payloads (`START`, `ABORT`)
When the client only needs to trigger state changes without receiving processing results back, it sends a single byte.
Used for state changes that don't return streaming text.
```text
[ Command Byte ]
```
Example (`ABORT`): `[ 0x03 ]`
### 3-Byte Payloads (`STOP`, `TOGGLE`)
When asking the daemon to process audio, the client can specify how it wants to receive the finalized text: typed via hotkeys (default), piped to standard output (`--pipe`), or copied to the clipboard (`--copy`).
When asking the daemon to process audio, the client can specify how it wants to receive the result. The client sends exactly 3 bytes:
The client sends exactly 3 bytes:
```text
[ Command Byte ] [ Pipe Flag ] [ Copy Flag ]
```
- **Byte 0:** The command (`0x02` or `0x04`)
- **Byte 1:** Pipe to Stdout: `0x01` if enabled, `0x00` if disabled.
- **Byte 2:** Copy to Clipboard: `0x01` if enabled, `0x00` if disabled.
- **Byte 1:** **Pipe to Stdout**: `0x01` if enabled (client waits for stream), `0x00` if disabled.
- **Byte 2:** **Copy to Clipboard**: `0x01` if enabled, `0x00` if disabled.
Example (`TOGGLE` with stdout piping enabled): `[ 0x04, 0x01, 0x00 ]`
## Server Responses
Depending on the flags provided in a 3-Byte Payload:
Depending on the flags:
1. **Default (No flags set):**
The server will process the audio, handle LLM modifications, and inject the text into the user's active window using Wayland (`wtype`) or X11 (`xdotool`). The socket is closed by the server.
The server processes the audio, handles LLM refinedment, and injects the text into the user's active window using the configured backend (`wtype`, `xdotool`, or `ydotool`). The socket is closed by the server.
2. **Pipe or Copy Flag Set:**
The client will keep the connection open and wait to read the incoming text from the server.
The server will stream UTF-8 encoded text chunks back to the client as the LLM generates them. The client reads these chunks and pushes them to `stdout`. Once sending is complete, the server closes the socket.
2. **Pipe Flag Set:**
The client stays connected. The server streams UTF-8 encoded text chunks (tokens) back to the client as they are generated by the LLM. The client writes these to `stdout`. The server closes the socket when finished.
3. **Copy Flag Set:**
The server handles copying to the system clipboard internally via its `ClipboardManager`. If the Pipe flag is also set, it will stream to stdout simultaneously.

View File

@@ -12,43 +12,45 @@ Toak is designed as a fast, Linux-native dictation application utilizing C# AOT
Toak/
├── Api/
│ ├── GroqApiClient.cs # Client for external transcription and LLM API calls (Groq/Whisper)
│ ├── OpenAiCompatibleClient.cs # Generic OpenAI-compatible client for Groq and Together AI
│ └── Models/ # API payload representations
├── Assets/ # Sound files or other static resources
├── Audio/
── AudioRecorder.cs # Handles audio capture via system utilities (e.g., pw-record from PipeWire)
── AudioRecorder.cs # Handles audio capture via PipeWire (pw-record)
│ └── FfmpegAudioRecorder.cs # Universal audio capture via ffmpeg
├── Commands/
│ ├── ToggleCommand.cs # Start/stop recording and pass pipe/copy flags
│ ├── DiscardCommand.cs # Abort the current recording
│ ├── OnboardCommand.cs # Initial interactive configuration setup
│ ├── ToggleCommand.cs # Client command to start/stop recording via socket
│ ├── DiscardCommand.cs # Client command to abort current recording
│ ├── OnboardCommand.cs # Interactive configuration setup wizard
│ ├── ConfigUpdaterCommand.cs # Direct configuration modifications
│ ├── ShowCommand.cs # Display current configuration
│ ├── SkillCommand.cs # CLI controller for discovering and adding Dynamic JSON Skills
│ ├── LatencyTestCommand.cs # Benchmark tool for API calls
│ ├── HistoryCommand.cs # CLI interface to query, export, or shred past transcripts
│ └── StatsCommand.cs # CLI interface to calculate analytics from history
│ ├── SkillCommand.cs # CLI controller for managing JSON Skills
│ ├── LatencyTestCommand.cs # Pipeline benchmark tool
│ ├── HistoryCommand.cs # Interface to query past transcriptions
│ └── StatsCommand.cs # Aggregated usage analytics
├── Configuration/
│ ├── ConfigManager.cs # Loads and saves JSON configuration from the user's home folder
│ ├── ConfigManager.cs # Loads/saves JSON configuration
│ └── ToakConfig.cs # Data model for user preferences
├── Core/
│ ├── DaemonService.cs # The background daemon maintaining the socket server and handling states
│ ├── Logger.cs # Logging utility (verbose logging)
│ ├── HistoryManager.cs # Manages appending and reading the local history.jsonl
│ ├── HistoryEntry.cs # The data model for transcription history
│ ├── PromptBuilder.cs # Constructs the system prompts for the LLM based on user settings
│ ├── StateTracker.cs # Tracks the current application state (e.g. is recording active?)
│ ├── DaemonService.cs # Background daemon maintaining the socket server
│ ├── TranscriptionOrchestrator.cs # Coordinates audio recording, STT, LLM, and output
│ ├── Logger.cs # Logging utility
│ ├── HistoryManager.cs # Thread-safe history management (.jsonl)
│ ├── HistoryEntry.cs # Data model for transcription history
│ ├── PromptBuilder.cs # Constructs LLM system prompts
│ ├── StateTracker.cs # Tracks application state and recording PIDs
│ ├── Interfaces/ # Core abstractions (ILlmClient, IAudioRecorder, etc.)
│ └── Skills/ # Data-driven JSON skill integrations
│ ├── SkillDefinition.cs # JSON Model
│ ├── DynamicSkill.cs # Runtime implementation mapping LLM context to actions
│ └── SkillRegistry.cs # Loads and detects skills from ~/.config/toak/skills/
├── IO/
│ ├── ClipboardManager.cs # Cross-session (Wayland/X11) clipboard manipulation (`wl-copy`, `xclip`)
│ ├── TextInjector.cs # Native keyboard injection handling (`wtype`, `xdotool`)
│ └── Notifications.cs # System notifications (`notify-send`) and sound playback (`paplay`)
│ ├── ClipboardManager.cs # Cross-session clipboard manipulation (wl-copy, xclip)
│ ├── TextInjector.cs # Native keyboard injection (wtype, xdotool, ydotool)
│ └── Notifications.cs # System notifications and sound playback
├── Serialization/
│ └── AppJsonSerializerContext.cs # System.Text.Json source generation context for AOT support
│ └── AppJsonSerializerContext.cs # System.Text.Json source generation for AOT
├── bin/ # Compiler output
├── docs/ # Documentation
├── toak.service # systemd user service file to run the daemon automatically
├── uninstall.sh # Script to completely remove daemon, service, and binaries
├── install.sh # Native AOT build and installation script
├── toak.service # systemd user service definition
└── Program.cs # Application entry point using System.CommandLine
```