1
0

feat: Implement daemon support for piping LLM output to stdout or clipboard via an extended socket protocol and update project documentation.

This commit is contained in:
2026-02-28 12:42:41 +01:00
parent 75a6d20e0d
commit 27d7d11b63
5 changed files with 160 additions and 66 deletions

56
docs/PROTOCOL.md Normal file
View File

@@ -0,0 +1,56 @@
# Daemon Socket Protocol
Toak uses a lightweight, custom Unix Domain Socket protocol for IPC (Inter-Process Communication). This allows front-end short-lived CLI tools (like `toak toggle`) to execute instantly while the persistent state and API operations happen inside the background daemon (`toak daemon`).
## Connection
The UNIX domain socket is typically located at:
`$XDG_RUNTIME_DIR/toak.sock` (falls back to `/tmp/toak.sock` if `$XDG_RUNTIME_DIR` is not set).
## Message Format
Clients send small byte arrays to issue commands to the server. Depending on the command, the structure ranges from a single byte to a 3-byte payload containing the command ID and configuration flags for standard output handling.
### Command Bytes
| Command | Byte | Description |
|---|---|---|
| **START** | `1` | Forces the daemon to start recording. Ignored if already recording. |
| **STOP** | `2` | Forces the daemon to stop recording and begin processing the audio. Takes flags for response handling. |
| **ABORT** | `3` | Stops audio recording and discards the buffer without making API calls. |
| **TOGGLE** | `4` | Stops recording if currently recording; starts recording if currently inactive. Takes flags for response handling. |
## Payload Formats
### 1-Byte Payloads (`START`, `ABORT`)
When the client only needs to trigger state changes without receiving processing results back, it sends a single byte.
```text
[ Command Byte ]
```
Example (`ABORT`): `[ 0x03 ]`
### 3-Byte Payloads (`STOP`, `TOGGLE`)
When asking the daemon to process audio, the client can specify how it wants to receive the finalized text: typed via hotkeys (default), piped to standard output (`--pipe`), or copied to the clipboard (`--copy`).
The client sends exactly 3 bytes:
```text
[ Command Byte ] [ Pipe Flag ] [ Copy Flag ]
```
- **Byte 0:** The command (`0x02` or `0x04`)
- **Byte 1:** Pipe to Stdout: `0x01` if enabled, `0x00` if disabled.
- **Byte 2:** Copy to Clipboard: `0x01` if enabled, `0x00` if disabled.
Example (`TOGGLE` with stdout piping enabled): `[ 0x04, 0x01, 0x00 ]`
## Server Responses
Depending on the flags provided in a 3-Byte Payload:
1. **Default (No flags set):**
The server will process the audio, handle LLM modifications, and inject the text into the user's active window using Wayland (`wtype`) or X11 (`xdotool`). The socket is closed by the server.
2. **Pipe or Copy Flag Set:**
The client will keep the connection open and wait to read the incoming text from the server.
The server will stream UTF-8 encoded text chunks back to the client as the LLM generates them. The client reads these chunks and pushes them to `stdout`. Once sending is complete, the server closes the socket.

54
docs/STRUCTURE.md Normal file
View File

@@ -0,0 +1,54 @@
# Toak Project Structure
This document outlines the high-level architecture and directory structure of the Toak project to help contributors navigate the codebase.
## Overview
Toak is designed as a fast, Linux-native dictation application utilizing C# AOT (Ahead-Of-Time compilation) for minimal latency. It operates primarily as a client-daemon architecture where background application state is managed by a daemon process while short-lived CLI commands issue control messages via Unix domain sockets.
## Directory Structure
```text
Toak/
├── Api/
│ ├── GroqApiClient.cs # Client for external transcription and LLM API calls (Groq/Whisper)
│ └── Models/ # API payload representations
├── Assets/ # Sound files or other static resources
├── Audio/
│ └── AudioRecorder.cs # Handles audio capture via system utilities (e.g., ffmpeg/arecord)
├── Commands/
│ ├── ToggleCommand.cs # Start/stop recording and pass pipe/copy flags
│ ├── DiscardCommand.cs # Abort the current recording
│ ├── OnboardCommand.cs # Initial interactive configuration setup
│ ├── ConfigUpdaterCommand.cs # Direct configuration modifications
│ ├── ShowCommand.cs # Display current configuration
│ └── LatencyTestCommand.cs # Benchmark tool for API calls
├── Configuration/
│ ├── ConfigManager.cs # Loads and saves JSON configuration from the user's home folder
│ └── ToakConfig.cs # Data model for user preferences
├── Core/
│ ├── DaemonService.cs # The background daemon maintaining the socket server and handling states
│ ├── Logger.cs # Logging utility (verbose logging)
│ ├── PromptBuilder.cs # Constructs the system prompts for the LLM based on user settings
│ ├── StateTracker.cs # Tracks the current application state (e.g. is recording active?)
│ └── Skills/ # Modular capabilities (e.g., Terminal mode, Language Translation)
├── IO/
│ ├── ClipboardManager.cs # Cross-session (Wayland/X11) clipboard manipulation (`wl-copy`, `xclip`)
│ ├── TextInjector.cs # Native keyboard injection handling (`wtype`, `xdotool`)
│ └── Notifications.cs # System notifications (`notify-send`) and sound playback (`paplay`)
├── Serialization/
│ └── AppJsonSerializerContext.cs # System.Text.Json source generation context for AOT support
├── docs/ # Documentation
└── Program.cs # Application entry point using System.CommandLine
```
## Key Architectural Concepts
### The Daemon Process
The `DaemonService` (`toak daemon`) is the heart of Toak. It listens on a Unix domain socket for IPC messages. This allows `toak toggle` to execute almost instantaneously, delegating all heavy lifting and state management to an already-hot background process.
### Unix Sockets IPC
Client commands communicate with the daemon via Unix sockets. For details on the byte payloads used for communication, please refer to [PROTOCOL.md](./PROTOCOL.md).
### AOT Compilation
The project relies on Native AOT compilation (`dotnet publish -c Release -r linux-x64 --aot`) to avoid JIT-startup time on CLI executions, making `toak toggle` fast enough to bind seamlessly to hotkeys.