3.5 KiB
3.5 KiB
Toak: Client-Server & PipeWire Architecture Specification
This document outlines the transition of Toak from a monolithic, ephemeral CLI application to a persistent, low-latency background daemon utilizing Linux Inter-Process Communication (IPC) and PipeWire.
1. System Architecture Overview
The system is divided into two distinct binaries to separate the heavy runtime environment from the instant-trigger mechanism.
- Toak Daemon (
toakd): A persistent C# background service. It holds the API connections, memory buffers, and audio routing open. - Toak Client (
toak): A lightweight, ephemeral trigger executed by the window manager that simply sends signals to the daemon.
2. The Toak Daemon (Server)
Built as a C# .NET Hosted Service, this component runs continuously in the background and manages three primary responsibilities:
A. Unix Domain Socket Listener
- Listens on a secure, user-space socket (e.g.,
/run/user/1000/toak.sock). - Awaits basic byte-sized instructions from the client (e.g.,
START_RECORDING,STOP_RECORDING,ABORT). - Ensures single-instance execution and rejects unauthorized cross-user connections.
B. PipeWire Audio Node
- Connects to the PipeWire graph as a native audio sink.
- Dynamically links to the default system microphone only upon receiving the
START_RECORDINGsignal. - Reads the audio stream directly into a pre-allocated C#
MemoryStreamvia memory-mapped buffers (zero-copy), requesting the exact format required by the Groq Whisper API (e.g., 16kHz, mono). - Unlinks from the microphone instantly upon receiving the
STOP_RECORDINGsignal, freeing the hardware device.
C. State & API Management
- Maintains a persistent
HttpClientconnection pool to Groq, eliminating TLS handshake overhead for each dictation. - Triggers the Wayland (
wtype) or X11 (xdotool) typing backend as a child process once the refined transcription is returned.
3. The Toak Client (Trigger)
A minimal executable designed to be fired by global window manager hotkeys (e.g., Sway, Hyprland, KDE).
- Stateless: Contains no audio logic, API keys, or large library dependencies.
- Execution: Connects to the daemon's Unix socket, writes a specific control byte, and exits immediately.
- Latency: Execution time is measured in microseconds, preventing any blocking of the desktop compositor's input thread.
4. Deployment & Lifecycle Management
The daemon is managed by the host's native init system to ensure uptime and clean restarts.
- Systemd User Service: Installed as
~/.config/systemd/user/toak.service. - Lifecycle: Starts automatically on user login (
default.target), restarts automatically on failure, and manages its own logging viajournalctl. - Environment: Inherits the active Wayland/X11 display variables necessary for the typing backends to inject keystrokes into the active window.
5. Execution Flow (The PTT Lifecycle)
- Init: User logs in. Systemd starts
toakd. It allocates memory, opens API connections, and begins listening on the Unix socket. - KeyDown: User holds the Push-to-Talk hotkey. Window manager executes
toak --start. - Link:
toakdreceives the signal over the socket and tells PipeWire to link the microphone to its internal buffer. - Dictation: User speaks. Audio fills the C#
MemoryStream. - KeyUp: User releases the hotkey. Window manager executes
toak --stop. - Unlink & Send:
toakdunlinks the microphone, flushes the memory buffer directly to the Groq API, receives the transcription, and executes the typing backend.