57 lines
3.5 KiB
Markdown
57 lines
3.5 KiB
Markdown
# Toak: Client-Server & PipeWire Architecture Specification
|
|
|
|
This document outlines the transition of Toak from a monolithic, ephemeral CLI application to a persistent, low-latency background daemon utilizing Linux Inter-Process Communication (IPC) and PipeWire.
|
|
|
|
## 1. System Architecture Overview
|
|
|
|
The system is divided into two distinct binaries to separate the heavy runtime environment from the instant-trigger mechanism.
|
|
|
|
* **Toak Daemon (`toakd`):** A persistent C# background service. It holds the API connections, memory buffers, and audio routing open.
|
|
* **Toak Client (`toak`):** A lightweight, ephemeral trigger executed by the window manager that simply sends signals to the daemon.
|
|
|
|
## 2. The Toak Daemon (Server)
|
|
|
|
Built as a C# `.NET Hosted Service`, this component runs continuously in the background and manages three primary responsibilities:
|
|
|
|
### A. Unix Domain Socket Listener
|
|
|
|
* Listens on a secure, user-space socket (e.g., `/run/user/1000/toak.sock`).
|
|
* Awaits basic byte-sized instructions from the client (e.g., `START_RECORDING`, `STOP_RECORDING`, `ABORT`).
|
|
* Ensures single-instance execution and rejects unauthorized cross-user connections.
|
|
|
|
### B. PipeWire Audio Node
|
|
|
|
* Connects to the PipeWire graph as a native audio sink.
|
|
* Dynamically links to the default system microphone *only* upon receiving the `START_RECORDING` signal.
|
|
* Reads the audio stream directly into a pre-allocated C# `MemoryStream` via memory-mapped buffers (zero-copy), requesting the exact format required by the Groq Whisper API (e.g., 16kHz, mono).
|
|
* Unlinks from the microphone instantly upon receiving the `STOP_RECORDING` signal, freeing the hardware device.
|
|
|
|
### C. State & API Management
|
|
|
|
* Maintains a persistent `HttpClient` connection pool to Groq, eliminating TLS handshake overhead for each dictation.
|
|
* Triggers the Wayland (`wtype`) or X11 (`xdotool`) typing backend as a child process once the refined transcription is returned.
|
|
|
|
## 3. The Toak Client (Trigger)
|
|
|
|
A minimal executable designed to be fired by global window manager hotkeys (e.g., Sway, Hyprland, KDE).
|
|
|
|
* **Stateless:** Contains no audio logic, API keys, or large library dependencies.
|
|
* **Execution:** Connects to the daemon's Unix socket, writes a specific control byte, and exits immediately.
|
|
* **Latency:** Execution time is measured in microseconds, preventing any blocking of the desktop compositor's input thread.
|
|
|
|
## 4. Deployment & Lifecycle Management
|
|
|
|
The daemon is managed by the host's native init system to ensure uptime and clean restarts.
|
|
|
|
* **Systemd User Service:** Installed as `~/.config/systemd/user/toak.service`.
|
|
* **Lifecycle:** Starts automatically on user login (`default.target`), restarts automatically on failure, and manages its own logging via `journalctl`.
|
|
* **Environment:** Inherits the active Wayland/X11 display variables necessary for the typing backends to inject keystrokes into the active window.
|
|
|
|
## 5. Execution Flow (The PTT Lifecycle)
|
|
|
|
1. **Init:** User logs in. Systemd starts `toakd`. It allocates memory, opens API connections, and begins listening on the Unix socket.
|
|
2. **KeyDown:** User holds the Push-to-Talk hotkey. Window manager executes `toak --start`.
|
|
3. **Link:** `toakd` receives the signal over the socket and tells PipeWire to link the microphone to its internal buffer.
|
|
4. **Dictation:** User speaks. Audio fills the C# `MemoryStream`.
|
|
5. **KeyUp:** User releases the hotkey. Window manager executes `toak --stop`.
|
|
6. **Unlink & Send:** `toakd` unlinks the microphone, flushes the memory buffer directly to the Groq API, receives the transcription, and executes the typing backend. |