1
0
Files
Toak/DAEMON_PLAN.md

3.5 KiB

Toak: Client-Server & PipeWire Architecture Specification

This document outlines the transition of Toak from a monolithic, ephemeral CLI application to a persistent, low-latency background daemon utilizing Linux Inter-Process Communication (IPC) and PipeWire.

1. System Architecture Overview

The system is divided into two distinct binaries to separate the heavy runtime environment from the instant-trigger mechanism.

  • Toak Daemon (toakd): A persistent C# background service. It holds the API connections, memory buffers, and audio routing open.
  • Toak Client (toak): A lightweight, ephemeral trigger executed by the window manager that simply sends signals to the daemon.

2. The Toak Daemon (Server)

Built as a C# .NET Hosted Service, this component runs continuously in the background and manages three primary responsibilities:

A. Unix Domain Socket Listener

  • Listens on a secure, user-space socket (e.g., /run/user/1000/toak.sock).
  • Awaits basic byte-sized instructions from the client (e.g., START_RECORDING, STOP_RECORDING, ABORT).
  • Ensures single-instance execution and rejects unauthorized cross-user connections.

B. PipeWire Audio Node

  • Connects to the PipeWire graph as a native audio sink.
  • Dynamically links to the default system microphone only upon receiving the START_RECORDING signal.
  • Reads the audio stream directly into a pre-allocated C# MemoryStream via memory-mapped buffers (zero-copy), requesting the exact format required by the Groq Whisper API (e.g., 16kHz, mono).
  • Unlinks from the microphone instantly upon receiving the STOP_RECORDING signal, freeing the hardware device.

C. State & API Management

  • Maintains a persistent HttpClient connection pool to Groq, eliminating TLS handshake overhead for each dictation.
  • Triggers the Wayland (wtype) or X11 (xdotool) typing backend as a child process once the refined transcription is returned.

3. The Toak Client (Trigger)

A minimal executable designed to be fired by global window manager hotkeys (e.g., Sway, Hyprland, KDE).

  • Stateless: Contains no audio logic, API keys, or large library dependencies.
  • Execution: Connects to the daemon's Unix socket, writes a specific control byte, and exits immediately.
  • Latency: Execution time is measured in microseconds, preventing any blocking of the desktop compositor's input thread.

4. Deployment & Lifecycle Management

The daemon is managed by the host's native init system to ensure uptime and clean restarts.

  • Systemd User Service: Installed as ~/.config/systemd/user/toak.service.
  • Lifecycle: Starts automatically on user login (default.target), restarts automatically on failure, and manages its own logging via journalctl.
  • Environment: Inherits the active Wayland/X11 display variables necessary for the typing backends to inject keystrokes into the active window.

5. Execution Flow (The PTT Lifecycle)

  1. Init: User logs in. Systemd starts toakd. It allocates memory, opens API connections, and begins listening on the Unix socket.
  2. KeyDown: User holds the Push-to-Talk hotkey. Window manager executes toak --start.
  3. Link: toakd receives the signal over the socket and tells PipeWire to link the microphone to its internal buffer.
  4. Dictation: User speaks. Audio fills the C# MemoryStream.
  5. KeyUp: User releases the hotkey. Window manager executes toak --stop.
  6. Unlink & Send: toakd unlinks the microphone, flushes the memory buffer directly to the Groq API, receives the transcription, and executes the typing backend.