2.7 KiB
Daemon Socket Protocol
Toak uses a lightweight, custom Unix Domain Socket protocol for IPC (Inter-Process Communication). This allows front-end short-lived CLI tools (like toak toggle) to execute instantly while the persistent state and API operations happen inside the background daemon (toak daemon).
Connection
The UNIX domain socket is typically located at:
$XDG_RUNTIME_DIR/toak.sock (falls back to /tmp/toak.sock if $XDG_RUNTIME_DIR is not set).
Message Format
Clients send small byte arrays to issue commands to the server. Depending on the command, the structure ranges from a single byte to a 3-byte payload containing the command ID and configuration flags for standard output handling.
Command Bytes
| Command | Byte | Description |
|---|---|---|
| START | 1 |
Forces the daemon to start recording. Ignored if already recording. |
| STOP | 2 |
Forces the daemon to stop recording and begin processing the audio. Takes flags for response handling. |
| ABORT | 3 |
Stops audio recording and discards the buffer without making API calls. |
| TOGGLE | 4 |
Stops recording if currently recording; starts recording if currently inactive. Takes flags for response handling. |
Payload Formats
1-Byte Payloads (START, ABORT)
When the client only needs to trigger state changes without receiving processing results back, it sends a single byte.
[ Command Byte ]
Example (ABORT): [ 0x03 ]
3-Byte Payloads (STOP, TOGGLE)
When asking the daemon to process audio, the client can specify how it wants to receive the finalized text: typed via hotkeys (default), piped to standard output (--pipe), or copied to the clipboard (--copy).
The client sends exactly 3 bytes:
[ Command Byte ] [ Pipe Flag ] [ Copy Flag ]
- Byte 0: The command (
0x02or0x04) - Byte 1: Pipe to Stdout:
0x01if enabled,0x00if disabled. - Byte 2: Copy to Clipboard:
0x01if enabled,0x00if disabled.
Example (TOGGLE with stdout piping enabled): [ 0x04, 0x01, 0x00 ]
Server Responses
Depending on the flags provided in a 3-Byte Payload:
-
Default (No flags set): The server will process the audio, handle LLM modifications, and inject the text into the user's active window using Wayland (
wtype) or X11 (xdotool). The socket is closed by the server. -
Pipe or Copy Flag Set: The client will keep the connection open and wait to read the incoming text from the server. The server will stream UTF-8 encoded text chunks back to the client as the LLM generates them. The client reads these chunks and pushes them to
stdout. Once sending is complete, the server closes the socket.