# Daemon Socket Protocol Toak uses a lightweight, custom Unix Domain Socket protocol for IPC (Inter-Process Communication). This allows front-end short-lived CLI tools (like `toak toggle`) to execute instantly while the persistent state and API operations happen inside the background daemon (`toak daemon`). ## Connection The UNIX domain socket is typically located at: `$XDG_RUNTIME_DIR/toak.sock` (falls back to `/tmp/toak.sock` if `$XDG_RUNTIME_DIR` is not set). ## Message Format Clients send small byte arrays (1 to 3 bytes) to issue commands to the server. ### Command Bytes | Command | Byte | Description | |---|---|---| | **START** | `1` | Forces the daemon to start recording. Ignored if already recording. | | **STOP** | `2` | Forces the daemon to stop recording and begin processing. Takes flags. | | **ABORT** | `3` | Stops audio recording and discards the buffer without making API calls. | | **TOGGLE** | `4` | Stops recording if currently recording; starts recording if inactive. Takes flags. | | **STATUS** | `5` | Queries the daemon for its current recording status (idle or recording). Takes a JSON flag. | ## Payload Formats ### 1-Byte Payloads (`START`, `ABORT`) Used for state changes that don't return streaming text. ```text [ Command Byte ] ``` Example (`ABORT`): `[ 0x03 ]` ### 2-Byte Payloads (`STATUS`) When asking the daemon for its status, the client specifies whether it wants a JSON response. ```text [ Command Byte ] [ JSON Flag ] ``` Example (`STATUS` with JSON): `[ 0x05, 0x01 ]` ### 3-Byte Payloads (`STOP`, `TOGGLE`) When asking the daemon to process audio, the client can specify how it wants to receive the result. The client sends exactly 3 bytes: ```text [ Command Byte ] [ Pipe Flag ] [ Copy Flag ] ``` - **Byte 0:** The command (`0x02` or `0x04`) - **Byte 1:** **Pipe to Stdout**: `0x01` if enabled (client waits for stream), `0x00` if disabled. - **Byte 2:** **Copy to Clipboard**: `0x01` if enabled, `0x00` if disabled. Example (`TOGGLE` with stdout piping enabled): `[ 0x04, 0x01, 0x00 ]` ## Server Responses Depending on the flags: 1. **Default (No flags set):** The server processes the audio, handles LLM refinedment, and injects the text into the user's active window using the configured backend (`wtype`, `xdotool`, or `ydotool`). The socket is closed by the server. 2. **Pipe Flag Set:** The client stays connected. The server streams UTF-8 encoded text chunks (tokens) back to the client as they are generated by the LLM. The client writes these to `stdout`. The server closes the socket when finished. 3. **Copy Flag Set:** The server handles copying to the system clipboard internally via its `ClipboardManager`. If the Pipe flag is also set, it will stream to stdout simultaneously.