1
0
Files
Toak/docs/PROTOCOL.md

2.7 KiB

Daemon Socket Protocol

Toak uses a lightweight, custom Unix Domain Socket protocol for IPC (Inter-Process Communication). This allows front-end short-lived CLI tools (like toak toggle) to execute instantly while the persistent state and API operations happen inside the background daemon (toak daemon).

Connection

The UNIX domain socket is typically located at: $XDG_RUNTIME_DIR/toak.sock (falls back to /tmp/toak.sock if $XDG_RUNTIME_DIR is not set).

Message Format

Clients send small byte arrays (1 to 3 bytes) to issue commands to the server.

Command Bytes

Command Byte Description
START 1 Forces the daemon to start recording. Ignored if already recording.
STOP 2 Forces the daemon to stop recording and begin processing. Takes flags.
ABORT 3 Stops audio recording and discards the buffer without making API calls.
TOGGLE 4 Stops recording if currently recording; starts recording if inactive. Takes flags.
STATUS 5 Queries the daemon for its current recording status (idle or recording). Takes a JSON flag.

Payload Formats

1-Byte Payloads (START, ABORT)

Used for state changes that don't return streaming text.

[ Command Byte ]

Example (ABORT): [ 0x03 ]

2-Byte Payloads (STATUS)

When asking the daemon for its status, the client specifies whether it wants a JSON response.

[ Command Byte ] [ JSON Flag ]

Example (STATUS with JSON): [ 0x05, 0x01 ]

3-Byte Payloads (STOP, TOGGLE)

When asking the daemon to process audio, the client can specify how it wants to receive the result. The client sends exactly 3 bytes:

[ Command Byte ] [ Pipe Flag ] [ Copy Flag ]
  • Byte 0: The command (0x02 or 0x04)
  • Byte 1: Pipe to Stdout: 0x01 if enabled (client waits for stream), 0x00 if disabled.
  • Byte 2: Copy to Clipboard: 0x01 if enabled, 0x00 if disabled.

Example (TOGGLE with stdout piping enabled): [ 0x04, 0x01, 0x00 ]

Server Responses

Depending on the flags:

  1. Default (No flags set): The server processes the audio, handles LLM refinedment, and injects the text into the user's active window using the configured backend (wtype, xdotool, or ydotool). The socket is closed by the server.

  2. Pipe Flag Set: The client stays connected. The server streams UTF-8 encoded text chunks (tokens) back to the client as they are generated by the LLM. The client writes these to stdout. The server closes the socket when finished.

  3. Copy Flag Set: The server handles copying to the system clipboard internally via its ClipboardManager. If the Pipe flag is also set, it will stream to stdout simultaneously.