feat: Add Together AI and FFmpeg support, introduce core orchestration, and update documentation and install scripts.
This commit is contained in:
@@ -9,48 +9,48 @@ The UNIX domain socket is typically located at:
|
||||
|
||||
## Message Format
|
||||
|
||||
Clients send small byte arrays to issue commands to the server. Depending on the command, the structure ranges from a single byte to a 3-byte payload containing the command ID and configuration flags for standard output handling.
|
||||
Clients send small byte arrays (1 to 3 bytes) to issue commands to the server.
|
||||
|
||||
### Command Bytes
|
||||
|
||||
| Command | Byte | Description |
|
||||
|---|---|---|
|
||||
| **START** | `1` | Forces the daemon to start recording. Ignored if already recording. |
|
||||
| **STOP** | `2` | Forces the daemon to stop recording and begin processing the audio. Takes flags for response handling. |
|
||||
| **STOP** | `2` | Forces the daemon to stop recording and begin processing. Takes flags. |
|
||||
| **ABORT** | `3` | Stops audio recording and discards the buffer without making API calls. |
|
||||
| **TOGGLE** | `4` | Stops recording if currently recording; starts recording if currently inactive. Takes flags for response handling. |
|
||||
| **TOGGLE** | `4` | Stops recording if currently recording; starts recording if inactive. Takes flags. |
|
||||
|
||||
## Payload Formats
|
||||
|
||||
### 1-Byte Payloads (`START`, `ABORT`)
|
||||
When the client only needs to trigger state changes without receiving processing results back, it sends a single byte.
|
||||
|
||||
Used for state changes that don't return streaming text.
|
||||
```text
|
||||
[ Command Byte ]
|
||||
```
|
||||
Example (`ABORT`): `[ 0x03 ]`
|
||||
|
||||
### 3-Byte Payloads (`STOP`, `TOGGLE`)
|
||||
When asking the daemon to process audio, the client can specify how it wants to receive the finalized text: typed via hotkeys (default), piped to standard output (`--pipe`), or copied to the clipboard (`--copy`).
|
||||
When asking the daemon to process audio, the client can specify how it wants to receive the result. The client sends exactly 3 bytes:
|
||||
|
||||
The client sends exactly 3 bytes:
|
||||
```text
|
||||
[ Command Byte ] [ Pipe Flag ] [ Copy Flag ]
|
||||
```
|
||||
|
||||
- **Byte 0:** The command (`0x02` or `0x04`)
|
||||
- **Byte 1:** Pipe to Stdout: `0x01` if enabled, `0x00` if disabled.
|
||||
- **Byte 2:** Copy to Clipboard: `0x01` if enabled, `0x00` if disabled.
|
||||
- **Byte 1:** **Pipe to Stdout**: `0x01` if enabled (client waits for stream), `0x00` if disabled.
|
||||
- **Byte 2:** **Copy to Clipboard**: `0x01` if enabled, `0x00` if disabled.
|
||||
|
||||
Example (`TOGGLE` with stdout piping enabled): `[ 0x04, 0x01, 0x00 ]`
|
||||
|
||||
## Server Responses
|
||||
|
||||
Depending on the flags provided in a 3-Byte Payload:
|
||||
Depending on the flags:
|
||||
|
||||
1. **Default (No flags set):**
|
||||
The server will process the audio, handle LLM modifications, and inject the text into the user's active window using Wayland (`wtype`) or X11 (`xdotool`). The socket is closed by the server.
|
||||
The server processes the audio, handles LLM refinedment, and injects the text into the user's active window using the configured backend (`wtype`, `xdotool`, or `ydotool`). The socket is closed by the server.
|
||||
|
||||
2. **Pipe or Copy Flag Set:**
|
||||
The client will keep the connection open and wait to read the incoming text from the server.
|
||||
The server will stream UTF-8 encoded text chunks back to the client as the LLM generates them. The client reads these chunks and pushes them to `stdout`. Once sending is complete, the server closes the socket.
|
||||
2. **Pipe Flag Set:**
|
||||
The client stays connected. The server streams UTF-8 encoded text chunks (tokens) back to the client as they are generated by the LLM. The client writes these to `stdout`. The server closes the socket when finished.
|
||||
|
||||
3. **Copy Flag Set:**
|
||||
The server handles copying to the system clipboard internally via its `ClipboardManager`. If the Pipe flag is also set, it will stream to stdout simultaneously.
|
||||
|
||||
Reference in New Issue
Block a user