feat: Introduce explicit start, stop, and status commands, including minimum recording duration.
This commit is contained in:
288
IDEAS.md
288
IDEAS.md
@@ -4,282 +4,12 @@ A curated list of CLI-native features to enhance the dictation workflow.
|
||||
|
||||
---
|
||||
|
||||
## Core Workflow Additions
|
||||
|
||||
### `toak history [-n N]`
|
||||
Display recent transcriptions with timestamps. Use `-n 1` to replay the last result.
|
||||
|
||||
**Use case:**
|
||||
- `toak history` - Show last 10 transcriptions
|
||||
- `toak history -n 5` - Show last 5
|
||||
- `toak history -n 1` - Show most recent (equivalent to a "last" command)
|
||||
|
||||
**Storage:** Append to `~/.local/share/toak/history.jsonl` on each successful transcription:
|
||||
```json
|
||||
{"timestamp":"2025-01-15T09:23:00Z","raw":"hello world","refined":"Hello world."}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
## Configuration Profiles
|
||||
|
||||
### `toak profile <name>` / `toak profile`
|
||||
Switch between prompt presets instantly.
|
||||
|
||||
**Built-in profiles:**
|
||||
- `default` - Current behavior
|
||||
- `code` - Technical mode: preserves indentation, brackets, camelCase
|
||||
- `email` - Professional mode with formal tone
|
||||
- `notes` - Concise mode, bullet points enabled
|
||||
- `social` - Casual mode, emoji allowed
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
toak profile code # Switch to code preset
|
||||
toak profile # Show current profile
|
||||
toak profiles # List available profiles
|
||||
```
|
||||
|
||||
**Storage:** `~/.config/toak/profiles/<name>.json` - Each file is a complete `ToakConfig` override.
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
## History Management
|
||||
|
||||
### `toak stats`
|
||||
Display usage statistics and analytics.
|
||||
|
||||
```bash
|
||||
$ toak stats
|
||||
Total recordings: 342
|
||||
Total duration: 4h 23m
|
||||
Average length: 45s
|
||||
Most active day: 2025-01-10 (23 recordings)
|
||||
Top words: "implementation", "refactor", "meeting"
|
||||
```
|
||||
|
||||
**Metrics tracked:**
|
||||
- Total recordings count
|
||||
- Total/average/min/max duration
|
||||
- Daily/weekly activity
|
||||
- Most common words (from refined text)
|
||||
- API usage estimates
|
||||
|
||||
---
|
||||
|
||||
### `toak history --export <file>`
|
||||
Export transcription history to various formats.
|
||||
|
||||
```bash
|
||||
toak history --export notes.md # Markdown format
|
||||
toak history --export log.txt # Plain text
|
||||
toak history --export data.json # Full JSON dump
|
||||
```
|
||||
|
||||
**Markdown format example:**
|
||||
```markdown
|
||||
# Toak Transcriptions - 2025-01-15
|
||||
|
||||
## 09:23:00
|
||||
We need to fix the API endpoint.
|
||||
|
||||
## 09:45:12
|
||||
- Review the pull request
|
||||
- Update documentation
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `toak history --grep <pattern>`
|
||||
Search through transcription history.
|
||||
|
||||
```bash
|
||||
toak history --grep "API" # Find all mentions of API
|
||||
toak history --grep "TODO" -n 5 # Last 5 occurrences of "TODO"
|
||||
toak history --grep "refactor" --raw # Search raw transcripts instead
|
||||
```
|
||||
|
||||
**Output format:**
|
||||
```
|
||||
2025-01-15 09:23:00 We need to fix the API endpoint.
|
||||
2025-01-15 14:12:33 The API response time is too slow.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `toak history --shred`
|
||||
Securely delete transcription history.
|
||||
|
||||
```bash
|
||||
toak history --shred # Delete entire history file
|
||||
toak history --shred -n 5 # Delete last 5 entries only
|
||||
toak history --shred --raw # Also delete archived raw audio files
|
||||
```
|
||||
|
||||
**Security:** Overwrites data before deletion (optional), removes from disk.
|
||||
|
||||
---
|
||||
|
||||
## Advanced Architecture
|
||||
|
||||
### `toak daemon` / `toak stop-daemon`
|
||||
Background service mode for reduced latency. The CLI interface stays identical, but work is offloaded to a persistent process.
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
┌─────────────┐ Unix Socket ┌─────────────────────────────┐
|
||||
│ toak CLI │ ───────────────────► │ toakd │
|
||||
│ (client) │ │ (background daemon) │
|
||||
│ Exits │ ◄──── Ack + Exit ──── │ - Long-running process │
|
||||
│ Instantly │ │ - Hot HttpClient pool │
|
||||
└─────────────┘ │ - Config cached in memory │
|
||||
│ - Manages ffmpeg lifecycle │
|
||||
└─────────────────────────────┘
|
||||
```
|
||||
|
||||
**CLI stays the same:**
|
||||
```bash
|
||||
toak toggle # Client sends "start" to daemon, exits (~10ms)
|
||||
# ... recording happens ...
|
||||
toak toggle # Client sends "stop" to daemon, exits (~10ms)
|
||||
# Daemon continues: upload → transcribe → refine → type
|
||||
```
|
||||
|
||||
**Why it's faster (without AOT):**
|
||||
|
||||
| Operation | Current | Daemon | Savings |
|
||||
|-----------|---------|--------|---------|
|
||||
| JIT compilation | 150ms | 0ms | 150ms |
|
||||
| Assembly loading | 50ms | 0ms | 50ms |
|
||||
| DNS lookup | 40ms | 0ms | 40ms |
|
||||
| TLS handshake | 80ms | 0ms | 80ms |
|
||||
| Config read | 10ms | 0ms | 10ms |
|
||||
| **Total** | **~330ms** | **~10ms** | **~320ms** |
|
||||
|
||||
**Why it's still faster (with AOT):**
|
||||
|
||||
AOT eliminates JIT/assembly overhead, but not everything:
|
||||
|
||||
| Operation | AOT Binary | AOT Daemon | Savings |
|
||||
|-----------|------------|------------|---------|
|
||||
| Process startup | 20ms | 0ms | 20ms |
|
||||
| DNS lookup | 40ms | 0ms | 40ms |
|
||||
| TLS handshake | 80ms | 0ms | 80ms |
|
||||
| Config read | 5ms | 0ms | 5ms |
|
||||
| **Total** | **~145ms** | **~10ms** | **~135ms** |
|
||||
|
||||
**Verdict with AOT:**
|
||||
- Without daemon: Each toggle takes ~145ms before network call starts
|
||||
- With daemon: Each toggle takes ~10ms (just socket IPC)
|
||||
- The daemon still saves ~135ms, but it's less critical than without AOT
|
||||
|
||||
**Trade-offs:**
|
||||
- **Pro:** Faster hotkey response, persistent connections, shared state
|
||||
- **Con:** Added complexity (process management, crash recovery, socket IPC)
|
||||
- **Con:** Debugging harder when logic lives in daemon
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
toak daemon # Start background service
|
||||
toak stop-daemon # Shutdown background service
|
||||
toak status # Check if daemon is running
|
||||
```
|
||||
|
||||
**Implementation notes:**
|
||||
- Socket path: `/tmp/toakd.sock` or `$XDG_RUNTIME_DIR/toakd.sock`
|
||||
- Protocol: Simple line-based or JSON messages
|
||||
- Daemon writes PID to `/tmp/toakd.pid` for status checks
|
||||
- Client binary checks for daemon on startup; can auto-start or error
|
||||
|
||||
---
|
||||
|
||||
## Future Innovations
|
||||
|
||||
### Hotword Commands (LLM Routing)
|
||||
Instruct the LLM in `PromptBuilder` to output a specific JSON structure if given a command phrase. If a specific hotword like "System command" or "Computer dictate" is detected at the start of the audio, Toak parses the JSON, skips typng out via `xdotool`/`wtype`, and instead executes a pre-defined background action.
|
||||
|
||||
If it doesn't hear a command phrase, it simply returns the text normally and types it.
|
||||
|
||||
**How it works (Under the Hood):**
|
||||
The LLM is prompted to always return JSON in the background when a command is directed at the assistant.
|
||||
```json
|
||||
{
|
||||
"is_command": true,
|
||||
"action": "append_to_notes",
|
||||
"content": "Buy milk and eggs",
|
||||
"meta": {}
|
||||
}
|
||||
```
|
||||
|
||||
**Alternative Hotword Ideas:**
|
||||
Since "Toak" is not a real English word, Whisper might transcribe it as "talk", "toke", or "oak." It is highly recommended to use distinct, phonetically clear hotwords such as:
|
||||
- **"System..."** (e.g. "System note:")
|
||||
- **"Computer..."** (e.g. "Computer search:")
|
||||
- **"Action..."** (e.g. "Action commit:")
|
||||
- **"Dictate..."** (e.g. "Dictate terminal:")
|
||||
- **"Listen up..."** (e.g. "Listen up translate...")
|
||||
|
||||
**Prompt Ideas & Use Cases:**
|
||||
|
||||
1. **Quick Notes / Brainstorming:**
|
||||
- *Hotword:* `"System note:"` or `"Drop this in my notes:"`
|
||||
- *Action:* Appends the spoken text to a configured `~/notes.md` file in the background without interrupting your current window.
|
||||
- *Example:* "System note: I need to remember to check the database migrations later today."
|
||||
|
||||
2. **Terminal / CLI Execution:**
|
||||
- *Hotword:* `"Computer terminal:"` or `"Command:"`
|
||||
- *Action:* Takes the natural language command, asks the LLM to translate it into a bash command, and types it into a new tmux window or background process.
|
||||
- *Example:* "Computer terminal: find all python files modified in the last 2 days."
|
||||
|
||||
3. **Git Commit Messages:**
|
||||
- *Hotword:* `"Action commit:"`
|
||||
- *Action:* Automatically formats the dictated text into a standard conventional commit message, stages all files, and commits them.
|
||||
- *Example:* "Action commit: I refactored the audio recorder to use native processes instead of the old library." -> LLM outputs `refactor(audio): migrate to native processes` and runs `git commit -am "..."`.
|
||||
|
||||
4. **Web Search / Lookup:**
|
||||
- *Hotword:* `"System search:"` or `"Look up:"`
|
||||
- *Action:* Opens your default browser and performs a search for the spoken phrase.
|
||||
- *Example:* "System search: MDN documentation for grid layout."
|
||||
|
||||
5. **Translating on the fly:**
|
||||
- *Hotword:* `"Translate to Spanish:"`
|
||||
- *Action:* Instead of typing English, it types the translated version of the rest of the sentence.
|
||||
- *Example:* "Translate to Spanish: Hello, how are you today?" -> Types out `Hola, ¿cómo estás hoy?`.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
### Tier 1: High Impact, Low Effort
|
||||
*(All Tier 1 items have been implemented!)*
|
||||
|
||||
### Tier 2: Medium Effort (Requires History Storage)
|
||||
4. `toak history` with `--export`, `--grep`, `--shred` flags
|
||||
5. `toak stats` - Analytics aggregation
|
||||
6. `toak copy` - Clipboard integration
|
||||
|
||||
### Tier 3: Higher Complexity
|
||||
7. `toak profile` - Config presets
|
||||
8. `toak daemon` - Background service architecture
|
||||
|
||||
---
|
||||
|
||||
## Technical Notes
|
||||
|
||||
**History Storage:**
|
||||
- Use JSON Lines format (`.jsonl`) for append-only log
|
||||
- Rotate at 5000 entries or 30 days
|
||||
- Store both raw and refined text for debugging
|
||||
|
||||
|
||||
**Pipe Detection in C#:**
|
||||
```csharp
|
||||
if (Console.IsOutputRedirected || args.Contains("--pipe"))
|
||||
{
|
||||
Console.WriteLine(refinedText);
|
||||
}
|
||||
```
|
||||
## 2. Bring Your Own Script (BYOS) Window Context
|
||||
Allow the LLM's system prompt to dynamically adapt based on the currently focused application (e.g., formatting as code for `alacritty`, or conversational for `discord`).
|
||||
* **Implementation:** Add a `ContextCommand` string to `ToakConfig.cs` (e.g., `~/.config/toak/get-window.sh`). Toak executes this script via `Process.Start` and appends the `stdout` result to the LLM prompt.
|
||||
* **Rationale:** Avoids bloating the tool with integrations for dozens of display managers and window managers. Users write their own one-liners (`hyprctl activewindow`, `xprop`, etc.), keeping Toak lean.
|
||||
|
||||
## 3. In-Place Editing via Primary Selection
|
||||
Transform Toak from a pure dictation tool into an inline text-editing copilot by reading the Linux "Primary" clipboard (currently highlighted text).
|
||||
* **Implementation:** Before sending the prompt to the LLM, attempt to read the primary selection via `wl-paste -p` (Wayland) or `xclip -o -selection primary` (X11). If text exists, pass it to the LLM as context alongside the dictated audio.
|
||||
* **Rationale:** Allows users to highlight a paragraph, hit the dictation key, and say "Translate this to Spanish" or "Make this sound more professional," replacing the highlighted text via the typing backend.
|
||||
|
||||
Reference in New Issue
Block a user