9.3 KiB
Feature Ideas for Toak
A curated list of CLI-native features to enhance the dictation workflow.
Core Workflow Additions
toak history [-n N]
Display recent transcriptions with timestamps. Use -n 1 to replay the last result.
Use case:
toak history- Show last 10 transcriptionstoak history -n 5- Show last 5toak history -n 1- Show most recent (equivalent to a "last" command)
Storage: Append to ~/.local/share/toak/history.jsonl on each successful transcription:
{"timestamp":"2025-01-15T09:23:00Z","raw":"hello world","refined":"Hello world."}
Configuration Profiles
toak profile <name> / toak profile
Switch between prompt presets instantly.
Built-in profiles:
default- Current behaviorcode- Technical mode: preserves indentation, brackets, camelCaseemail- Professional mode with formal tonenotes- Concise mode, bullet points enabledsocial- Casual mode, emoji allowed
Usage:
toak profile code # Switch to code preset
toak profile # Show current profile
toak profiles # List available profiles
Storage: ~/.config/toak/profiles/<name>.json - Each file is a complete ToakConfig override.
History Management
toak stats
Display usage statistics and analytics.
$ toak stats
Total recordings: 342
Total duration: 4h 23m
Average length: 45s
Most active day: 2025-01-10 (23 recordings)
Top words: "implementation", "refactor", "meeting"
Metrics tracked:
- Total recordings count
- Total/average/min/max duration
- Daily/weekly activity
- Most common words (from refined text)
- API usage estimates
toak history --export <file>
Export transcription history to various formats.
toak history --export notes.md # Markdown format
toak history --export log.txt # Plain text
toak history --export data.json # Full JSON dump
Markdown format example:
# Toak Transcriptions - 2025-01-15
## 09:23:00
We need to fix the API endpoint.
## 09:45:12
- Review the pull request
- Update documentation
toak history --grep <pattern>
Search through transcription history.
toak history --grep "API" # Find all mentions of API
toak history --grep "TODO" -n 5 # Last 5 occurrences of "TODO"
toak history --grep "refactor" --raw # Search raw transcripts instead
Output format:
2025-01-15 09:23:00 We need to fix the API endpoint.
2025-01-15 14:12:33 The API response time is too slow.
toak history --shred
Securely delete transcription history.
toak history --shred # Delete entire history file
toak history --shred -n 5 # Delete last 5 entries only
toak history --shred --raw # Also delete archived raw audio files
Security: Overwrites data before deletion (optional), removes from disk.
Advanced Architecture
toak daemon / toak stop-daemon
Background service mode for reduced latency. The CLI interface stays identical, but work is offloaded to a persistent process.
Architecture:
┌─────────────┐ Unix Socket ┌─────────────────────────────┐
│ toak CLI │ ───────────────────► │ toakd │
│ (client) │ │ (background daemon) │
│ Exits │ ◄──── Ack + Exit ──── │ - Long-running process │
│ Instantly │ │ - Hot HttpClient pool │
└─────────────┘ │ - Config cached in memory │
│ - Manages ffmpeg lifecycle │
└─────────────────────────────┘
CLI stays the same:
toak toggle # Client sends "start" to daemon, exits (~10ms)
# ... recording happens ...
toak toggle # Client sends "stop" to daemon, exits (~10ms)
# Daemon continues: upload → transcribe → refine → type
Why it's faster (without AOT):
| Operation | Current | Daemon | Savings |
|---|---|---|---|
| JIT compilation | 150ms | 0ms | 150ms |
| Assembly loading | 50ms | 0ms | 50ms |
| DNS lookup | 40ms | 0ms | 40ms |
| TLS handshake | 80ms | 0ms | 80ms |
| Config read | 10ms | 0ms | 10ms |
| Total | ~330ms | ~10ms | ~320ms |
Why it's still faster (with AOT):
AOT eliminates JIT/assembly overhead, but not everything:
| Operation | AOT Binary | AOT Daemon | Savings |
|---|---|---|---|
| Process startup | 20ms | 0ms | 20ms |
| DNS lookup | 40ms | 0ms | 40ms |
| TLS handshake | 80ms | 0ms | 80ms |
| Config read | 5ms | 0ms | 5ms |
| Total | ~145ms | ~10ms | ~135ms |
Verdict with AOT:
- Without daemon: Each toggle takes ~145ms before network call starts
- With daemon: Each toggle takes ~10ms (just socket IPC)
- The daemon still saves ~135ms, but it's less critical than without AOT
Trade-offs:
- Pro: Faster hotkey response, persistent connections, shared state
- Con: Added complexity (process management, crash recovery, socket IPC)
- Con: Debugging harder when logic lives in daemon
Usage:
toak daemon # Start background service
toak stop-daemon # Shutdown background service
toak status # Check if daemon is running
Implementation notes:
- Socket path:
/tmp/toakd.sockor$XDG_RUNTIME_DIR/toakd.sock - Protocol: Simple line-based or JSON messages
- Daemon writes PID to
/tmp/toakd.pidfor status checks - Client binary checks for daemon on startup; can auto-start or error
Future Innovations
Hotword Commands (LLM Routing)
Instruct the LLM in PromptBuilder to output a specific JSON structure if given a command phrase. If a specific hotword like "System command" or "Computer dictate" is detected at the start of the audio, Toak parses the JSON, skips typng out via xdotool/wtype, and instead executes a pre-defined background action.
If it doesn't hear a command phrase, it simply returns the text normally and types it.
How it works (Under the Hood): The LLM is prompted to always return JSON in the background when a command is directed at the assistant.
{
"is_command": true,
"action": "append_to_notes",
"content": "Buy milk and eggs",
"meta": {}
}
Alternative Hotword Ideas: Since "Toak" is not a real English word, Whisper might transcribe it as "talk", "toke", or "oak." It is highly recommended to use distinct, phonetically clear hotwords such as:
- "System..." (e.g. "System note:")
- "Computer..." (e.g. "Computer search:")
- "Action..." (e.g. "Action commit:")
- "Dictate..." (e.g. "Dictate terminal:")
- "Listen up..." (e.g. "Listen up translate...")
Prompt Ideas & Use Cases:
-
Quick Notes / Brainstorming:
- Hotword:
"System note:"or"Drop this in my notes:" - Action: Appends the spoken text to a configured
~/notes.mdfile in the background without interrupting your current window. - Example: "System note: I need to remember to check the database migrations later today."
- Hotword:
-
Terminal / CLI Execution:
- Hotword:
"Computer terminal:"or"Command:" - Action: Takes the natural language command, asks the LLM to translate it into a bash command, and types it into a new tmux window or background process.
- Example: "Computer terminal: find all python files modified in the last 2 days."
- Hotword:
-
Git Commit Messages:
- Hotword:
"Action commit:" - Action: Automatically formats the dictated text into a standard conventional commit message, stages all files, and commits them.
- Example: "Action commit: I refactored the audio recorder to use native processes instead of the old library." -> LLM outputs
refactor(audio): migrate to native processesand runsgit commit -am "...".
- Hotword:
-
Web Search / Lookup:
- Hotword:
"System search:"or"Look up:" - Action: Opens your default browser and performs a search for the spoken phrase.
- Example: "System search: MDN documentation for grid layout."
- Hotword:
-
Translating on the fly:
- Hotword:
"Translate to Spanish:" - Action: Instead of typing English, it types the translated version of the rest of the sentence.
- Example: "Translate to Spanish: Hello, how are you today?" -> Types out
Hola, ¿cómo estás hoy?.
- Hotword:
Implementation Priority
Tier 1: High Impact, Low Effort
(All Tier 1 items have been implemented!)
Tier 2: Medium Effort (Requires History Storage)
toak historywith--export,--grep,--shredflagstoak stats- Analytics aggregationtoak copy- Clipboard integration
Tier 3: Higher Complexity
toak profile- Config presetstoak daemon- Background service architecture
Technical Notes
History Storage:
- Use JSON Lines format (
.jsonl) for append-only log - Rotate at 5000 entries or 30 days
- Store both raw and refined text for debugging
Pipe Detection in C#:
if (Console.IsOutputRedirected || args.Contains("--pipe"))
{
Console.WriteLine(refinedText);
}