# Feature Ideas for Toak A curated list of CLI-native features to enhance the dictation workflow. --- ## Core Workflow Additions ### `toak history [-n N]` Display recent transcriptions with timestamps. Use `-n 1` to replay the last result. **Use case:** - `toak history` - Show last 10 transcriptions - `toak history -n 5` - Show last 5 - `toak history -n 1` - Show most recent (equivalent to a "last" command) **Storage:** Append to `~/.local/share/toak/history.jsonl` on each successful transcription: ```json {"timestamp":"2025-01-15T09:23:00Z","raw":"hello world","refined":"Hello world."} ``` --- ## Configuration Profiles ### `toak profile ` / `toak profile` Switch between prompt presets instantly. **Built-in profiles:** - `default` - Current behavior - `code` - Technical mode: preserves indentation, brackets, camelCase - `email` - Professional mode with formal tone - `notes` - Concise mode, bullet points enabled - `social` - Casual mode, emoji allowed **Usage:** ```bash toak profile code # Switch to code preset toak profile # Show current profile toak profiles # List available profiles ``` **Storage:** `~/.config/toak/profiles/.json` - Each file is a complete `ToakConfig` override. --- ## History Management ### `toak stats` Display usage statistics and analytics. ```bash $ toak stats Total recordings: 342 Total duration: 4h 23m Average length: 45s Most active day: 2025-01-10 (23 recordings) Top words: "implementation", "refactor", "meeting" ``` **Metrics tracked:** - Total recordings count - Total/average/min/max duration - Daily/weekly activity - Most common words (from refined text) - API usage estimates --- ### `toak history --export ` Export transcription history to various formats. ```bash toak history --export notes.md # Markdown format toak history --export log.txt # Plain text toak history --export data.json # Full JSON dump ``` **Markdown format example:** ```markdown # Toak Transcriptions - 2025-01-15 ## 09:23:00 We need to fix the API endpoint. ## 09:45:12 - Review the pull request - Update documentation ``` --- ### `toak history --grep ` Search through transcription history. ```bash toak history --grep "API" # Find all mentions of API toak history --grep "TODO" -n 5 # Last 5 occurrences of "TODO" toak history --grep "refactor" --raw # Search raw transcripts instead ``` **Output format:** ``` 2025-01-15 09:23:00 We need to fix the API endpoint. 2025-01-15 14:12:33 The API response time is too slow. ``` --- ### `toak history --shred` Securely delete transcription history. ```bash toak history --shred # Delete entire history file toak history --shred -n 5 # Delete last 5 entries only toak history --shred --raw # Also delete archived raw audio files ``` **Security:** Overwrites data before deletion (optional), removes from disk. --- ## Advanced Architecture ### `toak daemon` / `toak stop-daemon` Background service mode for reduced latency. The CLI interface stays identical, but work is offloaded to a persistent process. **Architecture:** ``` ┌─────────────┐ Unix Socket ┌─────────────────────────────┐ │ toak CLI │ ───────────────────► │ toakd │ │ (client) │ │ (background daemon) │ │ Exits │ ◄──── Ack + Exit ──── │ - Long-running process │ │ Instantly │ │ - Hot HttpClient pool │ └─────────────┘ │ - Config cached in memory │ │ - Manages ffmpeg lifecycle │ └─────────────────────────────┘ ``` **CLI stays the same:** ```bash toak toggle # Client sends "start" to daemon, exits (~10ms) # ... recording happens ... toak toggle # Client sends "stop" to daemon, exits (~10ms) # Daemon continues: upload → transcribe → refine → type ``` **Why it's faster (without AOT):** | Operation | Current | Daemon | Savings | |-----------|---------|--------|---------| | JIT compilation | 150ms | 0ms | 150ms | | Assembly loading | 50ms | 0ms | 50ms | | DNS lookup | 40ms | 0ms | 40ms | | TLS handshake | 80ms | 0ms | 80ms | | Config read | 10ms | 0ms | 10ms | | **Total** | **~330ms** | **~10ms** | **~320ms** | **Why it's still faster (with AOT):** AOT eliminates JIT/assembly overhead, but not everything: | Operation | AOT Binary | AOT Daemon | Savings | |-----------|------------|------------|---------| | Process startup | 20ms | 0ms | 20ms | | DNS lookup | 40ms | 0ms | 40ms | | TLS handshake | 80ms | 0ms | 80ms | | Config read | 5ms | 0ms | 5ms | | **Total** | **~145ms** | **~10ms** | **~135ms** | **Verdict with AOT:** - Without daemon: Each toggle takes ~145ms before network call starts - With daemon: Each toggle takes ~10ms (just socket IPC) - The daemon still saves ~135ms, but it's less critical than without AOT **Trade-offs:** - **Pro:** Faster hotkey response, persistent connections, shared state - **Con:** Added complexity (process management, crash recovery, socket IPC) - **Con:** Debugging harder when logic lives in daemon **Usage:** ```bash toak daemon # Start background service toak stop-daemon # Shutdown background service toak status # Check if daemon is running ``` **Implementation notes:** - Socket path: `/tmp/toakd.sock` or `$XDG_RUNTIME_DIR/toakd.sock` - Protocol: Simple line-based or JSON messages - Daemon writes PID to `/tmp/toakd.pid` for status checks - Client binary checks for daemon on startup; can auto-start or error --- ## Future Innovations ### Hotword Commands (LLM Routing) Instruct the LLM in `PromptBuilder` to output a specific JSON structure if given a command phrase. If a specific hotword like "System command" or "Computer dictate" is detected at the start of the audio, Toak parses the JSON, skips typng out via `xdotool`/`wtype`, and instead executes a pre-defined background action. If it doesn't hear a command phrase, it simply returns the text normally and types it. **How it works (Under the Hood):** The LLM is prompted to always return JSON in the background when a command is directed at the assistant. ```json { "is_command": true, "action": "append_to_notes", "content": "Buy milk and eggs", "meta": {} } ``` **Alternative Hotword Ideas:** Since "Toak" is not a real English word, Whisper might transcribe it as "talk", "toke", or "oak." It is highly recommended to use distinct, phonetically clear hotwords such as: - **"System..."** (e.g. "System note:") - **"Computer..."** (e.g. "Computer search:") - **"Action..."** (e.g. "Action commit:") - **"Dictate..."** (e.g. "Dictate terminal:") - **"Listen up..."** (e.g. "Listen up translate...") **Prompt Ideas & Use Cases:** 1. **Quick Notes / Brainstorming:** - *Hotword:* `"System note:"` or `"Drop this in my notes:"` - *Action:* Appends the spoken text to a configured `~/notes.md` file in the background without interrupting your current window. - *Example:* "System note: I need to remember to check the database migrations later today." 2. **Terminal / CLI Execution:** - *Hotword:* `"Computer terminal:"` or `"Command:"` - *Action:* Takes the natural language command, asks the LLM to translate it into a bash command, and types it into a new tmux window or background process. - *Example:* "Computer terminal: find all python files modified in the last 2 days." 3. **Git Commit Messages:** - *Hotword:* `"Action commit:"` - *Action:* Automatically formats the dictated text into a standard conventional commit message, stages all files, and commits them. - *Example:* "Action commit: I refactored the audio recorder to use native processes instead of the old library." -> LLM outputs `refactor(audio): migrate to native processes` and runs `git commit -am "..."`. 4. **Web Search / Lookup:** - *Hotword:* `"System search:"` or `"Look up:"` - *Action:* Opens your default browser and performs a search for the spoken phrase. - *Example:* "System search: MDN documentation for grid layout." 5. **Translating on the fly:** - *Hotword:* `"Translate to Spanish:"` - *Action:* Instead of typing English, it types the translated version of the rest of the sentence. - *Example:* "Translate to Spanish: Hello, how are you today?" -> Types out `Hola, ¿cómo estás hoy?`. --- ## Implementation Priority ### Tier 1: High Impact, Low Effort *(All Tier 1 items have been implemented!)* ### Tier 2: Medium Effort (Requires History Storage) 4. `toak history` with `--export`, `--grep`, `--shred` flags 5. `toak stats` - Analytics aggregation 6. `toak copy` - Clipboard integration ### Tier 3: Higher Complexity 7. `toak profile` - Config presets 8. `toak daemon` - Background service architecture --- ## Technical Notes **History Storage:** - Use JSON Lines format (`.jsonl`) for append-only log - Rotate at 5000 entries or 30 days - Store both raw and refined text for debugging **Pipe Detection in C#:** ```csharp if (Console.IsOutputRedirected || args.Contains("--pipe")) { Console.WriteLine(refinedText); } ```