1
0
Files
Toak/IDEAS.md

9.3 KiB

Feature Ideas for Toak

A curated list of CLI-native features to enhance the dictation workflow.


Core Workflow Additions

toak history [-n N]

Display recent transcriptions with timestamps. Use -n 1 to replay the last result.

Use case:

  • toak history - Show last 10 transcriptions
  • toak history -n 5 - Show last 5
  • toak history -n 1 - Show most recent (equivalent to a "last" command)

Storage: Append to ~/.local/share/toak/history.jsonl on each successful transcription:

{"timestamp":"2025-01-15T09:23:00Z","raw":"hello world","refined":"Hello world."}

Configuration Profiles

toak profile <name> / toak profile

Switch between prompt presets instantly.

Built-in profiles:

  • default - Current behavior
  • code - Technical mode: preserves indentation, brackets, camelCase
  • email - Professional mode with formal tone
  • notes - Concise mode, bullet points enabled
  • social - Casual mode, emoji allowed

Usage:

toak profile code     # Switch to code preset
toak profile          # Show current profile
toak profiles         # List available profiles

Storage: ~/.config/toak/profiles/<name>.json - Each file is a complete ToakConfig override.


History Management

toak stats

Display usage statistics and analytics.

$ toak stats
Total recordings: 342
Total duration: 4h 23m
Average length: 45s
Most active day: 2025-01-10 (23 recordings)
Top words: "implementation", "refactor", "meeting"

Metrics tracked:

  • Total recordings count
  • Total/average/min/max duration
  • Daily/weekly activity
  • Most common words (from refined text)
  • API usage estimates

toak history --export <file>

Export transcription history to various formats.

toak history --export notes.md     # Markdown format
toak history --export log.txt      # Plain text
toak history --export data.json    # Full JSON dump

Markdown format example:

# Toak Transcriptions - 2025-01-15

## 09:23:00
We need to fix the API endpoint.

## 09:45:12
- Review the pull request
- Update documentation

toak history --grep <pattern>

Search through transcription history.

toak history --grep "API"              # Find all mentions of API
toak history --grep "TODO" -n 5        # Last 5 occurrences of "TODO"
toak history --grep "refactor" --raw   # Search raw transcripts instead

Output format:

2025-01-15 09:23:00  We need to fix the API endpoint.
2025-01-15 14:12:33  The API response time is too slow.

toak history --shred

Securely delete transcription history.

toak history --shred           # Delete entire history file
toak history --shred -n 5      # Delete last 5 entries only
toak history --shred --raw     # Also delete archived raw audio files

Security: Overwrites data before deletion (optional), removes from disk.


Advanced Architecture

toak daemon / toak stop-daemon

Background service mode for reduced latency. The CLI interface stays identical, but work is offloaded to a persistent process.

Architecture:

┌─────────────┐      Unix Socket      ┌─────────────────────────────┐
│  toak CLI   │  ───────────────────► │         toakd               │
│   (client)  │                       │  (background daemon)        │
│   Exits     │ ◄──── Ack + Exit ──── │  - Long-running process     │
│   Instantly │                       │  - Hot HttpClient pool      │
└─────────────┘                       │  - Config cached in memory  │
                                      │  - Manages ffmpeg lifecycle │
                                      └─────────────────────────────┘

CLI stays the same:

toak toggle      # Client sends "start" to daemon, exits (~10ms)
# ... recording happens ...
toak toggle      # Client sends "stop" to daemon, exits (~10ms)
                 # Daemon continues: upload → transcribe → refine → type

Why it's faster (without AOT):

Operation Current Daemon Savings
JIT compilation 150ms 0ms 150ms
Assembly loading 50ms 0ms 50ms
DNS lookup 40ms 0ms 40ms
TLS handshake 80ms 0ms 80ms
Config read 10ms 0ms 10ms
Total ~330ms ~10ms ~320ms

Why it's still faster (with AOT):

AOT eliminates JIT/assembly overhead, but not everything:

Operation AOT Binary AOT Daemon Savings
Process startup 20ms 0ms 20ms
DNS lookup 40ms 0ms 40ms
TLS handshake 80ms 0ms 80ms
Config read 5ms 0ms 5ms
Total ~145ms ~10ms ~135ms

Verdict with AOT:

  • Without daemon: Each toggle takes ~145ms before network call starts
  • With daemon: Each toggle takes ~10ms (just socket IPC)
  • The daemon still saves ~135ms, but it's less critical than without AOT

Trade-offs:

  • Pro: Faster hotkey response, persistent connections, shared state
  • Con: Added complexity (process management, crash recovery, socket IPC)
  • Con: Debugging harder when logic lives in daemon

Usage:

toak daemon                    # Start background service
toak stop-daemon               # Shutdown background service
toak status                    # Check if daemon is running

Implementation notes:

  • Socket path: /tmp/toakd.sock or $XDG_RUNTIME_DIR/toakd.sock
  • Protocol: Simple line-based or JSON messages
  • Daemon writes PID to /tmp/toakd.pid for status checks
  • Client binary checks for daemon on startup; can auto-start or error

Future Innovations

Hotword Commands (LLM Routing)

Instruct the LLM in PromptBuilder to output a specific JSON structure if given a command phrase. If a specific hotword like "System command" or "Computer dictate" is detected at the start of the audio, Toak parses the JSON, skips typng out via xdotool/wtype, and instead executes a pre-defined background action.

If it doesn't hear a command phrase, it simply returns the text normally and types it.

How it works (Under the Hood): The LLM is prompted to always return JSON in the background when a command is directed at the assistant.

{
  "is_command": true,
  "action": "append_to_notes",
  "content": "Buy milk and eggs",
  "meta": {}
}

Alternative Hotword Ideas: Since "Toak" is not a real English word, Whisper might transcribe it as "talk", "toke", or "oak." It is highly recommended to use distinct, phonetically clear hotwords such as:

  • "System..." (e.g. "System note:")
  • "Computer..." (e.g. "Computer search:")
  • "Action..." (e.g. "Action commit:")
  • "Dictate..." (e.g. "Dictate terminal:")
  • "Listen up..." (e.g. "Listen up translate...")

Prompt Ideas & Use Cases:

  1. Quick Notes / Brainstorming:

    • Hotword: "System note:" or "Drop this in my notes:"
    • Action: Appends the spoken text to a configured ~/notes.md file in the background without interrupting your current window.
    • Example: "System note: I need to remember to check the database migrations later today."
  2. Terminal / CLI Execution:

    • Hotword: "Computer terminal:" or "Command:"
    • Action: Takes the natural language command, asks the LLM to translate it into a bash command, and types it into a new tmux window or background process.
    • Example: "Computer terminal: find all python files modified in the last 2 days."
  3. Git Commit Messages:

    • Hotword: "Action commit:"
    • Action: Automatically formats the dictated text into a standard conventional commit message, stages all files, and commits them.
    • Example: "Action commit: I refactored the audio recorder to use native processes instead of the old library." -> LLM outputs refactor(audio): migrate to native processes and runs git commit -am "...".
  4. Web Search / Lookup:

    • Hotword: "System search:" or "Look up:"
    • Action: Opens your default browser and performs a search for the spoken phrase.
    • Example: "System search: MDN documentation for grid layout."
  5. Translating on the fly:

    • Hotword: "Translate to Spanish:"
    • Action: Instead of typing English, it types the translated version of the rest of the sentence.
    • Example: "Translate to Spanish: Hello, how are you today?" -> Types out Hola, ¿cómo estás hoy?.

Implementation Priority

Tier 1: High Impact, Low Effort

(All Tier 1 items have been implemented!)

Tier 2: Medium Effort (Requires History Storage)

  1. toak history with --export, --grep, --shred flags
  2. toak stats - Analytics aggregation
  3. toak copy - Clipboard integration

Tier 3: Higher Complexity

  1. toak profile - Config presets
  2. toak daemon - Background service architecture

Technical Notes

History Storage:

  • Use JSON Lines format (.jsonl) for append-only log
  • Rotate at 5000 entries or 30 days
  • Store both raw and refined text for debugging

Pipe Detection in C#:

if (Console.IsOutputRedirected || args.Contains("--pipe"))
{
    Console.WriteLine(refinedText);
}