feat: Implement a modular skill system with hotword detection, streaming text output, and enhanced logging.
This commit is contained in:
55
IDEAS.md
55
IDEAS.md
@@ -197,6 +197,61 @@ toak status # Check if daemon is running
|
||||
|
||||
---
|
||||
|
||||
## Future Innovations
|
||||
|
||||
### Hotword Commands (LLM Routing)
|
||||
Instruct the LLM in `PromptBuilder` to output a specific JSON structure if given a command phrase. If a specific hotword like "System command" or "Computer dictate" is detected at the start of the audio, Toak parses the JSON, skips typng out via `xdotool`/`wtype`, and instead executes a pre-defined background action.
|
||||
|
||||
If it doesn't hear a command phrase, it simply returns the text normally and types it.
|
||||
|
||||
**How it works (Under the Hood):**
|
||||
The LLM is prompted to always return JSON in the background when a command is directed at the assistant.
|
||||
```json
|
||||
{
|
||||
"is_command": true,
|
||||
"action": "append_to_notes",
|
||||
"content": "Buy milk and eggs",
|
||||
"meta": {}
|
||||
}
|
||||
```
|
||||
|
||||
**Alternative Hotword Ideas:**
|
||||
Since "Toak" is not a real English word, Whisper might transcribe it as "talk", "toke", or "oak." It is highly recommended to use distinct, phonetically clear hotwords such as:
|
||||
- **"System..."** (e.g. "System note:")
|
||||
- **"Computer..."** (e.g. "Computer search:")
|
||||
- **"Action..."** (e.g. "Action commit:")
|
||||
- **"Dictate..."** (e.g. "Dictate terminal:")
|
||||
- **"Listen up..."** (e.g. "Listen up translate...")
|
||||
|
||||
**Prompt Ideas & Use Cases:**
|
||||
|
||||
1. **Quick Notes / Brainstorming:**
|
||||
- *Hotword:* `"System note:"` or `"Drop this in my notes:"`
|
||||
- *Action:* Appends the spoken text to a configured `~/notes.md` file in the background without interrupting your current window.
|
||||
- *Example:* "System note: I need to remember to check the database migrations later today."
|
||||
|
||||
2. **Terminal / CLI Execution:**
|
||||
- *Hotword:* `"Computer terminal:"` or `"Command:"`
|
||||
- *Action:* Takes the natural language command, asks the LLM to translate it into a bash command, and types it into a new tmux window or background process.
|
||||
- *Example:* "Computer terminal: find all python files modified in the last 2 days."
|
||||
|
||||
3. **Git Commit Messages:**
|
||||
- *Hotword:* `"Action commit:"`
|
||||
- *Action:* Automatically formats the dictated text into a standard conventional commit message, stages all files, and commits them.
|
||||
- *Example:* "Action commit: I refactored the audio recorder to use native processes instead of the old library." -> LLM outputs `refactor(audio): migrate to native processes` and runs `git commit -am "..."`.
|
||||
|
||||
4. **Web Search / Lookup:**
|
||||
- *Hotword:* `"System search:"` or `"Look up:"`
|
||||
- *Action:* Opens your default browser and performs a search for the spoken phrase.
|
||||
- *Example:* "System search: MDN documentation for grid layout."
|
||||
|
||||
5. **Translating on the fly:**
|
||||
- *Hotword:* `"Translate to Spanish:"`
|
||||
- *Action:* Instead of typing English, it types the translated version of the rest of the sentence.
|
||||
- *Example:* "Translate to Spanish: Hello, how are you today?" -> Types out `Hola, ¿cómo estás hoy?`.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
### Tier 1: High Impact, Low Effort
|
||||
|
||||
Reference in New Issue
Block a user