feat: Add file editing and manipulation tools with Hashline anchor validation and integrate them into the tool registry.
This commit is contained in:
@@ -1,134 +0,0 @@
|
||||
# Advanced AI Agent CLI System Design
|
||||
|
||||
This document outlines the architecture for a completely new, built-from-scratch AI Agent Command Line Interface system, inspired by the lessons learned from the `Anchor CLI` refactoring.
|
||||
|
||||
## 1. Core Principles
|
||||
* **Event-Driven UI & Decoupled State:** The UI and display layers communicate exclusively through an asynchronous Event Bus.
|
||||
* **Explicit Control Flow:** Core agent execution utilizes a Mediator pattern (Request/Response) for predictable, traceable control flow rather than pure event spaghetti.
|
||||
* **Dependency Injection:** A robust IoC container manages lifecycles and dependencies.
|
||||
* **Pluggable Architecture:** Everything—from the LLM provider to the UI renderer and memory storage—is an injectable plugin.
|
||||
* **Stateless Components:** Services maintain minimal internal state. State is managed centrally in a session or context store with immutable snapshots.
|
||||
* **Test-First Design:** Complete absence of static delegates and global mutable state ensures every component is unit-testable in isolation.
|
||||
* **Pervasive Cancellation:** Every asynchronous operation accepts a `CancellationToken` for graceful termination.
|
||||
|
||||
## 2. High-Level Architecture & Project Structure (AOT-Ready)
|
||||
|
||||
The system is structurally divided into three distinct C# projects to enforce decoupling, testability, and future-proof design, while maintaining strict compatibility with **.NET Native AOT** compilation for single-file, zero-dependency distribution on Linux/Windows.
|
||||
|
||||
### 2.1 Project: `Anchor.AgentFramework` (Class Library)
|
||||
The core logic and abstractions. It has **no knowledge** of the console, the file system, or specific LLM SDKs.
|
||||
* **Contains:** Interfaces (`IEventBus`, `IMediator`, `IAgentAvatar`), Memory Management (`ISessionManager`), Execution Loop (`ChatCoordinator`), and the `ToolRunner`.
|
||||
* **Responsibilities:** Orchestrating the agent's thought process, managing state, and firing events.
|
||||
|
||||
### 2.2 Project: `Anchor.Providers` (Class Library)
|
||||
The vendor-specific implementations for Language Models.
|
||||
* **Contains:** `OpenAIAvatar`, `AnthropicAvatar`.
|
||||
* **Responsibilities:** Translating the framework's semantic requests into vendor-specific API calls (e.g., mapping `ToolResult` to OpenAI's tool response format) via SDKs like `Azure.AI.OpenAI`.
|
||||
|
||||
### 2.3 Project: `Anchor.Cli` (Console Application)
|
||||
The "Hosting Shell" and the physical "Senses/Hands" of the application.
|
||||
* **Contains:** `Program.cs` (Composition Root), `RichConsoleRenderer`, `ConsoleInputDispatcher`, and concrete Tool implementations (e.g., `FileSystemTool`, `CmdTool`).
|
||||
* **Responsibilities:** Wiring up Dependency Injection, reading from stdin, rendering UI/spinners to stdout, and executing side-effects on the host OS.
|
||||
|
||||
### 2.4 Logical Layers
|
||||
|
||||
Across these projects, the system operates in five primary layers:
|
||||
|
||||
1. **Hosting & Lifecycle (The Host)**
|
||||
2. **Event & Messaging Backbone (The Bus)**
|
||||
3. **State & Memory Management (The Brain)**
|
||||
4. **I/O & User Interface (The Senses & Voice)**
|
||||
5. **Execution & Tooling (The Hands)**
|
||||
|
||||
### 2.5 Dependency Injection Graph
|
||||
|
||||
```text
|
||||
Anchor.Cli (Composition Root - Program.cs)
|
||||
│
|
||||
├── IEventBus → AsyncEventBus
|
||||
│
|
||||
├── IMemoryStore → VectorMemoryStore / SQLiteMemoryStore
|
||||
├── ISessionManager → ContextAwareSessionManager
|
||||
│ └── ICompactionStrategy → SemanticCompactionStrategy
|
||||
│
|
||||
├── IUserInputDispatcher → ConsoleInputDispatcher
|
||||
├── ICommandRegistry → DynamicCommandRegistry
|
||||
│
|
||||
├── IAgentAvatar (LLM Interface) → AnthropicAvatar / OpenAIAvatar
|
||||
├── IResponseStreamer → TokenAwareResponseStreamer
|
||||
│
|
||||
├── IUiRenderer → RichConsoleRenderer
|
||||
│ ├── ISpinnerManager → AsyncSpinnerManager
|
||||
│ └── IStreamingRenderer → ConsoleStreamingRenderer
|
||||
│
|
||||
└── IToolRegistry → DynamicToolRegistry
|
||||
└── (Injected Tools: FileSystemTool, CmdTool, WebSearchTool)
|
||||
```
|
||||
|
||||
## 3. Component Details
|
||||
|
||||
### 3.1 The Messaging Backbone: `IEventBus` and `IMediator` (AOT Safe)
|
||||
The system utilizes a dual-messaging approach to prevent "event spaghetti":
|
||||
* **Publish-Subscribe (Events):** Used for things that *happened* and might have multiple or zero listeners (e.g., UI updates, diagnostics).
|
||||
* `EventBus.PublishAsync(EventBase @event)`
|
||||
* **Request-Response (Commands):** Used for linear, required actions with a return value.
|
||||
* `Mediator.Send(IRequest<TResponse> request)`
|
||||
|
||||
> [!WARNING]
|
||||
> Standard `MediatR` relies heavily on runtime reflection for handler discovery, making it **incompatible with Native AOT**. We must use an AOT-safe source-generated alternative, such as the [Mediator](https://github.com/martinothamar/Mediator) library, or implement a simple, source-generated Event/Command bus internally.
|
||||
|
||||
**Key Events (Pub/Sub):**
|
||||
* `UserInputReceived`: Triggered when the user hits Enter.
|
||||
* `LLMStreamDeltaReceived`: Emitted for token-by-token streaming to the UI.
|
||||
* `ToolExecutionStarted` / `ToolExecutionCompleted`: Emitted for UI spinners and logging.
|
||||
* `ContextLimitWarning`: High token usage indicator.
|
||||
|
||||
**Key Commands (Request/Response):**
|
||||
* `ExecuteToolCommand`: Sent from the Avatar to the Tool Runner, returns a `ToolResult`.
|
||||
|
||||
### 3.2 The Brain: `ISessionManager` & Memory
|
||||
Instead of just a simple list of messages, the new system uses a multi-tiered memory architecture with thread-safe access.
|
||||
|
||||
* **Short-Term Memory (Context Window):** The active conversation. Must yield **Immutable Context Snapshots** to prevent collection modification exceptions when tools/LLM run concurrently with background tasks.
|
||||
* **Long-Term Memory (Vector DB):** Indexed facts, summaries, and user preferences.
|
||||
* **ICompactionStrategy:**
|
||||
Instead of implicitly using an LLM on the critical path, the system uses tiered, deterministic strategies:
|
||||
1. **Sliding Window:** Automatically drop the oldest user/assistant message pairs.
|
||||
2. **Tool Output Truncation:** Remove large file reads from old turns.
|
||||
3. **LLM Summarization (Optional):** As a last resort, explicitly lock state and summarize old context into a "Context Digest".
|
||||
|
||||
### 3.3 The Senses & Voice: Event-Driven CLI UI
|
||||
The UI is strictly separated from business logic, which is an ideal architecture for a dedicated CLI tool. The `RichConsoleRenderer` only listens to the `IEventBus`.
|
||||
|
||||
* **Input Loop:** `IUserInputDispatcher` sits in a loop reading stdin. When input is received, it fires `UserInputReceived`. It captures `Ctrl+C` to trigger a global `CancellationToken`.
|
||||
* **Output Loop:** `IUiRenderer` subscribes to `LLMStreamDeltaReceived` and renders tokens. It subscribes to `ToolExecutionStarted` and spins up a dedicated UI spinner, preventing async console output from overwriting the active prompt.
|
||||
* **Headless CLI Mode:** For CI/CD environments or scripting, the system can run non-interactively by simply swapping the `RichConsoleRenderer` with a `BasicLoggingRenderer`—the core agent logic remains untouched.
|
||||
|
||||
### 3.4 The Hands: Plugins and Tooling
|
||||
Tools are no longer hardcoded.
|
||||
|
||||
* **IToolRegistry:** Discovers tools at startup via Reflection or Assembly Scanning.
|
||||
* **Tool Execution:** When the LLM API returns a `tool_calls` stop reason, the `IAgentAvatar` iteratively or concurrently sends an `ExecuteToolCommand` via the Mediator. It directly awaits the results, appends them to the context snapshot, and resumes the LLM generation. This provides explicit, traceable control flow.
|
||||
* **Cancellation:** Every async method across the entire system accepts a `CancellationToken` to allow graceful termination of infinite loops or runaway processes.
|
||||
|
||||
## 4. Execution Flow (Anatomy of a User Turn)
|
||||
|
||||
1. **Input:** User types "Find the bug in main.py".
|
||||
2. **Dispatch:** `ConsoleInputDispatcher` reads it and publishes `UserInputReceived`.
|
||||
3. **Routing:** Built-in command handler (if applicable) checks if it's a structural command (`/clear`, `/exit`). Otherwise `SessionManager` adds it to the active context.
|
||||
4. **Inference:** A `ChatCoordinator` service reacts to the updated context and asks the `IAgentAvatar` for a response.
|
||||
5. **Streaming:** The Avatar calls the Anthropic/OpenAI API. As tokens arrive, it publishes `LLMStreamDeltaReceived`.
|
||||
6. **Rendering:** `RichConsoleRenderer` receives the deltas and prints them to the terminal.
|
||||
7. **Tool Request:** The LLM API returns a tool call. The Avatar dispatches an `ExecuteToolCommand` via the Mediator. The EventBus also publishes a `ToolExecutionStarted` event for the UI spinner.
|
||||
8. **Execution & Feedback:** `ToolRunner` handles the command, runs it safely with the `CancellationToken`, and returns the result back to the Avatar. The Avatar feeds this back to the LLM API automatically.
|
||||
9. **Completion:** The turn ends. The `SessionManager` checks token bounds and runs compaction if necessary.
|
||||
|
||||
## 5. Conclusion (Native AOT Focus)
|
||||
While `ARCHITECTURE_REFACTOR.md` focuses on migrating a legacy "God Class", this new design assumes a green-field, **AOT-first** approach.
|
||||
To achieve true Native AOT, we must strictly avoid runtime reflection. This means:
|
||||
1. Using `CreateSlimBuilder()` instead of `CreateDefaultBuilder()` in `Microsoft.Extensions.Hosting`.
|
||||
2. Using Source Generators for Dependency Injection setup.
|
||||
3. Using Source Generators for JSON Serialization (`System.Text.Json.Serialization.JsonSerializableAttribute`).
|
||||
4. Replacing reflection-heavy libraries like `MediatR` and `Scrutor` with AOT-friendly source-generated alternatives.
|
||||
|
||||
By adhering to these constraints, the resulting single-binary Linux executable will have near-instant startup time and a dramatically reduced memory footprint compared to a standard JIT-compiled .NET application.
|
||||
Reference in New Issue
Block a user