9.5 KiB
Advanced AI Agent CLI System Design
This document outlines the architecture for a completely new, built-from-scratch AI Agent Command Line Interface system, inspired by the lessons learned from the Anchor CLI refactoring.
1. Core Principles
- Event-Driven UI & Decoupled State: The UI and display layers communicate exclusively through an asynchronous Event Bus.
- Explicit Control Flow: Core agent execution utilizes a Mediator pattern (Request/Response) for predictable, traceable control flow rather than pure event spaghetti.
- Dependency Injection: A robust IoC container manages lifecycles and dependencies.
- Pluggable Architecture: Everything—from the LLM provider to the UI renderer and memory storage—is an injectable plugin.
- Stateless Components: Services maintain minimal internal state. State is managed centrally in a session or context store with immutable snapshots.
- Test-First Design: Complete absence of static delegates and global mutable state ensures every component is unit-testable in isolation.
- Pervasive Cancellation: Every asynchronous operation accepts a
CancellationTokenfor graceful termination.
2. High-Level Architecture & Project Structure (AOT-Ready)
The system is structurally divided into three distinct C# projects to enforce decoupling, testability, and future-proof design, while maintaining strict compatibility with .NET Native AOT compilation for single-file, zero-dependency distribution on Linux/Windows.
2.1 Project: Anchor.AgentFramework (Class Library)
The core logic and abstractions. It has no knowledge of the console, the file system, or specific LLM SDKs.
- Contains: Interfaces (
IEventBus,IMediator,IAgentAvatar), Memory Management (ISessionManager), Execution Loop (ChatCoordinator), and theToolRunner. - Responsibilities: Orchestrating the agent's thought process, managing state, and firing events.
2.2 Project: Anchor.Providers (Class Library)
The vendor-specific implementations for Language Models.
- Contains:
OpenAIAvatar,AnthropicAvatar. - Responsibilities: Translating the framework's semantic requests into vendor-specific API calls (e.g., mapping
ToolResultto OpenAI's tool response format) via SDKs likeAzure.AI.OpenAI.
2.3 Project: Anchor.Cli (Console Application)
The "Hosting Shell" and the physical "Senses/Hands" of the application.
- Contains:
Program.cs(Composition Root),RichConsoleRenderer,ConsoleInputDispatcher, and concrete Tool implementations (e.g.,FileSystemTool,CmdTool). - Responsibilities: Wiring up Dependency Injection, reading from stdin, rendering UI/spinners to stdout, and executing side-effects on the host OS.
2.4 Logical Layers
Across these projects, the system operates in five primary layers:
- Hosting & Lifecycle (The Host)
- Event & Messaging Backbone (The Bus)
- State & Memory Management (The Brain)
- I/O & User Interface (The Senses & Voice)
- Execution & Tooling (The Hands)
2.5 Dependency Injection Graph
Anchor.Cli (Composition Root - Program.cs)
│
├── IEventBus → AsyncEventBus
│
├── IMemoryStore → VectorMemoryStore / SQLiteMemoryStore
├── ISessionManager → ContextAwareSessionManager
│ └── ICompactionStrategy → SemanticCompactionStrategy
│
├── IUserInputDispatcher → ConsoleInputDispatcher
├── ICommandRegistry → DynamicCommandRegistry
│
├── IAgentAvatar (LLM Interface) → AnthropicAvatar / OpenAIAvatar
├── IResponseStreamer → TokenAwareResponseStreamer
│
├── IUiRenderer → RichConsoleRenderer
│ ├── ISpinnerManager → AsyncSpinnerManager
│ └── IStreamingRenderer → ConsoleStreamingRenderer
│
└── IToolRegistry → DynamicToolRegistry
└── (Injected Tools: FileSystemTool, CmdTool, WebSearchTool)
3. Component Details
3.1 The Messaging Backbone: IEventBus and IMediator (AOT Safe)
The system utilizes a dual-messaging approach to prevent "event spaghetti":
- Publish-Subscribe (Events): Used for things that happened and might have multiple or zero listeners (e.g., UI updates, diagnostics).
EventBus.PublishAsync(EventBase @event)
- Request-Response (Commands): Used for linear, required actions with a return value.
Mediator.Send(IRequest<TResponse> request)
Warning
Standard
MediatRrelies heavily on runtime reflection for handler discovery, making it incompatible with Native AOT. We must use an AOT-safe source-generated alternative, such as the Mediator library, or implement a simple, source-generated Event/Command bus internally.
Key Events (Pub/Sub):
UserInputReceived: Triggered when the user hits Enter.LLMStreamDeltaReceived: Emitted for token-by-token streaming to the UI.ToolExecutionStarted/ToolExecutionCompleted: Emitted for UI spinners and logging.ContextLimitWarning: High token usage indicator.
Key Commands (Request/Response):
ExecuteToolCommand: Sent from the Avatar to the Tool Runner, returns aToolResult.
3.2 The Brain: ISessionManager & Memory
Instead of just a simple list of messages, the new system uses a multi-tiered memory architecture with thread-safe access.
- Short-Term Memory (Context Window): The active conversation. Must yield Immutable Context Snapshots to prevent collection modification exceptions when tools/LLM run concurrently with background tasks.
- Long-Term Memory (Vector DB): Indexed facts, summaries, and user preferences.
- ICompactionStrategy:
Instead of implicitly using an LLM on the critical path, the system uses tiered, deterministic strategies:
- Sliding Window: Automatically drop the oldest user/assistant message pairs.
- Tool Output Truncation: Remove large file reads from old turns.
- LLM Summarization (Optional): As a last resort, explicitly lock state and summarize old context into a "Context Digest".
3.3 The Senses & Voice: Event-Driven CLI UI
The UI is strictly separated from business logic, which is an ideal architecture for a dedicated CLI tool. The RichConsoleRenderer only listens to the IEventBus.
- Input Loop:
IUserInputDispatchersits in a loop reading stdin. When input is received, it firesUserInputReceived. It capturesCtrl+Cto trigger a globalCancellationToken. - Output Loop:
IUiRenderersubscribes toLLMStreamDeltaReceivedand renders tokens. It subscribes toToolExecutionStartedand spins up a dedicated UI spinner, preventing async console output from overwriting the active prompt. - Headless CLI Mode: For CI/CD environments or scripting, the system can run non-interactively by simply swapping the
RichConsoleRendererwith aBasicLoggingRenderer—the core agent logic remains untouched.
3.4 The Hands: Plugins and Tooling
Tools are no longer hardcoded.
- IToolRegistry: Discovers tools at startup via Reflection or Assembly Scanning.
- Tool Execution: When the LLM API returns a
tool_callsstop reason, theIAgentAvatariteratively or concurrently sends anExecuteToolCommandvia the Mediator. It directly awaits the results, appends them to the context snapshot, and resumes the LLM generation. This provides explicit, traceable control flow. - Cancellation: Every async method across the entire system accepts a
CancellationTokento allow graceful termination of infinite loops or runaway processes.
4. Execution Flow (Anatomy of a User Turn)
- Input: User types "Find the bug in main.py".
- Dispatch:
ConsoleInputDispatcherreads it and publishesUserInputReceived. - Routing: Built-in command handler (if applicable) checks if it's a structural command (
/clear,/exit). OtherwiseSessionManageradds it to the active context. - Inference: A
ChatCoordinatorservice reacts to the updated context and asks theIAgentAvatarfor a response. - Streaming: The Avatar calls the Anthropic/OpenAI API. As tokens arrive, it publishes
LLMStreamDeltaReceived. - Rendering:
RichConsoleRendererreceives the deltas and prints them to the terminal. - Tool Request: The LLM API returns a tool call. The Avatar dispatches an
ExecuteToolCommandvia the Mediator. The EventBus also publishes aToolExecutionStartedevent for the UI spinner. - Execution & Feedback:
ToolRunnerhandles the command, runs it safely with theCancellationToken, and returns the result back to the Avatar. The Avatar feeds this back to the LLM API automatically. - Completion: The turn ends. The
SessionManagerchecks token bounds and runs compaction if necessary.
5. Conclusion (Native AOT Focus)
While ARCHITECTURE_REFACTOR.md focuses on migrating a legacy "God Class", this new design assumes a green-field, AOT-first approach.
To achieve true Native AOT, we must strictly avoid runtime reflection. This means:
- Using
CreateSlimBuilder()instead ofCreateDefaultBuilder()inMicrosoft.Extensions.Hosting. - Using Source Generators for Dependency Injection setup.
- Using Source Generators for JSON Serialization (
System.Text.Json.Serialization.JsonSerializableAttribute). - Replacing reflection-heavy libraries like
MediatRandScrutorwith AOT-friendly source-generated alternatives.
By adhering to these constraints, the resulting single-binary Linux executable will have near-instant startup time and a dramatically reduced memory footprint compared to a standard JIT-compiled .NET application.