The Context-Switching Problem

Every time a power user alt-tabs to query a cloud AI, they break their flow state, expose proprietary context, and wait on network latency. A.L.I.C.E. was designed around a single axiom: the assistant must be slower than thought to be useless, and visible to be a liability. The architecture had to solve both.

The threat isn’t just privacy — it’s friction. A tool that consumes 2GB of RAM or spikes the CPU every time a file is saved is immediately uninstalled. The constraints were treated as hard limits, not targets:

Memory ceiling: The entire A.L.I.C.E. stack — daemon, index, and idle inference runtime — must stay under 400MB. The user’s IDE and browser will always be the priority processes.
Latency ceiling: RAG retrieval must return in under 50ms. Anything slower breaks the psychological illusion of a ‘second brain’ and degrades back into a search tool.
Privacy floor: Zero bytes leave the machine. There is no fallback cloud endpoint, no telemetry ping, no model API call. The constraint is architectural, not configurable.

Architectural Pillars

1. The Rust Orchestration Core (The Nervous System)

The Rust daemon is the always-on heartbeat of A.L.I.C.E. It hooks directly into OS-level inotify events on Linux to watch the user’s entire file ecosystem — notes, codebases, project specs — in real time. When a .md file is saved or a new function is committed, the daemon triggers an immediate chunking and re-indexing pipeline before the user has lifted their hand from the keyboard.

Rust was non-negotiable here. A Python file-watcher under equivalent load introduces polling delays and GIL contention. The Rust core runs as a lean background binary consuming single-digit megabytes of RAM, leaving headroom for the processes that actually matter.

2. Embedded SQLite Vector Knowledge Graph (The Long-Term Memory)

There is no separate database process. The user’s entire personal knowledge base — every chunked note, every indexed code comment, every archived decision — lives in a single SQLite file on disk, augmented with a vector search extension. This means:

Portability: The knowledge graph is a file. It can be backed up with cp.
Speed: No inter-process communication overhead. Queries hit the disk directly from the orchestration layer.
Scale: SQLite handles multi-gigabyte databases without configuration. Years of accumulated context do not degrade retrieval performance the way in-memory JSON indexing would.

The chunking strategy proved to be the most critical tuning surface in the entire system. Semantic chunking — splitting on conceptual boundaries rather than fixed token counts — dramatically outperformed naive approaches. A well-chunked 7B model retrieves more relevant context than a poorly-chunked 70B model.

3. Isolated Python Inference Layer (The Reasoning Engine)

Python is intentionally sandboxed to a single responsibility: managing the local LLM runtime. It exposes a minimal interface over the Rust FFI boundary — receive a prompt with injected RAG context, return a completion. It does not touch the file system. It does not manage state. It does not know what triggered the query.

This isolation is what keeps the architecture stable. The Rust-Python FFI boundary is the most operationally sensitive seam in the system. Strict serialization protocols — all data crossing the boundary is serialized to a defined schema — prevent the subtle type mismatches and encoding errors that can silently corrupt retrieved context before it ever reaches the model.

Results & Impact (Ongoing)

Background Footprint: Full stack at idle holds comfortably under 400MB RAM, including the dormant inference runtime. The user’s machine does not notice A.L.I.C.E. is running.
RAG Retrieval Latency: Consistently under 50ms from query to ranked context chunks returned — fast enough to inject into a prompt without the user perceiving a delay.
Data Sovereignty: 100% offline. No outbound connections. The knowledge base is the user’s property, stored on their hardware, queryable without a network interface.

The Road Ahead

The next phase is Proactive Context Surfacing. Rather than waiting for a user query, A.L.I.C.E. will monitor the active window context — the file open in the IDE, the document being edited — and silently pre-fetch the most semantically relevant notes and past decisions into a warm retrieval buffer. By the time the user formulates a question, the answer is already staged. The goal is an assistant that anticipates rather than reacts — invisible, instantaneous, and entirely sovereign.

The A.L.I.C.E. Stack: Rust-Python-SQLite Hybrid Local Intelligence

Context

Decision

Alternatives Considered

Pure Python Daemon with ChromaDB

Electron/Node.js Background Agent with In-Memory JSON Index

Reasoning