Files
mempalace/website/guide/local-models.md
T
Igor Lins e Silva dfb22f5345 docs: add VitePress documentation site
- 22 content pages across Guide, Concepts, and Reference sections
- Custom indigo/cyan theme with Lucide icons and Mermaid diagrams
- GitHub Actions workflow for GitHub Pages deployment
- Live preview: https://mempalace-docs.netlify.app/
2026-04-09 19:41:08 -03:00

2.0 KiB

Local Models

MemPalace works with any local LLM — Llama, Mistral, or any offline model. Since local models generally don't speak MCP yet, there are two approaches.

Wake-Up Command

Load your world into the model's context:

mempalace wake-up > context.txt
# Paste context.txt into your local model's system prompt

This gives your local model a bounded wake-up context, typically around ~600-900 tokens in the current implementation. It includes:

  • Layer 0: Your identity — who you are, what you work on
  • Layer 1: Top moments from the palace — key decisions, recent work

For project-specific context:

mempalace wake-up --wing driftwood > context.txt

Query on demand, feed results into your prompt:

mempalace search "auth decisions" > results.txt
# Include results.txt in your prompt

Python API

For programmatic integration with your local model pipeline:

from mempalace.searcher import search_memories

results = search_memories(
    "auth decisions",
    palace_path="~/.mempalace/palace",
)

# Format results for your model's context
context = "\n".join(
    f"[{r['wing']}/{r['room']}] {r['text']}"
    for r in results["results"]
)

# Inject into your local model's prompt
prompt = f"Context from memory:\n{context}\n\nUser: What did we decide about auth?"

AAAK for Compression

Use AAAK dialect to compress wake-up context further:

mempalace compress --wing myapp --dry-run

AAAK is readable by any LLM that reads text — Claude, GPT, Gemini, Llama, Mistral — without a decoder.

Full Offline Stack

The core memory stack can run offline:

  • ChromaDB on your machine — vector storage and search
  • Local model on your machine — reasoning and responses
  • AAAK for compression — optional, no cloud dependency
  • Optional reranking or external model integrations may introduce cloud calls, depending on how you configure the system