CLAUDE.md

# CLAUDE.md

## The Mission

Memory is identity. When an AI forgets everything between conversations, it cannot build real understanding — of you, your work, your people, your life.

MemPalace exists to solve this. It is a memory system — not a search engine, not a RAG pipeline, not a vector database wrapper. It treats every word you have shared as sacred, stores it verbatim, and makes it instantly available. Your data never leaves your machine. We never summarize. We never paraphrase. We return your exact words.

100% recall is the design requirement — the target every search path is measured against. Anything less means forgetting, and forgetting means starting over.

The name comes from the ancient "method of loci" — the memory palace technique used for thousands of years to organize and recall vast amounts of information by placing it in imagined rooms of an imagined building. We were also inspired by the Zettelkasten method (created by German sociologist Niklas Luhmann) — small cross-referenced index cards that point to each other. We apply both ideas to AI memory:

- **Wings** for broad categories (people, projects, topics)
- **Rooms** for time-based groupings (days, sessions)
- **Drawers** for full verbatim content (your exact words)
- **AAAK compression** for the index layer — a compact symbolic format (via `dialect.py`) that lets an LLM scan thousands of entries instantly and know exactly which drawer to open

## Design Principles

These are non-negotiable. Every PR, every feature, every refactor must honor them.

- **Verbatim always** — Never summarize, paraphrase, or lossy-compress user data. The system searches the index and returns the original words. If a user said it, we store exactly what they said. This is the foundational promise.
- **Incremental only** — Append-only ingest after initial build. Never destroy existing data to rebuild. A crash mid-operation must leave the existing palace untouched.
- **Entity-first** — Everything is keyed by real names with disambiguation by DOB, ID, or context. People matter more than topics.
- **Local-first, zero external API by default** — All extraction, chunking, embedding, and LLM-assisted refinement happens on the user's machine by default, using locally-hosted runtimes (Ollama, LM Studio, llama.cpp, vLLM, unsloth studio, etc.). External providers (Anthropic, OpenAI, Google) are supported via BYOK but are never required and never enabled silently. The system never sends user content to a service the user has not explicitly configured. "Local LLM" is not an external API — Ollama and equivalents running on localhost are part of the user's machine. External BYOK is always a deliberate user choice, never a default and never a silent fallback.
- **Performance budgets** — Hooks under 500ms. Startup injection under 100ms. Memory should feel instant.
- **Privacy by architecture** — The system physically cannot send your data because it never leaves your machine. No telemetry, no phone-home, no external service dependencies for core operations.
- **Background everything** — Filing, indexing, timestamps, and pipeline work happen via hooks in the background. Nothing interrupts the user's conversation. Zero tokens spent on bookkeeping in the chat window.

## Contributing

We welcome bug fixes, performance improvements, new language support, better entity disambiguation, documentation, and test coverage.

We do not accept summarization of user content, cloud storage/sync features, telemetry or analytics, features requiring API keys for core memory, or shortcuts that bypass verbatim storage.

## Setup

```bash
uv sync --extra dev   # recommended; or: pip install -e ".[dev]"
```

## Commands

```bash
# Run tests
uv run pytest tests/ -v --ignore=tests/benchmarks

# Run tests with coverage
uv run pytest tests/ -v --ignore=tests/benchmarks --cov=mempalace --cov-report=term-missing

# Lint
uv run ruff check .

# Format
uv run ruff format .

# Format check (CI mode)
uv run ruff format --check .
```

## Project Structure

```
mempalace/
├── mcp_server.py        # MCP server — all read/write tools
├── ingest_server.py     # HTTP transcript-ingest endpoint (server mode only)
├── cli.py               # CLI dispatcher
├── config.py            # Configuration + input validation
├── miner.py             # Project file miner
├── convo_miner.py       # Conversation transcript miner
├── searcher.py          # Semantic search (hybrid BM25 + vector)
├── knowledge_graph.py   # Temporal entity-relationship graph (SQLite)
├── palace.py            # Shared palace operations
├── palace_graph.py      # Room traversal + cross-wing tunnels
├── backends/            # Pluggable storage backends (ChromaDB default)
│   ├── base.py          # Abstract interface — implement this for new backends
│   └── chroma.py        # ChromaDB implementation
├── dialect.py           # AAAK compression dialect
├── normalize.py         # Transcript format detection + normalization
├── entity_detector.py   # Auto-detect people/projects from content
├── entity_registry.py   # Entity storage and disambiguation
├── layers.py            # L0-L3 memory wake-up stack
├── onboarding.py        # Interactive first-run setup
├── repair.py            # Palace repair and consistency checks
├── dedup.py             # Deduplication
├── migrate.py           # ChromaDB version migration
├── spellcheck.py        # Auto-correct user messages
├── exporter.py          # Palace data export
├── hooks_cli.py         # Hook management CLI
├── query_sanitizer.py   # Prompt contamination prevention
├── split_mega_files.py  # Split concatenated transcript files
└── version.py           # Single source of truth for version

hooks/                              # Hook scripts for Claude Code / Codex CLI
├── mempal_save_hook_remote.sh      # Stop: HTTP POST to remote ingest endpoint
└── mempal_precompact_hook_remote.sh  # PreCompact: HTTP POST to remote ingest

deploy/unraid/                      # Containerized server-mode deployment
├── docker-compose.yml              # mempalace + caddy sidecar (auth + TLS)
├── Caddyfile                       # bearer-token auth, SSE-aware reverse proxy
├── mempalace-server.xml            # dockerMan template (no-auth, LAN-trust path)
└── README.md                       # Full install/usage/troubleshooting guide

Dockerfile                          # Builds the server-mode image
.dockerignore                       # Trims build context
```

## Conventions

- **Python style**: snake_case for functions/variables, PascalCase for classes
- **Linter**: ruff with E/F/W rules
- **Formatter**: ruff format, double quotes
- **Commits**: conventional commits (`fix:`, `feat:`, `test:`, `docs:`, `ci:`)
- **Tests**: `tests/test_*.py`, fixtures in `tests/conftest.py`
- **Coverage**: 85% threshold (80% on Windows due to ChromaDB file lock cleanup)

## Architecture

```
User → CLI / MCP Server → Storage Backend (ChromaDB default, pluggable)
                        → SQLite (knowledge graph)

Palace structure:
  WING (person/project)
    └── ROOM (day/topic)
          └── DRAWER (verbatim text chunk)

Index layer (AAAK):
  Compressed pointers → DRAWER locations
  Scanned by LLM to find relevant drawers without reading all content

Knowledge Graph:
  ENTITY → PREDICATE → ENTITY (with valid_from / valid_to dates)
```

## Key Files for Common Tasks

- **Adding an MCP tool**: `mempalace/mcp_server.py` — add handler function + TOOLS dict entry
- **Changing search**: `mempalace/searcher.py`
- **Modifying mining**: `mempalace/miner.py` (project files) or `mempalace/convo_miner.py` (transcripts)
- **Adding a storage backend**: subclass `mempalace/backends/base.py`, register in `backends/__init__.py`
- **Input validation**: `mempalace/config.py` — `sanitize_name()` / `sanitize_content()`
- **Server-mode deployment**: `deploy/unraid/` — see [`deploy/unraid/README.md`](deploy/unraid/README.md). Image is built from the repo-root `Dockerfile`. The HTTP transcript-ingest endpoint in `mempalace/ingest_server.py` runs as a daemon thread inside `mempalace-mcp` (single Chroma writer per palace) and is opt-in via `MEMPALACE_INGEST_PORT`.
- **Tests**: mirror source structure in `tests/test_<module>.py`

## Architectural notes

- **Server mode is opt-in.** The default install path (local CLI + stdio MCP server + local hooks) is unchanged. Server mode adds three things: a `Dockerfile`, an HTTP ingest thread that starts only when `MEMPALACE_INGEST_PORT` is set, and `*_remote.sh` hook variants that POST to that endpoint. Nothing in the local path imports the ingest server.
- **One ChromaDB writer per palace.** ChromaDB's HNSW index isn't safe across processes. The ingest endpoint is a thread inside the existing MCP server process — not a sibling container — so all writes serialize through one Python process and one Chroma client. Anyone adding a second writer (e.g. a sidecar that mines on a schedule) must do it in-process or via `mine_lock`.
- **"Local-first" boundary in server mode.** CLAUDE.md mission says data never leaves the user's machine. A user-controlled Unraid box on the user's LAN is still "the user's machine" — but the moment it accepts inbound HTTP, that property weakens to "user's machine + anyone with the bearer token + anyone who can MITM the LAN segment." Caddy's `tls internal` + bearer auth is the floor. Tailscale, mTLS, or a real CA cert are stronger options the user can layer on top.
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00			`# CLAUDE.md`

			`## The Mission`

			`Memory is identity. When an AI forgets everything between conversations, it cannot build real understanding — of you, your work, your people, your life.`

			`MemPalace exists to solve this. It is a memory system — not a search engine, not a RAG pipeline, not a vector database wrapper. It treats every word you have shared as sacred, stores it verbatim, and makes it instantly available. Your data never leaves your machine. We never summarize. We never paraphrase. We return your exact words.`

			`100% recall is the design requirement — the target every search path is measured against. Anything less means forgetting, and forgetting means starting over.`

			`The name comes from the ancient "method of loci" — the memory palace technique used for thousands of years to organize and recall vast amounts of information by placing it in imagined rooms of an imagined building. We were also inspired by the Zettelkasten method (created by German sociologist Niklas Luhmann) — small cross-referenced index cards that point to each other. We apply both ideas to AI memory:`

			`- Wings for broad categories (people, projects, topics)`
			`- Rooms for time-based groupings (days, sessions)`
			`- Drawers for full verbatim content (your exact words)`
			- AAAK compression for the index layer — a compact symbolic format (via `dialect.py`) that lets an LLM scan thousands of entries instantly and know exactly which drawer to open

			`## Design Principles`

			`These are non-negotiable. Every PR, every feature, every refactor must honor them.`

			`- Verbatim always — Never summarize, paraphrase, or lossy-compress user data. The system searches the index and returns the original words. If a user said it, we store exactly what they said. This is the foundational promise.`
			`- Incremental only — Append-only ingest after initial build. Never destroy existing data to rebuild. A crash mid-operation must leave the existing palace untouched.`
			`- Entity-first — Everything is keyed by real names with disambiguation by DOB, ID, or context. People matter more than topics.`
feat(init): context-aware corpus detection 2026-04-25 22:49:09 -07:00			- Local-first, zero external API by default — All extraction, chunking, embedding, and LLM-assisted refinement happens on the user's machine by default, using locally-hosted runtimes (Ollama, LM Studio, llama.cpp, vLLM, unsloth studio, etc.). External providers (Anthropic, OpenAI, Google) are supported via BYOK but are never required and never enabled silently. The system never sends user content to a service the user has not explicitly configured. "Local LLM" is not an external API — Ollama and equivalents running on localhost are part of the user's machine. External BYOK is always a deliberate user choice, never a default and never a silent fallback.
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00			`- Performance budgets — Hooks under 500ms. Startup injection under 100ms. Memory should feel instant.`
			`- Privacy by architecture — The system physically cannot send your data because it never leaves your machine. No telemetry, no phone-home, no external service dependencies for core operations.`
			`- Background everything — Filing, indexing, timestamps, and pipeline work happen via hooks in the background. Nothing interrupts the user's conversation. Zero tokens spent on bookkeeping in the chat window.`

			`## Contributing`

			`We welcome bug fixes, performance improvements, new language support, better entity disambiguation, documentation, and test coverage.`

			`We do not accept summarization of user content, cloud storage/sync features, telemetry or analytics, features requiring API keys for core memory, or shortcuts that bypass verbatim storage.`

			`## Setup`

			```bash
docs(install): recommend uv as the package manager 2026-05-08 01:37:46 -03:00			`uv sync --extra dev # recommended; or: pip install -e ".[dev]"`
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00			```

			`## Commands`

			```bash
			`# Run tests`
docs(install): recommend uv as the package manager 2026-05-08 01:37:46 -03:00			`uv run pytest tests/ -v --ignore=tests/benchmarks`
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00
			`# Run tests with coverage`
docs(install): recommend uv as the package manager 2026-05-08 01:37:46 -03:00			`uv run pytest tests/ -v --ignore=tests/benchmarks --cov=mempalace --cov-report=term-missing`
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00
			`# Lint`
docs(install): recommend uv as the package manager 2026-05-08 01:37:46 -03:00			`uv run ruff check .`
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00
			`# Format`
docs(install): recommend uv as the package manager 2026-05-08 01:37:46 -03:00			`uv run ruff format .`
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00
			`# Format check (CI mode)`
docs(install): recommend uv as the package manager 2026-05-08 01:37:46 -03:00			`uv run ruff format --check .`
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00			```

			`## Project Structure`

			```
			`mempalace/`
			`├── mcp_server.py # MCP server — all read/write tools`
cleanup and remote only 2026-05-09 10:52:25 -05:00			`├── ingest_server.py # HTTP transcript-ingest endpoint (server mode only)`
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00			`├── cli.py # CLI dispatcher`
			`├── config.py # Configuration + input validation`
			`├── miner.py # Project file miner`
			`├── convo_miner.py # Conversation transcript miner`
			`├── searcher.py # Semantic search (hybrid BM25 + vector)`
			`├── knowledge_graph.py # Temporal entity-relationship graph (SQLite)`
			`├── palace.py # Shared palace operations`
			`├── palace_graph.py # Room traversal + cross-wing tunnels`
			`├── backends/ # Pluggable storage backends (ChromaDB default)`
			`│ ├── base.py # Abstract interface — implement this for new backends`
			`│ └── chroma.py # ChromaDB implementation`
			`├── dialect.py # AAAK compression dialect`
			`├── normalize.py # Transcript format detection + normalization`
			`├── entity_detector.py # Auto-detect people/projects from content`
			`├── entity_registry.py # Entity storage and disambiguation`
			`├── layers.py # L0-L3 memory wake-up stack`
			`├── onboarding.py # Interactive first-run setup`
			`├── repair.py # Palace repair and consistency checks`
			`├── dedup.py # Deduplication`
			`├── migrate.py # ChromaDB version migration`
			`├── spellcheck.py # Auto-correct user messages`
			`├── exporter.py # Palace data export`
			`├── hooks_cli.py # Hook management CLI`
			`├── query_sanitizer.py # Prompt contamination prevention`
			`├── split_mega_files.py # Split concatenated transcript files`
			`└── version.py # Single source of truth for version`

cleanup and remote only 2026-05-09 10:52:25 -05:00			`hooks/ # Hook scripts for Claude Code / Codex CLI`
			`├── mempal_save_hook_remote.sh # Stop: HTTP POST to remote ingest endpoint`
			`└── mempal_precompact_hook_remote.sh # PreCompact: HTTP POST to remote ingest`

			`deploy/unraid/ # Containerized server-mode deployment`
			`├── docker-compose.yml # mempalace + caddy sidecar (auth + TLS)`
			`├── Caddyfile # bearer-token auth, SSE-aware reverse proxy`
			`├── mempalace-server.xml # dockerMan template (no-auth, LAN-trust path)`
			`└── README.md # Full install/usage/troubleshooting guide`

			`Dockerfile # Builds the server-mode image`
			`.dockerignore # Trims build context`
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00			```

			`## Conventions`

			`- Python style: snake_case for functions/variables, PascalCase for classes`
			`- Linter: ruff with E/F/W rules`
			`- Formatter: ruff format, double quotes`
			- Commits: conventional commits (`fix:`, `feat:`, `test:`, `docs:`, `ci:`)
			- Tests: `tests/test_*.py`, fixtures in `tests/conftest.py`
			`- Coverage: 85% threshold (80% on Windows due to ChromaDB file lock cleanup)`

			`## Architecture`

			```
			`User → CLI / MCP Server → Storage Backend (ChromaDB default, pluggable)`
			`→ SQLite (knowledge graph)`

			`Palace structure:`
			`WING (person/project)`
			`└── ROOM (day/topic)`
			`└── DRAWER (verbatim text chunk)`

			`Index layer (AAAK):`
			`Compressed pointers → DRAWER locations`
			`Scanned by LLM to find relevant drawers without reading all content`

			`Knowledge Graph:`
			`ENTITY → PREDICATE → ENTITY (with valid_from / valid_to dates)`
			```

			`## Key Files for Common Tasks`

			- Adding an MCP tool: `mempalace/mcp_server.py` — add handler function + TOOLS dict entry
			- Changing search: `mempalace/searcher.py`
			- Modifying mining: `mempalace/miner.py` (project files) or `mempalace/convo_miner.py` (transcripts)
			- Adding a storage backend: subclass `mempalace/backends/base.py`, register in `backends/__init__.py`
			- Input validation: `mempalace/config.py` — `sanitize_name()` / `sanitize_content()`
cleanup and remote only 2026-05-09 10:52:25 -05:00			- Server-mode deployment: `deploy/unraid/` — see [`deploy/unraid/README.md`](deploy/unraid/README.md). Image is built from the repo-root `Dockerfile`. The HTTP transcript-ingest endpoint in `mempalace/ingest_server.py` runs as a daemon thread inside `mempalace-mcp` (single Chroma writer per palace) and is opt-in via `MEMPALACE_INGEST_PORT`.
docs: add CLAUDE.md + mission/principles to AGENTS.md (#720 ) 2026-04-12 15:28:01 -07:00			- Tests: mirror source structure in `tests/test_<module>.py`
cleanup and remote only 2026-05-09 10:52:25 -05:00
			`## Architectural notes`

			- Server mode is opt-in. The default install path (local CLI + stdio MCP server + local hooks) is unchanged. Server mode adds three things: a `Dockerfile`, an HTTP ingest thread that starts only when `MEMPALACE_INGEST_PORT` is set, and `*_remote.sh` hook variants that POST to that endpoint. Nothing in the local path imports the ingest server.
			- One ChromaDB writer per palace. ChromaDB's HNSW index isn't safe across processes. The ingest endpoint is a thread inside the existing MCP server process — not a sibling container — so all writes serialize through one Python process and one Chroma client. Anyone adding a second writer (e.g. a sidecar that mines on a schedule) must do it in-process or via `mine_lock`.
			- "Local-first" boundary in server mode. CLAUDE.md mission says data never leaves the user's machine. A user-controlled Unraid box on the user's LAN is still "the user's machine" — but the moment it accepts inbound HTTP, that property weakens to "user's machine + anyone with the bearer token + anyone who can MITM the LAN segment." Caddy's `tls internal` + bearer auth is the floor. Tailscale, mTLS, or a real CA cert are stronger options the user can layer on top.