4631d6a7db
The init step's output was a dead file. miner.py has always read
`~/.mempalace/known_entities.json` to tag drawer metadata with
recognized names, but nothing ever wrote it — so init's careful
manifest + git + LLM detection work stopped at `<project>/entities.json`
and never reached the path that actually uses it.
Measured delta on a representative prose snippet (eight sentences
mentioning six real people and four real projects):
- Empty registry: 0 entities recognized (multi-word names fail the
frequency threshold; lowercase/hyphenated project names don't match
the CamelCase regex).
- Registry populated by init: 12 entities recognized (all correct, zero
false positives).
Every recognized name becomes a semicolon-separated metadata tag on the
drawer, which ChromaDB uses for entity-filtered search.
Implementation:
- `miner.add_to_known_entities({category: [names]})` reads the existing
registry, unions each category (case-insensitively, preserving first-
seen casing), and writes back. The function is tolerant of the two
on-disk shapes miner already supports: list of names, or dict mapping
name → code (dialect-style). In the dict case new names are added as
keys with `None` values so existing codes aren't overwritten.
- Invalidates the in-process mtime cache so same-process callers
(`cmd_init` → `cmd_mine` in one run) see the write immediately.
- Writes with `ensure_ascii=False` so non-ASCII names (Gergő Móricz,
Arturo Domínguez, etc.) stay readable on disk.
- Chmods 0o600 — the registry mirrors confirm-step PII from the user's
git authors and local paths.
cmd_init now calls this at the end of the confirm-entities step, after
the per-project `entities.json` is written (which is kept as an audit
trail the user can inspect or hand-edit). The per-project file is still
excluded from mining via `SKIP_FILENAMES` from the earlier fix.
17 new tests cover: fresh-file creation, list-category union, case-
insensitive dedup, preservation of untouched categories, dict-format
registries, malformed/non-dict file recovery, cache invalidation,
unicode round-trip, and an end-to-end verification that the miner's
`_extract_entities_for_metadata` picks up every registered name.
mempalace/ — Core Package
The Python package that powers MemPalace. All modules, all logic.
Modules
| Module | What it does |
|---|---|
cli.py |
CLI entry point — routes to mine, search, init, compress, wake-up |
config.py |
Configuration loading — ~/.mempalace/config.json, env vars, defaults |
normalize.py |
Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format |
miner.py |
Project file ingest — scans directories, chunks by paragraph, stores to ChromaDB |
convo_miner.py |
Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content |
searcher.py |
Semantic search via ChromaDB vectors — filters by wing/room, returns verbatim + scores |
layers.py |
4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search) |
dialect.py |
AAAK compression — entity codes, emotion markers, 30x lossless ratio |
knowledge_graph.py |
Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation |
palace_graph.py |
Room-based navigation graph — BFS traversal, tunnel detection across wings |
mcp_server.py |
MCP server — 19 tools, AAAK auto-teach, Palace Protocol, agent diary |
onboarding.py |
Guided first-run setup — asks about people/projects, generates AAAK bootstrap + wing config |
entity_registry.py |
Entity code registry — maps names to AAAK codes, handles ambiguous names |
entity_detector.py |
Auto-detect people and projects from file content |
general_extractor.py |
Classifies text into 5 memory types (decision, preference, milestone, problem, emotional) |
room_detector_local.py |
Maps folders to room names using 70+ patterns — no API |
spellcheck.py |
Name-aware spellcheck — won't "correct" proper nouns in your entity registry |
split_mega_files.py |
Splits concatenated transcript files into per-session files |
Architecture
User → CLI → miner/convo_miner → ChromaDB (palace)
↕
knowledge_graph (SQLite)
↕
User → MCP Server → searcher → results
→ kg_query → entity facts
→ diary → agent journal
The palace (ChromaDB) stores verbatim content. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool.