Files
mempalace/mempalace
Igor Lins e Silva 6aebf458ff fix(entity): reduce noise in regex-based detection
The pattern-matching detector had several systematic false positives that
crowded the init review with nonsense. Concrete fixes:

- CamelCase extraction: add `[A-Z][a-z]+(?:[A-Z][a-z]+|[A-Z]{2,})+` to
  candidate patterns so `MemPalace`, `ChromaDB`, `OpenAI`, `ChatGPT` are
  visible. Previously `MemPalace` fragmented into `Mem` + `Palace`.
- Dialogue `^NAME:\s` requires >=2 matches to count. A single metadata
  line like `Created: 2026-04-21` was scoring as dialogue and classifying
  `Created` as a person.
- Versioned/hyphenated pattern tightened to `\b{name}[-_]v?\d+(?:\.\d+)*\b`
  (version-only). The previous `\b{name}[-v]\w+` matched `context-manager`,
  `multi-word`, etc. - every hyphenated compound.
- Skip LICENSE/COPYING/NOTICE/AUTHORS/PATENTS files during scan. They
  produce pure-English-prose noise (`Contributor`, `Software`, `Covered`,
  `Before`).
- Extra SKIP_DIRS: `.terraform`, `vendor`, `target`.
- Expand stopword list with capitalized participles/descriptors that
  commonly appear at sentence start: `created`, `updated`, `extracted`,
  `processed`, `total`, `summary`, `auto`, `multi`, `hybrid`, `context`,
  `bridge`, `batch`, `local`, `native`, `never`, `before`, `after`, etc.
- classify_entity: high-pronoun single-category signal now classifies as
  person. A diary's main character gets referenced with pronouns, not
  dialogue markers - requiring two signal categories demoted `Lu` (16
  pronoun hits across 30 mentions) to uncertain. Gate on
  `pronoun_hits >= 5 AND pronoun_hits / frequency >= 0.2` so common
  sentence-start words (`Never`, `Before`) with incidental proximity
  stay uncertain.
2026-04-24 00:20:32 -03:00
..
2026-04-13 18:25:01 -07:00
2026-04-13 18:25:01 -07:00
2026-04-13 18:25:01 -07:00
2026-04-16 10:38:38 +05:00
2026-04-23 16:44:22 -07:00

mempalace/ — Core Package

The Python package that powers MemPalace. All modules, all logic.

Modules

Module What it does
cli.py CLI entry point — routes to mine, search, init, compress, wake-up
config.py Configuration loading — ~/.mempalace/config.json, env vars, defaults
normalize.py Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format
miner.py Project file ingest — scans directories, chunks by paragraph, stores to ChromaDB
convo_miner.py Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content
searcher.py Semantic search via ChromaDB vectors — filters by wing/room, returns verbatim + scores
layers.py 4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search)
dialect.py AAAK compression — entity codes, emotion markers, 30x lossless ratio
knowledge_graph.py Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation
palace_graph.py Room-based navigation graph — BFS traversal, tunnel detection across wings
mcp_server.py MCP server — 19 tools, AAAK auto-teach, Palace Protocol, agent diary
onboarding.py Guided first-run setup — asks about people/projects, generates AAAK bootstrap + wing config
entity_registry.py Entity code registry — maps names to AAAK codes, handles ambiguous names
entity_detector.py Auto-detect people and projects from file content
general_extractor.py Classifies text into 5 memory types (decision, preference, milestone, problem, emotional)
room_detector_local.py Maps folders to room names using 70+ patterns — no API
spellcheck.py Name-aware spellcheck — won't "correct" proper nouns in your entity registry
split_mega_files.py Splits concatenated transcript files into per-session files

Architecture

User → CLI → miner/convo_miner → ChromaDB (palace)
                                     ↕
                              knowledge_graph (SQLite)
                                     ↕
User → MCP Server → searcher → results
                  → kg_query → entity facts
                  → diary    → agent journal

The palace (ChromaDB) stores verbatim content. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool.