Files

T

Igor Lins e Silva f895bc58e6 fix(entity_detector): script-aware word boundaries for combining-mark scripts

Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras)
like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w.
This means \b splits mid-word on every matra: names like अनीता (Anita)
truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b
never match because \b fails after the final matra of कहा.

Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script
whose words contain combining marks.

Fix: locales with combining-mark scripts declare a boundary_chars field
in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n
loader replaces every \b in that locale's patterns with a script-aware
lookaround that treats the declared characters as "inside-word", and
pre-wraps candidate/multi_word patterns with the same boundary.

Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru,
it are unchanged.

Changes:
- mempalace/i18n/__init__.py: add _script_boundary, _expand_b,
  _wrap_candidate, _collect_entity_section; candidate_patterns are now
  returned fully-wrapped (boundary + capture group applied)
- mempalace/entity_detector.py: extract_candidates compiles pre-wrapped
  candidate patterns directly instead of re-wrapping with \b
- tests/test_entity_detector.py: 5 new tests for Devanagari boundaries
  (name extraction with/without boundary_chars, person-verb firing,
  English regression)

2026-04-15 22:18:52 -03:00

backends

refactor: route all chromadb access through ChromaBackend

2026-04-14 00:31:16 -03:00

i18n

fix(entity_detector): script-aware word boundaries for combining-mark scripts

2026-04-15 22:18:52 -03:00

instructions

fix: add --yes flag to init instructions for non-interactive use (#534 ) (#682 )

2026-04-12 14:23:29 -07:00

__init__.py

release: v3.3.0 (#839 )

2026-04-13 18:25:01 -07:00

__main__.py

MemPalace: palace architecture, AAAK compression, knowledge graph

2026-04-04 18:16:04 -07:00

cli.py

refactor(entity_detector): make multi-language extensible via i18n JSON

2026-04-15 08:52:42 -03:00

closet_llm.py

release: v3.3.0 (#839 )

2026-04-13 18:25:01 -07:00

config.py

refactor(entity_detector): make multi-language extensible via i18n JSON

2026-04-15 08:52:42 -03:00

convo_miner.py

release: v3.3.0 (#839 )

2026-04-13 18:25:01 -07:00

dedup.py

refactor: route all chromadb access through ChromaBackend

2026-04-14 00:31:16 -03:00

dialect.py

fix: address i18n review issues from PR #718

2026-04-15 11:03:28 +05:00

diary_ingest.py

release: v3.3.0 (#839 )

2026-04-13 18:25:01 -07:00

entity_detector.py

fix(entity_detector): script-aware word boundaries for combining-mark scripts

2026-04-15 22:18:52 -03:00

entity_registry.py

refactor(entity_detector): make multi-language extensible via i18n JSON

2026-04-15 08:52:42 -03:00

exporter.py

fix: restrict file permissions on sensitive palace data (#814 )

2026-04-15 00:27:03 -07:00

fact_checker.py

release: v3.3.0 (#839 )

2026-04-13 18:25:01 -07:00

general_extractor.py

MemPalace: palace architecture, AAAK compression, knowledge graph

2026-04-04 18:16:04 -07:00

hooks_cli.py

fix: restrict file permissions on sensitive palace data (#814 )

2026-04-15 00:27:03 -07:00

instructions_cli.py

feat: add MemPalace Claude Code plugin with hooks and instructions

2026-04-08 14:55:46 +03:00

knowledge_graph.py

fix: restrict file permissions on sensitive palace data (#814 )

2026-04-15 00:27:03 -07:00

layers.py

fix(searcher): guard against empty ChromaDB query results (#195 ) (#865 )

2026-04-15 00:26:38 -07:00

mcp_server.py

fix: restrict file permissions on sensitive palace data (#814 )

2026-04-15 00:27:03 -07:00

migrate.py

refactor: route all chromadb access through ChromaBackend

2026-04-14 00:31:16 -03:00

miner.py

fix: send missing-yaml warning to stderr and flag basename collisions

2026-04-14 13:53:07 -03:00

normalize.py

fix: add provenance header and speaker IDs to Slack transcript imports (#815 )

2026-04-15 00:27:01 -07:00

onboarding.py

test: add comprehensive test coverage (35% → 58%, threshold 50%)

2026-04-08 20:54:56 +03:00

palace_graph.py

release: v3.3.0 (#839 )

2026-04-13 18:25:01 -07:00

palace.py

release: v3.3.0 (#839 )

2026-04-13 18:25:01 -07:00

py.typed

chore: tighten chromadb version range and add py.typed marker

2026-04-07 18:51:42 -03:00

query_sanitizer.py

fix: address Copilot review comments on PR #739

2026-04-12 23:07:46 -03:00

README.md

MemPalace: palace architecture, AAAK compression, knowledge graph

2026-04-04 18:16:04 -07:00

repair.py

refactor: route all chromadb access through ChromaBackend

2026-04-14 00:31:16 -03:00

room_detector_local.py

fix: skip unreachable reparse points in detect_rooms_from_folders (#558 )

2026-04-11 16:16:06 -07:00

searcher.py

feat: include created_at timestamp in search results (#846 )

2026-04-15 00:26:57 -07:00

spellcheck.py

MemPalace: palace architecture, AAAK compression, knowledge graph

2026-04-04 18:16:04 -07:00

split_mega_files.py

fix: expand ~ in split command directory argument (#361 )

2026-04-11 23:14:28 -07:00

version.py

release: v3.3.0 (#839 )

2026-04-13 18:25:01 -07:00

README.md

mempalace/ — Core Package

The Python package that powers MemPalace. All modules, all logic.

Modules

Module	What it does
`cli.py`	CLI entry point — routes to mine, search, init, compress, wake-up
`config.py`	Configuration loading — `~/.mempalace/config.json`, env vars, defaults
`normalize.py`	Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format
`miner.py`	Project file ingest — scans directories, chunks by paragraph, stores to ChromaDB
`convo_miner.py`	Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content
`searcher.py`	Semantic search via ChromaDB vectors — filters by wing/room, returns verbatim + scores
`layers.py`	4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search)
`dialect.py`	AAAK compression — entity codes, emotion markers, 30x lossless ratio
`knowledge_graph.py`	Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation
`palace_graph.py`	Room-based navigation graph — BFS traversal, tunnel detection across wings
`mcp_server.py`	MCP server — 19 tools, AAAK auto-teach, Palace Protocol, agent diary
`onboarding.py`	Guided first-run setup — asks about people/projects, generates AAAK bootstrap + wing config
`entity_registry.py`	Entity code registry — maps names to AAAK codes, handles ambiguous names
`entity_detector.py`	Auto-detect people and projects from file content
`general_extractor.py`	Classifies text into 5 memory types (decision, preference, milestone, problem, emotional)
`room_detector_local.py`	Maps folders to room names using 70+ patterns — no API
`spellcheck.py`	Name-aware spellcheck — won't "correct" proper nouns in your entity registry
`split_mega_files.py`	Splits concatenated transcript files into per-session files

Architecture

User → CLI → miner/convo_miner → ChromaDB (palace)
                                     ↕
                              knowledge_graph (SQLite)
                                     ↕
User → MCP Server → searcher → results
                  → kg_query → entity facts
                  → diary    → agent journal

The palace (ChromaDB) stores verbatim content. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool.