Adds mempalace/closet_llm.py as an OPTIONAL path for richer closet
generation. Regex closets remain the default and cover the local-first
promise; users who want LLM-quality topics can bring their own endpoint.
Configuration (env or CLI flag):
LLM_ENDPOINT — OpenAI-compatible base URL (required)
LLM_KEY — bearer token (optional; local inference skips this)
LLM_MODEL — model name (required)
Works with Ollama, vLLM, llama.cpp servers, OpenAI, OpenRouter, and any
other provider that speaks OpenAI-compatible /chat/completions. Zero new
dependencies — uses stdlib urllib.
Replaces the original Anthropic-SDK-hardcoded version of this module
from Milla's branch (commit 935f657). Same prompt, same parsing, same
regenerate_closets flow; only the transport was generalised so the
feature doesn't lock users into a specific vendor or require API keys
for core memory operations (CLAUDE.md, "Local-first, zero API").
Includes 13 unit tests covering config resolution, request shape,
auth-header omission when no key is set, code-fence stripping, and
missing-config error path. All mocked — zero network calls in tests.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
fact_checker.py verifies text for contradictions against locally stored
entities and KG facts. Catches similar-name confusion (Bob vs Bobby),
relationship mismatches (KG says husband, text says brother), and
stale facts (KG valid_from/valid_to).
No hardcoded facts. No network calls. Reads:
- ~/.mempalace/known_entities.json
- KnowledgeGraph SQLite
Usage:
from mempalace.fact_checker import check_text
issues = check_text("Bob is Alice's brother", palace_path)
# CLI
python -m mempalace.fact_checker "text" --palace ~/.mempalace/palace
Extracted from Milla's commit 935f657 which bundled this with
closet_llm (deferred) and drawer-grep (PR #791). Ported only
fact_checker.py — verified no network / API imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a closet hit leads to a source file with many drawers, grep each
chunk for query terms and return the BEST-MATCHING chunk + 1 neighbor
on each side, instead of dumping the whole file truncated at
MAX_HYDRATION_CHARS. Result now includes drawer_index and
total_drawers so callers can request adjacent drawers explicitly.
Extracted from Milla's commit 935f657 which bundled drawer-grep with
closet_llm (deferred pending LLM_ENDPOINT refactor) and fact_checker
(separate PR). Ported only the searcher.py change.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Appended from Milla's omnibus test_closets.py — covers create,
list, delete, dedup, and follow_tunnels behavior. 21/21 pass.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
Adds active tunnel creation alongside passive tunnel discovery.
Passive tunnels (existing): rooms with the same name across wings.
Explicit tunnels (new): agent-created links between specific
locations. "This API design in project_api relates to the database
schema in project_database."
New functions in palace_graph.py:
- create_tunnel() — link two wing/room pairs with a label
- list_tunnels() — list all explicit tunnels, filter by wing
- delete_tunnel() — remove a tunnel by ID
- follow_tunnels() — from a room, find all connected rooms in
other wings with drawer content previews
New MCP tools:
- mempalace_create_tunnel
- mempalace_list_tunnels
- mempalace_delete_tunnel
- mempalace_follow_tunnels
Tunnels stored in ~/.mempalace/tunnels.json (persists across
palace rebuilds). Deduplicated by endpoint pair.
689/689 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trimmed version of Milla's omnibus test_closets.py to only cover
features present in this PR stack (#784 lock, #788 closets, this
PR's entity/BM25/diary). Strip-noise tests will land with #785;
tunnel tests will land with the tunnels PR.
16/16 pass.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
Three features that close the gap between the architecture docs
and the actual codebase:
1. Entity metadata on drawers and closets
- _extract_entities_for_metadata() pulls names from known_entities.json
+ proper nouns appearing 2+ times
- Stamped as "entities" field in ChromaDB metadata
- Enables filterable search by person/project name
2. Day-based diary ingest (diary_ingest.py)
- ONE drawer per day, upserted as the day grows
- Closets pack topics atomically, never split mid-topic
- Tracks entry count in state file, only processes new entries
- Usage: python -m mempalace.diary_ingest --dir ~/summaries
3. BM25 hybrid search in searcher.py
- _bm25_score() keyword matching complements vector similarity
- _hybrid_rank() combines both signals (60% vector, 40% BM25)
- Catches exact name/term matches that embeddings miss
- Applied to both closet-first and direct drawer search paths
689/689 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cherry-picked the docs portion of 67e4ac6 to accompany the closet
feature. Test coverage for closets is omnibus with tests for entity
metadata and BM25 (see PR targeting those features) and will land
together in a follow-up.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
- upsert_closet replaced by upsert_closet_lines: checks each topic
line individually against CLOSET_CHAR_LIMIT. If adding one line
WHOLE would exceed the limit, starts a new closet. Never splits
mid-topic.
- build_closet_lines returns a list of atomic lines (not joined text)
- Richer extraction: section headers, more action verbs, up to 3
quotes, up to 12 topics per file
- Each line is complete: topic|entities|→drawer_refs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The closet architecture was always part of MemPalace's design but
never shipped in the public codebase. This adds it.
Palace now has TWO collections:
- mempalace_drawers — full verbatim content (unchanged)
- mempalace_closets — compact AAAK-style index entries
How it works:
- When mining, each file gets a closet alongside its drawers
- Closet contains extracted topics, entities, quotes as pointers
- Closets pack up to 1500 chars, topics never split mid-entry
- Search hits closets first (fast, small), then hydrates the
full drawer content for matching files
- Falls back to direct drawer search if no closets exist yet
Files changed:
- palace.py: get_closets_collection(), build_closet_text(),
upsert_closet(), CLOSET_CHAR_LIMIT
- miner.py: process_file() now creates closets after drawers
- searcher.py: search_memories() tries closet-first search,
hydrates drawers, falls back to direct search
Backwards compatible — existing palaces without closets continue
to work via the fallback path. Closets are created on next mine.
689/689 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: when multiple agents mine simultaneously, both pass
file_already_mined() check, both delete+insert the same file's
drawers, creating duplicates or losing data.
Fix: mine_lock() in palace.py — cross-platform file lock (fcntl on
Unix, msvcrt on Windows). Both miner.py and convo_miner.py now lock
per-file during the delete+insert cycle and re-check after acquiring
the lock.
Tested:
- Lock acquires and releases correctly
- Second agent blocks until first releases (0.25s wait)
- 33/33 existing tests pass
- Cross-platform: fcntl (macOS/Linux), msvcrt (Windows)
Based on v3.2.0 tag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: parse Claude.ai privacy export with messages key and sender field (#677)
The privacy-export branch in _try_claude_ai_json only checked for the
"chat_messages" key, missing exports that use "messages" instead. It
also only read the "role" field while real privacy exports use "sender".
Both gaps caused the file to fall through to plain-text, producing a
single giant drawer.
Changes:
- Accept "messages" alongside "chat_messages" in the conversation-object
guard and inner extraction.
- Accept "sender" alongside "role" as the author field.
- Fall back to a top-level "text" key when content blocks are empty.
- Produce one transcript per conversation instead of concatenating all
conversations into a single blob.
- Extract shared logic into _collect_claude_messages helper.
- Add 6 regression tests covering each variant.
* style: apply ruff format to normalize.py
* fix: guard against null text field in Claude.ai export parsing
item.get("text", "").strip() crashes when "text" is explicitly null
in the JSON (legal and observed in some exports). Use
(item.get("text") or "").strip() and add a regression test.
---------
Co-authored-by: Igor Lins e Silva <4753812+igorls@users.noreply.github.com>
When external tools write to the palace database (CLI mining, scripts), the MCP server's cached ChromaDB collection becomes stale — its HNSW index doesn't know about new vectors. Develop already invalidates on inode changes (catches rebuilds) but not on mtime changes (misses in-place writes).
This PR:
- Adds st_mtime tracking alongside st_ino in _get_client; invalidates the cached client on either change.
- Adds the mempalace_reconnect MCP tool for explicit cache flush.
Original author: @jphein (#663). Original approval: @Ari4ka.
Skips test_missing_db_invalidates_cache on Windows (ChromaDB holds chroma.sqlite3 open).
'(r)roject' had a duplicate 'r', making it read as '(r)roject'
instead of the intended '(r)project'.
Small UX fix — no behavior change.
Co-authored-by: Arnold Wender <arnold.wender@gmail.com>
Full changelog from git history and merged PRs:
- v3.0.0 (2026-04-06): initial public release
- v3.1.0 (2026-04-09): 80+ commits, security hardening, Windows compat, tests 20→92
- Unreleased/v3.2.0: 50+ commits, i18n, backend seam, migrate command, more security
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- query_sanitizer: require matching quote pair in _strip_wrapping_quotes
- query_sanitizer: re-check MIN_QUERY_LENGTH after trim in tail_sentence path
- migrate: neutral confirmation message accurate for both migrate and repair
- cli: os.path.normpath instead of rstrip to handle '/' root edge case
* docs: add CLAUDE.md + mission/principles to AGENTS.md
Add project mission, design principles, and contribution guidelines
to CLAUDE.md (new file) and prepend them to AGENTS.md.
Six non-negotiable principles: verbatim always, incremental only,
entity-first, local-first zero API, performance budgets, privacy
by architecture.
* docs: update CLAUDE.md with Milla's edits, add MISSION.md, symlink AGENTS.md
CLAUDE.md:
- Add Zettelkasten + AAAK explanation to mission (Milla's edit)
- Add 7th principle: background everything
- Update project structure to match current develop (all modules)
- Fix hooks listing to match actual public repo
- Add backends/ to structure and key files
MISSION.md:
- Milla's personal narrative on why MemPalace exists
- Origin story, AAAK inside joke, v4 goals
AGENTS.md:
- Now symlinks to CLAUDE.md (single source of truth)
* docs: trim MISSION.md — remove v4 notes and workflow (moved to private)
* fix: register 0-chunk files to prevent re-processing on every mine (#654)
mine_convos() has three early-exit paths (OSError, content too short,
zero chunks) that skip writing anything to ChromaDB. Since
file_already_mined() checks for the presence of a document with a
matching source_file, these files are re-read and re-processed on
every subsequent run.
Add _register_file() that upserts a lightweight sentinel document
(room="_registry", ingest_mode="registry") so file_already_mined()
returns True on future runs.
Note: Bug 2 from the issue (drawers_added counter always 0) was
already resolved upstream via the switch from collection.add() to
collection.upsert().
* fix: resolve macOS path symlink in test + remove unused variable
The _chunk_by_exchange() function was silently truncating AI responses
to 8 lines via ai_lines[:8]. Any content beyond line 8 was discarded,
violating the project's verbatim storage principle.
Now the full AI response is preserved. When a combined exchange exceeds
CHUNK_SIZE (800 chars, aligned with miner.py), it is split across
consecutive drawers instead of being truncated.
* fix: return "general" room from process_file error paths (#586)
process_file() returned (0, None) for already-mined, unreadable, and
too-short files. In --dry-run mode the caller always enters the
room_counts branch, so None ended up as a dict key and crashed the
summary printer with "unsupported format string passed to
NoneType.__format__".
Returning "general" instead of None makes the function contract
explicit: it always yields (int, str). This matches the consensus
fix discussed in the issue thread.
* style: apply ruff format to test_miner.py
* fix: skip arg whitelist for handlers accepting **kwargs (#572)
The schema-based argument filter (from #647) strips all kwargs not
declared in input_schema. This breaks handlers that accept **kwargs
for pass-through to ChromaDB or other backends.
Add inspect.Parameter.VAR_KEYWORD check before filtering — handlers
with **kwargs receive all arguments unfiltered.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: guard inspect.signature failure, default to filtering
Wrap inspect.signature() in try/except — on failure, default to
filtering (safe fallback). Addresses Copilot feedback on fragility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: allow Unicode in sanitize_name() — Latvian, CJK, Cyrillic names (#637)
_SAFE_NAME_RE was ASCII-only ([a-zA-Z0-9]), rejecting valid Unicode
names like "Jānis" or "太郎". Changed to \w which matches Unicode
word characters (letters, digits, underscore) in Python 3.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: tighten Unicode regex, add sanitize_name tests
Use [^\W_] for first/last char to allow Unicode letters/digits but
reject leading/trailing underscores (Copilot feedback). Add 7 tests
covering Latvian, CJK, Cyrillic, path traversal, and edge cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
AI agents calling `mempalace init` hang on the interactive confirmation
prompt. The --yes flag skips it, enabling automated setup.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused `json` and `current_lang` imports from
mempalace/i18n/test_i18n.py (F401)
- Reformat Dialect.__init__ signature in mempalace/dialect.py
(ruff format collapses multi-line signature, adds blank line
after lazy import)
Both auto-fixes from `ruff check --fix` / `ruff format`. No behavioral
changes.
Add language dictionaries: English, French, Korean, Japanese, Spanish,
German, Simplified Chinese, Traditional Chinese.
Each language is a single JSON file with:
- Localized terms (palace, wing, closet, drawer, etc.)
- CLI output strings with {var} interpolation
- AAAK compression instructions in that language
- Regex patterns for offline topic/quote/action extraction
Usage: Dialect(lang="ko") or set "language": "ko" in config.
Contributors can add new languages by copying en.json and translating.
Dialect class now accepts lang param and loads AAAK instruction +
regex patterns from the i18n dictionary automatically.
Tests: mempalace/i18n/test_i18n.py — all 8 languages pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The repo moved to the MemPalace org but several docs still point at the
old milla-jovovich URLs. Also, CONTRIBUTING.md tells people to PR
against main while the actual workflow (per ROADMAP.md) targets develop.
Files touched:
- CONTRIBUTING.md: clone URL, issues URL, PR target branch
- examples/gemini_cli_setup.md: clone URL
- integrations/openclaw/SKILL.md: homepage and license URLs
The `_load_api_key()` function in longmemeval_bench.py and locomo_bench.py
searched for API keys in a fixed path (`~/.config/lu/keys.json`) using
personal key names (`anthropic_milla`, `anthropic_claude_code_main`).
This leaks internal infrastructure details into the public codebase and
trains contributors to store credentials in a non-standard location
rather than using the standard ANTHROPIC_API_KEY env var.
Simplified to: CLI flag > env var > empty string. Updated help text
and HYBRID_MODE.md docs to match.
Co-authored-by: Tadao <tadao@travisfixes.com>
Path("~/foo") does not expand tilde on its own, causing
`mempalace split ~/some/dir` to silently find no files.
Fix by calling .expanduser().resolve() in both places the
path is constructed: cmd_split in cli.py (defensive, at the
CLI boundary) and main() in split_mega_files.py (the root cause).
Co-authored-by: Brooke Whatnall <brookewhatnall@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The module-level `ssl._create_default_https_context = ssl._create_unverified_context`
disables certificate verification for ALL urllib requests in the process,
not just the benchmark's HuggingFace downloads. This silently exposes
the benchmark runner to MITM attacks.
If a specific environment needs to skip verification (e.g. corporate proxy),
users can set `PYTHONHTTPSVERIFY=0` or pass a custom ssl context per-request
rather than globally patching the ssl module.
Co-authored-by: Tadao <tadao@travisfixes.com>