4aa7e1eebde36d44e86f284259c79346d2d4905e
10 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4aa7e1eebd |
release: v3.3.0 (#839)
* fix: add file-level locking to prevent multi-agent duplicate drawers
Root cause: when multiple agents mine simultaneously, both pass
file_already_mined() check, both delete+insert the same file's
drawers, creating duplicates or losing data.
Fix: mine_lock() in palace.py — cross-platform file lock (fcntl on
Unix, msvcrt on Windows). Both miner.py and convo_miner.py now lock
per-file during the delete+insert cycle and re-check after acquiring
the lock.
Tested:
- Lock acquires and releases correctly
- Second agent blocks until first releases (0.25s wait)
- 33/33 existing tests pass
- Cross-platform: fcntl (macOS/Linux), msvcrt (Windows)
Based on v3.2.0 tag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: strip system tags, hook output, and Claude UI chrome from drawers
normalize.py now strips before filing:
- <system-reminder>, <command-message>, <command-name> tags
- <task-notification>, <user-prompt-submit-hook>, <hook_output> tags
- Hook status messages (CURRENT TIME, Checking verified facts, etc.)
- Claude Code UI chrome (ctrl+o to expand, progress bars, etc.)
- Collapsed runs of blank lines
This noise was going straight into drawers, wasting storage space
and polluting search results. strip_noise() runs on all normalized
output regardless of input format (JSONL, JSON, plain text).
689/689 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add closet layer — searchable index pointing to drawers
The closet architecture was always part of MemPalace's design but
never shipped in the public codebase. This adds it.
Palace now has TWO collections:
- mempalace_drawers — full verbatim content (unchanged)
- mempalace_closets — compact AAAK-style index entries
How it works:
- When mining, each file gets a closet alongside its drawers
- Closet contains extracted topics, entities, quotes as pointers
- Closets pack up to 1500 chars, topics never split mid-entry
- Search hits closets first (fast, small), then hydrates the
full drawer content for matching files
- Falls back to direct drawer search if no closets exist yet
Files changed:
- palace.py: get_closets_collection(), build_closet_text(),
upsert_closet(), CLOSET_CHAR_LIMIT
- miner.py: process_file() now creates closets after drawers
- searcher.py: search_memories() tries closet-first search,
hydrates drawers, falls back to direct search
Backwards compatible — existing palaces without closets continue
to work via the fallback path. Closets are created on next mine.
689/689 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: enforce atomic topics in closets, extract richer pointers
- upsert_closet replaced by upsert_closet_lines: checks each topic
line individually against CLOSET_CHAR_LIMIT. If adding one line
WHOLE would exceed the limit, starts a new closet. Never splits
mid-topic.
- build_closet_lines returns a list of atomic lines (not joined text)
- Richer extraction: section headers, more action verbs, up to 3
quotes, up to 12 topics per file
- Each line is complete: topic|entities|→drawer_refs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add CLOSETS.md — closet layer overview
Cherry-picked the docs portion of 67e4ac6 to accompany the closet
feature. Test coverage for closets is omnibus with tests for entity
metadata and BM25 (see PR targeting those features) and will land
together in a follow-up.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
* feat: entity metadata + diary ingest + BM25 hybrid search
Three features that close the gap between the architecture docs
and the actual codebase:
1. Entity metadata on drawers and closets
- _extract_entities_for_metadata() pulls names from known_entities.json
+ proper nouns appearing 2+ times
- Stamped as "entities" field in ChromaDB metadata
- Enables filterable search by person/project name
2. Day-based diary ingest (diary_ingest.py)
- ONE drawer per day, upserted as the day grows
- Closets pack topics atomically, never split mid-topic
- Tracks entry count in state file, only processes new entries
- Usage: python -m mempalace.diary_ingest --dir ~/summaries
3. BM25 hybrid search in searcher.py
- _bm25_score() keyword matching complements vector similarity
- _hybrid_rank() combines both signals (60% vector, 40% BM25)
- Catches exact name/term matches that embeddings miss
- Applied to both closet-first and direct drawer search paths
689/689 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add tests for mine_lock, closets, entity metadata, BM25, diary
Trimmed version of Milla's omnibus test_closets.py to only cover
features present in this PR stack (#784 lock, #788 closets, this
PR's entity/BM25/diary). Strip-noise tests will land with #785;
tunnel tests will land with the tunnels PR.
16/16 pass.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
* feat: explicit cross-wing tunnels for multi-project agents
Adds active tunnel creation alongside passive tunnel discovery.
Passive tunnels (existing): rooms with the same name across wings.
Explicit tunnels (new): agent-created links between specific
locations. "This API design in project_api relates to the database
schema in project_database."
New functions in palace_graph.py:
- create_tunnel() — link two wing/room pairs with a label
- list_tunnels() — list all explicit tunnels, filter by wing
- delete_tunnel() — remove a tunnel by ID
- follow_tunnels() — from a room, find all connected rooms in
other wings with drawer content previews
New MCP tools:
- mempalace_create_tunnel
- mempalace_list_tunnels
- mempalace_delete_tunnel
- mempalace_follow_tunnels
Tunnels stored in ~/.mempalace/tunnels.json (persists across
palace rebuilds). Deduplicated by endpoint pair.
689/689 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add TestTunnels for cross-wing tunnel operations
Appended from Milla's omnibus test_closets.py — covers create,
list, delete, dedup, and follow_tunnels behavior. 21/21 pass.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
* feat(search): drawer-grep returns best-matching chunk + neighbors
When a closet hit leads to a source file with many drawers, grep each
chunk for query terms and return the BEST-MATCHING chunk + 1 neighbor
on each side, instead of dumping the whole file truncated at
MAX_HYDRATION_CHARS. Result now includes drawer_index and
total_drawers so callers can request adjacent drawers explicitly.
Extracted from Milla's commit 935f657 which bundled drawer-grep with
closet_llm (deferred pending LLM_ENDPOINT refactor) and fact_checker
(separate PR). Ported only the searcher.py change.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: offline fact checker against entity registry + knowledge graph
fact_checker.py verifies text for contradictions against locally stored
entities and KG facts. Catches similar-name confusion (Bob vs Bobby),
relationship mismatches (KG says husband, text says brother), and
stale facts (KG valid_from/valid_to).
No hardcoded facts. No network calls. Reads:
- ~/.mempalace/known_entities.json
- KnowledgeGraph SQLite
Usage:
from mempalace.fact_checker import check_text
issues = check_text("Bob is Alice's brother", palace_path)
# CLI
python -m mempalace.fact_checker "text" --palace ~/.mempalace/palace
Extracted from Milla's commit 935f657 which bundled this with
closet_llm (deferred) and drawer-grep (PR #791). Ported only
fact_checker.py — verified no network / API imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: optional LLM-based closet regeneration — bring-your-own endpoint
Adds mempalace/closet_llm.py as an OPTIONAL path for richer closet
generation. Regex closets remain the default and cover the local-first
promise; users who want LLM-quality topics can bring their own endpoint.
Configuration (env or CLI flag):
LLM_ENDPOINT — OpenAI-compatible base URL (required)
LLM_KEY — bearer token (optional; local inference skips this)
LLM_MODEL — model name (required)
Works with Ollama, vLLM, llama.cpp servers, OpenAI, OpenRouter, and any
other provider that speaks OpenAI-compatible /chat/completions. Zero new
dependencies — uses stdlib urllib.
Replaces the original Anthropic-SDK-hardcoded version of this module
from Milla's branch (commit 935f657). Same prompt, same parsing, same
regenerate_closets flow; only the transport was generalised so the
feature doesn't lock users into a specific vendor or require API keys
for core memory operations (CLAUDE.md, "Local-first, zero API").
Includes 13 unit tests covering config resolution, request shape,
auth-header omission when no key is set, code-fence stripping, and
missing-config error path. All mocked — zero network calls in tests.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
* fix(search): hybrid closet+drawer retrieval — closets boost, never gate (#795)
* Fix: set cosine distance metadata on all collection creation sites
ChromaDB defaults HNSW index to L2 (Euclidean) distance, but
MemPalace scoring uses 1-distance which requires cosine (range 0-2).
Add metadata={"hnsw:space": "cosine"} to the 4 production and 3 test
call sites that were missing it.
Closes #218
* fix: sync version.py to 3.2.0
Commit
|
||
|
|
091c2fe1c6 |
fix: mine --dry-run TypeError on files with room=None (#586) (#687)
* fix: return "general" room from process_file error paths (#586) process_file() returned (0, None) for already-mined, unreadable, and too-short files. In --dry-run mode the caller always enters the room_counts branch, so None ended up as a dict key and crashed the summary printer with "unsupported format string passed to NoneType.__format__". Returning "general" instead of None makes the function contract explicit: it always yields (int, str). This matches the consensus fix discussed in the issue thread. * style: apply ruff format to test_miner.py |
||
|
|
ae5196bc8d |
Мempalace backend seam (#413)
* refactor: add stage-1 backend abstraction seam Introduce the first upstreamable storage seam for MemPalace without bringing in the PostgreSQL spike or any benchmark artifacts. This change adds a small backend package with: - BaseCollection as the minimal collection contract - ChromaBackend/ChromaCollection as the default implementation It then routes the main runtime collection consumers through that seam: - palace.py - searcher.py - layers.py - palace_graph.py - mcp_server.py - miner.status() Behavioral constraints kept for stage 1: - ChromaDB remains the only backend and the default path - no config/env backend selection yet - no PostgreSQL code - no benchmark or research files - existing tests stay unchanged Important compatibility details: - read paths now call the seam with create=False so they still surface the existing 'no palace found' behavior instead of silently creating empty collections - write paths keep create=True semantics through palace.get_collection() - layers/searcher retain a chromadb module attribute so the existing mock-based tests can keep patching PersistentClient unchanged - ChromaBackend only creates palace directories on create=True, which preserves mocked read-path tests that use fake read-only paths Verification: - python3 -m py_compile mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q # 529 passed, 106 deselected * refactor: clean up stage-1 seam compatibility shims Tighten the stage-1 backend abstraction branch after review. This follow-up does three small things: - keep the chromadb compatibility hook in searcher.py and layers.py, but express it through the backends.chroma module so it no longer reads like an accidental unused import - fix the palace_graph.py helper alias to avoid the local name collision flagged by ruff (imported helper vs local _get_collection wrapper) - preserve the existing mock-based test patch points unchanged while keeping the new backend seam intact Why this matters: - the direct form looked like a dead import in review, even though it was intentionally preserving the existing test seam ( and ) - palace_graph.py had a real lint issue ( redefinition) that was small but worth fixing before a public PR Verification: - /opt/homebrew/bin/ruff check mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * docs: explain backend shim imports in search paths Add short code comments in searcher.py and layers.py explaining why the module-level `chromadb` alias remains after the stage-1 backend seam refactor. The alias is intentional: it preserves the existing mock patch points used by the current test suite (`mempalace.searcher.chromadb.PersistentClient` and `mempalace.layers.chromadb.PersistentClient`) while the runtime logic now flows through the backend abstraction. This keeps the public PR easier to review because the apparent "unused import" now has an explicit reason next to it. Verification: - /opt/homebrew/bin/ruff check mempalace/searcher.py mempalace/layers.py - pytest -q tests/test_layers.py tests/test_searcher.py * refactor: reuse a default backend instance in palace helper Tighten the stage-1 backend seam by promoting the default Chroma backend adapter to a module-level singleton in `mempalace/palace.py`. This keeps the stage-1 scope unchanged — Chroma is still the only backend wired in this branch — but avoids constructing a fresh `ChromaBackend()` object on every `get_collection()` call. The backend is stateless today, so this is a readability/cleanup change rather than a behavioral one. Why this helps: - makes `palace.get_collection()` read like a real default factory instead of an inline constructor call - keeps the stage-1 branch a little cleaner before opening the public PR - does not widen the backend surface or change any config/runtime behavior Verification: - python3 -m py_compile mempalace/palace.py - pytest -q tests/test_miner.py tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * fix: harden read-only seam behavior and update seam tests Preserve the stage-1 backend abstraction while closing the real read-path regression surfaced in PR review. What changed: - make ChromaBackend.get_collection(create=False) fail fast when the palace directory does not exist instead of letting PersistentClient create it as a side effect - update miner.status() to call get_collection(..., create=False) so status keeps the historical 'No palace found' behavior - remove the temporary chromadb shim aliases from layers.py and searcher.py now that the tests patch the seam directly - add focused tests for the new backends package, including ChromaCollection delegation and ChromaBackend create=True/create=False behavior - retarget layer/searcher tests to patch the backend seam instead of patching chromadb.PersistentClient inside production modules - add a regression test that status() does not create an empty palace when the target path is missing Verification: - ruff check . - uv run pytest -q - uv run pytest -q tests/test_backends.py tests/test_cli.py tests/test_mcp_server.py tests/test_layers.py tests/test_searcher.py tests/test_miner.py Notes: - the separate benchmark/slow/stress layer was started as a soak but not used as the merge gate for this PR branch * refactor: drop duplicate mcp collection cache declaration Remove a redundant `_collection_cache = None` assignment in `mempalace/mcp_server.py` left over after the stage-1 backend seam refactor. This does not change behavior; it only trims review noise in the MCP server module after the read-path hardening pass. Verification: - ruff check mempalace/mcp_server.py - uv run pytest -q tests/test_mcp_server.py --------- Co-authored-by: Sergey Kuznetsov <sergey@iterudit.com> |
||
|
|
58b8d5b198 | fix: release ChromaDB handles before rmtree on Windows | ||
|
|
1c48f4d2c3 | fix: use os.utime in mtime test for Windows compatibility | ||
|
|
2448ac0026 |
test: add coverage for file_already_mined mtime check
Covers the check_mtime=True path in palace.py to meet 85% coverage threshold. |
||
|
|
abd52534bb |
test: bring coverage to 85%, set threshold to 85, reset version to 3.0.11
- Add tests for config, convo_miner, spellcheck, knowledge_graph - Fix Windows PermissionError in test cleanup (chromadb file locks) - Add UTF-8 encoding to split_mega_files, entity_registry, hooks_cli - Fix mcp_server parse_known_args logging for unknown args - Set coverage threshold to 85 in pyproject.toml and CI - Reset all version files to 3.0.11 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
c8c220d789 | fix: support nested .gitignore rules during mining | ||
|
|
9b9daa9b4b | fix: respect .gitignore during project mining | ||
|
|
0f8fa8c7d5 |
bench: add benchmark runners, results docs, and test suite
Benchmarks: LongMemEval, LoCoMo, ConvoMem, MemBench runners with methodology docs and hybrid retrieval analysis. Tests: config, miner, convo_miner, normalize — 9 tests, all passing. |