release: v3.3.0 (#839)

* fix: add file-level locking to prevent multi-agent duplicate drawers Root cause: when multiple agents mine simultaneously, both pass file_already_mined() check, both delete+insert the same file's drawers, creating duplicates or losing data. Fix: mine_lock() in palace.py — cross-platform file lock (fcntl on Unix, msvcrt on Windows). Both miner.py and convo_miner.py now lock per-file during the delete+insert cycle and re-check after acquiring the lock. Tested: - Lock acquires and releases correctly - Second agent blocks until first releases (0.25s wait) - 33/33 existing tests pass - Cross-platform: fcntl (macOS/Linux), msvcrt (Windows) Based on v3.2.0 tag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: strip system tags, hook output, and Claude UI chrome from drawers normalize.py now strips before filing: - <system-reminder>, <command-message>, <command-name> tags - <task-notification>, <user-prompt-submit-hook>, <hook_output> tags - Hook status messages (CURRENT TIME, Checking verified facts, etc.) - Claude Code UI chrome (ctrl+o to expand, progress bars, etc.) - Collapsed runs of blank lines This noise was going straight into drawers, wasting storage space and polluting search results. strip_noise() runs on all normalized output regardless of input format (JSONL, JSON, plain text). 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add closet layer — searchable index pointing to drawers The closet architecture was always part of MemPalace's design but never shipped in the public codebase. This adds it. Palace now has TWO collections: - mempalace_drawers — full verbatim content (unchanged) - mempalace_closets — compact AAAK-style index entries How it works: - When mining, each file gets a closet alongside its drawers - Closet contains extracted topics, entities, quotes as pointers - Closets pack up to 1500 chars, topics never split mid-entry - Search hits closets first (fast, small), then hydrates the full drawer content for matching files - Falls back to direct drawer search if no closets exist yet Files changed: - palace.py: get_closets_collection(), build_closet_text(), upsert_closet(), CLOSET_CHAR_LIMIT - miner.py: process_file() now creates closets after drawers - searcher.py: search_memories() tries closet-first search, hydrates drawers, falls back to direct search Backwards compatible — existing palaces without closets continue to work via the fallback path. Closets are created on next mine. 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: enforce atomic topics in closets, extract richer pointers - upsert_closet replaced by upsert_closet_lines: checks each topic line individually against CLOSET_CHAR_LIMIT. If adding one line WHOLE would exceed the limit, starts a new closet. Never splits mid-topic. - build_closet_lines returns a list of atomic lines (not joined text) - Richer extraction: section headers, more action verbs, up to 3 quotes, up to 12 topics per file - Each line is complete: topic|entities|→drawer_refs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add CLOSETS.md — closet layer overview Cherry-picked the docs portion of 67e4ac6 to accompany the closet feature. Test coverage for closets is omnibus with tests for entity metadata and BM25 (see PR targeting those features) and will land together in a follow-up. Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com> * feat: entity metadata + diary ingest + BM25 hybrid search Three features that close the gap between the architecture docs and the actual codebase: 1. Entity metadata on drawers and closets - _extract_entities_for_metadata() pulls names from known_entities.json + proper nouns appearing 2+ times - Stamped as "entities" field in ChromaDB metadata - Enables filterable search by person/project name 2. Day-based diary ingest (diary_ingest.py) - ONE drawer per day, upserted as the day grows - Closets pack topics atomically, never split mid-topic - Tracks entry count in state file, only processes new entries - Usage: python -m mempalace.diary_ingest --dir ~/summaries 3. BM25 hybrid search in searcher.py - _bm25_score() keyword matching complements vector similarity - _hybrid_rank() combines both signals (60% vector, 40% BM25) - Catches exact name/term matches that embeddings miss - Applied to both closet-first and direct drawer search paths 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add tests for mine_lock, closets, entity metadata, BM25, diary Trimmed version of Milla's omnibus test_closets.py to only cover features present in this PR stack (#784 lock, #788 closets, this PR's entity/BM25/diary). Strip-noise tests will land with #785; tunnel tests will land with the tunnels PR. 16/16 pass. Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com> * feat: explicit cross-wing tunnels for multi-project agents Adds active tunnel creation alongside passive tunnel discovery. Passive tunnels (existing): rooms with the same name across wings. Explicit tunnels (new): agent-created links between specific locations. "This API design in project_api relates to the database schema in project_database." New functions in palace_graph.py: - create_tunnel() — link two wing/room pairs with a label - list_tunnels() — list all explicit tunnels, filter by wing - delete_tunnel() — remove a tunnel by ID - follow_tunnels() — from a room, find all connected rooms in other wings with drawer content previews New MCP tools: - mempalace_create_tunnel - mempalace_list_tunnels - mempalace_delete_tunnel - mempalace_follow_tunnels Tunnels stored in ~/.mempalace/tunnels.json (persists across palace rebuilds). Deduplicated by endpoint pair. 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add TestTunnels for cross-wing tunnel operations Appended from Milla's omnibus test_closets.py — covers create, list, delete, dedup, and follow_tunnels behavior. 21/21 pass. Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com> * feat(search): drawer-grep returns best-matching chunk + neighbors When a closet hit leads to a source file with many drawers, grep each chunk for query terms and return the BEST-MATCHING chunk + 1 neighbor on each side, instead of dumping the whole file truncated at MAX_HYDRATION_CHARS. Result now includes drawer_index and total_drawers so callers can request adjacent drawers explicitly. Extracted from Milla's commit 935f657 which bundled drawer-grep with closet_llm (deferred pending LLM_ENDPOINT refactor) and fact_checker (separate PR). Ported only the searcher.py change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: offline fact checker against entity registry + knowledge graph fact_checker.py verifies text for contradictions against locally stored entities and KG facts. Catches similar-name confusion (Bob vs Bobby), relationship mismatches (KG says husband, text says brother), and stale facts (KG valid_from/valid_to). No hardcoded facts. No network calls. Reads: - ~/.mempalace/known_entities.json - KnowledgeGraph SQLite Usage: from mempalace.fact_checker import check_text issues = check_text("Bob is Alice's brother", palace_path) # CLI python -m mempalace.fact_checker "text" --palace ~/.mempalace/palace Extracted from Milla's commit 935f657 which bundled this with closet_llm (deferred) and drawer-grep (PR #791). Ported only fact_checker.py — verified no network / API imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: optional LLM-based closet regeneration — bring-your-own endpoint Adds mempalace/closet_llm.py as an OPTIONAL path for richer closet generation. Regex closets remain the default and cover the local-first promise; users who want LLM-quality topics can bring their own endpoint. Configuration (env or CLI flag): LLM_ENDPOINT — OpenAI-compatible base URL (required) LLM_KEY — bearer token (optional; local inference skips this) LLM_MODEL — model name (required) Works with Ollama, vLLM, llama.cpp servers, OpenAI, OpenRouter, and any other provider that speaks OpenAI-compatible /chat/completions. Zero new dependencies — uses stdlib urllib. Replaces the original Anthropic-SDK-hardcoded version of this module from Milla's branch (commit 935f657). Same prompt, same parsing, same regenerate_closets flow; only the transport was generalised so the feature doesn't lock users into a specific vendor or require API keys for core memory operations (CLAUDE.md, "Local-first, zero API"). Includes 13 unit tests covering config resolution, request shape, auth-header omission when no key is set, code-fence stripping, and missing-config error path. All mocked — zero network calls in tests. Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com> * fix(search): hybrid closet+drawer retrieval — closets boost, never gate (#795) * Fix: set cosine distance metadata on all collection creation sites ChromaDB defaults HNSW index to L2 (Euclidean) distance, but MemPalace scoring uses 1-distance which requires cosine (range 0-2). Add metadata={"hnsw:space": "cosine"} to the 4 production and 3 test call sites that were missing it. Closes #218 * fix: sync version.py to 3.2.0 Commit 6614b9b bumped pyproject.toml to 3.2.0 but missed mempalace/version.py, breaking test_version_consistency on every PR's CI. This syncs them. * refactor: extract locked filing block to keep mine_convos under C901 Adding the per-file lock + double-checked file_already_mined() in the previous commit pushed mine_convos cyclomatic complexity from 25 to 26, just over ruff's max-complexity threshold. Hoist the locked critical section into _file_chunks_locked() so the outer loop stays within budget. No behavior change. * style: ruff format mempalace/palace.py Add blank lines after inline imports in mine_lock. Pure formatting. * fix(normalize): make strip_noise verbatim-safe and scope it to Claude Code JSONL The initial strip_noise() regressed on three fronts when audited against adversarial user content — each verified with executable repros against the cherry-picked code: 1. `<tag>.*?</tag>` with re.DOTALL span-ate across messages: one stray unclosed <system-reminder> anywhere in a session merged with the next closing tag, silently deleting everything between them (including full assistant replies). 2. `.*$ctrl\+o to expand$.*\n?` nuked entire lines of user prose whenever a user happened to document the TUI shortcut. 3. `Ran \d+ (?:stop|pre|post)\s*hook.*` with IGNORECASE ate the second sentence from "our CI has a stop hook ... Ran 2 stop hooks last week" — legitimate user commentary. These are unambiguous violations of the project's "Verbatim always" design principle. Fixes: - All tag patterns are now line-anchored (`(?m)^(?:> )?<tag>`) and their body forbids crossing a blank line (`(?:(?!\n\s*\n)[\s\S])*?`), so a dangling open tag cannot eat neighboring messages. - `_NOISE_LINE_PREFIXES` are line-anchored and case-sensitive — user prose mentioning "CURRENT TIME:" mid-sentence is preserved. - Hook-run chrome requires `(?m)^`, explicit hook names (Stop, PreCompact, PreToolUse, etc.), and no IGNORECASE. - "… +N lines" is line-anchored. - "(ctrl+o to expand)" only matches Claude Code's actual collapsed- output chrome shape `[N tokens] (ctrl+o to expand)`; a bare parenthetical in user prose stays intact. Scope: - `strip_noise()` is no longer called on every normalization path. Only `_try_claude_code_jsonl` invokes it, per-extracted-message — so Claude.ai exports, ChatGPT exports, Slack JSON, Codex JSONL, and plain text with `>` markers pass through fully verbatim. Per-message application also makes span-eating structurally impossible. Tests: - 15 new tests in test_normalize.py pin the boundary: 6 guard user content that must survive (each of the adversarial repros), 9 assert real system chrome is still stripped. All pass; full suite 702 pass (2 failures are the unrelated pre-existing version.py bug, cleared by #820). Known limitation (not fixed here): convo_miner.py does not delete drawers on re-mine, so transcripts mined before this PR keep noise- filled drawers until the user manually erases + re-mines. Proper fix needs a schema-version field on drawer metadata + re-mine trigger — out of scope for this PR. * feat(normalize): auto-rebuild stale drawers via NORMALIZE_VERSION schema gate Without this, the strip_noise improvement only helps new mines. Every user who had already mined Claude Code JSONL sessions would keep their noise-polluted drawers forever, because convo_miner's file_already_mined skip short-circuits before re-processing. Adds a versioned schema gate so upgrades propagate silently: - palace.NORMALIZE_VERSION=2 — bumped when the normalization pipeline changes shape (this PR's strip_noise is the v1→v2 bump). - file_already_mined now returns False if the stored normalize_version is missing or less than current, triggering a rebuild on next mine. - Both miners stamp drawers with the current normalize_version. - convo_miner now purges stale drawers before inserting fresh chunks (mirrors miner.py's existing delete+insert), extracted into _file_convo_chunks helper to keep mine_convos under ruff's C901 limit. User experience: upgrade mempalace, run `mempalace mine` as usual, old noisy drawers get silently replaced with clean ones. No erase needed, no "you need to rebuild" changelog footgun. Tests: - test_file_already_mined_returns_false_for_stale_normalize_version — pins the version gate contract for missing/v1/current. - test_add_drawer_stamps_normalize_version — fresh project-miner drawers carry the field. - test_mine_convos_rebuilds_stale_drawers_after_schema_bump — end-to-end proof that a pre-v2 palace gets silently cleaned on next mine, with orphan drawers purged and NOT skipped. Existing test_file_already_mined_check_mtime updated to include the new field; all other tests unaffected. * fix: stop hooks from making agents write in chat — save tokens The save hook and precompact hook were telling the agent to write diary entries, add drawers, and add KG triples IN THE CHAT WINDOW. Every line written stays in conversation history and retransmits on every subsequent turn — ~$1/session in wasted tokens. Fix: hooks now say "saved in background, no action needed" and use decision: allow instead of block. The agent continues working without interruption. All filing happens via the background pipeline. Also updated hooks README with: - Known limitation: hooks require session restart after install - Updated cost section: zero tokens, background-only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use microsecond timestamp and full content hash in diary entry ID (#819) * fix: remove unused import 'main' from mempalace/__init__.py Removed the 'main' import from `mempalace/__init__.py` and updated `pyproject.toml` to point the script entry point directly to `mempalace.cli:main`. This ensures the CLI remains functional while improving code hygiene. Co-authored-by: igorls <4753812+igorls@users.noreply.github.com> * merge: full hardened stack + rewrite fact_checker around actual KG API Merges the full hardened stack (up through #791 drawer-grep) and turns fact_checker from "dead code hidden behind bare except" into an actually-working offline contradiction detector with tests. ## Dead paths the PR body advertised but the code never executed Both buried by a single outer ``except Exception: pass``: * ``kg.query(subject)`` — ``KnowledgeGraph`` has no ``query()`` method; it has ``query_entity()``. The attribute error was silently swallowed and the entire KG branch always returned ``[]``. Now using ``kg.query_entity(subject, direction="outgoing")`` with proper handling of the ``predicate``/``object``/``current``/``valid_to`` fields the real API returns. * ``KnowledgeGraph(palace_path=palace_path)`` — the constructor's only kwarg is ``db_path``. Passing ``palace_path`` raised TypeError, silently swallowed. Now computing the db_path correctly from ``<palace>/knowledge_graph.sqlite3``, matching the convention the MCP server already uses. ## Contradiction logic rewritten The previous ``if kg_pred in claim and fact.object not in claim`` only fired when text used the SAME predicate word as the KG fact — the exact opposite of the stated use case ("Bob is Alice's brother" when KG says husband" would NOT have fired). Replaced with a proper parse → lookup → compare pipeline: * ``_extract_claims`` parses two surface forms ("X is Y's Z" and "X's Z is Y") into ``(subject, predicate, object)`` triples. * ``_check_kg_contradictions`` pulls the subject's outgoing facts and flags two classes: - ``relationship_mismatch`` when a current KG fact matches the same ``(subject, object)`` pair but with a different predicate. - ``stale_fact`` when the exact triple exists but is ``valid_to``-closed in the past. * Stale-fact detection is now implemented (the PR body claimed it; the old code silently didn't implement it). ## Performance fix — O(n²) → O(mentioned × n) ``_check_entity_confusion`` previously computed Levenshtein for every pair of registered names on every ``check_text`` call. For 1,000 registered names that's ~500K edit-distance calls per hook invocation. Now we first identify which registry names actually appear in the text (single regex scan), then only compute edit distance between mentioned and unmentioned names. Pinned by a test that asserts <200ms on a 500- name registry with zero mentions. Also: when *both* similar names are mentioned in the text, we no longer flag them — the user clearly knows they're different people. ## Shared entity-registry loader ``mempalace/miner.py`` already had an mtime-cached loader for ``~/.mempalace/known_entities.json``. fact_checker had a duplicate implementation that leaked file handles and ignored caching. Extended miner's cache to expose both the flat set (``_load_known_entities``) and the raw category dict (``_load_known_entities_raw``); fact_checker now imports the latter. No more double disk reads, no more handle leak. ## Tests — 24 cases in tests/test_fact_checker.py All three detection paths + both dead-code regressions: * ``test_kg_init_uses_db_path_not_palace_path_kwarg`` — pins the correct KG constructor signature so the ``palace_path=`` bug can't come back. * ``test_relationship_mismatch_detected`` — the headline example from the PR body now actually fires. * ``test_stale_fact_detected`` — valid_to-closed triple is flagged. * ``test_current_fact_same_triple_is_not_flagged`` — no false positive on a still-valid match. * ``test_performance_bounded_by_mentioned_names`` — 500-name registry, zero mentions, <200ms. Regression for the O(n²) blowup. * ``test_no_false_positive_when_both_names_mentioned`` — Mila and Milla in the same text is fine. * Plus claim extraction, flatten_names shapes, CLI exit code, empty text handling, missing-palace graceful fallback, registry-dict shape support. 785/785 suite pass. ruff + format clean on CI-pinned 0.4.x. * Optimize entity detection with regex caching and pre-compilation - Use functools.lru_cache to cache compiled patterns for entity names. - Pre-compile static pronoun patterns into a single regex. - Remove redundant .lower() calls in score_entity loop. Co-authored-by: igorls <4753812+igorls@users.noreply.github.com> * docs: fix stale milla-jovovich org URLs in website and plugin manifests (#787) Follow-up to #766 which covers version.py, pyproject.toml, README, CHANGELOG, and CONTRIBUTING. These 11 files still had the old org name in URLs: - website/ (VitePress config + 6 docs pages) - .claude-plugin/ (plugin.json repository, README marketplace command) - .codex-plugin/ (plugin.json URLs, README links) Author name fields are intentionally unchanged. * test: make diary state path assertion platform-neutral The Windows CI job failed on: assert '/.mempalace/state/' in str(state_path) because Windows uses ``\`` as the path separator, so the substring never matches. The behavior under test (state file lives outside the diary dir, under ``~/.mempalace/state/``) is already correct on both platforms — only the assertion was Unix-only. Switch to ``state_path.parent`` comparisons that work on any OS. * test: serialize mine_lock concurrency test with multiprocessing The macOS CI job failed ``test_lock_blocks_concurrent_access`` because ``fcntl.flock`` on BSD/macOS is per-*process*, not per-FD: two threads in the same process both acquire even when they open their own file descriptors. The test passed on Linux (per-FD flock) and Windows (per-FD ``msvcrt.locking``) but was never actually exercising the lock's real contract. ``mine_lock`` is designed to serialize multi-*agent* access — i.e., separate processes, not threads. Switch the test to ``multiprocessing.get_context('spawn')`` with a module-level worker (so the spawn pickles cleanly) so it: 1. reflects the actual use case (one lock per mining process); 2. passes on all three OSes without flock-semantics branching; 3. catches real regressions (a broken lock would now let both processes through, exactly what we care about). Hold time bumped to 0.3s and the "wait until p1 acquires" delay to 0.2s to tolerate spawn's higher startup latency on macOS/Windows. * test: verify mine_lock via disjoint critical-section intervals The previous revision used multiprocessing but still relied on timing ("second process waited at least N seconds") which flakes on CI where spawn overhead eats into the hold window. Linux CI observed the second process report a 0.088s wait — below the 0.1s threshold — even though the lock behavior was correct; spawn was just slow enough that the first process had nearly finished holding when the second got past its own spawn. Switch to effect-based verification: each worker logs its [enter_time, exit_time] inside the critical section, and the test asserts the two intervals are disjoint after sorting. A broken lock would produce overlapping intervals regardless of spawn latency; a working lock cannot. Also removed the mp.Queue since we no longer pass timing data back. * Fix: ruff format with CI-pinned version (0.4.x) * fix: README audit — 42 TDD tests + hall detection + 7 claim fixes (#835) * fix: README audit — match every claim to shipped code + add hall detection TDD audit: wrote 42 tests verifying README claims against codebase. Fixed all 7 failures: 1. Tool count: 19 → 29 (10 tools were undocumented) 2. Added tool table rows for tunnels, drawer management, system tools 3. Version badge: 3.1.0 → 3.2.0 4. dialect.py file reference: "30x lossless" → "AAAK index format for closet pointers" 5. Wake-up token cost: "~170 tokens" → "~600-900 tokens" (matches layers.py) 6. pyproject.toml version in project structure: v3.0.0 → v3.2.0 7. Hall detection: added detect_hall() to miner.py — drawers now tagged with hall metadata so palace_graph.py can build hall connections New code: - miner.py: detect_hall() — keyword scoring against config hall_keywords, writes hall field to every drawer's metadata - tests/test_hall_detection.py — 12 TDD tests (written before code) - tests/test_readme_claims.py — 42 TDD tests verifying README accuracy 859/859 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve ruff lint — unused imports and variables Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: ruff format with CI-pinned 0.4.x Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use conftest fixtures in hall tests for Windows compat Windows CI fails with NotADirectoryError when ChromaDB tries to write HNSW files in short-lived TemporaryDirectory. Use conftest palace_path and tmp_dir fixtures instead — same pattern as all other tests that touch ChromaDB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Igor's review — convo_miner halls, cached config, markdown typo TDD: wrote tests for convo_miner hall metadata and config caching BEFORE verifying the code changes. 1. README markdown typo: extra ** in wake-up token row (line 195) 2. convo_miner.py: added _detect_hall_cached() — conversation drawers now get hall metadata (was missing, Igor caught it) 3. miner.py + convo_miner.py: cached hall_keywords at module level so config.json isn't re-read per drawer during bulk mine 4. New tests: TestConvoMinerWritesHalls, TestDetectHallCaching 861/861 tests pass. ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(website): update vitepress base url for custom domain * chore(release): bump version strings to 3.3.0 and curate CHANGELOG Prepare develop for the 3.3.0 release cycle. Version bumps: - mempalace/version.py: 3.2.0 -> 3.3.0 - pyproject.toml: 3.2.0 -> 3.3.0 - README.md: pyproject.toml label and shields.io badge - uv.lock: mempalace 3.0.0 -> 3.3.0 (also fills in resolved dev/extras) CHANGELOG.md: - Close out the stale [Unreleased] section as [3.2.0] - 2026-04-12 (v3.2.0 was tagged on that date but the release flip was never made) - Add a fresh [Unreleased] - v3.3.0 section covering the 49 commits since v3.2.0: closet layer, BM25 hybrid search, entity metadata, diary ingest, cross-wing tunnels, drawer-grep, offline fact checker, LLM-based closet regen, hall detection, cosine-distance fix, multi-agent locking, README audit, etc. - Adopt Keep a Changelog + SemVer framing - Add version compare reference links at the bottom - Fix stale milla-jovovich/mempalace preamble URL to MemPalace/mempalace --------- Co-authored-by: MSL <232237854+milla-jovovich@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: eblander <eblander@foundrydigital.com> Co-authored-by: shafdev <96260000+shafdev@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: mvalentsev <michael@valentsev.ru> Co-authored-by: Dominique Deschatre <43499065+domiscd@users.noreply.github.com>
2026-04-13 22:25:01 -03:00
parent 122a5fdf35
commit 4aa7e1eebd
49 changed files with 5847 additions and 172 deletions
@@ -0,0 +1,974 @@
+"""
+test_closets.py — Tests for the closet (searchable index) layer and the
+features that ride on top of it: mine_lock serialization, entity metadata,
+hybrid BM25+vector search, and diary ingest.
+
+Coverage map:
+  * mine_lock — acquire/release, blocks concurrent acquisition.
+  * build_closet_lines — pointer-line shape, header pickup, entity stoplist
+    (regression for "When/After/The"), real-name survival, fallback line.
+  * upsert_closet_lines — pure overwrite (regression for the append bug),
+    char-limit packing without splitting a line.
+  * purge_file_closets — scoped to source_file.
+  * Project-miner end-to-end rebuild — re-mining with fewer topics fully
+    purges leftover numbered closets from a larger prior run.
+  * _extract_drawer_ids_from_closet — pointer parsing + dedup.
+  * search_memories hybrid path — drawer query always the floor,
+    closets boost matching source_file, matched_via reflects both signals,
+    no whole-file glue, max_distance enforcement.
+  * Entity metadata — extracted, stoplist applied, registry cached by mtime.
+  * Real BM25 — real IDF over candidate corpus, hybrid rerank.
+  * Diary ingest — drawers + closets created, incremental skips, state
+    file lives outside the diary dir, wing-prefixed drawer IDs prevent
+    cross-diary collisions, force=True purges leftover closets.
+"""
+
+import json
+import multiprocessing
+import os
+import tempfile
+import threading
+import time
+
+import yaml
+
+from mempalace.miner import (
+    _extract_entities_for_metadata,
+    _load_known_entities,
+    mine,
+)
+from mempalace.palace import (
+    CLOSET_CHAR_LIMIT,
+    build_closet_lines,
+    get_closets_collection,
+    get_collection,
+    mine_lock,
+    purge_file_closets,
+    upsert_closet_lines,
+)
+from mempalace.palace_graph import (
+    create_tunnel,
+    delete_tunnel,
+    follow_tunnels,
+    list_tunnels,
+)
+from mempalace.searcher import (
+    _bm25_scores,
+    _expand_with_neighbors,
+    _extract_drawer_ids_from_closet,
+    _hybrid_rank,
+    search_memories,
+)
+
+
+# ── mine_lock ────────────────────────────────────────────────────────────
+
+
+def _lock_worker(target: str, name: str, hold_seconds: float, log_path: str) -> None:
+    """Worker for multiprocessing-spawn concurrency test. Writes its
+    critical-section enter/exit timestamps to ``log_path`` so the test
+    can verify the sections did not overlap in time."""
+    import time as _time
+
+    from mempalace.palace import mine_lock as _mine_lock
+
+    with _mine_lock(target):
+        t_enter = _time.time()
+        _time.sleep(hold_seconds)
+        t_exit = _time.time()
+        # Append atomically so concurrent writers don't stomp each other.
+        with open(log_path, "a") as f:
+            f.write(f"{name} {t_enter} {t_exit}\n")
+            f.flush()
+
+
+class TestMineLock:
+    def test_lock_acquires_and_releases(self, tmp_path):
+        target = str(tmp_path / "lock_target.txt")
+        with mine_lock(target):
+            lock_dir = os.path.expanduser("~/.mempalace/locks")
+            assert os.path.isdir(lock_dir)
+        # Re-acquire after release should succeed instantly.
+        start = time.time()
+        with mine_lock(target):
+            pass
+        assert time.time() - start < 1.0
+
+    def test_lock_blocks_concurrent_access(self, tmp_path):
+        """The lock's contract is inter-*process* (multi-agent), not
+        inter-thread. Use multiprocessing so the test reflects the real
+        use case and is portable: on macOS/BSD ``fcntl.flock`` is
+        per-process, so two threads would both acquire — a thread-based
+        test would flake there even when the lock is correct.
+
+        Verify mutual exclusion by the effect the critical section
+        actually has — each worker records its enter/exit timestamps
+        under the lock, and the test asserts the two intervals do not
+        overlap. This is robust to spawn-overhead timing, unlike
+        "second worker waited at least N seconds" which flakes when CI
+        spawn latency eats into the hold window.
+        """
+        target = str(tmp_path / "concurrent_lock.txt")
+        log_path = str(tmp_path / "critical_section.log")
+        # Spawn so the same code path runs on every OS (macOS 3.8+ and
+        # Windows already default to spawn; Linux is fork by default).
+        ctx = multiprocessing.get_context("spawn")
+
+        # Each worker holds the lock for HOLD seconds. With real mutual
+        # exclusion, the two [enter, exit] intervals must be disjoint.
+        HOLD = 0.3
+        p1 = ctx.Process(target=_lock_worker, args=(target, "a", HOLD, log_path))
+        p2 = ctx.Process(target=_lock_worker, args=(target, "b", HOLD, log_path))
+        p1.start()
+        p2.start()
+        p1.join(timeout=30)
+        p2.join(timeout=30)
+
+        assert p1.exitcode == 0, f"p1 exited non-zero: {p1.exitcode}"
+        assert p2.exitcode == 0, f"p2 exited non-zero: {p2.exitcode}"
+
+        # Parse the log: "<name> <enter_ts> <exit_ts>".
+        intervals = []
+        with open(log_path) as f:
+            for line in f:
+                parts = line.strip().split()
+                if len(parts) == 3:
+                    intervals.append((parts[0], float(parts[1]), float(parts[2])))
+        assert len(intervals) == 2, f"expected two critical sections, got {intervals}"
+
+        # Sort by entry time and verify the second entry is after the first exit.
+        intervals.sort(key=lambda iv: iv[1])
+        (_, enter_a, exit_a), (_, enter_b, exit_b) = intervals
+        assert (
+            enter_a < exit_a <= enter_b < exit_b
+        ), f"critical sections overlapped — lock failed to serialize: {intervals}"
+
+
+# ── build_closet_lines ─────────────────────────────────────────────────
+
+
+class TestBuildClosetLines:
+    def test_emits_pointer_line_shape(self):
+        content = (
+            "# Auth rewrite\n\n"
+            "Decided we need to migrate to passkeys. "
+            "Built the prototype with WebAuthn. "
+            "Reviewed the API surface."
+        )
+        lines = build_closet_lines(
+            "/proj/auth.md",
+            ["drawer_proj_backend_aaa", "drawer_proj_backend_bbb"],
+            content,
+            wing="proj",
+            room="backend",
+        )
+        assert lines, "should always emit at least one line"
+        for line in lines:
+            assert "→" in line, f"line missing pointer arrow: {line!r}"
+            parts = line.split("|")
+            assert len(parts) == 3, f"expected topic|entities|→refs, got {line!r}"
+            assert parts[2].startswith("→")
+
+    def test_extracts_section_headers_as_topics(self):
+        content = "# First Header\nbody\n## Second Header\nmore body"
+        lines = build_closet_lines("/x.md", ["d1"], content, "w", "r")
+        joined = "\n".join(lines).lower()
+        assert "first header" in joined
+        assert "second header" in joined
+
+    def test_entity_stoplist_filters_sentence_starters(self):
+        # "When", "After", "The" repeat 3+ times — old code would index them
+        # as entities. Stoplist drops them.
+        content = (
+            "When the pipeline ran, the result was good. "
+            "When the user logged in, the token was issued. "
+            "After the migration, the latency dropped. "
+            "After the rollback, the latency rose. "
+            "The new flow is stable. The audit cleared."
+        )
+        lines = build_closet_lines("/x.md", ["d1"], content, "w", "r")
+        entity_segments = [line.split("|")[1] for line in lines]
+        for seg in entity_segments:
+            tokens = set(seg.split(";")) if seg else set()
+            assert "When" not in tokens
+            assert "After" not in tokens
+            assert "The" not in tokens
+
+    def test_real_proper_nouns_survive_stoplist(self):
+        content = (
+            "Igor reviewed the diff. Milla wrote the spec. "
+            "Igor pushed the fix. Milla approved the PR. "
+            "Igor and Milla shipped together."
+        )
+        lines = build_closet_lines("/x.md", ["d1"], content, "w", "r")
+        joined_entities = ";".join(line.split("|")[1] for line in lines)
+        assert "Igor" in joined_entities
+        assert "Milla" in joined_entities
+
+    def test_emits_fallback_line_when_nothing_extractable(self):
+        content = "lorem ipsum dolor sit amet consectetur adipiscing elit"
+        lines = build_closet_lines("/x/notes.txt", ["d1"], content, "wing", "room")
+        assert len(lines) == 1
+        assert "wing/room/notes" in lines[0]
+        assert "→d1" in lines[0]
+
+    def test_pointer_references_first_three_drawers(self):
+        ids = [f"drawer_{i}" for i in range(10)]
+        lines = build_closet_lines("/x.md", ids, "# A\n# B", "w", "r")
+        assert all("→drawer_0,drawer_1,drawer_2" in line for line in lines)
+
+
+# ── upsert_closet_lines ───────────────────────────────────────────────
+
+
+class TestUpsertClosetLines:
+    def test_overwrites_existing_closet_does_not_append(self, palace_path):
+        col = get_closets_collection(palace_path)
+        base = "closet_test_room_abc"
+        meta = {"wing": "test", "room": "room", "source_file": "/x.md"}
+
+        upsert_closet_lines(col, base, ["alpha|;|→d1", "beta|;|→d2", "gamma|;|→d3"], meta)
+        first = col.get(ids=[f"{base}_01"])
+        assert "alpha" in first["documents"][0]
+
+        # Second mine — entirely different lines. Must replace, not append.
+        upsert_closet_lines(col, base, ["delta|;|→d4", "epsilon|;|→d5"], meta)
+        second = col.get(ids=[f"{base}_01"])
+        doc = second["documents"][0]
+        assert "delta" in doc
+        assert "epsilon" in doc
+        assert "alpha" not in doc, "old closet line leaked into rebuild"
+        assert "beta" not in doc
+
+    def test_packs_into_multiple_closets_without_splitting_lines(self, palace_path):
+        col = get_closets_collection(palace_path)
+        base = "closet_pack_room_def"
+        meta = {"wing": "test", "room": "room", "source_file": "/y.md"}
+
+        line = "x" * 600  # well under CLOSET_CHAR_LIMIT
+        n_written = upsert_closet_lines(col, base, [line, line, line, line], meta)
+        # 4 lines @ 601 chars each = 2404 — should pack into 2 closets
+        assert n_written == 2
+
+        for i in range(1, n_written + 1):
+            doc = col.get(ids=[f"{base}_{i:02d}"])["documents"][0]
+            for chunk in doc.split("\n"):
+                assert len(chunk) == 600, f"line was truncated in closet {i}"
+            assert len(doc) <= CLOSET_CHAR_LIMIT
+
+
+# ── purge_file_closets ────────────────────────────────────────────────
+
+
+class TestPurgeFileClosets:
+    def test_deletes_only_the_targeted_source(self, palace_path):
+        col = get_closets_collection(palace_path)
+        col.upsert(
+            ids=["closet_a_01", "closet_b_01"],
+            documents=["a|;|→d1", "b|;|→d2"],
+            metadatas=[
+                {"source_file": "/keep.md", "wing": "w", "room": "r"},
+                {"source_file": "/drop.md", "wing": "w", "room": "r"},
+            ],
+        )
+        purge_file_closets(col, "/drop.md")
+        remaining_ids = set(col.get()["ids"])
+        assert "closet_a_01" in remaining_ids
+        assert "closet_b_01" not in remaining_ids
+
+
+# ── project miner: closet rebuild end-to-end ──────────────────────────
+
+
+class TestMinerClosetRebuild:
+    def test_remine_replaces_closets_completely(self, tmp_path):
+        project = tmp_path / "proj"
+        project.mkdir()
+        (project / "mempalace.yaml").write_text(
+            yaml.dump({"wing": "proj", "rooms": [{"name": "general", "description": "x"}]})
+        )
+        target = project / "doc.md"
+
+        # First mine — long content produces multiple numbered closets.
+        first_topics = "\n\n".join(f"# Topic {i}\n" + ("filler text " * 30) for i in range(15))
+        target.write_text(first_topics)
+        palace = tmp_path / "palace"
+        mine(str(project), str(palace), wing_override="proj", agent="test")
+
+        col = get_closets_collection(str(palace))
+        first_pass = col.get(where={"source_file": str(target)})
+        assert first_pass["ids"], "first mine should have written closets"
+        first_ids = set(first_pass["ids"])
+        assert any("topic 0" in (d or "").lower() for d in first_pass["documents"])
+
+        # Touch mtime + shrink content so the rebuild produces fewer closets.
+        target.write_text("# Only Topic Now\n" + ("short body " * 5))
+        new_mtime = os.path.getmtime(target) + 60
+        os.utime(target, (new_mtime, new_mtime))
+        time.sleep(0.01)
+
+        mine(str(project), str(palace), wing_override="proj", agent="test")
+
+        col = get_closets_collection(str(palace))
+        second_pass = col.get(where={"source_file": str(target)})
+        second_docs = "\n".join(second_pass["documents"]).lower()
+        assert "only topic now" in second_docs
+        for i in range(15):
+            assert (
+                f"topic {i}\n" not in second_docs
+            ), f"stale 'Topic {i}' from first mine survived the rebuild"
+        # Numbered closets that existed only in the larger first run must be gone.
+        leftover = first_ids - set(second_pass["ids"])
+        for stale_id in leftover:
+            assert not col.get(ids=[stale_id])[
+                "ids"
+            ], f"orphan closet {stale_id} from larger first run survived purge"
+
+
+# ── _extract_drawer_ids_from_closet ───────────────────────────────────
+
+
+class TestExtractDrawerIds:
+    def test_parses_single_pointer(self):
+        assert _extract_drawer_ids_from_closet("topic|;|→drawer_x") == ["drawer_x"]
+
+    def test_parses_multiple_pointers_per_line(self):
+        line = "topic|ent|→drawer_a,drawer_b,drawer_c"
+        assert _extract_drawer_ids_from_closet(line) == ["drawer_a", "drawer_b", "drawer_c"]
+
+    def test_dedupes_across_lines(self):
+        doc = "one|;|→drawer_a,drawer_b\ntwo|;|→drawer_b,drawer_c"
+        assert _extract_drawer_ids_from_closet(doc) == ["drawer_a", "drawer_b", "drawer_c"]
+
+    def test_empty_doc_returns_empty(self):
+        assert _extract_drawer_ids_from_closet("") == []
+        assert _extract_drawer_ids_from_closet("no arrows here") == []
+
+
+# ── search_memories closet-first path ────────────────────────────────
+
+
+class TestSearchMemoriesHybrid:
+    def test_pure_drawer_when_no_closets(self, palace_path, seeded_collection):
+        """Palaces without closets return results via direct drawer search —
+        every hit must advertise that the closet signal was absent."""
+        result = search_memories("JWT authentication", palace_path)
+        assert result["results"], "should still find drawer hits"
+        for hit in result["results"]:
+            assert hit.get("matched_via") == "drawer"
+            assert hit.get("closet_boost") == 0.0
+            assert "closet_preview" not in hit
+
+    def test_closet_boost_marks_hit_as_drawer_plus_closet(self, palace_path, seeded_collection):
+        """When a closet agrees with direct search on source_file, the
+        matching drawer's ``matched_via`` switches to ``drawer+closet`` and
+        ``closet_preview`` exposes the hydrated index line."""
+        closets = get_closets_collection(palace_path)
+        # Seed the closet against the same source_file the drawer uses so
+        # the boost lookup keys align.
+        closets.upsert(
+            ids=["closet_proj_backend_aaa_01"],
+            documents=["JWT auth tokens|;|→drawer_proj_backend_aaa"],
+            metadatas=[{"wing": "project", "room": "backend", "source_file": "auth.py"}],
+        )
+
+        result = search_memories("JWT authentication", palace_path)
+        assert result["results"], "hybrid search should still return results"
+        # The JWT-bearing drawer should surface with closet agreement.
+        boosted = [h for h in result["results"] if h["matched_via"] == "drawer+closet"]
+        assert boosted, "closet agreement should promote the matching source"
+        top = boosted[0]
+        assert "JWT" in top["text"]
+        assert top["closet_boost"] > 0
+        assert "→drawer_proj_backend_aaa" in top["closet_preview"]
+
+    def test_max_distance_filters_hybrid_hits(self, palace_path, seeded_collection):
+        closets = get_closets_collection(palace_path)
+        closets.upsert(
+            ids=["closet_proj_backend_aaa_01"],
+            documents=["JWT auth tokens|;|→drawer_proj_backend_aaa"],
+            metadatas=[{"wing": "project", "room": "backend", "source_file": "auth.py"}],
+        )
+        result = search_memories(
+            "completely unrelated query about quantum gardening",
+            palace_path,
+            max_distance=0.001,
+        )
+        for hit in result["results"]:
+            assert hit["distance"] <= 0.001
+
+
+# ── entity metadata ──────────────────────────────────────────────────
+
+
+class TestEntityMetadata:
+    def test_extracts_capitalized_names(self):
+        text = "Ben reviewed the code. Ben approved it. Igor flagged two issues. Igor fixed them."
+        entities = _extract_entities_for_metadata(text)
+        assert "Ben" in entities
+        assert "Igor" in entities
+
+    def test_empty_for_no_entities(self):
+        text = "this is all lowercase with no proper nouns at all"
+        assert _extract_entities_for_metadata(text) == ""
+
+    def test_semicolon_separated(self):
+        text = "Alice and Bob met Charlie. Alice said hello. Bob agreed. Charlie laughed."
+        entities = _extract_entities_for_metadata(text)
+        assert ";" in entities
+
+    def test_stoplist_filters_sentence_starters(self):
+        # Same regression as the closet entity test — "When/After/The" must
+        # not become entities just because they're capitalized 2+ times.
+        text = (
+            "When the build broke, the team paged. "
+            "When the fix landed, the alarm cleared. "
+            "After the rollback, the queue drained. "
+            "After the deploy, the latency normalized."
+        )
+        entities = _extract_entities_for_metadata(text)
+        tokens = set(entities.split(";")) if entities else set()
+        assert "When" not in tokens
+        assert "After" not in tokens
+        assert "The" not in tokens
+
+    def test_capped_list_never_truncates_a_name(self):
+        # 30 distinct repeated proper nouns — extraction should cap the list
+        # before joining so a name never gets cut in half.
+        # Use morphologically distinct stems so the [A-Z][a-z]+ regex sees
+        # each as its own token.
+        names = [
+            "Anna",
+            "Brian",
+            "Carol",
+            "David",
+            "Elena",
+            "Frank",
+            "Grace",
+            "Harold",
+            "Iris",
+            "Julian",
+            "Kira",
+            "Liam",
+            "Maya",
+            "Noah",
+            "Oscar",
+            "Penny",
+            "Quinn",
+            "Rosa",
+            "Sergei",
+            "Tara",
+            "Umar",
+            "Vera",
+            "Walter",
+            "Xander",
+            "Yvonne",
+            "Zachary",
+            "Amelia",
+            "Boris",
+            "Clara",
+            "Dmitri",
+        ]
+        text = " ".join(f"{n} met {n}." for n in names)
+        entities = _extract_entities_for_metadata(text)
+        extracted = [n for n in entities.split(";") if n]
+        assert extracted, "should have extracted some entities"
+        for name in extracted:
+            assert name in names, f"truncation produced a partial token: {name!r}"
+
+    def test_known_registry_is_cached_by_mtime(self, monkeypatch, tmp_path):
+        # Point the registry at a temp file we control, exercise the cache.
+        registry = tmp_path / "known_entities.json"
+        registry.write_text(json.dumps({"people": ["Zelda"]}))
+        from mempalace import miner
+
+        monkeypatch.setattr(miner, "_ENTITY_REGISTRY_PATH", str(registry))
+        miner._ENTITY_REGISTRY_CACHE["mtime"] = None
+        miner._ENTITY_REGISTRY_CACHE["names"] = frozenset()
+
+        first = _load_known_entities()
+        assert "Zelda" in first
+
+        # Second call without changing mtime: must reuse cache, not re-read.
+        read_count = {"n": 0}
+        original_open = open
+
+        def counting_open(path, *a, **kw):
+            if str(path) == str(registry):
+                read_count["n"] += 1
+            return original_open(path, *a, **kw)
+
+        monkeypatch.setattr("builtins.open", counting_open)
+        _load_known_entities()
+        assert read_count["n"] == 0, "registry should not be re-read when mtime unchanged"
+
+        # Bump mtime → cache must invalidate.
+        new_mtime = os.path.getmtime(registry) + 5
+        os.utime(registry, (new_mtime, new_mtime))
+        registry.write_text(json.dumps({"people": ["Zelda", "Link"]}))
+        os.utime(registry, (new_mtime, new_mtime))
+        names = _load_known_entities()
+        assert "Link" in names
+
+
+# ── BM25 hybrid search (real IDF over candidate corpus) ──────────────
+
+
+class TestBM25:
+    def test_scores_positive_for_matching_doc(self):
+        scores = _bm25_scores(
+            "database migration",
+            ["We migrated the database to Postgres.", "unrelated cookery tips"],
+        )
+        assert scores[0] > 0
+        assert scores[1] == 0.0
+
+    def test_scores_zero_when_no_overlap(self):
+        scores = _bm25_scores("quantum physics", ["We built a web app in React"])
+        assert scores == [0.0]
+
+    def test_idf_downweights_terms_present_in_every_doc(self):
+        # "database" appears in every candidate → low IDF → low contribution.
+        # "vacuum" is unique to one → high IDF → that doc dominates.
+        scores = _bm25_scores(
+            "database vacuum",
+            [
+                "database backup nightly schedule",
+                "database vacuum scheduled weekly",
+                "database failover plan",
+            ],
+        )
+        assert scores[1] == max(scores), "doc with the rare query term should win on IDF"
+
+    def test_empty_inputs_return_zeros(self):
+        assert _bm25_scores("", ["hello world"]) == [0.0]
+        assert _bm25_scores("query here", []) == []
+        assert _bm25_scores("query", [""]) == [0.0]
+
+    def test_hybrid_rank_promotes_keyword_match(self):
+        results = [
+            {"text": "database schema design for Postgres", "distance": 0.5},
+            {"text": "unrelated topic about cooking", "distance": 0.3},
+        ]
+        ranked = _hybrid_rank(results, "database Postgres schema")
+        # The keyword-rich result outranks the closer-vector but irrelevant one.
+        assert "database" in ranked[0]["text"]
+        # bm25_score field is exposed for debugging.
+        assert "bm25_score" in ranked[0]
+        # No internal scoring leak.
+        assert "_hybrid_score" not in ranked[0]
+
+    def test_hybrid_rank_absolute_normalization(self):
+        # Adding a much-worse result to the candidate set must NOT reshuffle
+        # the top two — proves we're using absolute (1 - dist) and not
+        # dist / max_dist normalization.
+        base = [
+            {"text": "alpha alpha alpha", "distance": 0.1},
+            {"text": "beta beta beta", "distance": 0.4},
+        ]
+        ranked_short = _hybrid_rank([dict(r) for r in base], "alpha")
+        with_outlier = base + [{"text": "gamma gamma gamma", "distance": 1.9}]
+        ranked_long = _hybrid_rank([dict(r) for r in with_outlier], "alpha")
+        assert ranked_short[0]["text"] == ranked_long[0]["text"]
+        assert ranked_short[1]["text"] == ranked_long[1]["text"]
+
+
+# ── diary ingest ─────────────────────────────────────────────────────
+
+
+class TestDiaryIngest:
+    def test_ingest_creates_drawers_and_closets(self, tmp_path):
+        diary_dir = tmp_path / "diaries"
+        diary_dir.mkdir()
+        (diary_dir / "2026-04-13.md").write_text(
+            "# 2026-04-13\n\n## 10:00 PDT — Test\n\nBuilt the auth system.\n"
+        )
+        palace_dir = tmp_path / "palace"
+
+        from mempalace.diary_ingest import ingest_diaries
+
+        result = ingest_diaries(str(diary_dir), str(palace_dir), force=True)
+        assert result["days_updated"] >= 1
+        assert get_collection(str(palace_dir)).count() >= 1
+
+    def test_ingest_skips_unchanged_on_second_run(self, tmp_path):
+        diary_dir = tmp_path / "diaries"
+        diary_dir.mkdir()
+        (diary_dir / "2026-04-13.md").write_text(
+            "# 2026-04-13\n\n## 10:00 — Test\n\nContent here that's long enough.\n"
+        )
+        palace_dir = tmp_path / "palace"
+
+        from mempalace.diary_ingest import ingest_diaries
+
+        ingest_diaries(str(diary_dir), str(palace_dir), force=True)
+        result = ingest_diaries(str(diary_dir), str(palace_dir))
+        assert result["days_updated"] == 0
+
+    def test_state_file_lives_outside_diary_dir(self, tmp_path):
+        # Regression: the original implementation wrote
+        # ``.diary_ingest_state.json`` *inside* the user's diary directory,
+        # polluting their content folder. State must live under
+        # ``~/.mempalace/state/`` instead.
+        diary_dir = tmp_path / "diaries"
+        diary_dir.mkdir()
+        (diary_dir / "2026-04-13.md").write_text(
+            "# 2026-04-13\n\n## 10:00 — Test\n\nBody content here long enough.\n"
+        )
+        palace_dir = tmp_path / "palace"
+
+        from mempalace.diary_ingest import _state_file_for, ingest_diaries
+
+        ingest_diaries(str(diary_dir), str(palace_dir), force=True)
+
+        # No state file inside the user's diary dir.
+        for entry in diary_dir.iterdir():
+            assert (
+                "diary_ingest" not in entry.name
+            ), f"state file leaked into user diary dir: {entry}"
+
+        # State file does exist under ~/.mempalace/state/.
+        state_path = _state_file_for(str(palace_dir), diary_dir.resolve())
+        assert state_path.exists()
+        # Platform-neutral path check: compare parents rather than a hardcoded
+        # separator string that would fail on Windows (``\.mempalace\state\``).
+        assert state_path.parent.name == "state"
+        assert state_path.parent.parent.name == ".mempalace"
+
+    def test_wing_prefixed_drawer_id_prevents_cross_diary_collision(self, tmp_path):
+        # Regression: the original implementation used
+        # ``drawer_diary_{date_str}`` regardless of wing — two diaries with
+        # the same date in different wings would clobber each other.
+        date_md = "# 2026-04-13\n\n## 10:00 — entry\n\nThis is the day's content.\n"
+
+        # Two separate diary dirs, ingested into the same palace under
+        # different wings. Each must produce a distinct drawer.
+        personal_dir = tmp_path / "personal"
+        personal_dir.mkdir()
+        (personal_dir / "2026-04-13.md").write_text(date_md + "Personal-only marker.\n")
+
+        work_dir = tmp_path / "work"
+        work_dir.mkdir()
+        (work_dir / "2026-04-13.md").write_text(date_md + "Work-only marker.\n")
+
+        palace_dir = tmp_path / "palace"
+
+        from mempalace.diary_ingest import _diary_drawer_id, ingest_diaries
+
+        ingest_diaries(str(personal_dir), str(palace_dir), wing="personal", force=True)
+        ingest_diaries(str(work_dir), str(palace_dir), wing="work", force=True)
+
+        col = get_collection(str(palace_dir))
+        personal_id = _diary_drawer_id("personal", "2026-04-13")
+        work_id = _diary_drawer_id("work", "2026-04-13")
+        assert personal_id != work_id
+
+        personal = col.get(ids=[personal_id])
+        work = col.get(ids=[work_id])
+        assert personal["ids"] == [personal_id]
+        assert work["ids"] == [work_id]
+        assert "Personal-only marker." in personal["documents"][0]
+        assert "Work-only marker." in work["documents"][0]
+
+
+# ── cross-wing tunnels ───────────────────────────────────────────────
+
+
+class TestTunnels:
+    """Tunnels are explicit cross-wing connections stored in
+    ``~/.mempalace/tunnels.json``. Each test points the module-level
+    ``_TUNNEL_FILE`` at a fresh tmp file so tests don't cross-contaminate
+    or touch the user's real tunnels."""
+
+    def setup_method(self):
+        import mempalace.palace_graph as pg
+
+        self._orig = pg._TUNNEL_FILE
+        self._tmpdir = tempfile.mkdtemp()
+        pg._TUNNEL_FILE = os.path.join(self._tmpdir, "tunnels.json")
+
+    def teardown_method(self):
+        import mempalace.palace_graph as pg
+
+        pg._TUNNEL_FILE = self._orig
+        import shutil
+
+        shutil.rmtree(self._tmpdir, ignore_errors=True)
+
+    def test_create_tunnel(self):
+        t = create_tunnel("wing_api", "auth", "wing_db", "users", label="auth uses users table")
+        assert t["id"]
+        assert t["source"]["wing"] == "wing_api"
+        assert t["source"]["room"] == "auth"
+        assert t["target"]["wing"] == "wing_db"
+        assert t["target"]["room"] == "users"
+        assert t["label"] == "auth uses users table"
+
+    def test_list_tunnels_with_and_without_filter(self):
+        create_tunnel("wing_a", "room1", "wing_b", "room2")
+        create_tunnel("wing_a", "room3", "wing_c", "room4")
+        assert len(list_tunnels()) == 2
+        # Filtering by a wing that appears on either endpoint.
+        assert len(list_tunnels("wing_a")) == 2
+        assert len(list_tunnels("wing_c")) == 1
+        assert len(list_tunnels("wing_nonexistent")) == 0
+
+    def test_delete_tunnel(self):
+        t = create_tunnel("wing_x", "r1", "wing_y", "r2")
+        delete_tunnel(t["id"])
+        assert list_tunnels() == []
+
+    def test_dedup_same_endpoints_updates_label(self):
+        create_tunnel("wing_a", "r1", "wing_b", "r2", label="first")
+        create_tunnel("wing_a", "r1", "wing_b", "r2", label="updated")
+        tunnels = list_tunnels()
+        assert len(tunnels) == 1
+        assert tunnels[0]["label"] == "updated"
+
+    def test_follow_tunnels_returns_connected_endpoints(self):
+        create_tunnel("wing_api", "auth", "wing_db", "users")
+        create_tunnel("wing_api", "auth", "wing_frontend", "login")
+        # Unrelated tunnel that must not surface.
+        create_tunnel("wing_other", "notes", "wing_misc", "scratch")
+
+        connections = follow_tunnels("wing_api", "auth")
+        assert len(connections) == 2
+        wings = {c["connected_wing"] for c in connections}
+        assert wings == {"wing_db", "wing_frontend"}
+
+    # ── regression: symmetry, durability, validation, concurrency ─────
+
+    def test_tunnel_is_symmetric(self):
+        """Regression: tunnels are undirected. create(A, B) and create(B, A)
+        must resolve to the same canonical ID and dedupe into one record —
+        the second call updates the label instead of creating a dupe."""
+        first = create_tunnel("wing_a", "r1", "wing_b", "r2", label="forward")
+        second = create_tunnel("wing_b", "r2", "wing_a", "r1", label="reversed")
+        assert first["id"] == second["id"]
+        assert len(list_tunnels()) == 1
+        assert list_tunnels()[0]["label"] == "reversed"
+
+    def test_follow_tunnels_works_from_either_endpoint(self):
+        """Symmetric: you can follow_tunnels from either end of the link."""
+        create_tunnel("wing_api", "auth", "wing_db", "users", label="auth uses users")
+        from_source = follow_tunnels("wing_api", "auth")
+        from_target = follow_tunnels("wing_db", "users")
+        assert len(from_source) == 1
+        assert len(from_target) == 1
+        assert from_source[0]["connected_wing"] == "wing_db"
+        assert from_target[0]["connected_wing"] == "wing_api"
+        # Both surfaces should carry the same label.
+        assert from_source[0]["label"] == "auth uses users"
+        assert from_target[0]["label"] == "auth uses users"
+
+    def test_empty_endpoint_fields_rejected(self):
+        """Regression: create_tunnel must reject empty strings on any
+        endpoint field so the JSON store can't grow phantom tunnels."""
+        import pytest
+
+        for args in [
+            ("", "r1", "wing", "r2"),
+            ("wing", "", "wing", "r2"),
+            ("wing", "r1", "", "r2"),
+            ("wing", "r1", "wing", ""),
+            ("   ", "r1", "wing", "r2"),  # whitespace-only also rejected
+        ]:
+            with pytest.raises(ValueError):
+                create_tunnel(*args)
+
+    def test_corrupt_tunnel_file_does_not_lose_new_writes(self):
+        """A truncated/corrupt tunnels.json (crash mid-write on a system
+        without atomic rename) must not leak into subsequent reads — the
+        file should be treated as empty and a fresh create_tunnel should
+        persist cleanly."""
+        import mempalace.palace_graph as pg
+
+        # Simulate a crash that left a truncated file behind.
+        with open(pg._TUNNEL_FILE, "w") as f:
+            f.write("{not valid json")
+
+        # Load should return [] rather than raising.
+        assert list_tunnels() == []
+
+        # A subsequent create must persist (atomic write replaces the corrupt file).
+        t = create_tunnel("wing_a", "r1", "wing_b", "r2")
+        assert list_tunnels() == [t]
+
+    def test_atomic_write_leaves_no_stray_tmp_file(self):
+        """Regression: _save_tunnels uses write-then-os.replace. After a
+        successful create, there must be no leftover ``tunnels.json.tmp``."""
+        import mempalace.palace_graph as pg
+
+        create_tunnel("wing_a", "r1", "wing_b", "r2")
+        assert os.path.exists(pg._TUNNEL_FILE)
+        assert not os.path.exists(pg._TUNNEL_FILE + ".tmp")
+
+    def test_concurrent_creates_preserve_all_tunnels(self):
+        """Regression: two concurrent create_tunnel calls must not clobber
+        each other. Without the mine_lock around load+save, the later
+        writer's snapshot would overwrite the earlier writer's tunnel."""
+        barrier = threading.Barrier(5)
+        errors: list = []
+
+        def worker(i):
+            try:
+                barrier.wait(timeout=2)
+                create_tunnel(f"wing_{i}", "r", "wing_shared", "hub")
+            except Exception as e:
+                errors.append(e)
+
+        threads = [threading.Thread(target=worker, args=(i,)) for i in range(5)]
+        for t in threads:
+            t.start()
+        for t in threads:
+            t.join()
+
+        assert not errors, f"worker raised: {errors}"
+        tunnels = list_tunnels()
+        assert len(tunnels) == 5, (
+            f"expected 5 concurrent tunnels, got {len(tunnels)} — " "write race dropped some"
+        )
+
+    def test_created_at_is_timezone_aware(self):
+        """Regression: created_at must be tz-aware UTC, not naive."""
+        t = create_tunnel("wing_a", "r1", "wing_b", "r2")
+        # ISO format with tz offset contains '+' or 'Z'.
+        assert t["created_at"].endswith("+00:00") or t["created_at"].endswith("Z")
+
+
+# ── drawer-grep neighbor expansion ────────────────────────────────────
+#
+# When a closet hit lands on a drawer whose chunk boundary clips a thought
+# (matched chunk says "here's a breakdown:" and the breakdown lives in the
+# next chunk), the closet path now expands to ±1 neighbor chunks from the
+# same source file. These tests pin that behavior end-to-end and at the
+# helper level.
+
+
+class TestDrawerGrepExpansion:
+    def _seed_source_file(self, palace_path, source: str, n_chunks: int):
+        """Helper: put N sequential drawers for a single source file into
+        the palace and return the drawer IDs keyed by chunk_index."""
+        col = get_collection(palace_path)
+        ids = [f"drawer_test_room_{source.replace('/', '_')}_{i:03d}" for i in range(n_chunks)]
+        docs = [f"chunk_{i} content about topic alpha" for i in range(n_chunks)]
+        metas = [
+            {
+                "wing": "test",
+                "room": "room",
+                "source_file": source,
+                "chunk_index": i,
+                "filed_at": "2026-04-13T00:00:00",
+            }
+            for i in range(n_chunks)
+        ]
+        col.upsert(ids=ids, documents=docs, metadatas=metas)
+        return col, {i: ids[i] for i in range(n_chunks)}
+
+    def test_expand_returns_matched_plus_neighbors(self, palace_path):
+        col, by_idx = self._seed_source_file(palace_path, "/proj/doc.md", n_chunks=5)
+        matched_meta = {"source_file": "/proj/doc.md", "chunk_index": 2}
+        matched_doc = "chunk_2 content about topic alpha"
+
+        out = _expand_with_neighbors(col, matched_doc, matched_meta, radius=1)
+        assert out["drawer_index"] == 2
+        assert out["total_drawers"] == 5
+        # Expect chunks 1, 2, 3 joined in chunk_index order.
+        text = out["text"]
+        assert "chunk_1" in text
+        assert "chunk_2" in text
+        assert "chunk_3" in text
+        # No leakage of non-neighbors.
+        assert "chunk_0" not in text
+        assert "chunk_4" not in text
+        # Ordering preserved — chunk_1 before chunk_2 before chunk_3.
+        assert text.index("chunk_1") < text.index("chunk_2") < text.index("chunk_3")
+
+    def test_expand_at_start_of_file_only_has_next_neighbor(self, palace_path):
+        col, _ = self._seed_source_file(palace_path, "/proj/edge_start.md", n_chunks=3)
+        out = _expand_with_neighbors(
+            col,
+            "chunk_0 content",
+            {"source_file": "/proj/edge_start.md", "chunk_index": 0},
+        )
+        assert out["drawer_index"] == 0
+        assert out["total_drawers"] == 3
+        assert "chunk_0" in out["text"]
+        assert "chunk_1" in out["text"]
+        # No chunk_-1 could exist; the expansion must not invent one.
+        assert "chunk_-1" not in out["text"]
+
+    def test_expand_at_end_of_file_only_has_prev_neighbor(self, palace_path):
+        col, _ = self._seed_source_file(palace_path, "/proj/edge_end.md", n_chunks=3)
+        out = _expand_with_neighbors(
+            col,
+            "chunk_2 content",
+            {"source_file": "/proj/edge_end.md", "chunk_index": 2},
+        )
+        assert out["drawer_index"] == 2
+        assert out["total_drawers"] == 3
+        assert "chunk_1" in out["text"]
+        assert "chunk_2" in out["text"]
+        # No chunk_3 exists.
+        assert "chunk_3" not in out["text"]
+
+    def test_expand_single_drawer_file_returns_just_matched(self, palace_path):
+        col, _ = self._seed_source_file(palace_path, "/proj/lone.md", n_chunks=1)
+        out = _expand_with_neighbors(
+            col,
+            "chunk_0 content",
+            {"source_file": "/proj/lone.md", "chunk_index": 0},
+        )
+        assert out["drawer_index"] == 0
+        assert out["total_drawers"] == 1
+        assert out["text"] == "chunk_0 content about topic alpha"
+
+    def test_expand_falls_back_when_metadata_missing(self, palace_path):
+        col = get_collection(palace_path)
+        # No source_file / chunk_index in meta — degrade gracefully.
+        out = _expand_with_neighbors(col, "matched doc", {})
+        assert out["text"] == "matched doc"
+        assert out["drawer_index"] is None
+        assert out["total_drawers"] is None
+
+    def test_hybrid_search_enrichment_populates_drawer_index_and_total(self, palace_path):
+        """End-to-end: when a closet boosts a source with many drawers, the
+        enrichment step runs drawer-grep across all chunks of that source
+        and exposes drawer_index + total_drawers on the hit (so the client
+        knows which chunk was expanded around)."""
+        col = get_collection(palace_path)
+        source = "/proj/indexed.md"
+        # Seed 5 drawers for one source file.
+        for i in range(5):
+            col.upsert(
+                ids=[f"drawer_proj_backend_indexed_{i:03d}"],
+                documents=[f"chunk_{i} talks about JWT authentication flow"],
+                metadatas=[
+                    {
+                        "wing": "project",
+                        "room": "backend",
+                        "source_file": source,
+                        "chunk_index": i,
+                        "filed_at": "2026-04-13T00:00:00",
+                    }
+                ],
+            )
+        # Closet pointing at chunk_2 for this source.
+        closets = get_closets_collection(palace_path)
+        closets.upsert(
+            ids=["closet_proj_backend_indexed_01"],
+            documents=["JWT auth|;|→drawer_proj_backend_indexed_002"],
+            metadatas=[{"wing": "project", "room": "backend", "source_file": source}],
+        )
+
+        result = search_memories("JWT authentication", palace_path)
+        assert result["results"]
+        # The hybrid path promotes the closet-agreeing source to drawer+closet.
+        boosted = [h for h in result["results"] if h["matched_via"] == "drawer+closet"]
+        assert boosted, "hybrid search should mark the closet-agreeing source"
+        top = boosted[0]
+        assert top["total_drawers"] == 5
+        assert isinstance(top["drawer_index"], int)
+        # Enriched text must include the grep-best chunk plus one neighbor
+        # on each side (chunk boundary may clip).
+        assert "chunk_" in top["text"]