7192552624a089e73162a94e1c68d6c7c0da3d93
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7192552624 |
test: make diary state path assertion platform-neutral
The Windows CI job failed on:
assert '/.mempalace/state/' in str(state_path)
because Windows uses ``\`` as the path separator, so the substring
never matches. The behavior under test (state file lives outside the
diary dir, under ``~/.mempalace/state/``) is already correct on both
platforms — only the assertion was Unix-only.
Switch to ``state_path.parent`` comparisons that work on any OS.
|
||
|
|
6b7dcc53d4 |
merge: pr/closet-llm-generic + harden LLM regen path for production
Brings in PR #793 (optional LLM-based closet regeneration via user-configured OpenAI-compatible endpoint) and PR #795 (hybrid closet+drawer search — closets boost, never gate). Stack: #784 → #788 → #789 → #790 → #791 → #792 → #793 (+ #795). Findings hardened on our side ───────────────────────────── 1) closet_llm.regenerate_closets didn't use the blessed palace helpers. Before: * manual closets_col.get(where=...) + .delete(ids=...) with a silent ``except Exception: pass`` around both — if the purge failed, pre-existing regex closets survived alongside fresh LLM closets, giving the searcher double hits for the same source. * ``source.split('/')[-1][:30]`` to build the closet_id — quietly wrong on Windows paths (``C:\\proj\\a.md`` has no ``/``, so the whole string ends up in the ID). * no mine_lock around purge+upsert — a concurrent regex rebuild of the same source could interleave with our purge and leave a mix of regex and LLM pointers. * no ``normalize_version`` stamp on the LLM closets — the miner's stale-version gate would treat them as leftovers from an older schema and rebuild over them on the next mine. After: routes through ``purge_file_closets`` + ``mine_lock`` + ``os.path.basename`` + ``NORMALIZE_VERSION`` stamp. Regression tests cover each. 2) searcher.search_memories was still closet-first. PR #795 merged into #793's head to fix the recall regression documented in that PR (R@1 0.25 on narrative content vs. 0.42 baseline). The hybrid design makes closets a ranking boost rather than a gate: drawers are always queried at the floor, and matching closet hits (rank 0-4 within CLOSET_DISTANCE_CAP=1.5) add a boost of 0.40/0.25/0.15/0.08/0.04 to the effective distance. Merged to take the incoming hybrid design, with two cleanups: * kept the ``_expand_with_neighbors`` / ``_extract_drawer_ids_from_closet`` helpers as separately-tested utilities (still imported by tests and future callers); * replaced the fragile ``source_file.endswith(basename)`` reverse- lookup in the enrichment step with internal ``_source_file_full`` / ``_chunk_index`` fields stripped before return, so enrichment doesn't silently pick the wrong path when two sources share a basename across directories; * drawer-grep enrichment now sorts by ``chunk_index`` before neighbor expansion, so ``best_idx ± 1`` corresponds to actual document order rather than whatever order Chroma returned. 3) Closet-first tests in test_closets.py (``TestSearchMemoriesClosetFirst``, end-to-end ``test_closet_first_search_includes_drawer_index_and_total``) pinned contracts that the hybrid path now violates (``matched_via`` went from ``"closet"`` to ``"drawer+closet"``). Rewrote them around the new invariant: direct drawers are always the floor, closet agreement flips the hit's matched_via and exposes closet_preview. Verification ──────────── * 805/805 pass under ``uv run pytest tests/ -v --ignore=tests/benchmarks`` (13 new tests from PR #793 + 5 from PR #795 + 2 new regressions for the closet_llm hardening + the rewritten hybrid assertions in test_closets.py). * CI-pinned ruff 0.4.x clean on ``mempalace/`` + ``tests/`` (check + format both pass). * No new deps — closet_llm.py still uses stdlib ``urllib.request`` per the PR's "zero new dependencies" promise. Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com> |
||
|
|
e9201fb617 |
merge: pr/cross-wing-tunnels + rebuild drawer-grep on hardened closet path
Merges the full hardened stack (#788 closets, #789 entity/BM25/diary, #790 tunnels) and reimplements the drawer-grep feature in a way that composes with the chunk-level closet-first search instead of fighting it. ## Background The original PR added "drawer-grep" on top of the pre-hardening closet code that returned whole-file blobs. My #788 hardening changed that path to return *chunk-level* hits by parsing each closet's ``→drawer_id`` pointers and hydrating exactly those drawers. That made the original drawer-grep grep-over-all-drawers logic redundant — the closet already points at the relevant chunk. What remained valuable from the original PR was the *context expansion* idea: a chunk boundary can clip a thought mid-stride (matched chunk says "here's a breakdown:" and the breakdown lives in the next chunk), so callers want ±1 neighbor chunks for free rather than a follow-up get_drawer call. ## Change New ``_expand_with_neighbors(drawers_col, doc, meta, radius=1)`` helper in searcher.py: * Reads ``source_file`` + ``chunk_index`` from the matched drawer's metadata. * Fetches the ±radius sibling chunks in a SINGLE ChromaDB query using ``$and + $in`` — no "fetch all drawers for source" blowup. * Sorts retrieved chunks by chunk_index, joins with ``\n\n``. * Does a cheap metadata-only second query to compute ``total_drawers`` so callers know where in the file they landed. * Graceful fallback to the matched doc alone on any ChromaDB failure or missing metadata — search never breaks because expansion failed. ``_closet_first_hits`` now calls this helper and tags each hit with ``drawer_index`` + ``total_drawers``. Hit shape stays consistent with the direct-search path (both still carry ``matched_via``) so callers can't tell which path produced a given hit except via that field. ## Tests 6 new cases in TestDrawerGrepExpansion: * neighbors returned in chunk_index order (not hash order) * edge case: matched chunk at index 0 — only next neighbor surfaces * edge case: matched chunk at last index — only prev neighbor surfaces * edge case: 1-drawer file — returns just the matched doc * missing/non-int chunk_index metadata — graceful fallback * end-to-end via ``search_memories`` — closet-first hit carries drawer_index, total_drawers, and includes ±1 neighbors 761/761 suite pass; ruff + format clean on CI-pinned 0.4.x. Merge resolutions: miner.py kept develop's purge+NORMALIZE_VERSION; searcher.py dropped the old whole-file-blob block entirely in favor of rebuilding context expansion on top of ``_closet_first_hits``; test_closets.py took develop's 47-test baseline and appended TestDrawerGrepExpansion. |
||
|
|
20255b05be |
merge: develop + harden cross-wing tunnels for production
Merges the hardened closet/entity/BM25/diary stack from #789 and fixes five correctness/durability issues in the tunnels module plus the directional/symmetric design question. ## Design: tunnels are now symmetric Per review discussion: a tunnel represents "these two things relate", not "A causes B". The canonical ID now hashes the *sorted* endpoint pair, so ``create_tunnel(A, B)`` and ``create_tunnel(B, A)`` resolve to the same record and the second call updates the label rather than creating a duplicate. ``follow_tunnels`` can be called from either endpoint and surfaces the other side consistently. The returned dict still preserves ``source``/``target`` in the order the caller supplied, so UIs that want to render the connection directionally can do so. ## Correctness fixes * **Atomic write** — ``_save_tunnels`` writes to ``tunnels.json.tmp`` and ``os.replace``s it into place. A crash mid-write can no longer leave a truncated file that silently reads back as ``[]`` and wipes every tunnel. Includes ``f.flush() + os.fsync`` before replace on platforms that support it. * **Concurrent-write lock** — ``create_tunnel`` and ``delete_tunnel`` wrap the load→mutate→save cycle in ``mine_lock(_TUNNEL_FILE)``. Without this, two agents creating tunnels simultaneously would both read the same snapshot and the later writer would drop the earlier writer's tunnel. * **Corrupt-file tolerance** — ``_load_tunnels`` now uses a context manager, validates that the loaded JSON is a list, and returns ``[]`` for any read failure. Subsequent ``create_tunnel`` then overwrites the corrupt file via atomic write — no manual recovery needed. * **Input validation** — new ``_require_name`` helper rejects empty or whitespace-only wing/room names with a clear ``ValueError``. Prevents phantom tunnels with blank endpoints from ever reaching the JSON store. * **Timezone-aware timestamps** — ``created_at`` / ``updated_at`` now use ``datetime.now(timezone.utc).isoformat()``, matching diary ingest and other recent modules. ## Tests (12 in TestTunnels) 5 original + 7 regression cases: * ``test_tunnel_is_symmetric`` — A↔B and B↔A dedupe to one record. * ``test_follow_tunnels_works_from_either_endpoint`` — symmetric surface. * ``test_empty_endpoint_fields_rejected`` — validation guard. * ``test_corrupt_tunnel_file_does_not_lose_new_writes`` — truncated JSON treated as empty; next create persists cleanly. * ``test_atomic_write_leaves_no_stray_tmp_file`` — no leftover ``.tmp``. * ``test_concurrent_creates_preserve_all_tunnels`` — 5 threads each create a distinct tunnel; all 5 persisted (regression for the read-modify-write race). * ``test_created_at_is_timezone_aware`` — ISO8601 has tz suffix. Merge resolutions: tests/test_closets.py combined develop's hardened closet/entity/BM25/diary tests with this PR's TestTunnels class. 755/755 tests pass. ruff + format clean under CI-pinned 0.4.x. |
||
|
|
32d7f4376b |
merge: develop + harden entity metadata, BM25, and diary ingest for production
Merges develop (closet hardening #826, strip_noise #785, lock #784) and replaces every sub-feature in this PR with a correct, tested implementation. Shippable now. ## 1. Real Okapi-BM25 (searcher.py) The prior `_bm25_score()` hardcoded `idf = log(2.0)` for every term — it was really a scaled TF, not BM25, and couldn't tell a discriminative term from a generic one. Replaced with `_bm25_scores(query, documents)` that computes proper IDF over the provided candidate corpus using the Lucene smoothed formula `log((N - df + 0.5) / (df + 0.5) + 1)`. Well- defined for re-ranking vector-retrieval candidates — IDF there measures how discriminative each term is *within the candidate set*, exactly the signal we want. `_hybrid_rank` also fixed: - Vector normalization is now absolute `max(0, 1 - dist)`, not `1 - dist/max_dist` — adding/removing a candidate no longer reshuffles the others. - BM25 is min-max normalized within candidates (bounded [0, 1]). - Closet path now re-ranks too (was previously returning closet-order hits without hybrid scoring). - `_hybrid_score` internal field stripped from output; `bm25_score` exposed for debugging. ## 2. Entity metadata (miner.py) - Reuses `_ENTITY_STOPLIST` from palace.py so sentence-starters like "When", "After", "The" no longer land as entities (regression test covers this). - Known-entity registry is cached at module level, keyed by the registry file's mtime — no more disk read per drawer. - File handle now uses a context manager. - Truncates the entity LIST (to 25) before joining — never splits a name in the middle. ## 3. Diary ingest (diary_ingest.py) - State file now lives at `~/.mempalace/state/diary_ingest_<hash>.json`, keyed by (palace_path, diary_dir). No more pollution of the user's content directory. - Drawer IDs now hash `(wing, date_str)` — a user with personal + work diaries on the same day no longer silently clobbers. - Each day's upsert runs inside `mine_lock(source_file)` so concurrent ingest from two terminals can't race. - `force=True` now calls `purge_file_closets` before rebuild so leftover numbered closets from a longer prior day don't orphan. ## 4. Tests (tests/test_closets.py) Merged this PR's MineLock/Entity/BM25/Diary tests with develop's hardened Build/Upsert/Purge/Rebuild/SearchClosetFirst tests. Added specific regression tests for every fix above: - entity stoplist applies (no "When/After/The") - entity list capped before join (no partial tokens) - registry cached by mtime (mock-verified zero re-reads) - BM25 IDF downweights terms present in every doc (real BM25 evidence) - hybrid rank absolute normalization stable against outliers - diary state file outside user's diary dir - diary wing-prefixed IDs prevent cross-wing date collisions 35/35 closet tests pass; full suite 743/743. ruff + format clean under CI-pinned 0.4.x. |
||
|
|
21d4a23430 |
merge: develop + harden closet layer for production
Merges develop (#820 version sync, #785 strip_noise + NORMALIZE_VERSION, #784 file locking) and addresses six concerns surfaced during PR review of the closet feature: 1. Closet append-on-rebuild bug — upsert_closet_lines used to APPEND to existing closets (mismatched the doc's "fully replaced" promise). With NORMALIZE_VERSION rebuilds on develop, this would have stacked stale v1 topics on top of fresh v2 content forever. Fix: - Drop the read-and-append branch from upsert_closet_lines (now a pure numbered-id overwrite). - Add purge_file_closets(closets_col, source_file) helper that wipes every closet for a source file by where-filter. - process_file calls purge_file_closets before upsert on every mine, mirroring the existing drawer purge. 2. Searcher returned whole-file blobs from the closet path while the direct path returned chunk-level drawers. Refactored: - _extract_drawer_ids_from_closet parses the `→drawer_a,drawer_b` pointers out of closet documents. - _closet_first_hits hydrates exactly those drawer IDs (chunk-level), not collection.get(where=source_file) (which returned everything). - Same hit shape as direct-search path; both now carry matched_via. 3. max_distance was bypassed on the closet path. Now applied per-hit; when every closet candidate gets filtered, _closet_first_hits returns None and the caller falls through to direct drawer search. 4. Entity extraction caught sentence-starters like "When", "The", "After" as proper nouns. Added _ENTITY_STOPLIST (~40 common false positives + day/month names + role words). Real names like Igor / Milla still survive — covered by tests. 5. CLOSETS.md drifted from the code (claimed "replaced via upsert" but code appended; claimed BM25 hybrid that doesn't exist; claimed a 10K char hydration cap that wasn't enforced). Rewritten to describe what actually ships, with explicit notes on the BM25 / convo-closet follow-ups. 6. Zero tests for ~250 lines. Added tests/test_closets.py with 17 cases: - build_closet_lines: pointer shape, header extraction, stoplist filtering (with regression case for "When/After/The"), real-name survival, fallback-line guarantee, drawer-ref slicing. - upsert_closet_lines: pure overwrite semantics (regression for the append bug), char-limit packing without splitting lines. - purge_file_closets: scoped to source_file, doesn't touch others. - End-to-end miner rebuild: re-mining a file with fewer topics fully purges leftover numbered closets from the larger first run. - _extract_drawer_ids_from_closet: parsing + dedup edge cases. - search_memories closet-first: fallback when empty, chunk-level hits with matched_via, no whole-file glue, max_distance enforced. Merge resolutions: miner.py imports combined NORMALIZE_VERSION/mine_lock from develop with the closet helpers from this branch. process_file auto-merged cleanly (closet block sits inside develop's lock body). 724/724 tests pass. ruff + format clean under CI-pinned 0.4.x. |
||
|
|
e2a9bb05d3 |
test: add TestTunnels for cross-wing tunnel operations
Appended from Milla's omnibus test_closets.py — covers create, list, delete, dedup, and follow_tunnels behavior. 21/21 pass. Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com> |
||
|
|
f72ffbbcb2 |
test: add tests for mine_lock, closets, entity metadata, BM25, diary
Trimmed version of Milla's omnibus test_closets.py to only cover features present in this PR stack (#784 lock, #788 closets, this PR's entity/BM25/diary). Strip-noise tests will land with #785; tunnel tests will land with the tunnels PR. 16/16 pass. Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com> |