Symptom on the canonical disks daemon: drift quarantines firing every
10–30 minutes throughout the day under steady write load. Logs show
.drift-* directories accumulating despite the daemon being the only
writer (no Syncthing replication of palace data).
Root cause is a false-positive thrash in the quarantine heuristic:
- chroma.sqlite3 mtime bumps on every write (millisecond cadence).
- HNSW segment files (data_level0.bin) only flush to disk on
chromadb's internal cadence, which can lag minutes behind sqlite
under continuous write load.
Once the gap exceeds the 300s threshold, quarantine_stale_hnsw renames
a perfectly valid HNSW segment, chromadb rebuilds it from scratch, and
the cycle repeats as soon as the next batch of writes lands. The 300s
threshold (lowered from 3600s in PR #1173 after a 0.96h-drift production
segfault) is correct for the cross-machine-replication failure mode it
was designed for, but wrong for a daemon-strict deployment whose only
"drift" source is its own benign flush lag.
Fix: gate the proactive quarantine check to the first ``make_client()``
invocation per palace per process (``ChromaBackend._quarantined_paths``
set). Real cold-start drift (replication, partial restore, crashed-mid-
write) still gets caught — that's exactly when a fresh daemon process
opens the palace. Real runtime drift on observed HNSW errors still gets
caught via palace-daemon's ``_auto_repair`` which calls
``quarantine_stale_hnsw`` directly, bypassing this gate.
Two new tests in test_backends.py verify single-fire-per-palace and
per-palace independence. Conftest clears the gate between tests.
Suite 1362/1362 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChromaDB defaults HNSW index to L2 (Euclidean) distance, but
MemPalace scoring uses 1-distance which requires cosine (range 0-2).
Add metadata={"hnsw:space": "cosine"} to the 4 production and 3 test
call sites that were missing it.
Closes#218
The MCP server previously created a new PersistentClient on every tool
call via _get_collection(). This incurs HNSW index loading overhead
on each request.
Cache the client and collection at module level. The cache resets
naturally on process restart (MCP runs as a subprocess).
Also adds a _reset_mcp_cache fixture to conftest.py for test isolation.
Includes test infrastructure from PR #131.
92 tests pass.
- Switch CI install step from `pip install -r requirements.txt` to
`pip install -e ".[dev]"` since requirements.txt was removed
- Add noqa: E402 to intentionally-late imports in conftest.py
(HOME must be isolated before mempalace imports)
- Remove unused KnowledgeGraph import in test_knowledge_graph.py
- Apply ruff formatting to test files