Addresses CI lint feedback on PR #1391. No behavior change.
- Split `import json, sys` into separate lines (E401)
- Split chained `print(...); sys.exit(1)` into two lines (E702, two occurrences)
- Split inline `if ts: stamps.append(ts)` into two lines (E701)
Verified: `ruff check tools/render_jsonl.py` reports "All checks passed!"
Tool still renders correctly (3 turns from a real JSONL test, identical output to pre-fix).
Adds a brief [!IMPORTANT] callout at the top of the README pointing
users to the urgent announcement at #1388. Claude Code auto-deletes
local JSONL transcripts after 30 days; users without the auto-save
hooks wired are losing transcript data off the rolling window.
Ships 4 small standalone tools at tools/:
- backup_claude_jsonls.sh — rsync ~/.claude/projects/ to a safe folder
- render_jsonl.py — convert JSONL transcripts to readable text
- find_orphan_claude_jsonls.sh — scan backup locations for orphan
Claude Code transcripts (multi-line shape detection + topic preview)
- save.md — Claude Code slash command for manual /save into MemPalace
Tools verified by independent agent against v3.3.4 source.
Read-only on user data. POSIX bash + Python stdlib only.
A transient chromadb exception inside `_get_collection` was swallowed by
the bare `except Exception: return None`, leaving every subsequent tool
call hitting the same poisoned cache silently. The fix wraps the body
in a `for attempt in range(2)` loop: on attempt 0 failure, log via
`logger.exception(...)` and clear `_client_cache` / `_collection_cache`
/ `_metadata_cache` so the next iteration forces `_get_client()` to
rebuild from scratch — that path now re-runs `quarantine_stale_hnsw`
(per #1322), so the second attempt heals the common stale-handle case
automatically. If both attempts fail, return `None` (matches the prior
contract for permanent failures).
Two new tests in `tests/test_mcp_server.py::TestCacheInvalidation`:
- `test_get_collection_retries_once_on_exception` — first attempt raises
via a monkeypatched `_get_client`, second attempt succeeds; assert the
caller gets the collection back, not None.
- `test_get_collection_returns_none_after_two_failures` — both attempts
fail, assert we exhaust the loop and return None (no infinite retry).
Surgical extraction from PR #1286, which carried the same fix idea
(plus a fork-sync bundle that couldn't be merged); credit to the
original author below.
Co-authored-by: Jeffrey Hein <jp@jphein.com>
Five small hardening fixes for the from-sqlite rebuild path, all from
mjc's review on #1310:
- repair.py: drawers collection name now resolves from
MempalaceConfig().collection_name via _drawers_collection_name() (closets
stays fixed by design — AAAK index references drawer IDs by string).
Lines up with the broader configured-collection work in #1312 so that
PR can rebase cleanly on top.
- repair.py: create_collection() moved inside the try block in
_rebuild_one_collection so a Chroma "Collection already exists" failure
surfaces as RebuildPartialError with archive_path, not an unstructured
exception that strands the user without recovery instructions.
- repair.py: rebuild_from_sqlite wraps backend lifetime in try/finally
with backend.close() so PersistentClient handles to dest_palace are
released on every exit path. The from-sqlite path post-dates #1285's
lifecycle hardening of the legacy rebuild, so this needed its own
cleanup.
- cli.py: cmd_repair (from-sqlite mode) now exits non-zero when
rebuild_from_sqlite returns {} (validation refusal sentinel), so
unattended scripts/CI distinguish "invalid inputs" from a successful
rebuild that legitimately found zero rows.
- tests/test_repair.py: test_extract_via_sqlite_returns_all_rows_with_metadata
now asserts every backing segment is scope='METADATA', locking in the
segment-layout assumption against future regressions that point the
JOIN at the VECTOR segment.
New test coverage:
- test_rebuild_from_sqlite_honors_configured_drawer_collection_name
- test_cmd_repair_from_sqlite_validation_refusal_exits_nonzero
- test_cmd_repair_from_sqlite_success_does_not_exit
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both `--mode legacy` and the inline `cli.cmd_repair` rebuild path
call `Collection.count()` as their first read — the same call that
raises `chromadb.errors.InternalError: Failed to apply logs to the
hnsw segment writer` on the corruption class reported in #1308.
Repair would print "Cannot recover — palace may need to be re-mined
from source files" even though the underlying SQLite tables were
fully intact.
The new `--mode from-sqlite` reads `(id, document, metadata)` rows
directly from `chroma.sqlite3` via `segments` → `embeddings` →
`embedding_metadata` joins, never opens a chromadb client against
the corrupt palace, and re-upserts everything into a fresh palace.
- `--source PATH` extracts from a corrupt palace already moved aside
- `--archive-existing` handles the in-place case by renaming the
existing palace to `<palace>.pre-rebuild-<timestamp>` first
- Partial-rebuild failures raise `RebuildPartialError` with the
archive path so users can recover; CLI exits non-zero
- In-place mode calls `SharedSystemClient.clear_system_cache()` to
drop chromadb's process-wide System registry (cross-palace use
does not, to limit blast radius for library callers)
- Source validation runs before any destructive moves
Verified end-to-end recovering a 52,300-row real-world corrupt
palace.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- B904: chain OSError/collection errors with "raise ... from e" in
normalize.py and searcher.py so the original traceback is preserved.
- B007: rename unused loop variables to _name in dedup, dialect, layers,
and room_detector_local.
- S110/S112: replace bare "try/except/pass" and "try/except/continue"
with logger.debug(..., exc_info=True) in mcp_server, searcher,
palace, palace_graph, miner, convo_miner, and fact_checker so
background failures are observable without changing behaviour.
A module-level logger ("mempalace_mcp", matching mcp_server/searcher)
is added to the five files that didn't already have one. Configured
ruff checks (E/F/W/C901) and ruff --select B, S110, S112 all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LLMConfig accepted any URL scheme from LLM_ENDPOINT / --endpoint,
so a misconfigured endpoint such as file:///etc/passwd would be
passed straight to urllib.request.urlopen. Validate the scheme at
construction time and raise ValueError on anything other than
http/https, preserving the "privacy by architecture" guarantee.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without ensure_ascii=False, non-ASCII characters (e.g. Chinese) in tool
results and JSON-RPC responses are escaped as \uXXXX, which causes
downstream MCP clients to receive escaped text instead of the original
characters. This affects all platforms, not just Windows.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the MCP client sends a malformed or null top-level request, prevent the AttributeError on request.get() by explicitly validating that the request is a dictionary. Returns standard JSON-RPC Error -32600 (Invalid Request) instead of crashing the server.
``search_memories`` computes ``effective_dist = dist - boost`` where
``boost`` can be as large as ``CLOSET_RANK_BOOSTS[0] == 0.40`` for a
rank-0 closet hit. When the raw drawer distance is small — any
near-exact match — the subtraction goes negative.
Two downstream effects:
1. Line 418 returns ``round(max(0.0, 1 - effective_dist), 3)`` as
``similarity``. With ``effective_dist = -0.30`` that yields
``similarity = 1.30``, outside the documented ``[0, 1]`` range.
The ``max(0.0, ...)`` only prevents negative similarities; it does
not cap above 1.
2. Line 427 stores ``_sort_key: effective_dist`` and line 435 sorts
``scored`` ascending by that key. A negative key drops *below* the
rest, so the strongest hybrid matches end up sorting after weaker
ones — ranking inversion under the exact conditions hybrid retrieval
is supposed to serve best.
Clamp ``effective_dist`` to the valid cosine-distance range ``[0, 2]``.
The boost still wins (closet-backed hit still ranks first), it just no
longer flips the order.
Test added: mock drawer_col (base dist 0.08 / 0.35 for two sources) +
closet_col (rank-0 closet for the 0.08 source) → assert all hits have
``0 <= similarity <= 1`` and ``0 <= effective_distance <= 2``, and that
the closet-boosted source still ranks first.
Relationship to other PRs:
* **#988** clamps the output ``similarity`` alone. That does not fix
the sort-key inversion or the invalid ``effective_distance`` in the
returned dict. This PR clamps at the arithmetic source so both
downstream users of the value stay in range.
* Orthogonal to **#979** (``tool_check_duplicate`` negative similarity).
ChromaDB can return None entries in metadatas/documents lists under
partial-flush, mid-delete, upgrade-boundary, and interrupted-mine
states. Add `meta = meta or {}` and `doc = doc or ""` guards in the
three result loops (search display, closet hybrid, drawer scored) so
.get() and .strip() calls never crash on None.
Fixes#1007, #1011
The alias was placed below an explanatory comment block introduced by
#1305, which trips ruff E402 (module-level import not at top of file).
Moved next to the existing 'from mempalace.hooks_cli import (...)' line.
CI lint went red on develop after #1305 merged with the failing check;
this re-greens it so subsequent PRs do not inherit the failure.
Bundled CHANGELOG entries for the seven Tier-1 PRs merged today, including
the behavior-change call-out for #1167 (KG date validators now reject
non-ISO inputs that previously produced silent empty results).
Without this, on ext4 (and similar) filesystems the rename ack does not
guarantee durability across power loss — a crash can revert to a state
where the temp file is present and the target is at the old version.
Suggested by @jphein on #1215.
EntityRegistry.save() called Path.write_text() directly, which truncates
the target file and then writes — so a crash mid-write (power loss, OOM,
filesystem-full mid-flush) leaves an empty or half-written
entity_registry.json. The whole people/projects map is lost; the system
falls back to an empty registry on next load.
Switch to the standard atomic-write pattern: serialize to a sibling
.tmp file in the same directory (so os.replace stays on one filesystem),
fsync, chmod 0o600, then os.replace over the target. The replace is
atomic on POSIX and Windows, so any crash leaves the previous registry
intact instead of a truncated file.
Tests cover: no leftover .tmp on success, and previous content preserved
when os.replace itself raises mid-save.