Commit Graph

783 Commits

Author SHA1 Message Date
Igor Lins e Silva 670aba974f test(repair): close ChromaBackend in _seed_palace to release Windows file locks
The helper opened a chromadb PersistentClient via ChromaBackend and never
closed it, leaving rust-side SQLite/HNSW file locks alive after the
helper returned. On Windows that blocks the in-place archive rename
inside rebuild_from_sqlite with WinError 32 on data_level0.bin,
causing test_rebuild_from_sqlite_in_place_archives_when_opted_in and
test_rebuild_from_sqlite_raises_on_upsert_failure to fail in the
test-windows CI job. No test consumes the returned collection, so
closing the backend in a try/finally is safe and drops the return.
2026-05-07 07:37:25 -03:00
Igor Lins e Silva 8d8f54a807 Merge remote-tracking branch 'origin/develop' into fix/1308-rebuild-from-sqlite 2026-05-07 07:30:56 -03:00
Igor Lins e Silva 435f0ad348 Merge pull request #1391 from MemPalace/docs/auto-save-tools-on-develop
docs: add 30-day expiry callout + ship 4 auto-save tools
2026-05-06 20:18:44 -03:00
MillaJ 7c679ba625 fix(tools/render_jsonl): apply ruff format
Earlier commit fixed ruff lint but missed the formatter check.
This applies `ruff format` — adds standard PEP8 blank lines between
functions, splits one inline list. No behavior change.

Verified: both `ruff format --check` and `ruff check` pass cleanly.
Tool still renders correctly.
2026-05-06 16:12:34 -07:00
MillaJ 921ff5a6fa fix(tools/render_jsonl): split chained statements per ruff 0.4.x
Addresses CI lint feedback on PR #1391. No behavior change.
- Split `import json, sys` into separate lines (E401)
- Split chained `print(...); sys.exit(1)` into two lines (E702, two occurrences)
- Split inline `if ts: stamps.append(ts)` into two lines (E701)

Verified: `ruff check tools/render_jsonl.py` reports "All checks passed!"
Tool still renders correctly (3 turns from a real JSONL test, identical output to pre-fix).
2026-05-06 15:39:08 -07:00
MillaJ bddba59ae3 docs: add 30-day expiry callout + ship 4 auto-save tools
Adds a brief [!IMPORTANT] callout at the top of the README pointing
users to the urgent announcement at #1388. Claude Code auto-deletes
local JSONL transcripts after 30 days; users without the auto-save
hooks wired are losing transcript data off the rolling window.

Ships 4 small standalone tools at tools/:
- backup_claude_jsonls.sh — rsync ~/.claude/projects/ to a safe folder
- render_jsonl.py — convert JSONL transcripts to readable text
- find_orphan_claude_jsonls.sh — scan backup locations for orphan
  Claude Code transcripts (multi-line shape detection + topic preview)
- save.md — Claude Code slash command for manual /save into MemPalace

Tools verified by independent agent against v3.3.4 source.
Read-only on user data. POSIX bash + Python stdlib only.
2026-05-06 13:10:16 -07:00
Igor Lins e Silva f0d236019a Merge pull request #1377 from MemPalace/fix/get-collection-retry-on-exception
fix(mcp): retry _get_collection once on transient failure (#1286)
2026-05-06 05:18:04 -03:00
Igor Lins e Silva e334e257bf fix(mcp): retry _get_collection once on transient failure (#1286)
A transient chromadb exception inside `_get_collection` was swallowed by
the bare `except Exception: return None`, leaving every subsequent tool
call hitting the same poisoned cache silently. The fix wraps the body
in a `for attempt in range(2)` loop: on attempt 0 failure, log via
`logger.exception(...)` and clear `_client_cache` / `_collection_cache`
/ `_metadata_cache` so the next iteration forces `_get_client()` to
rebuild from scratch — that path now re-runs `quarantine_stale_hnsw`
(per #1322), so the second attempt heals the common stale-handle case
automatically. If both attempts fail, return `None` (matches the prior
contract for permanent failures).

Two new tests in `tests/test_mcp_server.py::TestCacheInvalidation`:
- `test_get_collection_retries_once_on_exception` — first attempt raises
  via a monkeypatched `_get_client`, second attempt succeeds; assert the
  caller gets the collection back, not None.
- `test_get_collection_returns_none_after_two_failures` — both attempts
  fail, assert we exhaust the loop and return None (no infinite retry).

Surgical extraction from PR #1286, which carried the same fix idea
(plus a fork-sync bundle that couldn't be merged); credit to the
original author below.

Co-authored-by: Jeffrey Hein <jp@jphein.com>
2026-05-06 04:52:18 -03:00
Brian potter d92c741084 fix(repair): address PR #1310 review feedback
Five small hardening fixes for the from-sqlite rebuild path, all from
mjc's review on #1310:

- repair.py: drawers collection name now resolves from
  MempalaceConfig().collection_name via _drawers_collection_name() (closets
  stays fixed by design — AAAK index references drawer IDs by string).
  Lines up with the broader configured-collection work in #1312 so that
  PR can rebase cleanly on top.
- repair.py: create_collection() moved inside the try block in
  _rebuild_one_collection so a Chroma "Collection already exists" failure
  surfaces as RebuildPartialError with archive_path, not an unstructured
  exception that strands the user without recovery instructions.
- repair.py: rebuild_from_sqlite wraps backend lifetime in try/finally
  with backend.close() so PersistentClient handles to dest_palace are
  released on every exit path. The from-sqlite path post-dates #1285's
  lifecycle hardening of the legacy rebuild, so this needed its own
  cleanup.
- cli.py: cmd_repair (from-sqlite mode) now exits non-zero when
  rebuild_from_sqlite returns {} (validation refusal sentinel), so
  unattended scripts/CI distinguish "invalid inputs" from a successful
  rebuild that legitimately found zero rows.
- tests/test_repair.py: test_extract_via_sqlite_returns_all_rows_with_metadata
  now asserts every backing segment is scope='METADATA', locking in the
  segment-layout assumption against future regressions that point the
  JOIN at the VECTOR segment.

New test coverage:
- test_rebuild_from_sqlite_honors_configured_drawer_collection_name
- test_cmd_repair_from_sqlite_validation_refusal_exits_nonzero
- test_cmd_repair_from_sqlite_success_does_not_exit

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:37:22 -03:00
Brian potter cb6bfd5231 chore: gitignore .envrc for direnv users
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:36:39 -03:00
Brian potter a7c4ed24d7 fix(repair): add --mode from-sqlite to recover palaces with corrupt HNSW (#1308)
Both `--mode legacy` and the inline `cli.cmd_repair` rebuild path
call `Collection.count()` as their first read — the same call that
raises `chromadb.errors.InternalError: Failed to apply logs to the
hnsw segment writer` on the corruption class reported in #1308.
Repair would print "Cannot recover — palace may need to be re-mined
from source files" even though the underlying SQLite tables were
fully intact.

The new `--mode from-sqlite` reads `(id, document, metadata)` rows
directly from `chroma.sqlite3` via `segments` → `embeddings` →
`embedding_metadata` joins, never opens a chromadb client against
the corrupt palace, and re-upserts everything into a fresh palace.

  - `--source PATH` extracts from a corrupt palace already moved aside
  - `--archive-existing` handles the in-place case by renaming the
    existing palace to `<palace>.pre-rebuild-<timestamp>` first
  - Partial-rebuild failures raise `RebuildPartialError` with the
    archive path so users can recover; CLI exits non-zero
  - In-place mode calls `SharedSystemClient.clear_system_cache()` to
    drop chromadb's process-wide System registry (cross-palace use
    does not, to limit blast radius for library callers)
  - Source validation runs before any destructive moves

Verified end-to-end recovering a 52,300-row real-world corrupt
palace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:36:39 -03:00
Igor Lins e Silva 6741b6908e Merge pull request #1138 from anthonyonazure/fix/bugbear-cleanup-and-endpoint-scheme
fix: reject non-http(s) LLM endpoints + clear ruff bugbear/silent-except findings
2026-05-06 04:28:12 -03:00
Anthony Clendenen ca5899e361 refactor: fix ruff bugbear and silent-except findings
- B904: chain OSError/collection errors with "raise ... from e" in
  normalize.py and searcher.py so the original traceback is preserved.
- B007: rename unused loop variables to _name in dedup, dialect, layers,
  and room_detector_local.
- S110/S112: replace bare "try/except/pass" and "try/except/continue"
  with logger.debug(..., exc_info=True) in mcp_server, searcher,
  palace, palace_graph, miner, convo_miner, and fact_checker so
  background failures are observable without changing behaviour.

A module-level logger ("mempalace_mcp", matching mcp_server/searcher)
is added to the five files that didn't already have one. Configured
ruff checks (E/F/W/C901) and ruff --select B, S110, S112 all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:12:09 -03:00
Anthony Clendenen b68485dfd4 fix(closet_llm): reject non-http(s) endpoints
LLMConfig accepted any URL scheme from LLM_ENDPOINT / --endpoint,
so a misconfigured endpoint such as file:///etc/passwd would be
passed straight to urllib.request.urlopen. Validate the scheme at
construction time and raise ValueError on anything other than
http/https, preserving the "privacy by architecture" guarantee.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:10:05 -03:00
Igor Lins e Silva 01880f674d Merge pull request #1019 from cantenesse/swe/session-1-bug-mempalace-search-crashes-with-attrib
fix(searcher): guard against None metadata/doc in search result loops
2026-05-06 03:35:07 -03:00
Igor Lins e Silva 8a9b2bed63 Merge pull request #988 from bobo-xxx/clawoss/fix/978-negative-similarity
fix: clamp similarity scores to [0,1] to prevent negative values
2026-05-06 03:34:56 -03:00
Igor Lins e Silva f3d9801c73 Merge pull request #1293 from hzx945627450-eng/fix/mcp-ensure-ascii
fix: MCP server JSON output ensure_ascii=False for non-ASCII support
2026-05-06 03:34:32 -03:00
Igor Lins e Silva 9b24cfc93b Merge pull request #987 from alpiua/fix-mcp-null-payload
fix(mcp): handle null JSON-RPC request payloads safely
2026-05-06 03:33:45 -03:00
Igor Lins e Silva f4617b3d83 Merge pull request #1029 from eldar702/fix/searcher-effective-distance-clamp
fix(searcher): clamp effective_distance to valid cosine range [0, 2]
2026-05-06 03:20:59 -03:00
bobo-xxx f2bed9284f fix(layers): clamp similarity to [0,1] to avoid negative values 2026-05-06 02:20:47 -03:00
bobo-xxx eef053d750 fix(mcp_server): clamp similarity to [0,1] to avoid negative values 2026-05-06 02:20:47 -03:00
Igor Lins e Silva 74288f1cdd style: ruff format mcp_server.py (CI lint) 2026-05-06 02:20:00 -03:00
黄祖鑫(940219) 7b49478ef7 fix: MCP server JSON output ensure_ascii=False for non-ASCII support
Without ensure_ascii=False, non-ASCII characters (e.g. Chinese) in tool
results and JSON-RPC responses are escaped as \uXXXX, which causes
downstream MCP clients to receive escaped text instead of the original
characters. This affects all platforms, not just Windows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 02:20:00 -03:00
Igor Lins e Silva 869ab38095 style: ruff format mcp_server.py + test_mcp_server.py (CI lint) 2026-05-06 02:19:57 -03:00
Oleksii Pylypchuk a85d432b54 feat: add validation for missing name parameter in tools/call requests 2026-05-06 02:19:57 -03:00
Oleksii Pylypchuk 55d79dc8cd fix: include null id in JSON-RPC invalid request error responses and add validation tests 2026-05-06 02:19:57 -03:00
Oleksii Pylypchuk 0fdb480e12 fix(mcp): handle null JSON-RPC request payloads safely
When the MCP client sends a malformed or null top-level request, prevent the AttributeError on request.get() by explicitly validating that the request is a dictionary. Returns standard JSON-RPC Error -32600 (Invalid Request) instead of crashing the server.
2026-05-06 02:19:57 -03:00
Igor Lins e Silva aac8437979 style: ruff format tests/test_searcher.py (CI lint) 2026-05-06 02:19:54 -03:00
eldar702 5347c2c71c fix(searcher): clamp effective_distance to valid cosine range [0, 2]
``search_memories`` computes ``effective_dist = dist - boost`` where
``boost`` can be as large as ``CLOSET_RANK_BOOSTS[0] == 0.40`` for a
rank-0 closet hit. When the raw drawer distance is small — any
near-exact match — the subtraction goes negative.

Two downstream effects:

1. Line 418 returns ``round(max(0.0, 1 - effective_dist), 3)`` as
   ``similarity``. With ``effective_dist = -0.30`` that yields
   ``similarity = 1.30``, outside the documented ``[0, 1]`` range.
   The ``max(0.0, ...)`` only prevents negative similarities; it does
   not cap above 1.
2. Line 427 stores ``_sort_key: effective_dist`` and line 435 sorts
   ``scored`` ascending by that key. A negative key drops *below* the
   rest, so the strongest hybrid matches end up sorting after weaker
   ones — ranking inversion under the exact conditions hybrid retrieval
   is supposed to serve best.

Clamp ``effective_dist`` to the valid cosine-distance range ``[0, 2]``.
The boost still wins (closet-backed hit still ranks first), it just no
longer flips the order.

Test added: mock drawer_col (base dist 0.08 / 0.35 for two sources) +
closet_col (rank-0 closet for the 0.08 source) → assert all hits have
``0 <= similarity <= 1`` and ``0 <= effective_distance <= 2``, and that
the closet-boosted source still ranks first.

Relationship to other PRs:

* **#988** clamps the output ``similarity`` alone. That does not fix
  the sort-key inversion or the invalid ``effective_distance`` in the
  returned dict. This PR clamps at the arithmetic source so both
  downstream users of the value stay in range.
* Orthogonal to **#979** (``tool_check_duplicate`` negative similarity).
2026-05-06 02:19:54 -03:00
Chris Antenesse 733e435332 fix(searcher): guard against None metadata/doc in search result loops
ChromaDB can return None entries in metadatas/documents lists under
partial-flush, mid-delete, upgrade-boundary, and interrupted-mine
states. Add `meta = meta or {}` and `doc = doc or ""` guards in the
three result loops (search display, closet hybrid, drawer scored) so
.get() and .strip() calls never crash on None.

Fixes #1007, #1011
2026-05-06 01:59:24 -03:00
Igor Lins e Silva 46d9eb5df0 Merge pull request #1375 from MemPalace/fix/lint-e402-test-hooks-cli
fix(lint): hoist hooks_cli_mod import to top of test_hooks_cli (E402)
2026-05-06 01:58:29 -03:00
Igor Lins e Silva f854da779f fix(lint): hoist hooks_cli_mod import to top of test_hooks_cli (E402)
The alias was placed below an explanatory comment block introduced by
#1305, which trips ruff E402 (module-level import not at top of file).
Moved next to the existing 'from mempalace.hooks_cli import (...)' line.

CI lint went red on develop after #1305 merged with the failing check;
this re-greens it so subsequent PRs do not inherit the failure.
2026-05-06 01:57:44 -03:00
Igor Lins e Silva 67cda9d455 Merge pull request #1030 from eldar702/fix/none-metadata-residual-guards
fix: guard None metadata/doc in tool_check_duplicate and Layer1/Layer2
2026-05-06 01:51:24 -03:00
Igor Lins e Silva 0c8314f919 Merge pull request #1060 from alonehobo/fix/stdio-utf8
fix(mcp): force UTF-8 on stdio to fix -32000 on non-ASCII payloads (Windows)
2026-05-06 01:49:56 -03:00
Igor Lins e Silva 642a073305 Merge pull request #1114 from Sathvik-1007/fix/list-drawers-pagination-total
fix: add total count to tool_list_drawers pagination response
2026-05-06 01:49:37 -03:00
Igor Lins e Silva d9ab5b7fd3 Merge pull request #1305 from lcatlett/upstream/respect-absent-palace-dir
fix(hooks): treat absent ~/.mempalace as auto-save off
2026-05-06 01:49:22 -03:00
Igor Lins e Silva ea6f2c0c4c Merge pull request #1162 from imtylervo/fix/palace-write-lock-queue-pattern
fix: serialize ChromaCollection writes through palace lock
2026-05-06 01:48:51 -03:00
Igor Lins e Silva d1e27b8c42 style: ruff format new test files (CI lint) 2026-05-06 01:47:46 -03:00
Igor Lins e Silva 5ae83d8ec3 Merge pull request #1370 from MemPalace/docs/changelog-v3.3.5-batch1
docs(changelog): batch entries for 7 v3.3.5 fixes
2026-05-06 01:40:21 -03:00
Igor Lins e Silva 2c0ef2c04e docs(changelog): document v3.3.5 fixes from #1214 #1105 #1215 #1107 #1282 #1167 #1160
Bundled CHANGELOG entries for the seven Tier-1 PRs merged today, including
the behavior-change call-out for #1167 (KG date validators now reject
non-ISO inputs that previously produced silent empty results).
2026-05-06 01:38:57 -03:00
Igor Lins e Silva 53675dd194 Merge pull request #1160 from mvalentsev/fix/mcp-kg-lazy-per-path-cache
fix(mcp): lazy per-path KnowledgeGraph cache (#1136)
2026-05-06 01:33:47 -03:00
Igor Lins e Silva 7ede231da9 Merge pull request #1167 from arnoldwender/fix/kg-date-validation
fix(kg): validate ISO-8601 date formats at MCP boundary
2026-05-06 01:33:27 -03:00
Igor Lins e Silva 3824ea610c Merge pull request #1282 from mvalentsev/fix/fact-checker-stdio-utf8
fix(cli, fact-checker): reconfigure stdio to UTF-8 on Windows
2026-05-06 01:33:15 -03:00
Igor Lins e Silva 778f830cd0 Merge pull request #1107 from sha2fiddy/fix/1073-closet-llm-paginate
fix: paginate closet_llm col.get (#1073)
2026-05-06 01:33:04 -03:00
Igor Lins e Silva e18981a527 Merge pull request #1215 from arnoldwender/fix/entity-registry-atomic-write
fix(entity_registry): atomic write to prevent partial corruption on crash
2026-05-06 01:32:46 -03:00
Igor Lins e Silva ef0e45ad92 Merge pull request #1105 from mvalentsev/fix/chroma-backend-close-releases-lock
fix(backends/chroma): release SQLite file lock on close_palace/close (#1067)
2026-05-06 01:32:30 -03:00
Igor Lins e Silva 0cfb4b3ef1 Merge pull request #1214 from arnoldwender/fix/kg-temporal-inversion-guard
fix(kg): reject inverted intervals in add_triple (valid_to < valid_from)
2026-05-06 01:32:16 -03:00
Arnold Wender 2e441d17a2 fix(entity_registry): fsync parent dir after rename for ext4 durability
Without this, on ext4 (and similar) filesystems the rename ack does not
guarantee durability across power loss — a crash can revert to a state
where the temp file is present and the target is at the old version.

Suggested by @jphein on #1215.
2026-05-04 11:08:14 +02:00
Arnold Wender 4f36145c2e fix(entity_registry): atomic write to prevent partial corruption on crash
EntityRegistry.save() called Path.write_text() directly, which truncates
the target file and then writes — so a crash mid-write (power loss, OOM,
filesystem-full mid-flush) leaves an empty or half-written
entity_registry.json. The whole people/projects map is lost; the system
falls back to an empty registry on next load.

Switch to the standard atomic-write pattern: serialize to a sibling
.tmp file in the same directory (so os.replace stays on one filesystem),
fsync, chmod 0o600, then os.replace over the target. The replace is
atomic on POSIX and Windows, so any crash leaves the previous registry
intact instead of a truncated file.

Tests cover: no leftover .tmp on success, and previous content preserved
when os.replace itself raises mid-save.
2026-05-04 11:08:14 +02:00
mvalentsev 285b3b4f2e refactor(stdio): extract Windows UTF-8 reconfigure into shared helper
Both cli.py and fact_checker.py carried identical 28-line Windows stdio
reconfigure helpers; pull the loop into mempalace/_stdio.py so the same
machine drives the CLI, the fact_checker --stdin entry point, and the
MCP server. The thin per-call-site wrappers stay so existing tests keep
importing _reconfigure_stdio_utf8_on_windows from the same module they
always have.

CLI / fact_checker policy unchanged: stdin=surrogateescape (don't crash
on a malformed redirected file), stdout/stderr=replace (don't crash
mid-print on a surrogate half round-tripped from a filename).
2026-05-03 22:25:31 +05:00