fix(repair): run SQLite integrity preflight before chromadb open
#1364 added the SQLite quick_check preflight to rebuild_index, but placed it AFTER backend.get_collection(...). On a SQLite-corrupt palace, chromadb's rust binding raises pyo3_runtime.PanicException — which is not a regular Exception subclass — so it propagates past the existing `except Exception` handlers and the user sees a 30-line stack trace instead of the friendly abort message #1364 was designed to deliver. Reproduced with `mempalace repair --yes` against a palace whose chroma.sqlite3 has 4 mangled pages: pre-fix, panic; post-fix, the clean abort message and exit code 1. Two changes: - mempalace/cli.py cmd_repair: run sqlite_integrity_errors() right after the basic palace-existence check, BEFORE the max_seq_id preflight (which itself opens sqlite3) and BEFORE backend = ChromaBackend(). Exit non-zero so unattended scripts and CI gates see the failure. - mempalace/repair.py rebuild_index: same move at the function level for direct callers (tests, MCP) that bypass cmd_repair. The new test test_rebuild_index_runs_sqlite_preflight_before_chromadb_open uses a real chromadb-built palace (no ChromaBackend mock) plus a real corrupt SQLite (16 KB of mangled pages) so the ordering is exercised end-to-end. The previously-shipping test for the abort path mocked both the backend and sqlite_integrity_errors, which is why the ordering bug shipped CI-green. Six existing test_cli.py cmd_repair tests used `(palace_dir / "chroma.sqlite3").write_text("db")` to fake the SQLite file. The new preflight correctly fails quick_check on those 2-byte stubs, so the tests now create empty real SQLite DBs the same way the test_repair.py fixtures already do.
This commit is contained in:
+11
-5
@@ -633,6 +633,17 @@ def rebuild_index(
|
||||
print(f"{'=' * 55}\n")
|
||||
print(f" Palace: {palace_path}")
|
||||
|
||||
# Run the SQLite integrity preflight before any chromadb client open.
|
||||
# ChromaDB's rust binding raises pyo3_runtime.PanicException (which is
|
||||
# not a regular Exception subclass) on a malformed page, propagating
|
||||
# past the try/except around get_collection below. Catching the
|
||||
# corruption here lets us surface the clear recovery instructions and
|
||||
# exit cleanly before chromadb's compactor touches the disk.
|
||||
sqlite_errors = sqlite_integrity_errors(palace_path)
|
||||
if sqlite_errors:
|
||||
print_sqlite_integrity_abort(palace_path, sqlite_errors)
|
||||
return
|
||||
|
||||
preflight = maybe_repair_poisoned_max_seq_id_before_rebuild(
|
||||
palace_path,
|
||||
assume_yes=True,
|
||||
@@ -676,11 +687,6 @@ def rebuild_index(
|
||||
print(e.message)
|
||||
return
|
||||
|
||||
sqlite_errors = sqlite_integrity_errors(palace_path)
|
||||
if sqlite_errors:
|
||||
print_sqlite_integrity_abort(palace_path, sqlite_errors)
|
||||
return
|
||||
|
||||
# Back up ONLY the SQLite database, not the bloated HNSW files
|
||||
sqlite_path = os.path.join(palace_path, "chroma.sqlite3")
|
||||
backup_path = sqlite_path + ".backup"
|
||||
|
||||
Reference in New Issue
Block a user