test(repair): page-align corruption offset in preflight regression test

Address Copilot review on #1403: the test seeked unconditionally to
offset 40960 with only `pre_size > 16384` as a guard. If pre_size sat
between 16384 and 40960 + 16384 = 57344 (e.g., on a chromadb version
that allocated fewer pages on init, or a future schema change), the
seek would extend the file with zero-padding and the original pages
would stay intact — quick_check would still pass on the (untouched)
real data, and the regression guard would silently skip detecting a
preflight-ordering regression.

Compute the offset from pre_size, page-aligned, with explicit asserts
that the file is large enough to mangle 4 pages without truncating
the header or extending past EOF.
This commit is contained in:
Igor Lins e Silva
2026-05-07 12:07:54 -03:00
parent 5134a635ed
commit 7b151039c9
+23 -3
View File
@@ -1186,11 +1186,31 @@ def test_rebuild_index_runs_sqlite_preflight_before_chromadb_open(tmp_path, caps
sqlite_path = palace / "chroma.sqlite3" sqlite_path = palace / "chroma.sqlite3"
pre_size = sqlite_path.stat().st_size pre_size = sqlite_path.stat().st_size
assert pre_size > 16384, "need a multi-page sqlite db to mangle"
# Compute a page-aligned corruption offset that's always inside the
# existing file. SQLite uses 4 KB pages by default; we mangle 4 pages
# somewhere in the middle, skipping at least the first 2 pages
# (header + root) so the file still opens. Without clamping to the
# actual file size, a seek past EOF on r+b mode would silently
# extend the file with zero-padding and leave the original pages
# intact — quick_check would still pass, and the regression guard
# would skip the bug.
PAGE = 4096
CORRUPT_BYTES = 16384 # 4 pages
HEADER_GUARD = PAGE * 2 # leave header + root pages intact
assert (
pre_size >= HEADER_GUARD + CORRUPT_BYTES
), f"sqlite db too small to mangle without truncating: {pre_size} bytes"
# Round (pre_size - CORRUPT_BYTES) down to a page boundary so we
# mangle whole pages. Cap at offset 40960 (page 10) for stable
# diagnostics across SQLite versions that may grow the file.
max_offset = (pre_size - CORRUPT_BYTES) & ~(PAGE - 1)
corrupt_offset = min(40960, max_offset)
assert corrupt_offset >= HEADER_GUARD, f"corruption offset {corrupt_offset} too close to header"
with open(sqlite_path, "r+b") as f: with open(sqlite_path, "r+b") as f:
f.seek(40960) # page 10 f.seek(corrupt_offset)
f.write(b"\xde\xad\xbe\xef" * 4096) # 16 KB of garbage f.write(b"\xde\xad\xbe\xef" * (CORRUPT_BYTES // 4))
# No chromadb mocks: rebuild_index must reach sqlite_integrity_errors # No chromadb mocks: rebuild_index must reach sqlite_integrity_errors
# before any code path that opens a chromadb client. If the preflight # before any code path that opens a chromadb client. If the preflight