fix(repair): decode BLOB embeddings.seq_id in max-seq-id heuristic (#1254)

`_compute_heuristic_seq_id` ran `int(row[0])` directly on the result
of `MAX(e.seq_id)`. On palaces where chromadb 1.5.x has been writing
seq_ids natively (8-byte big-endian uint64 BLOB), that raises
`ValueError: invalid literal for int() with base 10: b'...'` before
the dry-run can print, leaving users with no path through the
recovery feature added in #1135 — the only documented un-poison
route for palaces hit by the original PR #664 shim bug.

Decode BLOB return values via `int.from_bytes(val, "big")` and
keep the existing `int(val)` path for INTEGER rows. Regression
test seeds a BLOB row in `embeddings.seq_id` and asserts the
heuristic surfaces the correct integer.
This commit is contained in:
Igor Lins e Silva
2026-04-30 22:04:41 -03:00
parent fdfaf017ab
commit 3b5ebcc9fc
2 changed files with 35 additions and 1 deletions
+9 -1
View File
@@ -559,6 +559,11 @@ def _compute_heuristic_seq_id(cur: sqlite3.Cursor, segment_id: str) -> int:
already-indexed embeddings on next subscribe. That is an acceptable
loss vs. resetting to 0 (which would re-process the entire queue and
risk HNSW bloat from issue #1046).
``embeddings.seq_id`` rows can be BLOB-typed on palaces where
chromadb 1.5.x has been writing seq_ids natively (8-byte big-endian
uint64). When SQLite's ``MAX`` returns such a row, decode it back to
an integer rather than crashing on ``int(bytes)``.
"""
row = cur.execute(
"""
@@ -573,7 +578,10 @@ def _compute_heuristic_seq_id(cur: sqlite3.Cursor, segment_id: str) -> int:
).fetchone()
if row is None or row[0] is None:
return 0
return int(row[0])
val = row[0]
if isinstance(val, (bytes, bytearray)):
return int.from_bytes(val, "big")
return int(val)
def _read_sidecar_seq_ids(sidecar_path: str) -> dict[str, int]: