fix(repair): decode BLOB embeddings.seq_id in max-seq-id heuristic (#1254)
`_compute_heuristic_seq_id` ran `int(row[0])` directly on the result of `MAX(e.seq_id)`. On palaces where chromadb 1.5.x has been writing seq_ids natively (8-byte big-endian uint64 BLOB), that raises `ValueError: invalid literal for int() with base 10: b'...'` before the dry-run can print, leaving users with no path through the recovery feature added in #1135 — the only documented un-poison route for palaces hit by the original PR #664 shim bug. Decode BLOB return values via `int.from_bytes(val, "big")` and keep the existing `int(val)` path for INTEGER rows. Regression test seeds a BLOB row in `embeddings.seq_id` and asserts the heuristic surfaces the correct integer.
This commit is contained in:
+9
-1
@@ -559,6 +559,11 @@ def _compute_heuristic_seq_id(cur: sqlite3.Cursor, segment_id: str) -> int:
|
||||
already-indexed embeddings on next subscribe. That is an acceptable
|
||||
loss vs. resetting to 0 (which would re-process the entire queue and
|
||||
risk HNSW bloat from issue #1046).
|
||||
|
||||
``embeddings.seq_id`` rows can be BLOB-typed on palaces where
|
||||
chromadb 1.5.x has been writing seq_ids natively (8-byte big-endian
|
||||
uint64). When SQLite's ``MAX`` returns such a row, decode it back to
|
||||
an integer rather than crashing on ``int(bytes)``.
|
||||
"""
|
||||
row = cur.execute(
|
||||
"""
|
||||
@@ -573,7 +578,10 @@ def _compute_heuristic_seq_id(cur: sqlite3.Cursor, segment_id: str) -> int:
|
||||
).fetchone()
|
||||
if row is None or row[0] is None:
|
||||
return 0
|
||||
return int(row[0])
|
||||
val = row[0]
|
||||
if isinstance(val, (bytes, bytearray)):
|
||||
return int.from_bytes(val, "big")
|
||||
return int(val)
|
||||
|
||||
|
||||
def _read_sidecar_seq_ids(sidecar_path: str) -> dict[str, int]:
|
||||
|
||||
Reference in New Issue
Block a user