- Replace 'while offset < count/total' with 'while True' + break on short batch
- Fixes tool_list_rooms iterating over unfiltered col.count() when wing filter active
- Fixes all 4 paginated functions (tool_status, tool_list_wings, tool_list_rooms,
tool_get_taxonomy) missing early-exit when batch smaller than batch_size
- Remove unused 'total' variable in tool_list_wings, tool_list_rooms, tool_get_taxonomy
(replaced col.count() with accessibility check only)
Per bensig review comments on PR #371
Delete existing drawers for a file before re-inserting fresh chunks.
Converts re-mines from upsert (hnswlib updatePoint path, thread-unsafe
on macOS ARM + chromadb 0.6.3) into delete+insert (safe addPoint path).
Credit: @StefanKremen (#523)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reads documents and metadata directly from ChromaDB's SQLite (bypassing
the API that fails on version-mismatched databases), then reimports into
a fresh palace using the currently installed ChromaDB.
Fixes the 3.0.0 → 3.1.0 upgrade path where chromadb was downgraded from
1.5.x to 0.6.x, breaking the on-disk storage format.
- Detects chromadb version from SQLite schema (0.6.x vs 1.x)
- Extracts all drawers with full metadata via raw SQL
- Builds fresh palace in temp dir, swaps atomically
- Backs up original palace before any changes
- Supports --dry-run to preview without modifying
Fixes#457
Addresses community feedback:
- Add --wing flag to scope dedup to a single wing (catches cross-wing
duplicates when same source mined into multiple wings)
- Document that threshold is cosine distance (not similarity) with
guidance on values: 0.15 for near-identical, 0.3-0.4 for paraphrased
- Confirmed shutil import is present in repair.py (line 32)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: convo_miner.py used collection.add() instead of upsert(),
so repeated mine runs pushed duplicate entries into the HNSW graph.
At scale (50K+ drawers) this causes link_lists.bin to grow to terabytes
and eventually segfault.
Changes:
- convo_miner.py: add() → upsert() (the one-line root cause fix)
- repair.py: new module — scan for corrupt IDs, prune them, or rebuild
the HNSW index from scratch. Backs up only chroma.sqlite3 (not the
bloated HNSW files). Recreates collection with hnsw:space=cosine.
- dedup.py: new module — detect and remove near-duplicate drawers from
the same source file using cosine similarity. No API calls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reads documents and metadata directly from ChromaDB's SQLite (bypassing
the API that fails on version-mismatched databases), then reimports into
a fresh palace using the currently installed ChromaDB.
Fixes the 3.0.0 → 3.1.0 upgrade path where chromadb was downgraded from
1.5.x to 0.6.x, breaking the on-disk storage format.
- Detects chromadb version from SQLite schema (0.6.x vs 1.x)
- Extracts all drawers with full metadata via raw SQL
- Builds fresh palace in temp dir, swaps atomically
- Backs up original palace before any changes
- Supports --dry-run to preview without modifying
Fixes#457
bump-plugin-version.yml has been failing on every merge to main since
today's security + plugin-packaging work, because it tries to push
directly to main and branch protection blocks it. It also conflicts
with the manual version-management pattern we're currently using
(manual bumps in PRs like #409 for 3.1.0).
Renaming to .yml.disabled so GitHub Actions skips it. If we want
auto-bumps later, the workflow needs to open a PR instead of pushing
directly, and coordinate with manual version bumps.
Co-authored-by: milla-jovovich <noreply@github.com>
Three critical bugfixes:
1. MCP server hangs on null arguments (#394) — `params.get("arguments", {})`
returns None when JSON has `"arguments": null`. Changed to `or {}`.
2. cmd_repair infinite recursion (#395) — trailing slash on palace_path
caused backup_path to be inside the source dir. Strip trailing sep.
3. OOM on large transcript files (#396) — split_mega_files.py and
normalize.py load entire files into memory. Added 500MB safety limit
with clear skip/error messages.
Closes#394, #395, #396.
Review fixes (from Sage's review):
- Restore mtime check in file_already_mined (check_mtime=True for miner)
- Restore limit=10000 on MCP metadata fetches to prevent OOM on large palaces
- Apply _SAFE_NAME_RE regex in sanitize_name (was dead code)
- Drop raw_aaak metadata duplication in diary_write
- chmod 0o700 on WAL dir, 0o600 on WAL file
- Add check_same_thread=False on KnowledgeGraph SQLite connection
- Remove __del__ (unreliable) and dead PRAGMA foreign_keys=ON