Commit Graph

4 Commits

Author SHA1 Message Date
Igor Lins e Silva 7e5eeda9a5 feat(normalize): auto-rebuild stale drawers via NORMALIZE_VERSION schema gate
Without this, the strip_noise improvement only helps new mines. Every
user who had already mined Claude Code JSONL sessions would keep their
noise-polluted drawers forever, because convo_miner's file_already_mined
skip short-circuits before re-processing.

Adds a versioned schema gate so upgrades propagate silently:

- palace.NORMALIZE_VERSION=2 — bumped when the normalization pipeline
  changes shape (this PR's strip_noise is the v1→v2 bump).
- file_already_mined now returns False if the stored normalize_version
  is missing or less than current, triggering a rebuild on next mine.
- Both miners stamp drawers with the current normalize_version.
- convo_miner now purges stale drawers before inserting fresh chunks
  (mirrors miner.py's existing delete+insert), extracted into
  _file_convo_chunks helper to keep mine_convos under ruff's C901 limit.

User experience: upgrade mempalace, run `mempalace mine` as usual, old
noisy drawers get silently replaced with clean ones. No erase needed,
no "you need to rebuild" changelog footgun.

Tests:
- test_file_already_mined_returns_false_for_stale_normalize_version —
  pins the version gate contract for missing/v1/current.
- test_add_drawer_stamps_normalize_version — fresh project-miner drawers
  carry the field.
- test_mine_convos_rebuilds_stale_drawers_after_schema_bump — end-to-end
  proof that a pre-v2 palace gets silently cleaned on next mine, with
  orphan drawers purged and NOT skipped.

Existing test_file_already_mined_check_mtime updated to include the
new field; all other tests unaffected.
2026-04-13 16:20:55 -03:00
Mikhail Valentsev 87e8bafad8 fix: prevent convo_miner from re-processing 0-chunk files on every run (#654) (#732)
* fix: register 0-chunk files to prevent re-processing on every mine (#654)

mine_convos() has three early-exit paths (OSError, content too short,
zero chunks) that skip writing anything to ChromaDB. Since
file_already_mined() checks for the presence of a document with a
matching source_file, these files are re-read and re-processed on
every subsequent run.

Add _register_file() that upserts a lightweight sentinel document
(room="_registry", ingest_mode="registry") so file_already_mined()
returns True on future runs.

Note: Bug 2 from the issue (drawers_added counter always 0) was
already resolved upstream via the switch from collection.add() to
collection.upsert().

* fix: resolve macOS path symlink in test + remove unused variable
2026-04-12 14:25:34 -07:00
Tal Muskal abd52534bb test: bring coverage to 85%, set threshold to 85, reset version to 3.0.11
- Add tests for config, convo_miner, spellcheck, knowledge_graph
- Fix Windows PermissionError in test cleanup (chromadb file locks)
- Add UTF-8 encoding to split_mega_files, entity_registry, hooks_cli
- Fix mcp_server parse_known_args logging for unknown args
- Set coverage threshold to 85 in pyproject.toml and CI
- Reset all version files to 3.0.11

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 21:38:12 +03:00
bensig 0f8fa8c7d5 bench: add benchmark runners, results docs, and test suite
Benchmarks: LongMemEval, LoCoMo, ConvoMem, MemBench runners with
methodology docs and hybrid retrieval analysis.

Tests: config, miner, convo_miner, normalize — 9 tests, all passing.
2026-04-04 18:33:42 -07:00