fed69935d3
The primary miners (miner.py, convo_miner.py) operate at file
granularity and can drop data for several reasons: size caps, silent
OSError on read, dedup false positives, extensions the project miner
does not recognize. Even with tonight's hotfixes, any future bug in
the file-level path risks silent data loss.
The sweeper is a second, cooperating miner that works at MESSAGE
granularity:
- Parses Claude Code .jsonl line by line, yielding only
user/assistant records (filters progress, file-history-snapshot,
etc. noise).
- For each session_id, queries the palace for max(timestamp) and
treats that as the cursor.
- Ingests only messages newer than the cursor, as one small drawer
per exchange (never hits a size cap — each drawer is 1-5 KB).
- Deterministic drawer IDs from session_id + message UUID make
reruns idempotent; crash mid-sweep is safe.
Tandem coordination is free: if the primary miner committed up to
timestamp T, the sweeper resumes from T. If the primary miner missed
everything, the sweeper catches it all. Neither duplicates the other.
Smoke test on a real Claude Code transcript:
1st run: +39 drawers, 0 already present
2nd run: +0 drawers, 39 already present (perfect idempotence)
Opt-in via:
mempalace sweep <file.jsonl>
mempalace sweep <transcript-dir>
No changes to existing miners. No schema migration. Purely additive.
Tests: tests/test_sweeper.py (7 tests covering parsing, tandem
coordination, idempotency, resume-from-cursor, metadata correctness).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>