Four changes on top of the proposal's initial sweeper draft, driven by
the CLAUDE.md design principles:
1. Drop the 500-char truncation on tool_use / tool_result content in
_flatten_content. The "verbatim always" principle forbids lossy
compression of user-adjacent data; a long code-edit diff handed to
the assistant must round-trip intact. Unknown block types now also
serialize their full payload instead of just a type marker. New test
test_parse_preserves_tool_blocks_verbatim covers a 5000-char input.
2. Use the full session_id in drawer IDs (not session_id[:12]). Rules
out cross-session collisions if a transcript source ever uses
non-UUID session identifiers or shared prefixes.
3. Replace silent `except Exception: return None` in get_palace_cursor
with a logger.warning — the exact anti-pattern this PR otherwise
criticizes in miner.py. The fallback behavior is still safe
(deterministic IDs make a missed cursor recover on the next run),
but the failure is now discoverable.
4. sweep_directory now collects per-file failures into the result dict
and the CLI exits non-zero when any file failed, so a partial-sweep
outcome is visible rather than swallowed.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
Long Claude Code sessions routinely produce transcripts larger than 10
MB. The previous cap at miner.py:65 silently dropped them at line 732
with `if filepath.stat().st_size > MAX_FILE_SIZE: continue` — same
silent-failure pattern as the .jsonl extension bug.
The cap exists as a safety rail against pathological binaries, not as
a limit on legitimate text. Downstream chunking at 800 chars per drawer
means source file size does not affect storage or embedding cost.
500 MB leaves headroom for year-long continuous transcripts while still
catching accidental multi-GB binary mines.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mempalace/miner.py:READABLE_EXTENSIONS contained `.json` but not
`.jsonl`. Every jsonl file encountered in a mined directory was
silently skipped at miner.py:722:
if filepath.suffix.lower() not in READABLE_EXTENSIONS:
continue
Claude Code transcripts, ChatGPT exports, and every other tool writing
line-delimited JSON ship as `.jsonl`. Users running `mempalace mine`
against a directory of transcripts saw the command complete with no
error and no log line — and their conversations never reached the
palace. Silent data loss.
Adding `.jsonl` to the whitelist alongside `.json`. jsonl is text
line-by-line; the existing chunking pipeline handles it the same way
it handles any other text file.
Tests: tests/test_miner_jsonl_visibility.py
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>