Harden sweeper for production: verbatim tool blocks, full session_id, logged failures

Four changes on top of the proposal's initial sweeper draft, driven by
the CLAUDE.md design principles:

1. Drop the 500-char truncation on tool_use / tool_result content in
   _flatten_content. The "verbatim always" principle forbids lossy
   compression of user-adjacent data; a long code-edit diff handed to
   the assistant must round-trip intact. Unknown block types now also
   serialize their full payload instead of just a type marker. New test
   test_parse_preserves_tool_blocks_verbatim covers a 5000-char input.

2. Use the full session_id in drawer IDs (not session_id[:12]). Rules
   out cross-session collisions if a transcript source ever uses
   non-UUID session identifiers or shared prefixes.

3. Replace silent `except Exception: return None` in get_palace_cursor
   with a logger.warning — the exact anti-pattern this PR otherwise
   criticizes in miner.py. The fallback behavior is still safe
   (deterministic IDs make a missed cursor recover on the next run),
   but the failure is now discoverable.

4. sweep_directory now collects per-file failures into the result dict
   and the CLI exits non-zero when any file failed, so a partial-sweep
   outcome is visible rather than swallowed.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

This commit is contained in:

Igor Lins e Silva

2026-04-18 12:58:33 -03:00

parent fed69935d3

commit 29ce7c7135

4 changed files with 174 additions and 70 deletions

									
										tests/test_miner_jsonl_visibility.py
									
		+2
		
												View File
												
				@@ -108,9 +108,11 @@ class TestJsonlNotSilentlySkipped:

				            def fake_stat(self, *args, **kwargs):

				                result = real_stat(self, *args, **kwargs)

				                if self.name == "big_transcript.jsonl":

				                    class _FakeStat:

				                        st_size = fake_size

				                        st_mode = result.st_mode

				                    return _FakeStat()

				                return result