mempalace

Author	SHA1	Message	Date
Igor Lins e Silva	ca2598a9f6	fix(normalize): make strip_noise verbatim-safe and scope it to Claude Code JSONL The initial strip_noise() regressed on three fronts when audited against adversarial user content — each verified with executable repros against the cherry-picked code: 1. `<tag>.?</tag>` with re.DOTALL span-ate across messages: one stray unclosed <system-reminder> anywhere in a session merged with the next closing tag, silently deleting everything between them (including full assistant replies). 2. `.\(ctrl\+o to expand\).\n?` nuked entire lines of user prose whenever a user happened to document the TUI shortcut. 3. `Ran \d+ (?:stop\|pre\|post)\shook.` with IGNORECASE ate the second sentence from "our CI has a stop hook ... Ran 2 stop hooks last week" — legitimate user commentary. These are unambiguous violations of the project's "Verbatim always" design principle. Fixes: - All tag patterns are now line-anchored (`(?m)^(?:> )?<tag>`) and their body forbids crossing a blank line (`(?:(?!\n\s\n)[\s\S])*?`), so a dangling open tag cannot eat neighboring messages. - `_NOISE_LINE_PREFIXES` are line-anchored and case-sensitive — user prose mentioning "CURRENT TIME:" mid-sentence is preserved. - Hook-run chrome requires `(?m)^`, explicit hook names (Stop, PreCompact, PreToolUse, etc.), and no IGNORECASE. - "… +N lines" is line-anchored. - "(ctrl+o to expand)" only matches Claude Code's actual collapsed- output chrome shape `[N tokens] (ctrl+o to expand)`; a bare parenthetical in user prose stays intact. Scope: - `strip_noise()` is no longer called on every normalization path. Only `_try_claude_code_jsonl` invokes it, per-extracted-message — so Claude.ai exports, ChatGPT exports, Slack JSON, Codex JSONL, and plain text with `>` markers pass through fully verbatim. Per-message application also makes span-eating structurally impossible. Tests: - 15 new tests in test_normalize.py pin the boundary: 6 guard user content that must survive (each of the adversarial repros), 9 assert real system chrome is still stripped. All pass; full suite 702 pass (2 failures are the unrelated pre-existing version.py bug, cleared by #820). Known limitation (not fixed here): convo_miner.py does not delete drawers on re-mine, so transcripts mined before this PR keep noise- filled drawers until the user manually erases + re-mines. Proper fix needs a schema-version field on drawer metadata + re-mine trigger — out of scope for this PR.	2026-04-13 16:11:03 -03:00
Mikhail Valentsev	a2432a3245	fix: parse Claude.ai privacy export with messages key and sender field (#677 ) (#685 ) * fix: parse Claude.ai privacy export with messages key and sender field (#677) The privacy-export branch in _try_claude_ai_json only checked for the "chat_messages" key, missing exports that use "messages" instead. It also only read the "role" field while real privacy exports use "sender". Both gaps caused the file to fall through to plain-text, producing a single giant drawer. Changes: - Accept "messages" alongside "chat_messages" in the conversation-object guard and inner extraction. - Accept "sender" alongside "role" as the author field. - Fall back to a top-level "text" key when content blocks are empty. - Produce one transcript per conversation instead of concatenating all conversations into a single blob. - Extract shared logic into _collect_claude_messages helper. - Add 6 regression tests covering each variant. * style: apply ruff format to normalize.py * fix: guard against null text field in Claude.ai export parsing item.get("text", "").strip() crashes when "text" is explicitly null in the JSON (legal and observed in some exports). Use (item.get("text") or "").strip() and add a regression test. --------- Co-authored-by: Igor Lins e Silva <4753812+igorls@users.noreply.github.com>	2026-04-13 02:11:03 -03:00
Ben Sigman	4621f85d7c	style: ruff format all Python files (#675 )	2026-04-11 22:59:34 -07:00
Ben Sigman	20c8f8e57b	feat: new MCP tools — get/list/update drawer, hook settings, export (resolves #635 ) (#667 ) * feat: MCP reliability — inode detection, WAL rotation, metadata cache, search limits Infrastructure hardening for the MCP server: - Detect palace DB replacement via inode tracking (repair command support) - WAL rotation to prevent unbounded WAL growth - _fetch_all_metadata() + _get_cached_metadata() with 60s TTL for taxonomy/status - _MAX_RESULTS cap (100) with limit clamping [1, _MAX_RESULTS] - max_distance parameter for similarity threshold in search - Handle all notifications/* methods, null arguments, method=None - Remove duplicate _client_cache = None declarations - searcher.py max_distance parameter passthrough Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: new MCP tools (get/list/update drawer, hook settings, memories filed), export, normalize New MCP tools: - mempalace_get_drawer: fetch single drawer by ID with full content - mempalace_list_drawers: paginated listing with wing/room filter - mempalace_update_drawer: update content/wing/room on existing drawers - mempalace_hook_settings: get/set hook behavior (silent_save, desktop_toast) - mempalace_memories_filed_away: check latest checkpoint status Also includes: - exporter.py: export palace as browsable markdown files - normalize.py: tool_use/tool_result capture for richer transcript mining - layers.py: updated for new tool integration - config.py: hook settings properties (hook_silent_save, hook_desktop_toast) Depends on PR 3 (reliability) for _MAX_RESULTS, _metadata_cache, WAL logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: normalize.py handles string messages and Read offset type mismatch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: params null guard, L2→cosine docs, empty tool_use_map key guard - Handle explicit null in MCP params (request.get("params") or {}) - Fix search tool description: L2 → cosine distance (collection uses hnsw:space=cosine) - Guard against empty string key in tool_use_map from malformed JSONL entries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rename ambiguous var 'l' to 'line' (E741 lint) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address code review findings (5 issues) 1. min_similarity backwards-compat: convert similarity to distance scale (1.0 - similarity) instead of passing raw value as max_distance 2. Restore structured error reporting (error + partial fields) in tool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy — reverts silent except:pass that dropped #647 security hardening 3. inode cache: remove falsy-zero short-circuit so missing DB file triggers reconnect instead of reusing stale client 4. _fetch_all_metadata: check for empty batch before extending/advancing offset to prevent infinite loop on concurrent deletion 5. KG initialization: only override path when --palace is explicit; default runs use KnowledgeGraph's built-in default path Co-authored-by: jphein <jphein@users.noreply.github.com> --------- Co-authored-by: jp <jp@jphein.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: jphein <jphein@users.noreply.github.com>	2026-04-11 21:25:04 -07:00
bensig	b1adc047e6	fix: address Octocode review — move size check, add tests for all 3 fixes - Move file size check before try block so IOError propagates cleanly (not caught by the except OSError handler below it) - Wrap os.path.getsize in its own try/except to preserve existing test_normalize_io_error behavior on missing files - Add test_normalize_rejects_large_file (mocked getsize) - Add test_null_arguments_does_not_hang (#394) - Add test_cmd_repair_trailing_slash_does_not_recurse (#395) 532 tests pass locally, 0 regressions.	2026-04-09 10:40:53 -07:00
Tal Muskal	9ca70264f3	style: format test files with ruff Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 21:08:49 +03:00
Tal Muskal	e24d8ca733	test: expand coverage to 70%, fix mcp_server CI crash (threshold 60%) Add/expand tests for normalize (39%→97%), searcher (39%→100%), layers (28%→97%), split_mega_files (34%→72%). Fix mcp_server.py parse_args→parse_known_args to prevent SystemExit when imported during pytest (CI was crashing on all test jobs). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 21:07:03 +03:00
bensig	0f8fa8c7d5	bench: add benchmark runners, results docs, and test suite Benchmarks: LongMemEval, LoCoMo, ConvoMem, MemBench runners with methodology docs and hybrid retrieval analysis. Tests: config, miner, convo_miner, normalize — 9 tests, all passing.	2026-04-04 18:33:42 -07:00

8 Commits