`col.query(...)` can return `None` entries in the inner ``metadatas`` list
for drawers whose metadata was never set (older palaces, rows written
outside the normal mining path). The CLI `search()` function would render
earlier results successfully and then crash mid-loop with:
AttributeError: 'NoneType' object has no attribute 'get'
at ``searcher.py:286`` — ``meta.get("source_file", "?")``. The user sees
partial output followed by a traceback, with no indication of which
drawers rendered OK and which were skipped.
Guard with ``meta = meta or {}`` inside the loop so entries with missing
metadata fall back to the existing ``"?"`` defaults instead of crashing,
matching the hit dict assembly in ``search_memories()`` which already
uses ``meta.get("wing", "unknown")`` etc. against the same data.
Adds a regression test that mocks a ChromaDB result with a ``None``
metadata entry in the middle of the inner list and asserts both result
blocks render to stdout.
Six items from the automated review on PR #998:
1. **Cursor tie-break bug (correctness).** The skip condition was
`rec.timestamp <= cursor`; if multiple messages share the max
timestamp and only some were ingested before a crash, the rest
would be lost forever. Changed to `< cursor`, relying on
deterministic drawer IDs for safe re-attempt at the boundary.
Regression test
`test_sweep_recovers_untaken_message_at_cursor_timestamp`.
2. **`drawers_added` counted upserts, not adds.** Added a pre-flight
`collection.get(ids=batch)` to distinguish new rows from already-
present ones. Return value now carries `drawers_added`,
`drawers_already_present`, `drawers_upserted`, and `drawers_skipped`
separately. Dict-compatible access (`existing.get("ids")`) keeps it
working on both the raw Chroma return and the typed `GetResult`.
3. **`sweep_directory` hid failures in the summary.** `files_processed`
used to exclude failed files. Replaced with `files_attempted` (all
discovered) + `files_succeeded` (subset that completed); CLI output
shows `succeeded/attempted`.
4. **Coordination claim was overstated.** The primary miners don't
stamp `session_id`/`timestamp` metadata, so the sweeper coordinates
only with its own prior runs. Softened docstrings on module and CLI
command. Uniform cross-miner metadata is flagged as a follow-up.
5. **MAX_FILE_SIZE comments were misleading.** Said source size "does
not affect storage or embedding cost" — true per-drawer, but source
size still scales drawer count, embedding work, and memory usage
(files are read in full, not streamed). Corrected in both
`miner.py` and `convo_miner.py`.
6. Added the tie-break regression test that reproduces the correctness
bug from (1).
Tests: 970 passed (was 969), ruff + pre-commit clean.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
Four changes on top of the proposal's initial sweeper draft, driven by
the CLAUDE.md design principles:
1. Drop the 500-char truncation on tool_use / tool_result content in
_flatten_content. The "verbatim always" principle forbids lossy
compression of user-adjacent data; a long code-edit diff handed to
the assistant must round-trip intact. Unknown block types now also
serialize their full payload instead of just a type marker. New test
test_parse_preserves_tool_blocks_verbatim covers a 5000-char input.
2. Use the full session_id in drawer IDs (not session_id[:12]). Rules
out cross-session collisions if a transcript source ever uses
non-UUID session identifiers or shared prefixes.
3. Replace silent `except Exception: return None` in get_palace_cursor
with a logger.warning — the exact anti-pattern this PR otherwise
criticizes in miner.py. The fallback behavior is still safe
(deterministic IDs make a missed cursor recover on the next run),
but the failure is now discoverable.
4. sweep_directory now collects per-file failures into the result dict
and the CLI exits non-zero when any file failed, so a partial-sweep
outcome is visible rather than swallowed.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
The primary miners (miner.py, convo_miner.py) operate at file
granularity and can drop data for several reasons: size caps, silent
OSError on read, dedup false positives, extensions the project miner
does not recognize. Even with tonight's hotfixes, any future bug in
the file-level path risks silent data loss.
The sweeper is a second, cooperating miner that works at MESSAGE
granularity:
- Parses Claude Code .jsonl line by line, yielding only
user/assistant records (filters progress, file-history-snapshot,
etc. noise).
- For each session_id, queries the palace for max(timestamp) and
treats that as the cursor.
- Ingests only messages newer than the cursor, as one small drawer
per exchange (never hits a size cap — each drawer is 1-5 KB).
- Deterministic drawer IDs from session_id + message UUID make
reruns idempotent; crash mid-sweep is safe.
Tandem coordination is free: if the primary miner committed up to
timestamp T, the sweeper resumes from T. If the primary miner missed
everything, the sweeper catches it all. Neither duplicates the other.
Smoke test on a real Claude Code transcript:
1st run: +39 drawers, 0 already present
2nd run: +0 drawers, 39 already present (perfect idempotence)
Opt-in via:
mempalace sweep <file.jsonl>
mempalace sweep <transcript-dir>
No changes to existing miners. No schema migration. Purely additive.
Tests: tests/test_sweeper.py (7 tests covering parsing, tandem
coordination, idempotency, resume-from-cursor, metadata correctness).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the miner.py fix in this same branch. convo_miner.py had the
exact same 10 MB cap at line 58 that silently dropped long transcripts
via continue. Long Claude Code sessions, multi-year ChatGPT exports,
and lifetime Slack dumps all exceed 10 MB. Same silent-drop pattern,
different file.
Raised to 500 MB to match miner.py for consistency; downstream chunking
means source file size does not affect storage or embedding cost.
Tests: tests/test_convo_miner_size_cap.py (1 test)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Long Claude Code sessions routinely produce transcripts larger than 10
MB. The previous cap at miner.py:65 silently dropped them at line 732
with `if filepath.stat().st_size > MAX_FILE_SIZE: continue` — same
silent-failure pattern as the .jsonl extension bug.
The cap exists as a safety rail against pathological binaries, not as
a limit on legitimate text. Downstream chunking at 800 chars per drawer
means source file size does not affect storage or embedding cost.
500 MB leaves headroom for year-long continuous transcripts while still
catching accidental multi-GB binary mines.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mempalace/miner.py:READABLE_EXTENSIONS contained `.json` but not
`.jsonl`. Every jsonl file encountered in a mined directory was
silently skipped at miner.py:722:
if filepath.suffix.lower() not in READABLE_EXTENSIONS:
continue
Claude Code transcripts, ChatGPT exports, and every other tool writing
line-delimited JSON ship as `.jsonl`. Users running `mempalace mine`
against a directory of transcripts saw the command complete with no
error and no log line — and their conversations never reached the
palace. Silent data loss.
Adding `.jsonl` to the whitelist alongside `.json`. jsonl is text
line-by-line; the existing chunking pipeline handles it the same way
it handles any other text file.
Tests: tests/test_miner_jsonl_visibility.py
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Windows with non-UTF-8 locale (e.g. GBK), Path.read_text() defaults
to platform encoding, breaking onboarding tests and any source code that
reads JSON/markdown with non-ASCII content.
5 files, 8 call sites fixed.
Introduces the Indonesian (id) locale, providing translations for CLI commands, status messages, and core terminology.
Includes language-specific regex patterns for stop words and action detection to support text processing and indexing in Indonesian. The test suite is updated with a sample case to verify correct dialect handling and compression.
entity_detector.py was refactored in #911 to load candidate patterns
from i18n locale JSON files, supporting non-Latin scripts (Cyrillic,
accented Latin, etc.). But three other code paths still hardcoded the
ASCII-only regex [A-Z][a-z]{2,}, silently missing non-Latin entity
names in metadata tagging, closet indexing, and registry lookups.
Replace the hardcoded regex with a shared _candidate_entity_words()
helper that reuses the same i18n candidate_patterns as entity_detector.
Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras)
like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w.
This means \b splits mid-word on every matra: names like अनीता (Anita)
truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b
never match because \b fails after the final matra of कहा.
Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script
whose words contain combining marks.
Fix: locales with combining-mark scripts declare a boundary_chars field
in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n
loader replaces every \b in that locale's patterns with a script-aware
lookaround that treats the declared characters as "inside-word", and
pre-wraps candidate/multi_word patterns with the same boundary.
Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru,
it are unchanged.
Changes:
- mempalace/i18n/__init__.py: add _script_boundary, _expand_b,
_wrap_candidate, _collect_entity_section; candidate_patterns are now
returned fully-wrapped (boundary + capture group applied)
- mempalace/entity_detector.py: extract_candidates compiles pre-wrapped
candidate patterns directly instead of re-wrapping with \b
- tests/test_entity_detector.py: 5 new tests for Devanagari boundaries
(name extraction with/without boundary_chars, person-verb firing,
English regression)
BCP 47 language tags are case-insensitive (RFC 5646 §2.1.1) but the
locale files mix conventions (pt-br.json vs zh-CN.json). On
case-sensitive filesystems, '--lang PT-BR' or '--lang zh-cn' silently
missed the file, _load_entity_section returned {}, and entity
detection ran in English with no warning.
The cache key in get_entity_patterns was built from raw input, so
('PT-BR',) and ('pt-br',) produced two distinct entries, both wrong.
Add _canonical_lang(lang) that resolves any casing to the on-disk
filename stem via lowercase comparison, and route load_lang,
_load_entity_section, and the cache key through it.
Closes#927
Move all entity-detection lexical patterns (person verbs, pronouns,
dialogue markers, project verbs, stopwords, candidate character class)
out of hardcoded module-level constants and into the entity section of
each locale's JSON in mempalace/i18n/. Adds a languages parameter to
every public function so callers union patterns across the desired
locales. The default stays ("en",), so all existing callers and tests
behave unchanged.
Also adds:
- get_entity_patterns(langs) helper in mempalace/i18n/ that merges
patterns across requested languages, dedupes lists, unions stopwords,
and falls back to English for unknown locales
- MempalaceConfig.entity_languages property + setter, with env var
override (MEMPALACE_ENTITY_LANGUAGES, comma-separated)
- mempalace init --lang en,pt-br flag (persists to config.json)
- Per-language candidate_pattern so non-Latin scripts (Cyrillic,
Devanagari, CJK) can register their own character classes instead of
being silently dropped by the ASCII-only [A-Z][a-z]+ default
- _build_patterns LRU cache keyed by (name, languages) so multi-language
callers don't poison each other's cache slots
Why now: the open language PRs (#760 ru, #773 hi, #778 id, #907 it) only
add CLI strings via mempalace/i18n/. PR #156 (pt-br) is the first that
needed entity_detector changes and inlined a _PTBR variant of every
constant. That doesn't scale past 2-3 languages — every text gets
checked against every language's patterns regardless of relevance, and
candidate extraction still drops accented and non-Latin names.
This PR sets the standard so future locale contributors only edit one
JSON file (no Python changes), and entity detection scales linearly
with how many languages a user actually enabled, not how many ship.
* fix: add provenance header and speaker IDs to Slack transcript imports
Slack exports are multi-party chats where no speaker is inherently
the "user" or "assistant". The parser previously assigned these roles
purely by position, allowing a crafted export to place attacker text
in the "user" role — making it appear as the memory owner's words
in all future retrieval (data poisoning via stored memory).
Changes:
- Add provenance header marking Slack transcripts as multi-party
with positional (unverified) role assignment
- Prefix each message with the original speaker ID ([U1], [U2], etc.)
so downstream consumers can distinguish authors
- Keep user/assistant role alternation for exchange-pair chunking
compatibility with convo_miner.py
Tests:
- Provenance header presence and content
- Speaker ID preservation in output
- Attacker-first-message attribution verification
Refs: MemPalace/mempalace#809
* fix: move Slack provenance to footer, sanitize speaker IDs, extract constant
- Move provenance notice from header to footer to prevent it becoming
a standalone ChromaDB drawer via paragraph chunking on exports
with fewer than 3 exchange pairs (violates verbatim-always principle)
- Sanitize speaker user_id/username: strip brackets, newlines, and
control characters to prevent chunk-boundary injection via crafted
Slack exports
- Extract header string to _SLACK_PROVENANCE_FOOTER module constant,
consistent with _TOOL_RESULT_* constants pattern; tests import it
instead of duplicating the literal
Refs: MemPalace/mempalace#809
* feat: include created_at timestamp in search results (closes#465)
Surface the existing filed_at metadata as created_at in search result
objects returned by search_memories(). Enables temporal reasoning over
search hits without additional queries.
* Feat: add fallback for missing filed_at metadata
* fix(hooks): stop precompact hook from blocking compaction
The precompact hook unconditionally returned {"decision": "block"},
which in Claude Code means "cancel compaction" with no retry mechanism.
This made /compact permanently broken for all plugin users.
Changed hook_precompact() to mine the transcript synchronously (so data
lands before compaction) and return {"decision": "allow"}. This matches
the standalone bash hook in hooks/ which already uses allow.
Also extracted _get_mine_dir() and _mine_sync() helpers so precompact
can mine from the transcript directory, not just MEMPAL_DIR.
Stop hook behavior is unchanged -- left for #673 which implements the
full silent save path.
Closes#856, closes#858.
* fix: use empty JSON instead of invalid \"allow\" decision value
Claude Code only recognizes \"block\" as a top-level decision value.
\"allow\" is a permissionDecision value for PreToolUse hooks, not a
valid top-level decision. The correct way to not block is to return
empty JSON. Caught by #872.
* fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225)
Fixes#225.
Several transitive dependencies (chromadb, onnxruntime, posthog) print
banners and warnings to stdout — sometimes at the C level — during the
mcp_server import chain. Because the MCP protocol multiplexes JSON-RPC
over stdio, any non-JSON output on stdout corrupted the message stream
and broke Claude Desktop's parser with errors like:
MCP mempalace: Unexpected token '*', "**********"... is not valid JSON
MCP mempalace: Unexpected token 'E', "EP Error D"... is not valid JSON
MCP mempalace: Unexpected token 'F', "Falling ba"... is not valid JSON
Reproduced on Windows 11 with mempalace 3.0.0 / Python 3.10 / Claude
Desktop 1.1062.0.
Fix: at module load, redirect stdout to stderr at both the Python level
(sys.stdout = sys.stderr) and the file-descriptor level (os.dup2(2, 1))
to catch C-level prints, while preserving the real stdout for later
restore. main() calls _restore_stdout() right before entering the
protocol loop so JSON-RPC responses still go to the real stdout.
Adds tests/test_mcp_stdio_protection.py with three regression tests:
- module-level redirect is in place after import
- _restore_stdout() restores the original stdout (idempotent)
- 'python -m mempalace.mcp_server' with empty stdin emits no stdout
* style: reformat with ruff 0.4 (CI version) for #225
Partially addresses #185.
`mempalace init <dir>` writes `mempalace.yaml` and `entities.json` into
the project root. When <dir> is a git repository, those files have no
default protection and risk being committed by accident — the loudest
concern in the original report.
This PR adds `_ensure_mempalace_files_gitignored()` which runs at the
end of cmd_init: if <dir>/.git exists, append the two filenames to
.gitignore (creating it if necessary) under a clearly-marked block.
The helper is conservative:
- only runs when <dir>/.git is present (no-op for non-git projects)
- skips entries already present (no duplicates)
- preserves existing .gitignore content
- handles files without trailing newlines
This does NOT relocate the files to ~/.mempalace/wings/<wing>/ as the
issue's 'Expected' section proposes — that's a behavioral change with
miner/config implications and warrants a separate design discussion.
The gitignore safeguard removes the immediate risk without breaking any
existing flow.
Tests: 5 cases in tests/test_init_gitignore_protection.py covering
no-op, fresh creation, partial append, idempotency, and missing-newline
edge case.
Fixes#195.
When ChromaDB returns no documents (empty palace, or wing/room filter
that excludes everything), it returns the shape:
{"documents": [], "metadatas": [], "distances": []}
Indexing `results["documents"][0]` blindly raises IndexError instead of
the expected 'no results' response. Affected: searcher.search(),
searcher.search_memories() (drawer + closet branches plus the
total_before_filter aggregate), and Layer3.search() / Layer3.search_raw().
Adds a tiny private helper `searcher._first_or_empty(results, key)` that
safely extracts the inner list, returning [] for any of: missing key,
empty outer list, [None], or [[]]. layers.py imports the same helper to
avoid duplicating the guard.
Tests: tests/test_empty_chromadb_results.py covers all observed shapes
plus a documentation-style test that pins the original IndexError so
future readers understand why the helper exists.
tool_status() called _get_collection() with the default create=False,
which throws when the ChromaDB collection does not exist yet (valid
palace, zero drawers). The exception was swallowed and status returned
"No palace found" even though init had completed successfully.
Switching to create=True bootstraps an empty collection on first
status call, matching what the write path already does.
Fix suggested by @hkevinchu in the issue.
* fix: make entity_registry.research() local-only by default
research() previously called _wikipedia_lookup() unconditionally,
sending entity names to en.wikipedia.org on every uncached lookup.
This violates the project's local-first and privacy-by-architecture
principles documented in CLAUDE.md.
Changes:
- research() now returns "unknown" for uncached words by default
- New allow_network=True parameter required for Wikipedia lookups
- Wikipedia 404 now returns "unknown" instead of asserting "person"
with 0.70 confidence, preventing entity registry poisoning
- Added privacy warning docstring to _wikipedia_lookup()
- Added tests for local-only default, opt-in network, 404 handling,
and cache-not-persisted-on-local-only behaviour
Refs: MemPalace/mempalace#809
* fix: improve research() cache read path and deduplicate test mocks
- Use .get() instead of .setdefault() for cache reads in research()
so the local-only path never mutates _data unnecessarily
- Move .setdefault() to the network-write path only
- Use result.setdefault() for word/confirmed keys to ensure
consistent return shape across all _wikipedia_lookup error paths
- Extract duplicated mock_result dict into _MOCK_SAOIRSE_PERSON
constant shared by 3 test functions
Three issues flagged by bensig on the i18n PR before merge:
1. ko.json: status_drawers used {drawers} instead of {count}, causing
the Korean UI to show the raw template string instead of the actual
drawer count. All other 7 languages use {count}.
2. Test file was shipped inside the package at mempalace/i18n/test_i18n.py
with a sys.path.insert hack. Moved to tests/test_i18n.py per the
project convention in AGENTS.md.
3. Dialect.from_config() passed lang=config.get("lang") which defaults
to None, causing __init__ to inherit whatever language was loaded
earlier via module-level state. Now defaults to "en" explicitly so
from_config is deterministic regardless of prior load_lang() calls.
Added two regression tests for the ko.json fix and the state leak.
The regression-guard tests added in #835 were pinned to the old
README shape (tool table + file-reference table). When #897 slimmed
the README and moved that content to the website, three tests
started failing:
TestReadmeToolsExistInCode.test_every_readme_tool_exists_in_tools_dict
TestNoUnlistedTools.test_no_undocumented_tools
TestReadmeDialectNotLossless.test_readme_dialect_line_not_lossless
Changes in this commit:
1. Update the 3 tests to track the new canonical docs surfaces
- Tool list -> website/reference/mcp-tools.md
(tests parse `### \`mempalace_xxx\`` headings instead of
markdown table rows).
- dialect.py lossless disclaimer -> website/reference/modules.md
(any line mentioning dialect.py must not also say "lossless").
2. Fix the website to make "no undocumented tools" true
Add the 10 tools that existed in TOOLS but were missing from
website/reference/mcp-tools.md (create_tunnel, delete_tunnel,
follow_tunnels, list_tunnels, get_drawer, list_drawers,
update_drawer, hook_settings, memories_filed_away, reconnect).
Page header now correctly says "all 29 MCP tools".
3. Align pre-commit ruff pin to match CI (0.4.x)
.pre-commit-config.yaml was pinning ruff v0.9.0, while
.github/workflows/ci.yml installs ruff>=0.4.0,<0.5. The two
formatters produce incompatible output (e.g. v0.9.0 reformats
`assert (x), msg` -> `assert x, (msg)` in a way v0.4.x rejects),
which would cause the pre-commit hook to modify files that CI
then flags as unformatted. Pinning the hook to v0.4.10 keeps
the dev loop and CI in lock-step.
Full suite: 887 passed, 0 failed.
TDD: test first, failed, fixed, passed.
Igor fixed query_relationship/timeline/stats in an earlier commit.
close() was the last method touching self._connection without
holding the lock.
Closes#883.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
export MEMPAL_VERBOSE=true → hook blocks, agent writes diary in chat
export MEMPAL_VERBOSE=false → silent background save (default)
Developers need to see code and diaries being written.
Regular users want zero chat clutter. Now both work.
TDD: tests written first, failed, code fixed, tests pass.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move regular expression compilation to the module level in `dialect.py` to prevent repeated parsing during loop execution.
Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>
When no mempalace.yaml or mempal.yaml exists in the source directory,
return a default config (wing = directory name, room = general) instead
of calling sys.exit(1). This lets users mine any directory into their
palace without requiring init first.
Closes#14.
sanitize_name rejects commas, colons, parentheses, and slashes — characters
that commonly appear in knowledge graph subject/object values. Adds
sanitize_kg_value for KG entity fields (subject, object, entity) while
keeping sanitize_name for predicates and wing/room names.
- _count_human_messages() now logs a WARNING via _log() when a
non-empty transcript_path is rejected by the validator, making
silent auto-save failures diagnosable via hook.log
- Add test for platform-native paths (backslashes on Windows) to
verify _validate_transcript_path works cross-platform
- Add test verifying the warning log is emitted on rejection
Refs: MemPalace/mempalace#809
Prerequisite for RFC 001 (plugin spec, #743). Removes every direct
`import chromadb` outside the ChromaDB backend itself so the core
modules depend only on the backend abstraction layer.
Extends ChromaBackend with make_client, get_or_create_collection,
delete_collection, create_collection, and backend_version. Adds
update() to the BaseCollection contract. Non-backend callers
(mcp_server, dedup, repair, migrate, cli) now go through the
abstraction; tests patch ChromaBackend instead of chromadb.
With this landed, the RFC 001 spec can be enforced and PalaceStore
(#643) can ship as a plugin without touching core modules.
TDD: test written first, failed, then fixed.
Problem: save hook says "saved in background" but MEMPAL_DIR defaults
to empty, so nothing actually mines. Users get no auto-save despite
the hook firing every 15 messages.
Fix: use TRANSCRIPT_PATH (received from Claude Code in the hook's
JSON input) to discover the session directory. Mine that directory
automatically. MEMPAL_DIR is still supported as override but no
longer required.
Also fixed: bare python3 → $(command -v python3) for nohup safety.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: README audit — match every claim to shipped code + add hall detection
TDD audit: wrote 42 tests verifying README claims against codebase.
Fixed all 7 failures:
1. Tool count: 19 → 29 (10 tools were undocumented)
2. Added tool table rows for tunnels, drawer management, system tools
3. Version badge: 3.1.0 → 3.2.0
4. dialect.py file reference: "30x lossless" → "AAAK index format for closet pointers"
5. Wake-up token cost: "~170 tokens" → "~600-900 tokens" (matches layers.py)
6. pyproject.toml version in project structure: v3.0.0 → v3.2.0
7. Hall detection: added detect_hall() to miner.py — drawers now tagged
with hall metadata so palace_graph.py can build hall connections
New code:
- miner.py: detect_hall() — keyword scoring against config hall_keywords,
writes hall field to every drawer's metadata
- tests/test_hall_detection.py — 12 TDD tests (written before code)
- tests/test_readme_claims.py — 42 TDD tests verifying README accuracy
859/859 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve ruff lint — unused imports and variables
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: ruff format with CI-pinned 0.4.x
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use conftest fixtures in hall tests for Windows compat
Windows CI fails with NotADirectoryError when ChromaDB tries to
write HNSW files in short-lived TemporaryDirectory. Use conftest
palace_path and tmp_dir fixtures instead — same pattern as all
other tests that touch ChromaDB.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address Igor's review — convo_miner halls, cached config, markdown typo
TDD: wrote tests for convo_miner hall metadata and config caching
BEFORE verifying the code changes.
1. README markdown typo: extra ** in wake-up token row (line 195)
2. convo_miner.py: added _detect_hall_cached() — conversation
drawers now get hall metadata (was missing, Igor caught it)
3. miner.py + convo_miner.py: cached hall_keywords at module level
so config.json isn't re-read per drawer during bulk mine
4. New tests: TestConvoMinerWritesHalls, TestDetectHallCaching
861/861 tests pass. ruff clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous revision used multiprocessing but still relied on timing
("second process waited at least N seconds") which flakes on CI where
spawn overhead eats into the hold window. Linux CI observed the second
process report a 0.088s wait — below the 0.1s threshold — even though
the lock behavior was correct; spawn was just slow enough that the
first process had nearly finished holding when the second got past
its own spawn.
Switch to effect-based verification: each worker logs its
[enter_time, exit_time] inside the critical section, and the test
asserts the two intervals are disjoint after sorting. A broken lock
would produce overlapping intervals regardless of spawn latency; a
working lock cannot.
Also removed the mp.Queue since we no longer pass timing data back.
The macOS CI job failed ``test_lock_blocks_concurrent_access`` because
``fcntl.flock`` on BSD/macOS is per-*process*, not per-FD: two threads
in the same process both acquire even when they open their own file
descriptors. The test passed on Linux (per-FD flock) and Windows
(per-FD ``msvcrt.locking``) but was never actually exercising the
lock's real contract.
``mine_lock`` is designed to serialize multi-*agent* access — i.e.,
separate processes, not threads. Switch the test to
``multiprocessing.get_context('spawn')`` with a module-level worker
(so the spawn pickles cleanly) so it:
1. reflects the actual use case (one lock per mining process);
2. passes on all three OSes without flock-semantics branching;
3. catches real regressions (a broken lock would now let both
processes through, exactly what we care about).
Hold time bumped to 0.3s and the "wait until p1 acquires" delay to
0.2s to tolerate spawn's higher startup latency on macOS/Windows.
The Windows CI job failed on:
assert '/.mempalace/state/' in str(state_path)
because Windows uses ``\`` as the path separator, so the substring
never matches. The behavior under test (state file lives outside the
diary dir, under ``~/.mempalace/state/``) is already correct on both
platforms — only the assertion was Unix-only.
Switch to ``state_path.parent`` comparisons that work on any OS.
Brings in PR #793 (optional LLM-based closet regeneration via
user-configured OpenAI-compatible endpoint) and PR #795 (hybrid
closet+drawer search — closets boost, never gate). Stack: #784 → #788
→ #789 → #790 → #791 → #792 → #793 (+ #795).
Findings hardened on our side
─────────────────────────────
1) closet_llm.regenerate_closets didn't use the blessed palace helpers.
Before:
* manual closets_col.get(where=...) + .delete(ids=...) with a
silent ``except Exception: pass`` around both — if the purge
failed, pre-existing regex closets survived alongside fresh LLM
closets, giving the searcher double hits for the same source.
* ``source.split('/')[-1][:30]`` to build the closet_id — quietly
wrong on Windows paths (``C:\\proj\\a.md`` has no ``/``, so the
whole string ends up in the ID).
* no mine_lock around purge+upsert — a concurrent regex rebuild of
the same source could interleave with our purge and leave a mix
of regex and LLM pointers.
* no ``normalize_version`` stamp on the LLM closets — the miner's
stale-version gate would treat them as leftovers from an older
schema and rebuild over them on the next mine.
After: routes through ``purge_file_closets`` + ``mine_lock`` +
``os.path.basename`` + ``NORMALIZE_VERSION`` stamp. Regression tests
cover each.
2) searcher.search_memories was still closet-first.
PR #795 merged into #793's head to fix the recall regression
documented in that PR (R@1 0.25 on narrative content vs. 0.42
baseline). The hybrid design makes closets a ranking boost rather
than a gate: drawers are always queried at the floor, and matching
closet hits (rank 0-4 within CLOSET_DISTANCE_CAP=1.5) add a boost
of 0.40/0.25/0.15/0.08/0.04 to the effective distance.
Merged to take the incoming hybrid design, with two cleanups:
* kept the ``_expand_with_neighbors`` / ``_extract_drawer_ids_from_closet``
helpers as separately-tested utilities (still imported by tests
and future callers);
* replaced the fragile ``source_file.endswith(basename)`` reverse-
lookup in the enrichment step with internal ``_source_file_full``
/ ``_chunk_index`` fields stripped before return, so enrichment
doesn't silently pick the wrong path when two sources share a
basename across directories;
* drawer-grep enrichment now sorts by ``chunk_index`` before
neighbor expansion, so ``best_idx ± 1`` corresponds to actual
document order rather than whatever order Chroma returned.
3) Closet-first tests in test_closets.py (``TestSearchMemoriesClosetFirst``,
end-to-end ``test_closet_first_search_includes_drawer_index_and_total``)
pinned contracts that the hybrid path now violates (``matched_via``
went from ``"closet"`` to ``"drawer+closet"``). Rewrote them around
the new invariant: direct drawers are always the floor, closet
agreement flips the hit's matched_via and exposes closet_preview.
Verification
────────────
* 805/805 pass under ``uv run pytest tests/ -v --ignore=tests/benchmarks``
(13 new tests from PR #793 + 5 from PR #795 + 2 new regressions for
the closet_llm hardening + the rewritten hybrid assertions in
test_closets.py).
* CI-pinned ruff 0.4.x clean on ``mempalace/`` + ``tests/`` (check +
format both pass).
* No new deps — closet_llm.py still uses stdlib ``urllib.request`` per
the PR's "zero new dependencies" promise.
Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
Merges the full hardened stack (up through #791 drawer-grep) and turns
fact_checker from "dead code hidden behind bare except" into an
actually-working offline contradiction detector with tests.
## Dead paths the PR body advertised but the code never executed
Both buried by a single outer ``except Exception: pass``:
* ``kg.query(subject)`` — ``KnowledgeGraph`` has no ``query()`` method;
it has ``query_entity()``. The attribute error was silently swallowed
and the entire KG branch always returned ``[]``. Now using
``kg.query_entity(subject, direction="outgoing")`` with proper
handling of the ``predicate``/``object``/``current``/``valid_to``
fields the real API returns.
* ``KnowledgeGraph(palace_path=palace_path)`` — the constructor's only
kwarg is ``db_path``. Passing ``palace_path`` raised TypeError,
silently swallowed. Now computing the db_path correctly from
``<palace>/knowledge_graph.sqlite3``, matching the convention the
MCP server already uses.
## Contradiction logic rewritten
The previous ``if kg_pred in claim and fact.object not in claim`` only
fired when text used the SAME predicate word as the KG fact — the exact
opposite of the stated use case ("Bob is Alice's brother" when KG says
husband" would NOT have fired). Replaced with a proper parse → lookup
→ compare pipeline:
* ``_extract_claims`` parses two surface forms ("X is Y's Z" and
"X's Z is Y") into ``(subject, predicate, object)`` triples.
* ``_check_kg_contradictions`` pulls the subject's outgoing facts
and flags two classes:
- ``relationship_mismatch`` when a current KG fact matches the
same ``(subject, object)`` pair but with a different predicate.
- ``stale_fact`` when the exact triple exists but is
``valid_to``-closed in the past.
* Stale-fact detection is now implemented (the PR body claimed it;
the old code silently didn't implement it).
## Performance fix — O(n²) → O(mentioned × n)
``_check_entity_confusion`` previously computed Levenshtein for every
pair of registered names on every ``check_text`` call. For 1,000
registered names that's ~500K edit-distance calls per hook invocation.
Now we first identify which registry names actually appear in the text
(single regex scan), then only compute edit distance between mentioned
and unmentioned names. Pinned by a test that asserts <200ms on a 500-
name registry with zero mentions.
Also: when *both* similar names are mentioned in the text, we no
longer flag them — the user clearly knows they're different people.
## Shared entity-registry loader
``mempalace/miner.py`` already had an mtime-cached loader for
``~/.mempalace/known_entities.json``. fact_checker had a duplicate
implementation that leaked file handles and ignored caching. Extended
miner's cache to expose both the flat set (``_load_known_entities``)
and the raw category dict (``_load_known_entities_raw``); fact_checker
now imports the latter. No more double disk reads, no more handle leak.
## Tests — 24 cases in tests/test_fact_checker.py
All three detection paths + both dead-code regressions:
* ``test_kg_init_uses_db_path_not_palace_path_kwarg`` — pins the
correct KG constructor signature so the ``palace_path=`` bug can't
come back.
* ``test_relationship_mismatch_detected`` — the headline example from
the PR body now actually fires.
* ``test_stale_fact_detected`` — valid_to-closed triple is flagged.
* ``test_current_fact_same_triple_is_not_flagged`` — no false positive
on a still-valid match.
* ``test_performance_bounded_by_mentioned_names`` — 500-name registry,
zero mentions, <200ms. Regression for the O(n²) blowup.
* ``test_no_false_positive_when_both_names_mentioned`` — Mila and
Milla in the same text is fine.
* Plus claim extraction, flatten_names shapes, CLI exit code, empty
text handling, missing-palace graceful fallback, registry-dict
shape support.
785/785 suite pass. ruff + format clean on CI-pinned 0.4.x.