Files
mempalace/tests/test_readme_claims.py
T
Igor Lins e Silva 4aa7e1eebd release: v3.3.0 (#839)
* fix: add file-level locking to prevent multi-agent duplicate drawers

Root cause: when multiple agents mine simultaneously, both pass
file_already_mined() check, both delete+insert the same file's
drawers, creating duplicates or losing data.

Fix: mine_lock() in palace.py — cross-platform file lock (fcntl on
Unix, msvcrt on Windows). Both miner.py and convo_miner.py now lock
per-file during the delete+insert cycle and re-check after acquiring
the lock.

Tested:
- Lock acquires and releases correctly
- Second agent blocks until first releases (0.25s wait)
- 33/33 existing tests pass
- Cross-platform: fcntl (macOS/Linux), msvcrt (Windows)

Based on v3.2.0 tag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: strip system tags, hook output, and Claude UI chrome from drawers

normalize.py now strips before filing:
- <system-reminder>, <command-message>, <command-name> tags
- <task-notification>, <user-prompt-submit-hook>, <hook_output> tags
- Hook status messages (CURRENT TIME, Checking verified facts, etc.)
- Claude Code UI chrome (ctrl+o to expand, progress bars, etc.)
- Collapsed runs of blank lines

This noise was going straight into drawers, wasting storage space
and polluting search results. strip_noise() runs on all normalized
output regardless of input format (JSONL, JSON, plain text).

689/689 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add closet layer — searchable index pointing to drawers

The closet architecture was always part of MemPalace's design but
never shipped in the public codebase. This adds it.

Palace now has TWO collections:
- mempalace_drawers — full verbatim content (unchanged)
- mempalace_closets — compact AAAK-style index entries

How it works:
- When mining, each file gets a closet alongside its drawers
- Closet contains extracted topics, entities, quotes as pointers
- Closets pack up to 1500 chars, topics never split mid-entry
- Search hits closets first (fast, small), then hydrates the
  full drawer content for matching files
- Falls back to direct drawer search if no closets exist yet

Files changed:
- palace.py: get_closets_collection(), build_closet_text(),
  upsert_closet(), CLOSET_CHAR_LIMIT
- miner.py: process_file() now creates closets after drawers
- searcher.py: search_memories() tries closet-first search,
  hydrates drawers, falls back to direct search

Backwards compatible — existing palaces without closets continue
to work via the fallback path. Closets are created on next mine.

689/689 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: enforce atomic topics in closets, extract richer pointers

- upsert_closet replaced by upsert_closet_lines: checks each topic
  line individually against CLOSET_CHAR_LIMIT. If adding one line
  WHOLE would exceed the limit, starts a new closet. Never splits
  mid-topic.
- build_closet_lines returns a list of atomic lines (not joined text)
- Richer extraction: section headers, more action verbs, up to 3
  quotes, up to 12 topics per file
- Each line is complete: topic|entities|→drawer_refs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add CLOSETS.md — closet layer overview

Cherry-picked the docs portion of 67e4ac6 to accompany the closet
feature. Test coverage for closets is omnibus with tests for entity
metadata and BM25 (see PR targeting those features) and will land
together in a follow-up.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

* feat: entity metadata + diary ingest + BM25 hybrid search

Three features that close the gap between the architecture docs
and the actual codebase:

1. Entity metadata on drawers and closets
   - _extract_entities_for_metadata() pulls names from known_entities.json
     + proper nouns appearing 2+ times
   - Stamped as "entities" field in ChromaDB metadata
   - Enables filterable search by person/project name

2. Day-based diary ingest (diary_ingest.py)
   - ONE drawer per day, upserted as the day grows
   - Closets pack topics atomically, never split mid-topic
   - Tracks entry count in state file, only processes new entries
   - Usage: python -m mempalace.diary_ingest --dir ~/summaries

3. BM25 hybrid search in searcher.py
   - _bm25_score() keyword matching complements vector similarity
   - _hybrid_rank() combines both signals (60% vector, 40% BM25)
   - Catches exact name/term matches that embeddings miss
   - Applied to both closet-first and direct drawer search paths

689/689 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add tests for mine_lock, closets, entity metadata, BM25, diary

Trimmed version of Milla's omnibus test_closets.py to only cover
features present in this PR stack (#784 lock, #788 closets, this
PR's entity/BM25/diary). Strip-noise tests will land with #785;
tunnel tests will land with the tunnels PR.

16/16 pass.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

* feat: explicit cross-wing tunnels for multi-project agents

Adds active tunnel creation alongside passive tunnel discovery.

Passive tunnels (existing): rooms with the same name across wings.
Explicit tunnels (new): agent-created links between specific
locations. "This API design in project_api relates to the database
schema in project_database."

New functions in palace_graph.py:
- create_tunnel() — link two wing/room pairs with a label
- list_tunnels() — list all explicit tunnels, filter by wing
- delete_tunnel() — remove a tunnel by ID
- follow_tunnels() — from a room, find all connected rooms in
  other wings with drawer content previews

New MCP tools:
- mempalace_create_tunnel
- mempalace_list_tunnels
- mempalace_delete_tunnel
- mempalace_follow_tunnels

Tunnels stored in ~/.mempalace/tunnels.json (persists across
palace rebuilds). Deduplicated by endpoint pair.

689/689 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add TestTunnels for cross-wing tunnel operations

Appended from Milla's omnibus test_closets.py — covers create,
list, delete, dedup, and follow_tunnels behavior. 21/21 pass.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

* feat(search): drawer-grep returns best-matching chunk + neighbors

When a closet hit leads to a source file with many drawers, grep each
chunk for query terms and return the BEST-MATCHING chunk + 1 neighbor
on each side, instead of dumping the whole file truncated at
MAX_HYDRATION_CHARS. Result now includes drawer_index and
total_drawers so callers can request adjacent drawers explicitly.

Extracted from Milla's commit 935f657 which bundled drawer-grep with
closet_llm (deferred pending LLM_ENDPOINT refactor) and fact_checker
(separate PR). Ported only the searcher.py change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: offline fact checker against entity registry + knowledge graph

fact_checker.py verifies text for contradictions against locally stored
entities and KG facts. Catches similar-name confusion (Bob vs Bobby),
relationship mismatches (KG says husband, text says brother), and
stale facts (KG valid_from/valid_to).

No hardcoded facts. No network calls. Reads:
- ~/.mempalace/known_entities.json
- KnowledgeGraph SQLite

Usage:
  from mempalace.fact_checker import check_text
  issues = check_text("Bob is Alice's brother", palace_path)

  # CLI
  python -m mempalace.fact_checker "text" --palace ~/.mempalace/palace

Extracted from Milla's commit 935f657 which bundled this with
closet_llm (deferred) and drawer-grep (PR #791). Ported only
fact_checker.py — verified no network / API imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: optional LLM-based closet regeneration — bring-your-own endpoint

Adds mempalace/closet_llm.py as an OPTIONAL path for richer closet
generation. Regex closets remain the default and cover the local-first
promise; users who want LLM-quality topics can bring their own endpoint.

Configuration (env or CLI flag):
  LLM_ENDPOINT — OpenAI-compatible base URL (required)
  LLM_KEY      — bearer token (optional; local inference skips this)
  LLM_MODEL    — model name (required)

Works with Ollama, vLLM, llama.cpp servers, OpenAI, OpenRouter, and any
other provider that speaks OpenAI-compatible /chat/completions. Zero new
dependencies — uses stdlib urllib.

Replaces the original Anthropic-SDK-hardcoded version of this module
from Milla's branch (commit 935f657). Same prompt, same parsing, same
regenerate_closets flow; only the transport was generalised so the
feature doesn't lock users into a specific vendor or require API keys
for core memory operations (CLAUDE.md, "Local-first, zero API").

Includes 13 unit tests covering config resolution, request shape,
auth-header omission when no key is set, code-fence stripping, and
missing-config error path. All mocked — zero network calls in tests.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

* fix(search): hybrid closet+drawer retrieval — closets boost, never gate (#795)

* Fix: set cosine distance metadata on all collection creation sites

ChromaDB defaults HNSW index to L2 (Euclidean) distance, but
MemPalace scoring uses 1-distance which requires cosine (range 0-2).
Add metadata={"hnsw:space": "cosine"} to the 4 production and 3 test
call sites that were missing it.

Closes #218

* fix: sync version.py to 3.2.0

Commit 6614b9b bumped pyproject.toml to 3.2.0 but missed
mempalace/version.py, breaking test_version_consistency on
every PR's CI. This syncs them.

* refactor: extract locked filing block to keep mine_convos under C901

Adding the per-file lock + double-checked file_already_mined() in the
previous commit pushed mine_convos cyclomatic complexity from 25 to 26,
just over ruff's max-complexity threshold. Hoist the locked critical
section into _file_chunks_locked() so the outer loop stays within
budget. No behavior change.

* style: ruff format mempalace/palace.py

Add blank lines after inline imports in mine_lock. Pure formatting.

* fix(normalize): make strip_noise verbatim-safe and scope it to Claude Code JSONL

The initial strip_noise() regressed on three fronts when audited against
adversarial user content — each verified with executable repros against
the cherry-picked code:

  1. `<tag>.*?</tag>` with re.DOTALL span-ate across messages: one
     stray unclosed <system-reminder> anywhere in a session merged with
     the next closing tag, silently deleting everything between them
     (including full assistant replies).
  2. `.*\(ctrl\+o to expand\).*\n?` nuked entire lines of user prose
     whenever a user happened to document the TUI shortcut.
  3. `Ran \d+ (?:stop|pre|post)\s*hook.*` with IGNORECASE ate the
     second sentence from "our CI has a stop hook ... Ran 2 stop hooks
     last week" — legitimate user commentary.

These are unambiguous violations of the project's "Verbatim always"
design principle.

Fixes:

- All tag patterns are now line-anchored (`(?m)^(?:> )?<tag>`) and their
  body forbids crossing a blank line (`(?:(?!\n\s*\n)[\s\S])*?`), so a
  dangling open tag cannot eat neighboring messages.
- `_NOISE_LINE_PREFIXES` are line-anchored and case-sensitive — user
  prose mentioning "CURRENT TIME:" mid-sentence is preserved.
- Hook-run chrome requires `(?m)^`, explicit hook names (Stop,
  PreCompact, PreToolUse, etc.), and no IGNORECASE.
- "… +N lines" is line-anchored.
- "(ctrl+o to expand)" only matches Claude Code's actual collapsed-
  output chrome shape `[N tokens] (ctrl+o to expand)`; a bare
  parenthetical in user prose stays intact.

Scope:

- `strip_noise()` is no longer called on every normalization path.
  Only `_try_claude_code_jsonl` invokes it, per-extracted-message — so
  Claude.ai exports, ChatGPT exports, Slack JSON, Codex JSONL, and
  plain text with `>` markers pass through fully verbatim. Per-message
  application also makes span-eating structurally impossible.

Tests:

- 15 new tests in test_normalize.py pin the boundary: 6 guard user
  content that must survive (each of the adversarial repros), 9 assert
  real system chrome is still stripped. All pass; full suite 702 pass
  (2 failures are the unrelated pre-existing version.py bug, cleared
  by #820).

Known limitation (not fixed here): convo_miner.py does not delete
drawers on re-mine, so transcripts mined before this PR keep noise-
filled drawers until the user manually erases + re-mines. Proper fix
needs a schema-version field on drawer metadata + re-mine trigger —
out of scope for this PR.

* feat(normalize): auto-rebuild stale drawers via NORMALIZE_VERSION schema gate

Without this, the strip_noise improvement only helps new mines. Every
user who had already mined Claude Code JSONL sessions would keep their
noise-polluted drawers forever, because convo_miner's file_already_mined
skip short-circuits before re-processing.

Adds a versioned schema gate so upgrades propagate silently:

- palace.NORMALIZE_VERSION=2 — bumped when the normalization pipeline
  changes shape (this PR's strip_noise is the v1→v2 bump).
- file_already_mined now returns False if the stored normalize_version
  is missing or less than current, triggering a rebuild on next mine.
- Both miners stamp drawers with the current normalize_version.
- convo_miner now purges stale drawers before inserting fresh chunks
  (mirrors miner.py's existing delete+insert), extracted into
  _file_convo_chunks helper to keep mine_convos under ruff's C901 limit.

User experience: upgrade mempalace, run `mempalace mine` as usual, old
noisy drawers get silently replaced with clean ones. No erase needed,
no "you need to rebuild" changelog footgun.

Tests:
- test_file_already_mined_returns_false_for_stale_normalize_version —
  pins the version gate contract for missing/v1/current.
- test_add_drawer_stamps_normalize_version — fresh project-miner drawers
  carry the field.
- test_mine_convos_rebuilds_stale_drawers_after_schema_bump — end-to-end
  proof that a pre-v2 palace gets silently cleaned on next mine, with
  orphan drawers purged and NOT skipped.

Existing test_file_already_mined_check_mtime updated to include the
new field; all other tests unaffected.

* fix: stop hooks from making agents write in chat — save tokens

The save hook and precompact hook were telling the agent to write
diary entries, add drawers, and add KG triples IN THE CHAT WINDOW.
Every line written stays in conversation history and retransmits on
every subsequent turn — ~$1/session in wasted tokens.

Fix: hooks now say "saved in background, no action needed" and use
decision: allow instead of block. The agent continues working without
interruption. All filing happens via the background pipeline.

Also updated hooks README with:
- Known limitation: hooks require session restart after install
- Updated cost section: zero tokens, background-only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use microsecond timestamp and full content hash in diary entry ID (#819)

* fix: remove unused import 'main' from mempalace/__init__.py

Removed the 'main' import from `mempalace/__init__.py` and updated
`pyproject.toml` to point the script entry point directly to
`mempalace.cli:main`. This ensures the CLI remains functional while
improving code hygiene.

Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>

* merge: full hardened stack + rewrite fact_checker around actual KG API

Merges the full hardened stack (up through #791 drawer-grep) and turns
fact_checker from "dead code hidden behind bare except" into an
actually-working offline contradiction detector with tests.

## Dead paths the PR body advertised but the code never executed

Both buried by a single outer ``except Exception: pass``:

  * ``kg.query(subject)`` — ``KnowledgeGraph`` has no ``query()`` method;
    it has ``query_entity()``. The attribute error was silently swallowed
    and the entire KG branch always returned ``[]``. Now using
    ``kg.query_entity(subject, direction="outgoing")`` with proper
    handling of the ``predicate``/``object``/``current``/``valid_to``
    fields the real API returns.
  * ``KnowledgeGraph(palace_path=palace_path)`` — the constructor's only
    kwarg is ``db_path``. Passing ``palace_path`` raised TypeError,
    silently swallowed. Now computing the db_path correctly from
    ``<palace>/knowledge_graph.sqlite3``, matching the convention the
    MCP server already uses.

## Contradiction logic rewritten

The previous ``if kg_pred in claim and fact.object not in claim`` only
fired when text used the SAME predicate word as the KG fact — the exact
opposite of the stated use case ("Bob is Alice's brother" when KG says
husband" would NOT have fired). Replaced with a proper parse → lookup
→ compare pipeline:

  * ``_extract_claims`` parses two surface forms ("X is Y's Z" and
    "X's Z is Y") into ``(subject, predicate, object)`` triples.
  * ``_check_kg_contradictions`` pulls the subject's outgoing facts
    and flags two classes:
      - ``relationship_mismatch`` when a current KG fact matches the
        same ``(subject, object)`` pair but with a different predicate.
      - ``stale_fact`` when the exact triple exists but is
        ``valid_to``-closed in the past.
  * Stale-fact detection is now implemented (the PR body claimed it;
    the old code silently didn't implement it).

## Performance fix — O(n²) → O(mentioned × n)

``_check_entity_confusion`` previously computed Levenshtein for every
pair of registered names on every ``check_text`` call. For 1,000
registered names that's ~500K edit-distance calls per hook invocation.
Now we first identify which registry names actually appear in the text
(single regex scan), then only compute edit distance between mentioned
and unmentioned names. Pinned by a test that asserts <200ms on a 500-
name registry with zero mentions.

Also: when *both* similar names are mentioned in the text, we no
longer flag them — the user clearly knows they're different people.

## Shared entity-registry loader

``mempalace/miner.py`` already had an mtime-cached loader for
``~/.mempalace/known_entities.json``. fact_checker had a duplicate
implementation that leaked file handles and ignored caching. Extended
miner's cache to expose both the flat set (``_load_known_entities``)
and the raw category dict (``_load_known_entities_raw``); fact_checker
now imports the latter. No more double disk reads, no more handle leak.

## Tests — 24 cases in tests/test_fact_checker.py

All three detection paths + both dead-code regressions:
  * ``test_kg_init_uses_db_path_not_palace_path_kwarg`` — pins the
    correct KG constructor signature so the ``palace_path=`` bug can't
    come back.
  * ``test_relationship_mismatch_detected`` — the headline example from
    the PR body now actually fires.
  * ``test_stale_fact_detected`` — valid_to-closed triple is flagged.
  * ``test_current_fact_same_triple_is_not_flagged`` — no false positive
    on a still-valid match.
  * ``test_performance_bounded_by_mentioned_names`` — 500-name registry,
    zero mentions, <200ms. Regression for the O(n²) blowup.
  * ``test_no_false_positive_when_both_names_mentioned`` — Mila and
    Milla in the same text is fine.
  * Plus claim extraction, flatten_names shapes, CLI exit code, empty
    text handling, missing-palace graceful fallback, registry-dict
    shape support.

785/785 suite pass. ruff + format clean on CI-pinned 0.4.x.

* Optimize entity detection with regex caching and pre-compilation

- Use functools.lru_cache to cache compiled patterns for entity names.
- Pre-compile static pronoun patterns into a single regex.
- Remove redundant .lower() calls in score_entity loop.

Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>

* docs: fix stale milla-jovovich org URLs in website and plugin manifests (#787)

Follow-up to #766 which covers version.py, pyproject.toml, README,
CHANGELOG, and CONTRIBUTING. These 11 files still had the old org
name in URLs:

- website/ (VitePress config + 6 docs pages)
- .claude-plugin/ (plugin.json repository, README marketplace command)
- .codex-plugin/ (plugin.json URLs, README links)

Author name fields are intentionally unchanged.

* test: make diary state path assertion platform-neutral

The Windows CI job failed on:

    assert '/.mempalace/state/' in str(state_path)

because Windows uses ``\`` as the path separator, so the substring
never matches. The behavior under test (state file lives outside the
diary dir, under ``~/.mempalace/state/``) is already correct on both
platforms — only the assertion was Unix-only.

Switch to ``state_path.parent`` comparisons that work on any OS.

* test: serialize mine_lock concurrency test with multiprocessing

The macOS CI job failed ``test_lock_blocks_concurrent_access`` because
``fcntl.flock`` on BSD/macOS is per-*process*, not per-FD: two threads
in the same process both acquire even when they open their own file
descriptors. The test passed on Linux (per-FD flock) and Windows
(per-FD ``msvcrt.locking``) but was never actually exercising the
lock's real contract.

``mine_lock`` is designed to serialize multi-*agent* access — i.e.,
separate processes, not threads. Switch the test to
``multiprocessing.get_context('spawn')`` with a module-level worker
(so the spawn pickles cleanly) so it:

  1. reflects the actual use case (one lock per mining process);
  2. passes on all three OSes without flock-semantics branching;
  3. catches real regressions (a broken lock would now let both
     processes through, exactly what we care about).

Hold time bumped to 0.3s and the "wait until p1 acquires" delay to
0.2s to tolerate spawn's higher startup latency on macOS/Windows.

* test: verify mine_lock via disjoint critical-section intervals

The previous revision used multiprocessing but still relied on timing
("second process waited at least N seconds") which flakes on CI where
spawn overhead eats into the hold window. Linux CI observed the second
process report a 0.088s wait — below the 0.1s threshold — even though
the lock behavior was correct; spawn was just slow enough that the
first process had nearly finished holding when the second got past
its own spawn.

Switch to effect-based verification: each worker logs its
[enter_time, exit_time] inside the critical section, and the test
asserts the two intervals are disjoint after sorting. A broken lock
would produce overlapping intervals regardless of spawn latency; a
working lock cannot.

Also removed the mp.Queue since we no longer pass timing data back.

* Fix: ruff format with CI-pinned version (0.4.x)

* fix: README audit — 42 TDD tests + hall detection + 7 claim fixes (#835)

* fix: README audit — match every claim to shipped code + add hall detection

TDD audit: wrote 42 tests verifying README claims against codebase.
Fixed all 7 failures:

1. Tool count: 19 → 29 (10 tools were undocumented)
2. Added tool table rows for tunnels, drawer management, system tools
3. Version badge: 3.1.0 → 3.2.0
4. dialect.py file reference: "30x lossless" → "AAAK index format for closet pointers"
5. Wake-up token cost: "~170 tokens" → "~600-900 tokens" (matches layers.py)
6. pyproject.toml version in project structure: v3.0.0 → v3.2.0
7. Hall detection: added detect_hall() to miner.py — drawers now tagged
   with hall metadata so palace_graph.py can build hall connections

New code:
- miner.py: detect_hall() — keyword scoring against config hall_keywords,
  writes hall field to every drawer's metadata
- tests/test_hall_detection.py — 12 TDD tests (written before code)
- tests/test_readme_claims.py — 42 TDD tests verifying README accuracy

859/859 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve ruff lint — unused imports and variables

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: ruff format with CI-pinned 0.4.x

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use conftest fixtures in hall tests for Windows compat

Windows CI fails with NotADirectoryError when ChromaDB tries to
write HNSW files in short-lived TemporaryDirectory. Use conftest
palace_path and tmp_dir fixtures instead — same pattern as all
other tests that touch ChromaDB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Igor's review — convo_miner halls, cached config, markdown typo

TDD: wrote tests for convo_miner hall metadata and config caching
BEFORE verifying the code changes.

1. README markdown typo: extra ** in wake-up token row (line 195)
2. convo_miner.py: added _detect_hall_cached() — conversation
   drawers now get hall metadata (was missing, Igor caught it)
3. miner.py + convo_miner.py: cached hall_keywords at module level
   so config.json isn't re-read per drawer during bulk mine
4. New tests: TestConvoMinerWritesHalls, TestDetectHallCaching

861/861 tests pass. ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(website): update vitepress base url for custom domain

* chore(release): bump version strings to 3.3.0 and curate CHANGELOG

Prepare develop for the 3.3.0 release cycle.

Version bumps:
- mempalace/version.py: 3.2.0 -> 3.3.0
- pyproject.toml: 3.2.0 -> 3.3.0
- README.md: pyproject.toml label and shields.io badge
- uv.lock: mempalace 3.0.0 -> 3.3.0 (also fills in resolved dev/extras)

CHANGELOG.md:
- Close out the stale [Unreleased] section as [3.2.0] - 2026-04-12
  (v3.2.0 was tagged on that date but the release flip was never made)
- Add a fresh [Unreleased] - v3.3.0 section covering the 49 commits
  since v3.2.0: closet layer, BM25 hybrid search, entity metadata,
  diary ingest, cross-wing tunnels, drawer-grep, offline fact checker,
  LLM-based closet regen, hall detection, cosine-distance fix,
  multi-agent locking, README audit, etc.
- Adopt Keep a Changelog + SemVer framing
- Add version compare reference links at the bottom
- Fix stale milla-jovovich/mempalace preamble URL to MemPalace/mempalace

---------

Co-authored-by: MSL <232237854+milla-jovovich@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: eblander <eblander@foundrydigital.com>
Co-authored-by: shafdev <96260000+shafdev@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: mvalentsev <michael@valentsev.ru>
Co-authored-by: Dominique Deschatre <43499065+domiscd@users.noreply.github.com>
2026-04-13 18:25:01 -07:00

738 lines
30 KiB
Python

#!/usr/bin/env python3
"""
test_readme_claims.py — TDD verification of every major README claim against actual code.
Each test verifies a specific claim made in README.md. If a test fails, either
the README is wrong or the code hasn't shipped the feature yet. Fix one or the
other until all tests pass — that's when the README matches reality.
Based on the audit at ~/Desktop/readme_audit.md (2026-04-13).
"""
import importlib
import re
from pathlib import Path
import pytest
# ---------------------------------------------------------------------------
# Helpers — locate repo root and parse README / source files
# ---------------------------------------------------------------------------
REPO_ROOT = Path(__file__).resolve().parent.parent
MEMPALACE_PKG = REPO_ROOT / "mempalace"
README_PATH = REPO_ROOT / "README.md"
def _read(path: Path) -> str:
return path.read_text(encoding="utf-8", errors="replace")
def _readme() -> str:
return _read(README_PATH)
def _tools_dict_keys() -> list:
"""Return the list of tool names registered in the TOOLS dict."""
# Import the module-level TOOLS dict. We can't just import mcp_server
# because it calls chromadb on import, so we parse the source instead.
src = _read(MEMPALACE_PKG / "mcp_server.py")
return re.findall(r'"(mempalace_\w+)":\s*\{', src)
def _readme_tool_table_names() -> list:
"""Return tool names listed in the README's MCP tool table."""
readme = _readme()
return re.findall(r"^\| `(mempalace_\w+)`", readme, re.MULTILINE)
# ---------------------------------------------------------------------------
# 1. Tool count — README says 19, verify actual count
# ---------------------------------------------------------------------------
class TestToolCount:
"""README claims '19 tools available through MCP' in multiple places."""
def test_readme_tool_count_matches_code(self):
"""Claim: README says 19 tools. Actual TOOLS dict may differ.
This test asserts the REAL tool count so the README can be updated.
If TOOLS has 25 entries, the README should say 25, not 19.
"""
actual_count = len(_tools_dict_keys())
readme = _readme()
# Find all "19 tools" claims in README
claimed_counts = re.findall(r"(\d+)\s+tools", readme)
for claimed in claimed_counts:
assert int(claimed) == actual_count, (
f"README claims {claimed} tools but TOOLS dict has {actual_count}. "
f"Update every occurrence of '{claimed} tools' to '{actual_count} tools'."
)
# ---------------------------------------------------------------------------
# 2. Every tool listed in README actually exists in TOOLS dict
# ---------------------------------------------------------------------------
class TestReadmeToolsExistInCode:
"""Every tool name in the README tool table must be a key in TOOLS."""
def test_every_readme_tool_exists_in_tools_dict(self):
"""Claim: README lists tools like mempalace_get_aaak_spec.
Each one must actually be registered in the TOOLS dict."""
code_tools = set(_tools_dict_keys())
readme_tools = _readme_tool_table_names()
assert len(readme_tools) > 0, "Could not parse any tools from README table"
missing = [t for t in readme_tools if t not in code_tools]
assert missing == [], (
f"README lists tools that don't exist in TOOLS dict: {missing}. "
f"Either add them to mcp_server.py or remove them from README."
)
# ---------------------------------------------------------------------------
# 3. No tool in TOOLS dict is missing from README's tool table
# ---------------------------------------------------------------------------
class TestNoUnlistedTools:
"""Every tool in the TOOLS dict should be documented in the README."""
def test_no_undocumented_tools(self):
"""Claim: README's tool table is complete.
Any tool in TOOLS but not in README is undocumented."""
code_tools = set(_tools_dict_keys())
readme_tools = set(_readme_tool_table_names())
undocumented = sorted(code_tools - readme_tools)
assert undocumented == [], (
f"Tools in TOOLS dict but missing from README: {undocumented}. "
f"Add rows for these to the tool table in README.md."
)
# ---------------------------------------------------------------------------
# 4. Closets collection exists — palace.py has get_closets_collection()
# ---------------------------------------------------------------------------
class TestClosetsExist:
"""README describes closets as a core architectural feature."""
def test_get_closets_collection_exists(self):
"""Claim: closets are a shipped feature.
palace.py must export get_closets_collection()."""
src = _read(MEMPALACE_PKG / "palace.py")
assert "def get_closets_collection(" in src, (
"palace.py does not define get_closets_collection(). "
"Closets are described in README but the collection function is missing."
)
def test_closets_importable(self):
"""get_closets_collection should be importable from mempalace.palace."""
from mempalace.palace import get_closets_collection
assert callable(get_closets_collection)
# ---------------------------------------------------------------------------
# 5. Closet-first search exists in searcher.py
# ---------------------------------------------------------------------------
class TestClosetFirstSearch:
"""README implies search goes through closets, not just direct drawer query."""
def test_closet_boost_search_exists(self):
"""Claim: search uses closets as a boost signal.
searcher.py must have CLOSET_RANK_BOOSTS and query closets_col."""
src = _read(MEMPALACE_PKG / "searcher.py")
assert "CLOSET_RANK_BOOSTS" in src, (
"searcher.py has no closet boost logic. "
"README describes closet-based search but searcher.py has no closet ranking."
)
def test_searcher_imports_closets(self):
"""searcher.py must import get_closets_collection to use closets."""
src = _read(MEMPALACE_PKG / "searcher.py")
assert "get_closets_collection" in src, (
"searcher.py does not reference get_closets_collection. "
"Closet-first search can't work without the closets collection."
)
# ---------------------------------------------------------------------------
# 6. BM25 hybrid search functions exist
# ---------------------------------------------------------------------------
class TestBM25HybridSearch:
"""README claims 'BM25 hybrid search'. Verify the functions exist."""
def test_bm25_in_searcher(self):
"""Claim: BM25 hybrid search is shipped.
searcher.py must have BM25 scoring or hybrid ranking logic."""
src = _read(MEMPALACE_PKG / "searcher.py")
has_bm25 = any(
term in src.lower()
for term in [
"bm25",
"_bm25_score",
"_hybrid_rank",
"hybrid_search",
"bm25_score",
"rank_bm25",
]
)
assert has_bm25, (
"searcher.py has no BM25 or hybrid search function. "
"README claims BM25 hybrid search but it's not in the code."
)
# ---------------------------------------------------------------------------
# 7. Entity metadata extraction exists in miner.py
# ---------------------------------------------------------------------------
class TestEntityMetadataExtraction:
"""README implies entity extraction populates drawer/closet metadata."""
def test_entity_extraction_in_palace_or_miner(self):
"""Claim: entity extraction is part of the mining pipeline.
Either miner.py or palace.py must extract entities."""
miner_src = _read(MEMPALACE_PKG / "miner.py")
palace_src = _read(MEMPALACE_PKG / "palace.py")
# Entity extraction can be in either file — palace.py has it for closets
has_entity_extraction = (
"entities" in palace_src and "_ENTITY_STOPLIST" in palace_src
) or "extract_entities" in miner_src
assert has_entity_extraction, (
"No entity extraction found in miner.py or palace.py. "
"README implies entities are extracted during mining."
)
# ---------------------------------------------------------------------------
# 8. strip_noise function exists in normalize.py
# ---------------------------------------------------------------------------
class TestStripNoise:
"""normalize.py should have strip_noise() for cleaning input text."""
def test_strip_noise_exists(self):
"""Claim: normalize.py has noise stripping.
Function strip_noise must exist."""
src = _read(MEMPALACE_PKG / "normalize.py")
assert "def strip_noise(" in src, (
"normalize.py does not define strip_noise(). "
"This function is referenced in the normalization pipeline."
)
def test_strip_noise_importable(self):
"""strip_noise should be importable from mempalace.normalize."""
from mempalace.normalize import strip_noise
assert callable(strip_noise)
# ---------------------------------------------------------------------------
# 9. diary_ingest.py module exists and is importable
# ---------------------------------------------------------------------------
class TestDiaryIngest:
"""README describes diary ingest (day-based). Module must exist."""
def test_diary_ingest_module_exists(self):
"""Claim: diary_ingest.py is a shipped module.
File must exist at mempalace/diary_ingest.py."""
path = MEMPALACE_PKG / "diary_ingest.py"
assert path.is_file(), (
"mempalace/diary_ingest.py does not exist. "
"README describes diary ingest but the module is missing (still in an unmerged PR?)."
)
def test_diary_ingest_importable(self):
"""diary_ingest should be importable."""
try:
importlib.import_module("mempalace.diary_ingest")
except ImportError:
pytest.fail(
"mempalace.diary_ingest is not importable. Module must exist and import cleanly."
)
# ---------------------------------------------------------------------------
# 10. fact_checker.py module exists and is importable
# ---------------------------------------------------------------------------
class TestFactChecker:
"""README has a 'Contradiction detection' section implying fact_checker.py."""
def test_fact_checker_module_exists(self):
"""Claim: contradiction detection is shipped.
fact_checker.py must exist at mempalace/fact_checker.py."""
path = MEMPALACE_PKG / "fact_checker.py"
assert path.is_file(), (
"mempalace/fact_checker.py does not exist. "
"README describes contradiction detection but the module is missing."
)
def test_fact_checker_importable(self):
"""fact_checker should be importable."""
try:
importlib.import_module("mempalace.fact_checker")
except ImportError:
pytest.fail(
"mempalace.fact_checker is not importable. Module must exist and import cleanly."
)
# ---------------------------------------------------------------------------
# 11. Tunnel functions exist in palace_graph.py
# ---------------------------------------------------------------------------
class TestTunnelFunctions:
"""README describes tunnels — connections between wings."""
def test_find_tunnels_exists(self):
"""Claim: tunnels connect rooms across wings.
palace_graph.py must have find_tunnels()."""
src = _read(MEMPALACE_PKG / "palace_graph.py")
assert "def find_tunnels(" in src, (
"palace_graph.py has no find_tunnels() function. "
"README describes tunnels but the function is missing."
)
def test_traverse_exists(self):
"""Claim: you can walk the palace graph.
palace_graph.py must have traverse()."""
src = _read(MEMPALACE_PKG / "palace_graph.py")
assert "def traverse(" in src, "palace_graph.py has no traverse() function."
def test_graph_stats_exists(self):
"""palace_graph.py must have graph_stats()."""
src = _read(MEMPALACE_PKG / "palace_graph.py")
assert "def graph_stats(" in src, "palace_graph.py has no graph_stats() function."
def test_tunnel_functions_importable(self):
"""find_tunnels, traverse, graph_stats should be importable."""
from mempalace.palace_graph import find_tunnels, traverse, graph_stats
assert callable(find_tunnels)
assert callable(traverse)
assert callable(graph_stats)
# ---------------------------------------------------------------------------
# 12. closet_llm.py module exists and is importable
# ---------------------------------------------------------------------------
class TestClosetLLM:
"""README describes LLM-based closet regeneration. Module must exist."""
def test_closet_llm_module_exists(self):
"""Claim: LLM-based closet regen is shipped.
closet_llm.py must exist at mempalace/closet_llm.py."""
path = MEMPALACE_PKG / "closet_llm.py"
assert path.is_file(), (
"mempalace/closet_llm.py does not exist. "
"README describes LLM closet regeneration but the module is missing."
)
def test_closet_llm_importable(self):
"""closet_llm should be importable."""
try:
importlib.import_module("mempalace.closet_llm")
except ImportError:
pytest.fail(
"mempalace.closet_llm is not importable. Module must exist and import cleanly."
)
# ---------------------------------------------------------------------------
# 13. mine_lock exists in palace.py
# ---------------------------------------------------------------------------
class TestMineLock:
"""Multi-agent file locking must be shipped (PR #784 was merged)."""
def test_mine_lock_exists(self):
"""Claim: multi-agent file locking is shipped.
palace.py must define mine_lock."""
src = _read(MEMPALACE_PKG / "palace.py")
assert "def mine_lock(" in src, (
"palace.py does not define mine_lock(). "
"Multi-agent locking is claimed as shipped but function is missing."
)
def test_mine_lock_importable(self):
"""mine_lock should be importable from mempalace.palace."""
from mempalace.palace import mine_lock
assert callable(mine_lock)
def test_mine_lock_is_context_manager(self):
"""mine_lock should be a context manager (used with `with` statement)."""
src = _read(MEMPALACE_PKG / "palace.py")
# It should be decorated with @contextlib.contextmanager or similar
# Find the mine_lock definition and check for context manager pattern
assert "@contextlib.contextmanager" in src or "def __enter__" in src, (
"mine_lock does not appear to be a context manager. "
"It should be usable with `with mine_lock(path):` syntax."
)
# ---------------------------------------------------------------------------
# 14. Version in version.py matches pyproject.toml
# ---------------------------------------------------------------------------
class TestVersionConsistency:
"""version.py and pyproject.toml must agree on the version string."""
def test_version_py_matches_pyproject(self):
"""Claim: single source of truth for version.
version.py __version__ must match pyproject.toml version."""
version_src = _read(MEMPALACE_PKG / "version.py")
version_match = re.search(r'__version__\s*=\s*"([^"]+)"', version_src)
assert version_match, "Could not parse __version__ from version.py"
code_version = version_match.group(1)
pyproject_src = _read(REPO_ROOT / "pyproject.toml")
pyproject_match = re.search(r'^version\s*=\s*"([^"]+)"', pyproject_src, re.MULTILINE)
assert pyproject_match, "Could not parse version from pyproject.toml"
toml_version = pyproject_match.group(1)
assert code_version == toml_version, (
f"version.py says {code_version} but pyproject.toml says {toml_version}. "
f"These must match."
)
# ---------------------------------------------------------------------------
# 15. Version badge URL in README matches version.py
# ---------------------------------------------------------------------------
class TestVersionBadge:
"""README version badge must show the current version, not a stale one."""
def test_readme_badge_matches_version_py(self):
"""Claim: README badge shows current version.
The shields.io badge URL must contain the version from version.py."""
version_src = _read(MEMPALACE_PKG / "version.py")
version_match = re.search(r'__version__\s*=\s*"([^"]+)"', version_src)
assert version_match, "Could not parse __version__ from version.py"
code_version = version_match.group(1)
readme = _readme()
# Find the version badge URL
badge_match = re.search(r"shields\.io/badge/version-([^-]+)-", readme)
assert badge_match, "Could not find version badge URL in README"
badge_version = badge_match.group(1)
assert badge_version == code_version, (
f"README badge says {badge_version} but version.py says {code_version}. "
f"Update the badge URL in README.md."
)
# ---------------------------------------------------------------------------
# 16. dialect.py docstring does NOT say "lossless"
# ---------------------------------------------------------------------------
class TestDialectNotLossless:
"""The April 7 correction: AAAK is lossy, not lossless."""
def test_dialect_docstring_says_not_lossless(self):
"""Claim: dialect.py correctly says AAAK is NOT lossless.
The docstring must contain 'NOT lossless' or 'lossy'."""
src = _read(MEMPALACE_PKG / "dialect.py")
# Check the module docstring (first ~20 lines)
docstring_area = src[:1000]
assert "NOT lossless" in docstring_area or "lossy" in docstring_area.lower(), (
"dialect.py docstring does not disclaim losslessness. "
"After the April 7 correction, it must say AAAK is NOT lossless."
)
def test_dialect_docstring_does_not_claim_lossless(self):
"""The docstring must not positively claim 'lossless compression'."""
src = _read(MEMPALACE_PKG / "dialect.py")
docstring_area = src[:1000]
# "NOT lossless" is OK; bare "lossless" without negation is not
# Remove the "NOT lossless" disclaimer before checking
cleaned = docstring_area.replace("NOT lossless", "")
assert "lossless" not in cleaned.lower(), (
"dialect.py docstring still claims 'lossless' somewhere. "
"AAAK is lossy — remove any positive lossless claims."
)
# ---------------------------------------------------------------------------
# 17. README file reference table for dialect.py does NOT say "lossless"
# ---------------------------------------------------------------------------
class TestReadmeDialectNotLossless:
"""README's file reference table must not say dialect.py is lossless."""
def test_readme_dialect_line_not_lossless(self):
"""Claim: April 7 correction applied to README file table.
The dialect.py row must not say 'lossless'."""
readme = _readme()
# Find the line with dialect.py in the file reference table
dialect_lines = [
line for line in readme.splitlines() if "dialect.py" in line and "|" in line
]
assert len(dialect_lines) > 0, "Could not find dialect.py in README file table"
for line in dialect_lines:
assert "lossless" not in line.lower(), (
f"README file table still says dialect.py is lossless: {line.strip()!r}. "
f"After April 7 correction, this must say 'lossy' or remove the lossless claim."
)
# ---------------------------------------------------------------------------
# 18. Hall keywords in config.py — verify miners actually WRITE hall metadata
# ---------------------------------------------------------------------------
class TestHallMetadata:
"""README describes 5 hall types. Miners must actually write hall metadata."""
def test_hall_keywords_defined_in_config(self):
"""Prerequisite: DEFAULT_HALL_KEYWORDS must exist in config.py."""
src = _read(MEMPALACE_PKG / "config.py")
assert "DEFAULT_HALL_KEYWORDS" in src, (
"config.py does not define DEFAULT_HALL_KEYWORDS. "
"Hall types are described in README but not defined in config."
)
def test_miners_write_hall_metadata(self):
"""Claim: halls are populated. At least one miner must write a 'hall'
field into drawer metadata.
If no miner writes hall metadata, the halls described in README are
a schema ghost — defined but never populated."""
miner_src = _read(MEMPALACE_PKG / "miner.py")
convo_miner_src = _read(MEMPALACE_PKG / "convo_miner.py")
# Check if either miner references 'hall' in the metadata it writes
writes_hall = (
'"hall"' in miner_src
or "'hall'" in miner_src
or '"hall"' in convo_miner_src
or "'hall'" in convo_miner_src
)
assert writes_hall, (
"Neither miner.py nor convo_miner.py writes a 'hall' field to drawer metadata. "
"README describes 5 hall types (hall_facts, hall_events, hall_discoveries, "
"hall_preferences, hall_advice) but no mining code populates them. "
"Halls are a schema ghost — defined in config, read by palace_graph, "
"but never written by any pipeline."
)
def test_readme_hall_types_match_config(self):
"""If README lists specific hall names, they should appear in config."""
# README mentions these 5 halls
readme_halls = [
"hall_facts",
"hall_events",
"hall_discoveries",
"hall_preferences",
"hall_advice",
]
for hall in readme_halls:
# These should either be in config or README should not list them
# The hall_ prefix is a README convention; config uses keyword groups
# like "emotions", "consciousness" etc. Check if they're consistent.
pass # This is a documentation check; the real test is #18b above
# ---------------------------------------------------------------------------
# 19. Backend abstraction exists
# ---------------------------------------------------------------------------
class TestBackendAbstraction:
"""Backend seam for pluggable storage backends."""
def test_backends_base_exists(self):
"""Claim: pluggable backends.
backends/base.py must define an abstract base class."""
path = MEMPALACE_PKG / "backends" / "base.py"
assert (
path.is_file()
), "mempalace/backends/base.py does not exist. Backend abstraction layer is missing."
src = _read(path)
assert (
"ABC" in src or "abstractmethod" in src
), "backends/base.py does not define an abstract base class."
def test_backends_chroma_exists(self):
"""Claim: ChromaDB backend implementation.
backends/chroma.py must exist and subclass the base."""
path = MEMPALACE_PKG / "backends" / "chroma.py"
assert path.is_file(), "mempalace/backends/chroma.py does not exist."
src = _read(path)
assert (
"BaseCollection" in src or "base" in src
), "backends/chroma.py does not reference the base class."
def test_backends_importable(self):
"""Both backend modules should be importable."""
from mempalace.backends.base import BaseCollection
from mempalace.backends.chroma import ChromaBackend
assert BaseCollection is not None
assert ChromaBackend is not None
# ---------------------------------------------------------------------------
# 20. i18n module exists with at least 8 language files
# ---------------------------------------------------------------------------
class TestI18n:
"""i18n support — 8 languages."""
def test_i18n_directory_exists(self):
"""i18n directory must exist."""
path = MEMPALACE_PKG / "i18n"
assert path.is_dir(), "mempalace/i18n/ directory does not exist."
def test_at_least_8_language_files(self):
"""Claim: 8 languages supported.
i18n/ must contain at least 8 .json language files."""
path = MEMPALACE_PKG / "i18n"
json_files = list(path.glob("*.json"))
assert len(json_files) >= 8, (
f"i18n/ has only {len(json_files)} language files, expected >= 8. "
f"Files found: {[f.name for f in json_files]}"
)
def test_english_baseline_exists(self):
"""en.json must exist as the baseline language file."""
path = MEMPALACE_PKG / "i18n" / "en.json"
assert (
path.is_file()
), "mempalace/i18n/en.json does not exist. English baseline is required."
# ---------------------------------------------------------------------------
# 21. Wake-up token cost — check layers.py vs README's "~170 tokens"
# ---------------------------------------------------------------------------
class TestWakeUpTokenCost:
"""README claims '~170 tokens' for wake-up. layers.py says otherwise."""
def test_readme_wakeup_cost_matches_layers(self):
"""Claim: README says ~170 tokens for wake-up.
layers.py docstring says L0 ~100 tokens, L1 ~500-800 tokens.
Total = 600-900, not 170.
If the README means '170 tokens of critical facts' (just the AAAK
portion), it should say so clearly. If it means total wake-up cost,
it must match layers.py."""
readme = _readme()
layers_src = _read(MEMPALACE_PKG / "layers.py")
# What layers.py says
assert "~600-900 tokens" in layers_src or "600-900" in layers_src, (
"layers.py docstring does not mention 600-900 tokens. "
"Check if the wake-up cost documentation has changed."
)
# What README says
readme_170_claims = re.findall(r"~?170 tokens", readme)
if readme_170_claims:
# README claims 170 tokens but layers.py says 600-900.
# This test enforces that README must match the code.
# Either README should say 600-900 or layers.py should say 170.
# Since we trust code over docs, the README is wrong.
pytest.fail(
f"README claims '~170 tokens' for wake-up ({len(readme_170_claims)} occurrences) "
f"but layers.py says L0+L1 = ~600-900 tokens. "
f"Either update README to match layers.py, or clarify that '170 tokens' "
f"refers to a specific subset (e.g., AAAK-compressed facts only)."
)
# ---------------------------------------------------------------------------
# Bonus: pyproject.toml version in README project structure
# ---------------------------------------------------------------------------
class TestReadmeProjectStructureVersion:
"""README's project structure section says pyproject.toml version."""
def test_readme_pyproject_version_claim(self):
"""Claim: README says 'pyproject.toml — package config (v3.0.0)' or similar.
Must match actual pyproject.toml version."""
readme = _readme()
pyproject_src = _read(REPO_ROOT / "pyproject.toml")
pyproject_match = re.search(r'^version\s*=\s*"([^"]+)"', pyproject_src, re.MULTILINE)
assert pyproject_match, "Could not parse version from pyproject.toml"
actual_version = pyproject_match.group(1)
# Find any version claim near pyproject.toml in README
version_in_readme = re.search(r"pyproject\.toml.*?v?([\d]+\.[\d]+\.[\d]+)", readme)
if version_in_readme:
readme_version = version_in_readme.group(1)
assert readme_version == actual_version, (
f"README says pyproject.toml is v{readme_version} "
f"but actual version is {actual_version}."
)
# ---------------------------------------------------------------------------
# Bonus: README tool count consistency (all mentions must agree)
# ---------------------------------------------------------------------------
class TestReadmeToolCountConsistency:
"""README mentions tool count in multiple places — they must all agree."""
def test_all_tool_count_mentions_consistent(self):
"""Every place README says 'N tools' must use the same number."""
readme = _readme()
counts = re.findall(r"(\d+)\s+tools", readme)
if len(counts) > 1:
unique = set(counts)
assert (
len(unique) == 1
), f"README mentions different tool counts: {counts}. All occurrences must agree."
# ---------------------------------------------------------------------------
# Bonus: get_aaak_spec tool handler exists
# ---------------------------------------------------------------------------
class TestAAAKSpecToolHandler:
"""If mempalace_get_aaak_spec is in TOOLS, its handler must exist."""
def test_aaak_spec_handler_exists(self):
"""The handler function for get_aaak_spec must be defined."""
src = _read(MEMPALACE_PKG / "mcp_server.py")
tools = _tools_dict_keys()
if "mempalace_get_aaak_spec" in tools:
assert "def tool_get_aaak_spec(" in src, (
"mempalace_get_aaak_spec is in TOOLS dict but "
"tool_get_aaak_spec() handler function is not defined."
)