* fix: return "general" room from process_file error paths (#586)
process_file() returned (0, None) for already-mined, unreadable, and
too-short files. In --dry-run mode the caller always enters the
room_counts branch, so None ended up as a dict key and crashed the
summary printer with "unsupported format string passed to
NoneType.__format__".
Returning "general" instead of None makes the function contract
explicit: it always yields (int, str). This matches the consensus
fix discussed in the issue thread.
* style: apply ruff format to test_miner.py
* fix: skip arg whitelist for handlers accepting **kwargs (#572)
The schema-based argument filter (from #647) strips all kwargs not
declared in input_schema. This breaks handlers that accept **kwargs
for pass-through to ChromaDB or other backends.
Add inspect.Parameter.VAR_KEYWORD check before filtering — handlers
with **kwargs receive all arguments unfiltered.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: guard inspect.signature failure, default to filtering
Wrap inspect.signature() in try/except — on failure, default to
filtering (safe fallback). Addresses Copilot feedback on fragility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: allow Unicode in sanitize_name() — Latvian, CJK, Cyrillic names (#637)
_SAFE_NAME_RE was ASCII-only ([a-zA-Z0-9]), rejecting valid Unicode
names like "Jānis" or "太郎". Changed to \w which matches Unicode
word characters (letters, digits, underscore) in Python 3.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: tighten Unicode regex, add sanitize_name tests
Use [^\W_] for first/last char to allow Unicode letters/digits but
reject leading/trailing underscores (Copilot feedback). Add 7 tests
covering Latvian, CJK, Cyrillic, path traversal, and edge cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
AI agents calling `mempalace init` hang on the interactive confirmation
prompt. The --yes flag skips it, enabling automated setup.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused `json` and `current_lang` imports from
mempalace/i18n/test_i18n.py (F401)
- Reformat Dialect.__init__ signature in mempalace/dialect.py
(ruff format collapses multi-line signature, adds blank line
after lazy import)
Both auto-fixes from `ruff check --fix` / `ruff format`. No behavioral
changes.
Add language dictionaries: English, French, Korean, Japanese, Spanish,
German, Simplified Chinese, Traditional Chinese.
Each language is a single JSON file with:
- Localized terms (palace, wing, closet, drawer, etc.)
- CLI output strings with {var} interpolation
- AAAK compression instructions in that language
- Regex patterns for offline topic/quote/action extraction
Usage: Dialect(lang="ko") or set "language": "ko" in config.
Contributors can add new languages by copying en.json and translating.
Dialect class now accepts lang param and loads AAAK instruction +
regex patterns from the i18n dictionary automatically.
Tests: mempalace/i18n/test_i18n.py — all 8 languages pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The current constraint `chromadb>=0.5.0,<0.7` forces pip to install
chromadb 0.6.x, but palaces created with chromadb 1.x (which is what
the mempalace dev environment actually uses — 1.5.7 per uv.lock) have
an incompatible SQLite schema. Specifically, chromadb 0.6.x fails with
`KeyError: '_type'` when opening a collection written by 1.x.
This means a fresh `pip install mempalace` gives users a chromadb
version that cannot read palaces created in the maintainer's own
environment. The fix removes the upper bound so pip can resolve to the
current stable chromadb release.
Reproduction:
python3 -m venv .venv && source .venv/bin/activate
pip install mempalace # installs chromadb 0.6.3
# Try opening a palace created with chromadb 1.x:
# -> _get_collection() returns None, tool_status() returns "No palace found"
pip install chromadb==1.5.7 # force upgrade
# -> tool_status() returns real data (26k drawers in our case)
The repo moved to the MemPalace org but several docs still point at the
old milla-jovovich URLs. Also, CONTRIBUTING.md tells people to PR
against main while the actual workflow (per ROADMAP.md) targets develop.
Files touched:
- CONTRIBUTING.md: clone URL, issues URL, PR target branch
- examples/gemini_cli_setup.md: clone URL
- integrations/openclaw/SKILL.md: homepage and license URLs
The `_load_api_key()` function in longmemeval_bench.py and locomo_bench.py
searched for API keys in a fixed path (`~/.config/lu/keys.json`) using
personal key names (`anthropic_milla`, `anthropic_claude_code_main`).
This leaks internal infrastructure details into the public codebase and
trains contributors to store credentials in a non-standard location
rather than using the standard ANTHROPIC_API_KEY env var.
Simplified to: CLI flag > env var > empty string. Updated help text
and HYBRID_MODE.md docs to match.
Co-authored-by: Tadao <tadao@travisfixes.com>
Path("~/foo") does not expand tilde on its own, causing
`mempalace split ~/some/dir` to silently find no files.
Fix by calling .expanduser().resolve() in both places the
path is constructed: cmd_split in cli.py (defensive, at the
CLI boundary) and main() in split_mega_files.py (the root cause).
Co-authored-by: Brooke Whatnall <brookewhatnall@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The module-level `ssl._create_default_https_context = ssl._create_unverified_context`
disables certificate verification for ALL urllib requests in the process,
not just the benchmark's HuggingFace downloads. This silently exposes
the benchmark runner to MITM attacks.
If a specific environment needs to skip verification (e.g. corporate proxy),
users can set `PYTHONHTTPSVERIFY=0` or pass a custom ssl context per-request
rather than globally patching the ssl module.
Co-authored-by: Tadao <tadao@travisfixes.com>
Replace "your memory system" with explicit MemPalace references and
tool names (mempalace_diary_write, mempalace_add_drawer, mempalace_kg_add)
in stop and precompact hook block reasons. This prevents Claude Code from
misinterpreting the hook as a native auto-memory save instruction.
Updated in both Python (hooks_cli.py) and standalone shell scripts.
Also fix CONTRIBUTING.md Getting Started to show the fork-first workflow,
matching the PR Guidelines section.
Note from code review: (1) silent exception swallow on migration failure means caller proceeds with potentially corrupt DB — consider returning a boolean or re-raising in a follow-up. (2) No blob length validation before int.from_bytes — malformed rows could produce wrong seq_id values. Both are edge cases; the fix is still valuable for the common chromadb 0.6→1.5 migration path.
ORT_DISABLE_COREML is not a recognized ONNX Runtime environment
variable. ONNX Runtime does not expose a global env var to disable
individual execution providers -- providers are selected per session
via the providers argument to InferenceSession. Setting it had zero
effect.
The mitigation was added in df33550 (v3.1.0) with the stated goal of
fixing the #74 ARM64 segfault. Two problems: the env var doesn't
work as described above, and #74 is a null-pointer crash in
chromadb_rust_bindings.abi3.so -- not an ONNX issue, so disabling
CoreML would not have fixed it anyway.
#521 has since traced the actual macOS arm64 crashes (both mine and
search) to the 0.x chromadb hnswlib binding. Filtering
CoreMLExecutionProvider at the ONNX layer leaves the hnswlib C++
crash intact, so the real fix is upgrading chromadb to 1.5.4+,
which #581 proposes. This PR only removes the misleading no-op and
leaves a NOTE pointing at #521 / #581.
Closes#397
* refactor: add stage-1 backend abstraction seam
Introduce the first upstreamable storage seam for MemPalace without
bringing in the PostgreSQL spike or any benchmark artifacts.
This change adds a small backend package with:
- BaseCollection as the minimal collection contract
- ChromaBackend/ChromaCollection as the default implementation
It then routes the main runtime collection consumers through that seam:
- palace.py
- searcher.py
- layers.py
- palace_graph.py
- mcp_server.py
- miner.status()
Behavioral constraints kept for stage 1:
- ChromaDB remains the only backend and the default path
- no config/env backend selection yet
- no PostgreSQL code
- no benchmark or research files
- existing tests stay unchanged
Important compatibility details:
- read paths now call the seam with create=False so they still surface
the existing 'no palace found' behavior instead of silently creating
empty collections
- write paths keep create=True semantics through palace.get_collection()
- layers/searcher retain a chromadb module attribute so the existing
mock-based tests can keep patching PersistentClient unchanged
- ChromaBackend only creates palace directories on create=True, which
preserves mocked read-path tests that use fake read-only paths
Verification:
- python3 -m py_compile mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py
- pytest -q # 529 passed, 106 deselected
* refactor: clean up stage-1 seam compatibility shims
Tighten the stage-1 backend abstraction branch after review.
This follow-up does three small things:
- keep the chromadb compatibility hook in searcher.py and layers.py,
but express it through the backends.chroma module so it no longer
reads like an accidental unused import
- fix the palace_graph.py helper alias to avoid the local name collision
flagged by ruff (imported helper vs local _get_collection wrapper)
- preserve the existing mock-based test patch points unchanged while
keeping the new backend seam intact
Why this matters:
- the direct form looked like a
dead import in review, even though it was intentionally preserving the
existing test seam ( and
)
- palace_graph.py had a real lint issue ( redefinition) that was
small but worth fixing before a public PR
Verification:
- /opt/homebrew/bin/ruff check mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py
- pytest -q tests/test_layers.py tests/test_searcher.py
- pytest -q # 529 passed, 106 deselected
* docs: explain backend shim imports in search paths
Add short code comments in searcher.py and layers.py explaining why the
module-level `chromadb` alias remains after the stage-1 backend seam
refactor.
The alias is intentional: it preserves the existing mock patch points used
by the current test suite (`mempalace.searcher.chromadb.PersistentClient`
and `mempalace.layers.chromadb.PersistentClient`) while the runtime logic
now flows through the backend abstraction.
This keeps the public PR easier to review because the apparent "unused
import" now has an explicit reason next to it.
Verification:
- /opt/homebrew/bin/ruff check mempalace/searcher.py mempalace/layers.py
- pytest -q tests/test_layers.py tests/test_searcher.py
* refactor: reuse a default backend instance in palace helper
Tighten the stage-1 backend seam by promoting the default Chroma backend
adapter to a module-level singleton in `mempalace/palace.py`.
This keeps the stage-1 scope unchanged — Chroma is still the only backend
wired in this branch — but avoids constructing a fresh `ChromaBackend()`
object on every `get_collection()` call. The backend is stateless today,
so this is a readability/cleanup change rather than a behavioral one.
Why this helps:
- makes `palace.get_collection()` read like a real default factory instead
of an inline constructor call
- keeps the stage-1 branch a little cleaner before opening the public PR
- does not widen the backend surface or change any config/runtime behavior
Verification:
- python3 -m py_compile mempalace/palace.py
- pytest -q tests/test_miner.py tests/test_layers.py tests/test_searcher.py
- pytest -q # 529 passed, 106 deselected
* fix: harden read-only seam behavior and update seam tests
Preserve the stage-1 backend abstraction while closing the real read-path
regression surfaced in PR review.
What changed:
- make ChromaBackend.get_collection(create=False) fail fast when the palace
directory does not exist instead of letting PersistentClient create it as a
side effect
- update miner.status() to call get_collection(..., create=False) so status
keeps the historical 'No palace found' behavior
- remove the temporary chromadb shim aliases from layers.py and searcher.py
now that the tests patch the seam directly
- add focused tests for the new backends package, including ChromaCollection
delegation and ChromaBackend create=True/create=False behavior
- retarget layer/searcher tests to patch the backend seam instead of patching
chromadb.PersistentClient inside production modules
- add a regression test that status() does not create an empty palace when the
target path is missing
Verification:
- ruff check .
- uv run pytest -q
- uv run pytest -q tests/test_backends.py tests/test_cli.py tests/test_mcp_server.py tests/test_layers.py tests/test_searcher.py tests/test_miner.py
Notes:
- the separate benchmark/slow/stress layer was started as a soak but not used
as the merge gate for this PR branch
* refactor: drop duplicate mcp collection cache declaration
Remove a redundant `_collection_cache = None` assignment in
`mempalace/mcp_server.py` left over after the stage-1 backend seam refactor.
This does not change behavior; it only trims review noise in the MCP server
module after the read-path hardening pass.
Verification:
- ruff check mempalace/mcp_server.py
- uv run pytest -q tests/test_mcp_server.py
---------
Co-authored-by: Sergey Kuznetsov <sergey@iterudit.com>
On Windows, projects containing git-submodule junctions or dev-drive
reparse points cause iterdir() to list the entry successfully but
Path.is_dir() to raise OSError when it calls stat() internally.
Reproducer: any Windows project with a submodule checked out as a
junction (e.g. skills/pr-perfect) crashes mempalace init with:
OSError: [WinError 448] The path cannot be traversed because it
contains an untrusted mount point
Fix: wrap every is_dir() call in detect_rooms_from_folders with
try/except OSError so the scanner skips inaccessible entries and
continues rather than aborting.
Covers both the top-level pass and the one-level-deep nested pass.
Two new tests mock the OSError on specific paths and verify the
function returns correct rooms from the remaining accessible entries.
- Replace 'while offset < count/total' with 'while True' + break on short batch
- Fixes tool_list_rooms iterating over unfiltered col.count() when wing filter active
- Fixes all 4 paginated functions (tool_status, tool_list_wings, tool_list_rooms,
tool_get_taxonomy) missing early-exit when batch smaller than batch_size
- Remove unused 'total' variable in tool_list_wings, tool_list_rooms, tool_get_taxonomy
(replaced col.count() with accessibility check only)
Per bensig review comments on PR #371