- Landing: replace nonexistent `mempalace remember` CLI demo with real
`mempalace mine ./notes`
- Landing: soften unverifiable absolutes ("forever available",
"100% recall by design", "<50 ms", "90%+ compression",
"two-thousand-year-old", "tens of thousands of entries")
- MCP tool count: 19 → 29 across mcp-integration, claude-code, openclaw,
and modules; expand tool overview with Drawers, Tunnels, and System
categories to match mcp_server.py
- Wake-up token range: ~170–900 → ~600–900 in cli/api-reference/python-api
to match cli.py help text and concept docs
- Gemini CLI: move `--scope user` before target name and add `--`
separator so `-m mempalace.mcp_server` isn't parsed as Gemini flags
On Windows with non-UTF-8 locale (e.g. GBK), Path.read_text() defaults
to platform encoding, breaking onboarding tests and any source code that
reads JSON/markdown with non-ASCII content.
5 files, 8 call sites fixed.
Introduces the Indonesian (id) locale, providing translations for CLI commands, status messages, and core terminology.
Includes language-specific regex patterns for stop words and action detection to support text processing and indexing in Indonesian. The test suite is updated with a sample case to verify correct dialect handling and compression.
entity_detector.py was refactored in #911 to load candidate patterns
from i18n locale JSON files, supporting non-Latin scripts (Cyrillic,
accented Latin, etc.). But three other code paths still hardcoded the
ASCII-only regex [A-Z][a-z]{2,}, silently missing non-Latin entity
names in metadata tagging, closet indexing, and registry lookups.
Replace the hardcoded regex with a shared _candidate_entity_words()
helper that reuses the same i18n candidate_patterns as entity_detector.
Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras)
like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w.
This means \b splits mid-word on every matra: names like अनीता (Anita)
truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b
never match because \b fails after the final matra of कहा.
Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script
whose words contain combining marks.
Fix: locales with combining-mark scripts declare a boundary_chars field
in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n
loader replaces every \b in that locale's patterns with a script-aware
lookaround that treats the declared characters as "inside-word", and
pre-wraps candidate/multi_word patterns with the same boundary.
Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru,
it are unchanged.
Changes:
- mempalace/i18n/__init__.py: add _script_boundary, _expand_b,
_wrap_candidate, _collect_entity_section; candidate_patterns are now
returned fully-wrapped (boundary + capture group applied)
- mempalace/entity_detector.py: extract_candidates compiles pre-wrapped
candidate patterns directly instead of re-wrapping with \b
- tests/test_entity_detector.py: 5 new tests for Devanagari boundaries
(name extraction with/without boundary_chars, person-verb firing,
English regression)
BCP 47 language tags are case-insensitive (RFC 5646 §2.1.1) but the
locale files mix conventions (pt-br.json vs zh-CN.json). On
case-sensitive filesystems, '--lang PT-BR' or '--lang zh-cn' silently
missed the file, _load_entity_section returned {}, and entity
detection ran in English with no warning.
The cache key in get_entity_patterns was built from raw input, so
('PT-BR',) and ('pt-br',) produced two distinct entries, both wrong.
Add _canonical_lang(lang) that resolves any casing to the on-disk
filename stem via lowercase comparison, and route load_lang,
_load_entity_section, and the cache key through it.
Closes#927
Introduces a version label to the command-line interface, displaying the current MemPalace version in the help text. Adds a `--version` flag to allow users to easily check the version and exit.
CLI strings, AAAK instruction, regex patterns, and entity section
with person-verb, pronoun, dialogue, and candidate patterns for
Latin+diacritics names (Joao, Ines, Angela).
Follows the i18n entity framework from #911.
Adds 34 prepositions and conjunctions to reduce false positives
in entity detection when these words appear sentence-initial.
Co-Authored-By: almirus <almirus@users.noreply.github.com>
Add ru.json with full Russian translations for CLI strings, palace
terminology, AAAK compression instruction, and regex patterns for
topic/action extraction with Cyrillic character classes.
No code changes needed -- the i18n module auto-discovers language
files via *.json glob in the i18n directory.
Move all entity-detection lexical patterns (person verbs, pronouns,
dialogue markers, project verbs, stopwords, candidate character class)
out of hardcoded module-level constants and into the entity section of
each locale's JSON in mempalace/i18n/. Adds a languages parameter to
every public function so callers union patterns across the desired
locales. The default stays ("en",), so all existing callers and tests
behave unchanged.
Also adds:
- get_entity_patterns(langs) helper in mempalace/i18n/ that merges
patterns across requested languages, dedupes lists, unions stopwords,
and falls back to English for unknown locales
- MempalaceConfig.entity_languages property + setter, with env var
override (MEMPALACE_ENTITY_LANGUAGES, comma-separated)
- mempalace init --lang en,pt-br flag (persists to config.json)
- Per-language candidate_pattern so non-Latin scripts (Cyrillic,
Devanagari, CJK) can register their own character classes instead of
being silently dropped by the ASCII-only [A-Z][a-z]+ default
- _build_patterns LRU cache keyed by (name, languages) so multi-language
callers don't poison each other's cache slots
Why now: the open language PRs (#760 ru, #773 hi, #778 id, #907 it) only
add CLI strings via mempalace/i18n/. PR #156 (pt-br) is the first that
needed entity_detector changes and inlined a _PTBR variant of every
constant. That doesn't scale past 2-3 languages — every text gets
checked against every language's patterns regardless of relevance, and
candidate extraction still drops accented and non-Latin names.
This PR sets the standard so future locale contributors only edit one
JSON file (no Python changes), and entity detection scales linearly
with how many languages a user actually enabled, not how many ship.
* fix: restrict file permissions on sensitive palace data
On Linux with default umask (022), several files and directories
containing personal data were created world-readable. This patch
applies chmod 0o700 to directories and 0o600 to files immediately
after creation, wrapped in try/except for Windows compatibility.
Files hardened:
- hooks_cli.py: hook_state/ directory and hook.log
- entity_registry.py: entity_registry.json (names, relationships)
- knowledge_graph.py: knowledge_graph.sqlite3 parent directory
- exporter.py: export output directory and wing subdirectories
- config.py: people_map.json (name mappings)
- mcp_server.py: WAL file creation uses atomic os.open (TOCTOU fix)
Refs: MemPalace/mempalace#809
* fix: avoid redundant chmod calls on hot paths
- hooks_cli.py: chmod STATE_DIR and hook.log only on first creation,
not on every _log() call (hooks fire on every Stop event)
- exporter.py: track created wing dirs to skip redundant makedirs +
chmod on the same directory across batches
- mcp_server.py: remove redundant _WAL_FILE.chmod after os.open
already set mode=0o600 atomically
Refs: MemPalace/mempalace#809
* fix: add provenance header and speaker IDs to Slack transcript imports
Slack exports are multi-party chats where no speaker is inherently
the "user" or "assistant". The parser previously assigned these roles
purely by position, allowing a crafted export to place attacker text
in the "user" role — making it appear as the memory owner's words
in all future retrieval (data poisoning via stored memory).
Changes:
- Add provenance header marking Slack transcripts as multi-party
with positional (unverified) role assignment
- Prefix each message with the original speaker ID ([U1], [U2], etc.)
so downstream consumers can distinguish authors
- Keep user/assistant role alternation for exchange-pair chunking
compatibility with convo_miner.py
Tests:
- Provenance header presence and content
- Speaker ID preservation in output
- Attacker-first-message attribution verification
Refs: MemPalace/mempalace#809
* fix: move Slack provenance to footer, sanitize speaker IDs, extract constant
- Move provenance notice from header to footer to prevent it becoming
a standalone ChromaDB drawer via paragraph chunking on exports
with fewer than 3 exchange pairs (violates verbatim-always principle)
- Sanitize speaker user_id/username: strip brackets, newlines, and
control characters to prevent chunk-boundary injection via crafted
Slack exports
- Extract header string to _SLACK_PROVENANCE_FOOTER module constant,
consistent with _TOOL_RESULT_* constants pattern; tests import it
instead of duplicating the literal
Refs: MemPalace/mempalace#809
* feat: include created_at timestamp in search results (closes#465)
Surface the existing filed_at metadata as created_at in search result
objects returned by search_memories(). Enables temporal reasoning over
search hits without additional queries.
* Feat: add fallback for missing filed_at metadata
* fix(hooks): stop precompact hook from blocking compaction
The precompact hook unconditionally returned {"decision": "block"},
which in Claude Code means "cancel compaction" with no retry mechanism.
This made /compact permanently broken for all plugin users.
Changed hook_precompact() to mine the transcript synchronously (so data
lands before compaction) and return {"decision": "allow"}. This matches
the standalone bash hook in hooks/ which already uses allow.
Also extracted _get_mine_dir() and _mine_sync() helpers so precompact
can mine from the transcript directory, not just MEMPAL_DIR.
Stop hook behavior is unchanged -- left for #673 which implements the
full silent save path.
Closes#856, closes#858.
* fix: use empty JSON instead of invalid \"allow\" decision value
Claude Code only recognizes \"block\" as a top-level decision value.
\"allow\" is a permissionDecision value for PreToolUse hooks, not a
valid top-level decision. The correct way to not block is to return
empty JSON. Caught by #872.
* fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225)
Fixes#225.
Several transitive dependencies (chromadb, onnxruntime, posthog) print
banners and warnings to stdout — sometimes at the C level — during the
mcp_server import chain. Because the MCP protocol multiplexes JSON-RPC
over stdio, any non-JSON output on stdout corrupted the message stream
and broke Claude Desktop's parser with errors like:
MCP mempalace: Unexpected token '*', "**********"... is not valid JSON
MCP mempalace: Unexpected token 'E', "EP Error D"... is not valid JSON
MCP mempalace: Unexpected token 'F', "Falling ba"... is not valid JSON
Reproduced on Windows 11 with mempalace 3.0.0 / Python 3.10 / Claude
Desktop 1.1062.0.
Fix: at module load, redirect stdout to stderr at both the Python level
(sys.stdout = sys.stderr) and the file-descriptor level (os.dup2(2, 1))
to catch C-level prints, while preserving the real stdout for later
restore. main() calls _restore_stdout() right before entering the
protocol loop so JSON-RPC responses still go to the real stdout.
Adds tests/test_mcp_stdio_protection.py with three regression tests:
- module-level redirect is in place after import
- _restore_stdout() restores the original stdout (idempotent)
- 'python -m mempalace.mcp_server' with empty stdin emits no stdout
* style: reformat with ruff 0.4 (CI version) for #225