mempalace

Author	SHA1	Message	Date
Igor Lins e Silva	51919fef0c	Merge pull request #963 from domiscd/feat/landing-page-update feat(website): update landing page	2026-04-16 22:37:16 -03:00
Dominique Deschatre	c8727b3a2d	chore(website): add Google Analytics	2026-04-16 22:34:37 -03:00
Dominique Deschatre	44c525ddd3	Merge remote-tracking branch 'upstream/develop' into feat/landing-page-update # Conflicts: # website/index.md	2026-04-16 22:31:22 -03:00
Dominique Deschatre	d8ac4c3abb	new landing page pt 2	2026-04-16 22:24:15 -03:00
Dominique Deschatre	9893fa2383	new landing page	2026-04-16 21:46:03 -03:00
Igor Lins e Silva	55a004fe1e	Merge pull request #931 from mvalentsev/fix/i18n-entity-metadata fix: use i18n candidate patterns for entity extraction in miner and palace	2026-04-16 15:54:01 -03:00
Igor Lins e Silva	c5e249bba8	Merge pull request #946 from mvalentsev/fix/utf8-read-text fix: add explicit UTF-8 encoding to read_text() calls (#776)	2026-04-16 15:52:42 -03:00
Igor Lins e Silva	65f99ad7e6	Merge pull request #928 from arnoldwender/fix/i18n-lang-case-insensitive fix(i18n): resolve language codes case-insensitively (#927)	2026-04-16 15:44:36 -03:00
Igor Lins e Silva	29112fab82	Merge pull request #778 from dominosaurs/feat/id-lang feat: add Indonesian language support	2026-04-16 15:44:26 -03:00
Igor Lins e Silva	4215be3926	Merge pull request #773 from tejasashinde/feat/add-i18n-hindi feat: add Hindi language support to i18n module	2026-04-16 15:44:08 -03:00
mvalentsev	09fe2dda3c	fix: add explicit UTF-8 encoding to read_text() calls (#776 ) On Windows with non-UTF-8 locale (e.g. GBK), Path.read_text() defaults to platform encoding, breaking onboarding tests and any source code that reads JSON/markdown with non-ASCII content. 5 files, 8 call sites fixed.	2026-04-16 16:00:29 +05:00
🍕	939d4c1e74	feat: Update Indonesian translations Refine AAAK instruction and expand entity detection patterns.	2026-04-16 17:43:51 +08:00
🍕	88f5b5fa0e	Add Indonesian language support Introduces the Indonesian (id) locale, providing translations for CLI commands, status messages, and core terminology. Includes language-specific regex patterns for stop words and action detection to support text processing and indexing in Indonesian. The test suite is updated with a sample case to verify correct dialect handling and compression.	2026-04-16 16:15:47 +08:00
mvalentsev	cde0f5b9e7	remove unnecessary comment	2026-04-16 10:38:38 +05:00
mvalentsev	973bd62a9a	fix: use pre-wrapped candidate patterns after #932 refactor	2026-04-16 10:37:18 +05:00
mvalentsev	8bf940f861	fix: use i18n candidate patterns for entity extraction in miner and palace entity_detector.py was refactored in #911 to load candidate patterns from i18n locale JSON files, supporting non-Latin scripts (Cyrillic, accented Latin, etc.). But three other code paths still hardcoded the ASCII-only regex [A-Z][a-z]{2,}, silently missing non-Latin entity names in metadata tagging, closet indexing, and registry lookups. Replace the hardcoded regex with a shared _candidate_entity_words() helper that reuses the same i18n candidate_patterns as entity_detector.	2026-04-16 10:35:40 +05:00
tejasashinde	21da870bd0	fix(i18n/hi): add boundary_chars and update action_pattern for Devanagari-aware matching	2026-04-16 09:21:21 +05:30
Igor Lins e Silva	d4c942417a	Merge pull request #932 from MemPalace/fix/entity-detector-non-latin-boundaries fix(entity_detector): script-aware word boundaries for combining-mark scripts	2026-04-15 22:38:59 -03:00
Igor Lins e Silva	f895bc58e6	fix(entity_detector): script-aware word boundaries for combining-mark scripts Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras) like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w. This means \b splits mid-word on every matra: names like अनीता (Anita) truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b never match because \b fails after the final matra of कहा. Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script whose words contain combining marks. Fix: locales with combining-mark scripts declare a boundary_chars field in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n loader replaces every \b in that locale's patterns with a script-aware lookaround that treats the declared characters as "inside-word", and pre-wraps candidate/multi_word patterns with the same boundary. Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru, it are unchanged. Changes: - mempalace/i18n/__init__.py: add _script_boundary, _expand_b, _wrap_candidate, _collect_entity_section; candidate_patterns are now returned fully-wrapped (boundary + capture group applied) - mempalace/entity_detector.py: extract_candidates compiles pre-wrapped candidate patterns directly instead of re-wrapping with \b - tests/test_entity_detector.py: 5 new tests for Devanagari boundaries (name extraction with/without boundary_chars, person-verb firing, English regression)	2026-04-15 22:18:52 -03:00
Arnold Wender	6caac50138	fix(i18n): use Optional[str] for Python 3.9 compatibility PEP 604 union syntax (str \| None) requires Python 3.10+. The project supports 3.9 per CI matrix, so use typing.Optional instead.	2026-04-15 23:37:12 +02:00
Arnold Wender	0174b93d0f	fix(i18n): resolve language codes case-insensitively (#927 ) BCP 47 language tags are case-insensitive (RFC 5646 §2.1.1) but the locale files mix conventions (pt-br.json vs zh-CN.json). On case-sensitive filesystems, '--lang PT-BR' or '--lang zh-cn' silently missed the file, _load_entity_section returned {}, and entity detection ran in English with no warning. The cache key in get_entity_patterns was built from raw input, so ('PT-BR',) and ('pt-br',) produced two distinct entries, both wrong. Add _canonical_lang(lang) that resolves any casing to the on-disk filename stem via lowercase comparison, and route load_lang, _load_entity_section, and the cache key through it. Closes #927	2026-04-15 23:33:42 +02:00
Igor Lins e Silva	122ce38811	Merge pull request #907 from Archetipo95/feat/italian-i18n-support feat: add Italian language support	2026-04-15 18:05:13 -03:00
Igor Lins e Silva	57b0b14192	Merge pull request #156 from mvalentsev/feat/pt-br-entity-detection feat: add Brazilian Portuguese support to entity_detector (closes #117)	2026-04-15 17:53:30 -03:00
mvalentsev	4221589df2	fix(i18n): address review feedback on pt-br.json - dialogue_patterns[0]: remove stray \" before > (fixes markdown quote matching) - entity stopwords: add 40 prepositions, conjunctions, and common words to reduce false positives - pronoun_patterns: add 2nd-person (você/vocês) and possessives (seu/sua/seus/suas)	2026-04-15 23:32:31 +05:00
mvalentsev	3d13a72ae0	feat(i18n): add Brazilian Portuguese locale with entity detection (closes #117 ) CLI strings, AAAK instruction, regex patterns, and entity section with person-verb, pronoun, dialogue, and candidate patterns for Latin+diacritics names (Joao, Ines, Angela). Follows the i18n entity framework from #911.	2026-04-15 23:32:31 +05:00
Tejas Shinde	33a98fb9d1	Updated hi.json to support infra for entity,pronoun_patterns,dialogue_patterns,direct_address_pattern, project_verb_patterns and stopwords	2026-04-15 23:33:24 +05:30
Tejas Shinde	ce3ae0a668	Merge branch 'MemPalace:develop' into feat/add-i18n-hindi	2026-04-15 23:19:57 +05:30
Martin Masevski	69453b2180	feat: add italian entity patterns	2026-04-15 19:18:23 +02:00
Martin Masevski	2e998db0b9	feat: add italian i18n support	2026-04-15 19:15:55 +02:00
Igor Lins e Silva	73a2f82d5b	Merge pull request #760 from mvalentsev/feat/i18n-russian feat: add Russian language support (ru.json)	2026-04-15 13:46:04 -03:00
Igor Lins e Silva	312b3b5f0e	Merge pull request #758 from mvalentsev/fix/i18n-review-issues fix: address i18n review issues from PR #718	2026-04-15 13:45:49 -03:00
mvalentsev	4b998de77a	feat(i18n): expand Russian entity stopwords with prepositions and conjunctions Adds 34 prepositions and conjunctions to reduce false positives in entity detection when these words appear sentence-initial. Co-Authored-By: almirus <almirus@users.noreply.github.com>	2026-04-15 21:14:51 +05:00
mvalentsev	3e49522a42	fix(i18n): apply review feedback on ru.json (#760 ) - mine_skip: "повторной раскопки" -> "повторной обработки" - quote_pattern: add Russian guillemet quotes «» Co-Authored-By: almirus <almirus@users.noreply.github.com>	2026-04-15 20:17:16 +05:00
mvalentsev	d6bd7de5f6	feat(i18n): add entity detection section to Russian locale Cyrillic candidate/multi-word patterns, person-verb patterns (сказал, спросил, ответил, etc.), pronoun patterns, dialogue markers, direct address, and Russian stopwords. Follows the i18n entity framework from #911.	2026-04-15 18:16:25 +05:00
mvalentsev	b87ada3c96	feat: add Russian language support to i18n module Add ru.json with full Russian translations for CLI strings, palace terminology, AAAK compression instruction, and regex patterns for topic/action extraction with Cyrillic character classes. No code changes needed -- the i18n module auto-discovers language files via *.json glob in the i18n directory.	2026-04-15 18:15:15 +05:00
Igor Lins e Silva	3bac3654c4	Merge pull request #911 from MemPalace/refactor/entity-detector-i18n refactor(entity_detector): make multi-language extensible via i18n JSON	2026-04-15 09:40:36 -03:00
Igor Lins e Silva	c722c91e2a	test: document orphan-locale recovery for _temp_locale helper	2026-04-15 08:54:23 -03:00
Igor Lins e Silva	b214aced90	refactor(entity_detector): make multi-language extensible via i18n JSON Move all entity-detection lexical patterns (person verbs, pronouns, dialogue markers, project verbs, stopwords, candidate character class) out of hardcoded module-level constants and into the entity section of each locale's JSON in mempalace/i18n/. Adds a languages parameter to every public function so callers union patterns across the desired locales. The default stays ("en",), so all existing callers and tests behave unchanged. Also adds: - get_entity_patterns(langs) helper in mempalace/i18n/ that merges patterns across requested languages, dedupes lists, unions stopwords, and falls back to English for unknown locales - MempalaceConfig.entity_languages property + setter, with env var override (MEMPALACE_ENTITY_LANGUAGES, comma-separated) - mempalace init --lang en,pt-br flag (persists to config.json) - Per-language candidate_pattern so non-Latin scripts (Cyrillic, Devanagari, CJK) can register their own character classes instead of being silently dropped by the ASCII-only [A-Z][a-z]+ default - _build_patterns LRU cache keyed by (name, languages) so multi-language callers don't poison each other's cache slots Why now: the open language PRs (#760 ru, #773 hi, #778 id, #907 it) only add CLI strings via mempalace/i18n/. PR #156 (pt-br) is the first that needed entity_detector changes and inlined a _PTBR variant of every constant. That doesn't scale past 2-3 languages — every text gets checked against every language's patterns regardless of relevance, and candidate extraction still drops accented and non-Latin names. This PR sets the standard so future locale contributors only edit one JSON file (no Python changes), and entity detection scales linearly with how many languages a user actually enabled, not how many ship.	2026-04-15 08:52:42 -03:00
Igor Lins e Silva	56b6a6360f	Merge pull request #908 from fatkobra/test/palace-graph-tunnels test: add palace_graph tunnel helper coverage	2026-04-15 08:23:18 -03:00
fatkobra	966937d620	test: add palace_graph tunnel helper coverage Adds focused tests for explicit tunnel helpers in `mempalace/palace_graph.py`. Covered: - `_load_tunnels` - `_save_tunnels` - `create_tunnel` - `list_tunnels` - `delete_tunnel` - `follow_tunnels`	2026-04-15 11:38:18 +02:00
Marcio E. Heiderscheidt	b524b31839	fix: restrict file permissions on sensitive palace data (#814 ) * fix: restrict file permissions on sensitive palace data On Linux with default umask (022), several files and directories containing personal data were created world-readable. This patch applies chmod 0o700 to directories and 0o600 to files immediately after creation, wrapped in try/except for Windows compatibility. Files hardened: - hooks_cli.py: hook_state/ directory and hook.log - entity_registry.py: entity_registry.json (names, relationships) - knowledge_graph.py: knowledge_graph.sqlite3 parent directory - exporter.py: export output directory and wing subdirectories - config.py: people_map.json (name mappings) - mcp_server.py: WAL file creation uses atomic os.open (TOCTOU fix) Refs: MemPalace/mempalace#809 * fix: avoid redundant chmod calls on hot paths - hooks_cli.py: chmod STATE_DIR and hook.log only on first creation, not on every _log() call (hooks fire on every Stop event) - exporter.py: track created wing dirs to skip redundant makedirs + chmod on the same directory across batches - mcp_server.py: remove redundant _WAL_FILE.chmod after os.open already set mode=0o600 atomically Refs: MemPalace/mempalace#809	2026-04-15 00:27:03 -07:00
Marcio E. Heiderscheidt	e61dc2adf8	fix: add provenance header and speaker IDs to Slack transcript imports (#815 ) * fix: add provenance header and speaker IDs to Slack transcript imports Slack exports are multi-party chats where no speaker is inherently the "user" or "assistant". The parser previously assigned these roles purely by position, allowing a crafted export to place attacker text in the "user" role — making it appear as the memory owner's words in all future retrieval (data poisoning via stored memory). Changes: - Add provenance header marking Slack transcripts as multi-party with positional (unverified) role assignment - Prefix each message with the original speaker ID ([U1], [U2], etc.) so downstream consumers can distinguish authors - Keep user/assistant role alternation for exchange-pair chunking compatibility with convo_miner.py Tests: - Provenance header presence and content - Speaker ID preservation in output - Attacker-first-message attribution verification Refs: MemPalace/mempalace#809 * fix: move Slack provenance to footer, sanitize speaker IDs, extract constant - Move provenance notice from header to footer to prevent it becoming a standalone ChromaDB drawer via paragraph chunking on exports with fewer than 3 exchange pairs (violates verbatim-always principle) - Sanitize speaker user_id/username: strip brackets, newlines, and control characters to prevent chunk-boundary injection via crafted Slack exports - Extract header string to _SLACK_PROVENANCE_FOOTER module constant, consistent with _TOOL_RESULT_* constants pattern; tests import it instead of duplicating the literal Refs: MemPalace/mempalace#809	2026-04-15 00:27:01 -07:00
sha2fiddy	a15094ce60	feat: include created_at timestamp in search results (#846 ) * feat: include created_at timestamp in search results (closes #465) Surface the existing filed_at metadata as created_at in search result objects returned by search_memories(). Enables temporal reasoning over search hits without additional queries. * Feat: add fallback for missing filed_at metadata	2026-04-15 00:26:57 -07:00
Mikhail Valentsev	ecd44f7cb7	fix(hooks): stop precompact hook from blocking compaction (#856 , #858 ) (#863 ) * fix(hooks): stop precompact hook from blocking compaction The precompact hook unconditionally returned {"decision": "block"}, which in Claude Code means "cancel compaction" with no retry mechanism. This made /compact permanently broken for all plugin users. Changed hook_precompact() to mine the transcript synchronously (so data lands before compaction) and return {"decision": "allow"}. This matches the standalone bash hook in hooks/ which already uses allow. Also extracted _get_mine_dir() and _mine_sync() helpers so precompact can mine from the transcript directory, not just MEMPAL_DIR. Stop hook behavior is unchanged -- left for #673 which implements the full silent save path. Closes #856, closes #858. * fix: use empty JSON instead of invalid \"allow\" decision value Claude Code only recognizes \"block\" as a top-level decision value. \"allow\" is a permissionDecision value for PreToolUse hooks, not a valid top-level decision. The correct way to not block is to return empty JSON. Caught by #872.	2026-04-15 00:26:54 -07:00
Arnold Wender	b226251ddf	fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225 ) (#864 ) * fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225) Fixes #225. Several transitive dependencies (chromadb, onnxruntime, posthog) print banners and warnings to stdout — sometimes at the C level — during the mcp_server import chain. Because the MCP protocol multiplexes JSON-RPC over stdio, any non-JSON output on stdout corrupted the message stream and broke Claude Desktop's parser with errors like: MCP mempalace: Unexpected token '', "********"... is not valid JSON MCP mempalace: Unexpected token 'E', "EP Error D"... is not valid JSON MCP mempalace: Unexpected token 'F', "Falling ba"... is not valid JSON Reproduced on Windows 11 with mempalace 3.0.0 / Python 3.10 / Claude Desktop 1.1062.0. Fix: at module load, redirect stdout to stderr at both the Python level (sys.stdout = sys.stderr) and the file-descriptor level (os.dup2(2, 1)) to catch C-level prints, while preserving the real stdout for later restore. main() calls _restore_stdout() right before entering the protocol loop so JSON-RPC responses still go to the real stdout. Adds tests/test_mcp_stdio_protection.py with three regression tests: - module-level redirect is in place after import - _restore_stdout() restores the original stdout (idempotent) - 'python -m mempalace.mcp_server' with empty stdin emits no stdout style: reformat with ruff 0.4 (CI version) for #225	2026-04-15 00:26:51 -07:00
Arnold Wender	0aee6f3ed9	fix(init): auto-add per-project files to .gitignore in git repos (#185 ) (#866 ) Partially addresses #185. `mempalace init <dir>` writes `mempalace.yaml` and `entities.json` into the project root. When <dir> is a git repository, those files have no default protection and risk being committed by accident — the loudest concern in the original report. This PR adds `_ensure_mempalace_files_gitignored()` which runs at the end of cmd_init: if <dir>/.git exists, append the two filenames to .gitignore (creating it if necessary) under a clearly-marked block. The helper is conservative: - only runs when <dir>/.git is present (no-op for non-git projects) - skips entries already present (no duplicates) - preserves existing .gitignore content - handles files without trailing newlines This does NOT relocate the files to ~/.mempalace/wings/<wing>/ as the issue's 'Expected' section proposes — that's a behavioral change with miner/config implications and warrants a separate design discussion. The gitignore safeguard removes the immediate risk without breaking any existing flow. Tests: 5 cases in tests/test_init_gitignore_protection.py covering no-op, fresh creation, partial append, idempotency, and missing-newline edge case.	2026-04-15 00:26:41 -07:00
Arnold Wender	6a73eb2e20	fix(searcher): guard against empty ChromaDB query results (#195 ) (#865 ) Fixes #195. When ChromaDB returns no documents (empty palace, or wing/room filter that excludes everything), it returns the shape: {"documents": [], "metadatas": [], "distances": []} Indexing `results["documents"][0]` blindly raises IndexError instead of the expected 'no results' response. Affected: searcher.search(), searcher.search_memories() (drawer + closet branches plus the total_before_filter aggregate), and Layer3.search() / Layer3.search_raw(). Adds a tiny private helper `searcher._first_or_empty(results, key)` that safely extracts the inner list, returning [] for any of: missing key, empty outer list, [None], or [[]]. layers.py imports the same helper to avoid duplicating the guard. Tests: tests/test_empty_chromadb_results.py covers all observed shapes plus a documentation-style test that pins the original IndexError so future readers understand why the helper exists.	2026-04-15 00:26:38 -07:00
Mikhail Valentsev	54a386d925	fix: return empty status instead of error on cold-start palace (#830 ) (#831 ) tool_status() called _get_collection() with the default create=False, which throws when the ChromaDB collection does not exist yet (valid palace, zero drawers). The exception was swallowed and status returned "No palace found" even though init had completed successfully. Switching to create=True bootstraps an empty collection on first status call, matching what the write path already does. Fix suggested by @hkevinchu in the issue.	2026-04-15 00:26:35 -07:00
Marcio E. Heiderscheidt	f20f45a2da	fix: make entity_registry.research() local-only by default (#811 ) * fix: make entity_registry.research() local-only by default research() previously called _wikipedia_lookup() unconditionally, sending entity names to en.wikipedia.org on every uncached lookup. This violates the project's local-first and privacy-by-architecture principles documented in CLAUDE.md. Changes: - research() now returns "unknown" for uncached words by default - New allow_network=True parameter required for Wikipedia lookups - Wikipedia 404 now returns "unknown" instead of asserting "person" with 0.70 confidence, preventing entity registry poisoning - Added privacy warning docstring to _wikipedia_lookup() - Added tests for local-only default, opt-in network, 404 handling, and cache-not-persisted-on-local-only behaviour Refs: MemPalace/mempalace#809 * fix: improve research() cache read path and deduplicate test mocks - Use .get() instead of .setdefault() for cache reads in research() so the local-only path never mutates _data unnecessarily - Move .setdefault() to the network-write path only - Use result.setdefault() for word/confirmed keys to ensure consistent return shape across all _wikipedia_lookup error paths - Extract duplicated mock_result dict into _MOCK_SAOIRSE_PERSON constant shared by 3 test functions	2026-04-15 00:26:24 -07:00
Arnold Wender	f36d04e4a4	docs(cli): clarify that 'mempalace init' requires <dir> (#210 ) (#862 ) Fixes #210. The CLI requires a positional <dir> argument. Previous docs emphasized that init 'sets up ~/.mempalace/' which misled users into expecting no arguments. Now the docs show <dir> is required, offer '.' as the usage for the current directory, and reword the description so the project-directory scan is listed first.	2026-04-15 00:26:20 -07:00

1 2 3 4 5 ...

429 Commits