mempalace

Author	SHA1	Message	Date
Igor Lins e Silva	05ad2dc194	release: bump plugin manifests to 3.3.1 version-guard workflow checks five sources must agree: mempalace/version.py, pyproject.toml, .claude-plugin/marketplace.json, .claude-plugin/plugin.json, .codex-plugin/plugin.json. Initial release commit missed the three plugin manifests.	2026-04-16 16:25:00 -03:00
Igor Lins e Silva	fd89303fe1	docs(changelog): backfill post-v3.3.0 PRs missed by initial boundary Advisor caught: initial boundary (962776c..develop) skipped PRs that landed on develop after v3.3.0 tag but before the sync-back merge. Adds entries for #871 MEMPAL_VERBOSE, #811 research() local-only default, #866 init .gitignore, #864 MCP stdout redirect, #863 precompact hook, #865 searcher empty results, #831 cold-start palace, #862 init help, #815 Slack provenance, #840 save hook auto-mine. Also drops the awkward caveat on #846 created_at — it's post-v3.3.0.	2026-04-16 16:12:37 -03:00
Igor Lins e Silva	2087869752	release: v3.3.1 Bumps version across pyproject.toml, mempalace/version.py, README badge, and uv.lock. Finalizes the 3.3.0 CHANGELOG section (was still labeled 'Unreleased') and adds a 3.3.1 section covering the multi-language entity-detection infra and the five new locales landed since 2026-04-13. Highlights: - Multi-language entity detection infra (#911) + script-aware word boundaries for combining-mark scripts (#932) + BCP 47 case-insensitive locale resolution (#928) + i18n patterns wired into miner/palace/ entity_registry (#931) - Five new fully-supported locales: pt-br (#156), ru (#760), it (#907), hi (#773), id (#778) - UTF-8 encoding fix on read_text() calls for non-UTF-8 Windows locales (#946) - KnowledgeGraph lock correctness (#884, #887) - Various smaller fixes and improvements	2026-04-16 16:09:02 -03:00
Igor Lins e Silva	55a004fe1e	Merge pull request #931 from mvalentsev/fix/i18n-entity-metadata fix: use i18n candidate patterns for entity extraction in miner and palace	2026-04-16 15:54:01 -03:00
Igor Lins e Silva	c5e249bba8	Merge pull request #946 from mvalentsev/fix/utf8-read-text fix: add explicit UTF-8 encoding to read_text() calls (#776)	2026-04-16 15:52:42 -03:00
Igor Lins e Silva	65f99ad7e6	Merge pull request #928 from arnoldwender/fix/i18n-lang-case-insensitive fix(i18n): resolve language codes case-insensitively (#927)	2026-04-16 15:44:36 -03:00
Igor Lins e Silva	29112fab82	Merge pull request #778 from dominosaurs/feat/id-lang feat: add Indonesian language support	2026-04-16 15:44:26 -03:00
Igor Lins e Silva	4215be3926	Merge pull request #773 from tejasashinde/feat/add-i18n-hindi feat: add Hindi language support to i18n module	2026-04-16 15:44:08 -03:00
jp	8adf35a13c	fix: add threading lock to graph cache, expand docstring Address review feedback from @bensig: 1. Wrap cache reads/writes in threading.Lock for thread safety 2. Promote the col-arg caveat from inline comment to docstring Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 09:00:36 -07:00
jp	1657a79649	fix: clarify cache docs, skip caching empty graphs Addresses Copilot review feedback on #661. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 09:00:27 -07:00
jp	84e2aa16e4	perf: graph cache with write-invalidation in build_graph() build_graph() scans every drawer's metadata in 1000-item batches on every call — O(n) per graph build with no caching. At 50K+ drawers this costs several seconds per MCP tool call (traverse, find_tunnels, graph_stats all call build_graph on every invocation). Add a module-level cache (nodes + edges + timestamp) with a 60-second TTL. Cache is invalidated via invalidate_graph_cache(), exported for write operations to call. Tests updated with setup_method cache resets and two new tests verifying cache hit and invalidation behaviour.	2026-04-16 09:00:27 -07:00
jp	15ea385554	fix: replace all non-ASCII progress markers for Windows encoding Also fix miner.py checkmark and box-drawing/arrow chars (─, →) in both miner.py and split_mega_files.py that would crash on cp1251/cp1252. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 08:59:58 -07:00
jp	542b53bb0f	fix: replace Unicode checkmark with ASCII + for Windows encoding (#535 ) Windows terminals using cp1251/cp1252 crash on the Unicode ✓ (U+2713) in progress output. Replace with ASCII + in convo_miner.py and split_mega_files.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 08:59:58 -07:00
mvalentsev	09fe2dda3c	fix: add explicit UTF-8 encoding to read_text() calls (#776 ) On Windows with non-UTF-8 locale (e.g. GBK), Path.read_text() defaults to platform encoding, breaking onboarding tests and any source code that reads JSON/markdown with non-ASCII content. 5 files, 8 call sites fixed.	2026-04-16 16:00:29 +05:00
🍕	939d4c1e74	feat: Update Indonesian translations Refine AAAK instruction and expand entity detection patterns.	2026-04-16 17:43:51 +08:00
Lman Chu	683e940f70	feat(i18n): add Traditional + Simplified Chinese entity detection zh-TW and zh-CN previously had no `entity` section. Calling `detect_entities(..., languages=("zh-TW",))` silently fell back to English patterns (i18n/__init__.py:231-233), so no Chinese names were ever extracted — Chinese-speaking users got zero people or projects detected from their own notes. This adds entity sections for both locales: - `candidate_pattern`: common-surname-prefixed CJK n-grams (~100 surnames covering >95% of Taiwanese / PRC names), length capped at {1,2} trailing chars so greedy matches don't swallow the trailing verb character (e.g. 朱宜振說). - `boundary_chars`: `\u4E00-\u9FFF` so the i18n loader's script-aware wrap (introduced in #932) fires `\b` at CJK↔non-CJK transitions. This is the same mechanism used for Devanagari, applied to the CJK range. - `person_verb_patterns`: Chinese verbs attach directly to the name with no whitespace, so patterns are written as `{name}說`, `{name}問`, `{name}決定` — no `\b` or `\s+` separators. - `dialogue_patterns`: full-width colon `：`, Chinese quotes 「」『』, plus the standard Latin forms. - `pronoun_patterns`: 他 / 她 / 它 / 他們 / 她們 / 您 / 咱. - `stopwords`: ~140 common particles, pronouns, time expressions, question words, conjunctions, UI nouns, and politeness forms. Known limitation (explicitly covered by a test): CJK scripts have no word delimiters, so a name flanked by CJK on both sides with no punctuation or whitespace break is not extracted. This is a fundamental limit of regex-based CJK entity detection — resolving it would require a dictionary tokeniser. Realistic Chinese technical writing contains enough non-CJK neighbours (bullet lines, inline English, full-width punctuation, newlines) that 3+ occurrences normally produce matches. Verified against a realistic zh-TW PKM note: 朱宜振 extracted 11x from 8 sentences with 0.99 person-classification confidence. Follow-ups (separate PRs): same pattern for `ja` and `ko`, both of which currently share the silent fallback-to-English bug. Tests: 7 new tests in `tests/test_entity_detector.py`: - `test_zh_tw_candidate_extraction_at_boundaries` - `test_zh_tw_person_classification` - `test_zh_tw_stopwords_filter_common_particles` - `test_zh_tw_falls_back_to_english_for_non_cjk_names` - `test_zh_cn_candidate_extraction` - `test_zh_cn_and_zh_tw_union_covers_both_variants` - `test_zh_tw_known_limitation_inline_name_no_boundary` Full suite: 957 passed, 0 failed.	2026-04-16 17:43:09 +08:00
fatkobra	1dc55a791d	test: make Claude plugin wrapper tests portable on Windows	2026-04-16 11:41:53 +02:00
fatkobra	be9214a190	Update mempal-precompact-hook.sh	2026-04-16 10:42:20 +02:00
fatkobra	5fe0c1c2ac	Update mempal-stop-hook.sh	2026-04-16 10:33:34 +02:00
fatkobra	e083cd6c84	Create test_claude_plugin_hook_wrappers.py	2026-04-16 10:32:17 +02:00
🍕	88f5b5fa0e	Add Indonesian language support Introduces the Indonesian (id) locale, providing translations for CLI commands, status messages, and core terminology. Includes language-specific regex patterns for stop words and action detection to support text processing and indexing in Indonesian. The test suite is updated with a sample case to verify correct dialect handling and compression.	2026-04-16 16:15:47 +08:00
mvalentsev	cde0f5b9e7	remove unnecessary comment	2026-04-16 10:38:38 +05:00
mvalentsev	973bd62a9a	fix: use pre-wrapped candidate patterns after #932 refactor	2026-04-16 10:37:18 +05:00
mvalentsev	8bf940f861	fix: use i18n candidate patterns for entity extraction in miner and palace entity_detector.py was refactored in #911 to load candidate patterns from i18n locale JSON files, supporting non-Latin scripts (Cyrillic, accented Latin, etc.). But three other code paths still hardcoded the ASCII-only regex [A-Z][a-z]{2,}, silently missing non-Latin entity names in metadata tagging, closet indexing, and registry lookups. Replace the hardcoded regex with a shared _candidate_entity_words() helper that reuses the same i18n candidate_patterns as entity_detector.	2026-04-16 10:35:40 +05:00
JunghwanNA	fb1cf53919	fix: harden repair backup scope and migrate swap rollback - repair.py: define backup_path before the conditional block so it is always in scope when the except handler references it - migrate.py: restore old palace from .old if both os.rename and shutil.move fail during the swap step	2026-04-16 14:04:26 +09:00
tejasashinde	21da870bd0	fix(i18n/hi): add boundary_chars and update action_pattern for Devanagari-aware matching	2026-04-16 09:21:21 +05:30
JunghwanNA	5bf826046c	fix: sanitize topic parameter in tool_diary_write agent_name and entry are validated via sanitize_name/sanitize_content, but topic is stored raw into ChromaDB metadata. Apply the same sanitize_name guard to reject null bytes, path traversal, and oversized payloads.	2026-04-16 12:12:17 +09:00
JunghwanNA	5dfe853154	fix: guard against data loss in repair, migrate, and CLI rebuild - repair.py: wrap upsert loop in try/except; restore from backup on failure instead of leaving a partially rebuilt collection - migrate.py: replace non-atomic rmtree+move with rename-aside swap so a crash between the two calls does not destroy both copies - cli.py: use offset += len(batch["ids"]) with empty-batch guard instead of fixed offset += batch_size to prevent skipping drawers	2026-04-16 12:11:18 +09:00
Igor Lins e Silva	d4c942417a	Merge pull request #932 from MemPalace/fix/entity-detector-non-latin-boundaries fix(entity_detector): script-aware word boundaries for combining-mark scripts	2026-04-15 22:38:59 -03:00
Igor Lins e Silva	f895bc58e6	fix(entity_detector): script-aware word boundaries for combining-mark scripts Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras) like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w. This means \b splits mid-word on every matra: names like अनीता (Anita) truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b never match because \b fails after the final matra of कहा. Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script whose words contain combining marks. Fix: locales with combining-mark scripts declare a boundary_chars field in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n loader replaces every \b in that locale's patterns with a script-aware lookaround that treats the declared characters as "inside-word", and pre-wraps candidate/multi_word patterns with the same boundary. Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru, it are unchanged. Changes: - mempalace/i18n/__init__.py: add _script_boundary, _expand_b, _wrap_candidate, _collect_entity_section; candidate_patterns are now returned fully-wrapped (boundary + capture group applied) - mempalace/entity_detector.py: extract_candidates compiles pre-wrapped candidate patterns directly instead of re-wrapping with \b - tests/test_entity_detector.py: 5 new tests for Devanagari boundaries (name extraction with/without boundary_chars, person-verb firing, English regression)	2026-04-15 22:18:52 -03:00
Arnold Wender	6caac50138	fix(i18n): use Optional[str] for Python 3.9 compatibility PEP 604 union syntax (str \| None) requires Python 3.10+. The project supports 3.9 per CI matrix, so use typing.Optional instead.	2026-04-15 23:37:12 +02:00
Arnold Wender	0174b93d0f	fix(i18n): resolve language codes case-insensitively (#927 ) BCP 47 language tags are case-insensitive (RFC 5646 §2.1.1) but the locale files mix conventions (pt-br.json vs zh-CN.json). On case-sensitive filesystems, '--lang PT-BR' or '--lang zh-cn' silently missed the file, _load_entity_section returned {}, and entity detection ran in English with no warning. The cache key in get_entity_patterns was built from raw input, so ('PT-BR',) and ('pt-br',) produced two distinct entries, both wrong. Add _canonical_lang(lang) that resolves any casing to the on-disk filename stem via lowercase comparison, and route load_lang, _load_entity_section, and the cache key through it. Closes #927	2026-04-15 23:33:42 +02:00
Igor Lins e Silva	122ce38811	Merge pull request #907 from Archetipo95/feat/italian-i18n-support feat: add Italian language support	2026-04-15 18:05:13 -03:00
Igor Lins e Silva	57b0b14192	Merge pull request #156 from mvalentsev/feat/pt-br-entity-detection feat: add Brazilian Portuguese support to entity_detector (closes #117)	2026-04-15 17:53:30 -03:00
almirus	10cdd93cec	feat(cli): add version display and version flag to CLI Introduces a version label to the command-line interface, displaying the current MemPalace version in the help text. Adds a `--version` flag to allow users to easily check the version and exit.	2026-04-15 21:44:20 +03:00
mvalentsev	4221589df2	fix(i18n): address review feedback on pt-br.json - dialogue_patterns[0]: remove stray \" before > (fixes markdown quote matching) - entity stopwords: add 40 prepositions, conjunctions, and common words to reduce false positives - pronoun_patterns: add 2nd-person (você/vocês) and possessives (seu/sua/seus/suas)	2026-04-15 23:32:31 +05:00
mvalentsev	3d13a72ae0	feat(i18n): add Brazilian Portuguese locale with entity detection (closes #117 ) CLI strings, AAAK instruction, regex patterns, and entity section with person-verb, pronoun, dialogue, and candidate patterns for Latin+diacritics names (Joao, Ines, Angela). Follows the i18n entity framework from #911.	2026-04-15 23:32:31 +05:00
Tejas Shinde	33a98fb9d1	Updated hi.json to support infra for entity,pronoun_patterns,dialogue_patterns,direct_address_pattern, project_verb_patterns and stopwords	2026-04-15 23:33:24 +05:30
Tejas Shinde	ce3ae0a668	Merge branch 'MemPalace:develop' into feat/add-i18n-hindi	2026-04-15 23:19:57 +05:30
Martin Masevski	69453b2180	feat: add italian entity patterns	2026-04-15 19:18:23 +02:00
Martin Masevski	2e998db0b9	feat: add italian i18n support	2026-04-15 19:15:55 +02:00
Igor Lins e Silva	73a2f82d5b	Merge pull request #760 from mvalentsev/feat/i18n-russian feat: add Russian language support (ru.json)	2026-04-15 13:46:04 -03:00
Igor Lins e Silva	312b3b5f0e	Merge pull request #758 from mvalentsev/fix/i18n-review-issues fix: address i18n review issues from PR #718	2026-04-15 13:45:49 -03:00
mvalentsev	4b998de77a	feat(i18n): expand Russian entity stopwords with prepositions and conjunctions Adds 34 prepositions and conjunctions to reduce false positives in entity detection when these words appear sentence-initial. Co-Authored-By: almirus <almirus@users.noreply.github.com>	2026-04-15 21:14:51 +05:00
mvalentsev	3e49522a42	fix(i18n): apply review feedback on ru.json (#760 ) - mine_skip: "повторной раскопки" -> "повторной обработки" - quote_pattern: add Russian guillemet quotes «» Co-Authored-By: almirus <almirus@users.noreply.github.com>	2026-04-15 20:17:16 +05:00
mvalentsev	d6bd7de5f6	feat(i18n): add entity detection section to Russian locale Cyrillic candidate/multi-word patterns, person-verb patterns (сказал, спросил, ответил, etc.), pronoun patterns, dialogue markers, direct address, and Russian stopwords. Follows the i18n entity framework from #911.	2026-04-15 18:16:25 +05:00
mvalentsev	b87ada3c96	feat: add Russian language support to i18n module Add ru.json with full Russian translations for CLI strings, palace terminology, AAAK compression instruction, and regex patterns for topic/action extraction with Cyrillic character classes. No code changes needed -- the i18n module auto-discovers language files via *.json glob in the i18n directory.	2026-04-15 18:15:15 +05:00
Igor Lins e Silva	3bac3654c4	Merge pull request #911 from MemPalace/refactor/entity-detector-i18n refactor(entity_detector): make multi-language extensible via i18n JSON	2026-04-15 09:40:36 -03:00
Igor Lins e Silva	c722c91e2a	test: document orphan-locale recovery for _temp_locale helper	2026-04-15 08:54:23 -03:00
Igor Lins e Silva	b214aced90	refactor(entity_detector): make multi-language extensible via i18n JSON Move all entity-detection lexical patterns (person verbs, pronouns, dialogue markers, project verbs, stopwords, candidate character class) out of hardcoded module-level constants and into the entity section of each locale's JSON in mempalace/i18n/. Adds a languages parameter to every public function so callers union patterns across the desired locales. The default stays ("en",), so all existing callers and tests behave unchanged. Also adds: - get_entity_patterns(langs) helper in mempalace/i18n/ that merges patterns across requested languages, dedupes lists, unions stopwords, and falls back to English for unknown locales - MempalaceConfig.entity_languages property + setter, with env var override (MEMPALACE_ENTITY_LANGUAGES, comma-separated) - mempalace init --lang en,pt-br flag (persists to config.json) - Per-language candidate_pattern so non-Latin scripts (Cyrillic, Devanagari, CJK) can register their own character classes instead of being silently dropped by the ASCII-only [A-Z][a-z]+ default - _build_patterns LRU cache keyed by (name, languages) so multi-language callers don't poison each other's cache slots Why now: the open language PRs (#760 ru, #773 hi, #778 id, #907 it) only add CLI strings via mempalace/i18n/. PR #156 (pt-br) is the first that needed entity_detector changes and inlined a _PTBR variant of every constant. That doesn't scale past 2-3 languages — every text gets checked against every language's patterns regardless of relevance, and candidate extraction still drops accented and non-Latin names. This PR sets the standard so future locale contributors only edit one JSON file (no Python changes), and entity detection scales linearly with how many languages a user actually enabled, not how many ship.	2026-04-15 08:52:42 -03:00

... 3 4 5 6 7 ...

643 Commits