Commit Graph

612 Commits

Author SHA1 Message Date
bensig 41d45d9336 docs: RFC 002 — source adapter plugin specification
Draft plugin specification for source adapters, mirroring RFC 001's
role for storage backends. Formalizes the contract six community
ingester PRs (#274, #23, #169, #232, #567, #98, #702) plus #981's
metadata-only mode have been reinventing ad-hoc, so adapter authors
can build to a stable surface.

Key decisions:
- Single ingest() method; lazy adapters yield SourceItemMetadata
  ahead of drawers, eager adapters interleave
- Declared-transformation model (§1.4) replaces informal verbatim
  promise with a verifiable one; byte_preserving adapters declare
  the empty set, declared_lossy adapters enumerate. Existing
  miner.py and the convo_miner+normalize pipeline map cleanly
- Palace is the incremental cursor via is_current(item, metadata);
  no sidecar persistence
- Routing is adapter-owned; detect_room/detect_hall move into the
  filesystem adapter
- Flat metadata per ChromaDB (RFC 001 §1.4) — entity hints as
  json_string field, KG triples route to SQLite knowledge graph
- Closets stay core-built as a post-step; adapters may emit flat
  closet_hints. Closes existing gap where convo drawers get no
  closets
- No per-drawer field renames: source_file, filed_at, source_mtime,
  added_by, normalize_version, entities, ingest_mode all preserved.
  Spec adds adapter_name, adapter_version, privacy_class

§9 enumerates the cleanup PR prerequisites (mempalace/sources/
module, PalaceContext facade, KnowledgeGraph.add_triple gaining
backwards-compatible source_drawer_id + adapter_name params).

Tracking issue: #989
2026-04-17 23:42:46 -07:00
Igor Lins e Silva e4a2cd48a2 Merge pull request #984 from domiscd/feat/landing-page-update
feat/landing-page: Improve landing page readability
2026-04-17 19:47:39 -03:00
Dominique Deschatre 2e3e0b979c Update landing.css 2026-04-17 19:40:25 -03:00
Dominique Deschatre 9e8281aab5 (landing) svg icons animations 2026-04-17 19:37:30 -03:00
Dominique Deschatre e5f5009f80 (landing) added Closets section 2026-04-17 19:18:10 -03:00
Dominique Deschatre 89f0eb5cb3 refactor(website): split Landing.vue into section components
Extract 2002-line monolith into landing/ subfolder:
- 8 section components (FolioHeader, HeroSection, ForgettingSection, AnatomySection, DialectSection, MechanicsSection, InstallSection, CatalogFooter)
- useLandingEffects.js composable for all vanilla-JS effects
- landing.css for all styles
- Landing.vue reduced to 28-line orchestrator

Also restores upstream hero lede text ("permanent. Designed for total recall.").
2026-04-17 18:49:41 -03:00
Dominique Deschatre 8c3d1ba86c Merge remote-tracking branch 'upstream/develop' into feat/landing-page-update
Co-authored-by: Copilot <copilot@github.com>
2026-04-17 17:00:47 -03:00
Dominique Deschatre 28d4f67ba2 landing hero container 2026-04-17 15:53:50 -03:00
dependabot[bot] 0e632df85d chore(deps): bump actions/checkout from 4 to 6
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-17 07:56:09 +00:00
dependabot[bot] 04d80eb363 chore(deps): bump actions/upload-pages-artifact from 3 to 5
Bumps [actions/upload-pages-artifact](https://github.com/actions/upload-pages-artifact) from 3 to 5.
- [Release notes](https://github.com/actions/upload-pages-artifact/releases)
- [Commits](https://github.com/actions/upload-pages-artifact/compare/v3...v5)

---
updated-dependencies:
- dependency-name: actions/upload-pages-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-17 07:56:05 +00:00
dependabot[bot] c942f5866c chore(deps): bump actions/deploy-pages from 4 to 5
Bumps [actions/deploy-pages](https://github.com/actions/deploy-pages) from 4 to 5.
- [Release notes](https://github.com/actions/deploy-pages/releases)
- [Commits](https://github.com/actions/deploy-pages/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/deploy-pages
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-17 07:56:02 +00:00
Igor Lins e Silva 6889c6ff33 Merge pull request #957 from MemPalace/release/3.3.1
release: v3.3.1
2026-04-17 00:37:45 -03:00
Igor Lins e Silva 41bff266a4 Merge pull request #918 from almirus/develop
feat(cli): add version display and version flag to CLI
2026-04-17 00:29:55 -03:00
Igor Lins e Silva 596f3d3a8e Merge pull request #964 from MemPalace/fix/website-false-claims
fix(website): correct false claims and stale numbers in live docs
2026-04-16 23:38:08 -03:00
Igor Lins e Silva 0cb9ee5c58 fix(website): correct false claims and stale numbers in live docs
- Landing: replace nonexistent `mempalace remember` CLI demo with real
  `mempalace mine ./notes`
- Landing: soften unverifiable absolutes ("forever available",
  "100% recall by design", "<50 ms", "90%+ compression",
  "two-thousand-year-old", "tens of thousands of entries")
- MCP tool count: 19 → 29 across mcp-integration, claude-code, openclaw,
  and modules; expand tool overview with Drawers, Tunnels, and System
  categories to match mcp_server.py
- Wake-up token range: ~170–900 → ~600–900 in cli/api-reference/python-api
  to match cli.py help text and concept docs
- Gemini CLI: move `--scope user` before target name and add `--`
  separator so `-m mempalace.mcp_server` isn't parsed as Gemini flags
2026-04-16 23:31:35 -03:00
Igor Lins e Silva 51919fef0c Merge pull request #963 from domiscd/feat/landing-page-update
feat(website): update landing page
2026-04-16 22:37:16 -03:00
Dominique Deschatre c8727b3a2d chore(website): add Google Analytics 2026-04-16 22:34:37 -03:00
Dominique Deschatre 44c525ddd3 Merge remote-tracking branch 'upstream/develop' into feat/landing-page-update
# Conflicts:
#	website/index.md
2026-04-16 22:31:22 -03:00
Dominique Deschatre d8ac4c3abb new landing page pt 2 2026-04-16 22:24:15 -03:00
Dominique Deschatre 9893fa2383 new landing page 2026-04-16 21:46:03 -03:00
Lman Chu c88b8a2e17 style: fix ruff format for test_entity_detector.py
Collapse implicit string concatenation to single-line strings
to satisfy ruff format --check in CI.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-17 06:40:41 +08:00
Igor Lins e Silva b552bcf3ea Merge pull request #958 from MemPalace/fix/release-3.3.1-plugin-manifests
release: bump plugin manifests to 3.3.1
2026-04-16 16:26:35 -03:00
Igor Lins e Silva 05ad2dc194 release: bump plugin manifests to 3.3.1
version-guard workflow checks five sources must agree:
mempalace/version.py, pyproject.toml, .claude-plugin/marketplace.json,
.claude-plugin/plugin.json, .codex-plugin/plugin.json.

Initial release commit missed the three plugin manifests.
2026-04-16 16:25:00 -03:00
Igor Lins e Silva fd89303fe1 docs(changelog): backfill post-v3.3.0 PRs missed by initial boundary
Advisor caught: initial boundary (962776c..develop) skipped PRs that
landed on develop after v3.3.0 tag but before the sync-back merge.
Adds entries for #871 MEMPAL_VERBOSE, #811 research() local-only
default, #866 init .gitignore, #864 MCP stdout redirect, #863
precompact hook, #865 searcher empty results, #831 cold-start palace,
#862 init help, #815 Slack provenance, #840 save hook auto-mine.
Also drops the awkward caveat on #846 created_at — it's post-v3.3.0.
2026-04-16 16:12:37 -03:00
Igor Lins e Silva 2087869752 release: v3.3.1
Bumps version across pyproject.toml, mempalace/version.py, README badge,
and uv.lock. Finalizes the 3.3.0 CHANGELOG section (was still labeled
'Unreleased') and adds a 3.3.1 section covering the multi-language
entity-detection infra and the five new locales landed since 2026-04-13.

Highlights:
- Multi-language entity detection infra (#911) + script-aware word
  boundaries for combining-mark scripts (#932) + BCP 47 case-insensitive
  locale resolution (#928) + i18n patterns wired into miner/palace/
  entity_registry (#931)
- Five new fully-supported locales: pt-br (#156), ru (#760), it (#907),
  hi (#773), id (#778)
- UTF-8 encoding fix on read_text() calls for non-UTF-8 Windows locales
  (#946)
- KnowledgeGraph lock correctness (#884, #887)
- Various smaller fixes and improvements
2026-04-16 16:09:02 -03:00
Igor Lins e Silva 55a004fe1e Merge pull request #931 from mvalentsev/fix/i18n-entity-metadata
fix: use i18n candidate patterns for entity extraction in miner and palace
2026-04-16 15:54:01 -03:00
Igor Lins e Silva c5e249bba8 Merge pull request #946 from mvalentsev/fix/utf8-read-text
fix: add explicit UTF-8 encoding to read_text() calls (#776)
2026-04-16 15:52:42 -03:00
Igor Lins e Silva 65f99ad7e6 Merge pull request #928 from arnoldwender/fix/i18n-lang-case-insensitive
fix(i18n): resolve language codes case-insensitively (#927)
2026-04-16 15:44:36 -03:00
Igor Lins e Silva 29112fab82 Merge pull request #778 from dominosaurs/feat/id-lang
feat: add Indonesian language support
2026-04-16 15:44:26 -03:00
Igor Lins e Silva 4215be3926 Merge pull request #773 from tejasashinde/feat/add-i18n-hindi
feat: add Hindi language support to i18n module
2026-04-16 15:44:08 -03:00
jp 8adf35a13c fix: add threading lock to graph cache, expand docstring
Address review feedback from @bensig:
1. Wrap cache reads/writes in threading.Lock for thread safety
2. Promote the col-arg caveat from inline comment to docstring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 09:00:36 -07:00
jp 1657a79649 fix: clarify cache docs, skip caching empty graphs
Addresses Copilot review feedback on #661.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 09:00:27 -07:00
jp 84e2aa16e4 perf: graph cache with write-invalidation in build_graph()
build_graph() scans every drawer's metadata in 1000-item batches on
every call — O(n) per graph build with no caching. At 50K+ drawers
this costs several seconds per MCP tool call (traverse, find_tunnels,
graph_stats all call build_graph on every invocation).

Add a module-level cache (nodes + edges + timestamp) with a 60-second
TTL. Cache is invalidated via invalidate_graph_cache(), exported for
write operations to call. Tests updated with setup_method cache resets
and two new tests verifying cache hit and invalidation behaviour.
2026-04-16 09:00:27 -07:00
jp 15ea385554 fix: replace all non-ASCII progress markers for Windows encoding
Also fix miner.py checkmark and box-drawing/arrow chars (─, →) in
both miner.py and split_mega_files.py that would crash on cp1251/cp1252.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 08:59:58 -07:00
jp 542b53bb0f fix: replace Unicode checkmark with ASCII + for Windows encoding (#535)
Windows terminals using cp1251/cp1252 crash on the Unicode ✓ (U+2713)
in progress output. Replace with ASCII + in convo_miner.py and
split_mega_files.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 08:59:58 -07:00
mvalentsev 09fe2dda3c fix: add explicit UTF-8 encoding to read_text() calls (#776)
On Windows with non-UTF-8 locale (e.g. GBK), Path.read_text() defaults
to platform encoding, breaking onboarding tests and any source code that
reads JSON/markdown with non-ASCII content.

5 files, 8 call sites fixed.
2026-04-16 16:00:29 +05:00
🍕 939d4c1e74 feat: Update Indonesian translations
Refine AAAK instruction and expand entity detection patterns.
2026-04-16 17:43:51 +08:00
Lman Chu 683e940f70 feat(i18n): add Traditional + Simplified Chinese entity detection
zh-TW and zh-CN previously had no `entity` section. Calling
`detect_entities(..., languages=("zh-TW",))` silently fell back to
English patterns (i18n/__init__.py:231-233), so no Chinese names
were ever extracted — Chinese-speaking users got zero people or
projects detected from their own notes.

This adds entity sections for both locales:

- `candidate_pattern`: common-surname-prefixed CJK n-grams (~100
  surnames covering >95% of Taiwanese / PRC names), length capped
  at {1,2} trailing chars so greedy matches don't swallow the
  trailing verb character (e.g. 朱宜振說).
- `boundary_chars`: `\u4E00-\u9FFF` so the i18n loader's
  script-aware wrap (introduced in #932) fires `\b` at CJK↔non-CJK
  transitions. This is the same mechanism used for Devanagari,
  applied to the CJK range.
- `person_verb_patterns`: Chinese verbs attach directly to the
  name with no whitespace, so patterns are written as `{name}說`,
  `{name}問`, `{name}決定` — no `\b` or `\s+` separators.
- `dialogue_patterns`: full-width colon `:`, Chinese quotes
  「」『』, plus the standard Latin forms.
- `pronoun_patterns`: 他 / 她 / 它 / 他們 / 她們 / 您 / 咱.
- `stopwords`: ~140 common particles, pronouns, time expressions,
  question words, conjunctions, UI nouns, and politeness forms.

**Known limitation** (explicitly covered by a test): CJK scripts
have no word delimiters, so a name flanked by CJK on both sides
with no punctuation or whitespace break is not extracted. This
is a fundamental limit of regex-based CJK entity detection —
resolving it would require a dictionary tokeniser. Realistic
Chinese technical writing contains enough non-CJK neighbours
(bullet lines, inline English, full-width punctuation, newlines)
that 3+ occurrences normally produce matches. Verified against a
realistic zh-TW PKM note: 朱宜振 extracted 11x from 8 sentences
with 0.99 person-classification confidence.

**Follow-ups** (separate PRs): same pattern for `ja` and `ko`,
both of which currently share the silent fallback-to-English bug.

Tests: 7 new tests in `tests/test_entity_detector.py`:
- `test_zh_tw_candidate_extraction_at_boundaries`
- `test_zh_tw_person_classification`
- `test_zh_tw_stopwords_filter_common_particles`
- `test_zh_tw_falls_back_to_english_for_non_cjk_names`
- `test_zh_cn_candidate_extraction`
- `test_zh_cn_and_zh_tw_union_covers_both_variants`
- `test_zh_tw_known_limitation_inline_name_no_boundary`

Full suite: 957 passed, 0 failed.
2026-04-16 17:43:09 +08:00
fatkobra 1dc55a791d test: make Claude plugin wrapper tests portable on Windows 2026-04-16 11:41:53 +02:00
fatkobra be9214a190 Update mempal-precompact-hook.sh 2026-04-16 10:42:20 +02:00
fatkobra 5fe0c1c2ac Update mempal-stop-hook.sh 2026-04-16 10:33:34 +02:00
fatkobra e083cd6c84 Create test_claude_plugin_hook_wrappers.py 2026-04-16 10:32:17 +02:00
🍕 88f5b5fa0e Add Indonesian language support
Introduces the Indonesian (id) locale, providing translations for CLI commands, status messages, and core terminology.

Includes language-specific regex patterns for stop words and action detection to support text processing and indexing in Indonesian. The test suite is updated with a sample case to verify correct dialect handling and compression.
2026-04-16 16:15:47 +08:00
mvalentsev cde0f5b9e7 remove unnecessary comment 2026-04-16 10:38:38 +05:00
mvalentsev 973bd62a9a fix: use pre-wrapped candidate patterns after #932 refactor 2026-04-16 10:37:18 +05:00
mvalentsev 8bf940f861 fix: use i18n candidate patterns for entity extraction in miner and palace
entity_detector.py was refactored in #911 to load candidate patterns
from i18n locale JSON files, supporting non-Latin scripts (Cyrillic,
accented Latin, etc.). But three other code paths still hardcoded the
ASCII-only regex [A-Z][a-z]{2,}, silently missing non-Latin entity
names in metadata tagging, closet indexing, and registry lookups.

Replace the hardcoded regex with a shared _candidate_entity_words()
helper that reuses the same i18n candidate_patterns as entity_detector.
2026-04-16 10:35:40 +05:00
tejasashinde 21da870bd0 fix(i18n/hi): add boundary_chars and update action_pattern for Devanagari-aware matching 2026-04-16 09:21:21 +05:30
Igor Lins e Silva d4c942417a Merge pull request #932 from MemPalace/fix/entity-detector-non-latin-boundaries
fix(entity_detector): script-aware word boundaries for combining-mark scripts
2026-04-15 22:38:59 -03:00
Igor Lins e Silva f895bc58e6 fix(entity_detector): script-aware word boundaries for combining-mark scripts
Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras)
like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w.
This means \b splits mid-word on every matra: names like अनीता (Anita)
truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b
never match because \b fails after the final matra of कहा.

Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script
whose words contain combining marks.

Fix: locales with combining-mark scripts declare a boundary_chars field
in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n
loader replaces every \b in that locale's patterns with a script-aware
lookaround that treats the declared characters as "inside-word", and
pre-wraps candidate/multi_word patterns with the same boundary.

Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru,
it are unchanged.

Changes:
- mempalace/i18n/__init__.py: add _script_boundary, _expand_b,
  _wrap_candidate, _collect_entity_section; candidate_patterns are now
  returned fully-wrapped (boundary + capture group applied)
- mempalace/entity_detector.py: extract_candidates compiles pre-wrapped
  candidate patterns directly instead of re-wrapping with \b
- tests/test_entity_detector.py: 5 new tests for Devanagari boundaries
  (name extraction with/without boundary_chars, person-verb firing,
  English regression)
2026-04-15 22:18:52 -03:00
Arnold Wender 6caac50138 fix(i18n): use Optional[str] for Python 3.9 compatibility
PEP 604 union syntax (str | None) requires Python 3.10+. The project
supports 3.9 per CI matrix, so use typing.Optional instead.
2026-04-15 23:37:12 +02:00