Commit Graph

623 Commits

Author SHA1 Message Date
Igor Lins e Silva 6d42f61e64 Merge pull request #1001 from mvalentsev/feat/i18n-de-es-fr-entity
feat(i18n): add entity detection to German, Spanish, and French locales
2026-04-21 00:55:33 -03:00
Igor Lins e Silva 2a5914b630 Merge pull request #945 from lmanchu/feat/zh-entity-detection
feat(i18n): add Traditional + Simplified Chinese entity detection
2026-04-21 00:55:04 -03:00
Igor Lins e Silva cf0477bfa7 Merge pull request #1052 from MemPalace/merge/main-into-release-3.3.2
release: merge main into release/3.3.2
2026-04-20 15:34:24 -03:00
Igor Lins e Silva 936f1866af release: merge main into release/3.3.2 — keep 3.3.2 version bumps
Conflicts resolved by taking the 3.3.2 side for all version files:
- pyproject.toml, mempalace/version.py (3.3.2)
- .claude-plugin/marketplace.json, .claude-plugin/plugin.json (3.3.2)
- .codex-plugin/plugin.json (3.3.2)
- README.md version badge (3.3.2)
- uv.lock (3.3.2)
- CHANGELOG.md keeps [3.3.2] section on top of main's [3.3.1]

No source-code conflicts; main's 3.3.1 commit footprint is already in
develop's history via the earlier sync boundaries.

1033 tests pass on the merged tree.
2026-04-20 15:21:08 -03:00
Igor Lins e Silva 04e11ae545 Merge pull request #1045 from MemPalace/fix/copilot-review-release-3.3.2
fix: address Copilot review on release/3.3.2
2026-04-20 15:16:20 -03:00
Dzmitry Padabed 54c314d8d9 feat(i18n): add Belarusian 2026-04-20 21:00:39 +03:00
Igor Lins e Silva 65b17a6e0f fix: address Copilot review on release/3.3.2
Non-ASCII glyphs (regression of the #681 class of Windows UnicodeEncodeError):
- mempalace/cli.py: "✗" → "ERROR:", "⚠" → "WARNING:", em dash → "-"
- mempalace/sweeper.py: "⚠" → "WARNING:"

Backend arg validation:
- mempalace/backends/chroma.py: `_normalize_get_collection_args` now
  raises TypeError on unexpected trailing positional args instead of
  silently dropping them — surfaces call-site bugs early.

Docs site:
- website/.vitepress/config.mts: gate Google Analytics scripts behind
  MEMPALACE_DOCS_GA_ID env var (default off). Self-hosters no longer
  get GA injected unconditionally.

Landing page SPA hygiene:
- website/.vitepress/theme/landing/useLandingEffects.js: collect all
  IntersectionObserver disconnects and removeEventListener thunks in a
  shared `cleanups` registry; drain it in `onBeforeUnmount` so observers
  and form/replay listeners don't leak across SPA navigations.
2026-04-19 18:19:28 -03:00
Igor Lins e Silva 5e9451407f release: v3.3.2
Version bumps across pyproject.toml, mempalace/version.py, README badge,
uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*).

CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added
covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop
fix + tandem sweeper (#998), None-metadata guards (#999, #1013),
chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681),
HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path
cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).
2026-04-19 16:55:25 -03:00
jp d657626736 style: ruff format — collapse AttributeError log call to single line 2026-04-19 08:34:43 -07:00
jp 2629ae5b71 fix(hooks): default silent_guard=True — config-read failure must not suppress saves
Addresses bensig's review on PR #1021.

silent_guard was initialized to False, so when both MempalaceConfig
import and .hook_silent_save attribute access failed, silent_guard
stayed False. Then `if not silent_guard:` fired and returned empty —
silently dropping the save. In silent mode (the default since v3.3.0),
saves should ALWAYS proceed on config-read failure. Changing the
initial value to True makes that the safe default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 08:23:02 -07:00
fatkobra 0b316d4053 test: normalize wrapper script path for bash on Windows 2026-04-19 10:34:11 +02:00
Vu Nguyen 5d2da04bcd Merge remote-tracking branch 'upstream/develop' into fix/status-paginate-large-palaces
# Conflicts:
#	mempalace/miner.py
2026-04-19 02:02:28 -05:00
Vu Nguyen 3004ac4ee4 fix(miner): port None-metadata guard into paginated status loop
Upstream develop commit feba7e8 (2026-04-18) added `m = m or {}` to the
single-shot `for m in metas:` loop after this branch already rewrote
status() to paginate. Without porting the guard forward, merging this PR
would silently drop jp's fix and crash again on palaces with null-metadata
drawers.

Addresses bensig's review on #851.
2026-04-19 01:35:24 -05:00
Ben Sigman 32ec74d8eb Merge pull request #1023 from jphein/pr/pid-file-guard
fix(hooks): PID file guard prevents stacking mine processes
2026-04-18 23:33:47 -07:00
Ben Sigman caf503f442 Merge pull request #1000 from jphein/fix/quarantine-stale-hnsw
feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift (closes #823)
2026-04-18 23:28:00 -07:00
Ben Sigman 62439e1368 Merge pull request #681 from jphein/fix/unicode-checkmark
fix: replace Unicode checkmark with ASCII for Windows encoding (#535)
2026-04-18 23:27:57 -07:00
jp dfba247454 fix: cross-platform PID check — os.kill(pid, 0) TERMINATES on Windows
Real bug surfaced on CI for this PR. On POSIX, os.kill(pid, 0) is
the canonical no-op existence probe. On Windows, Python's os.kill
maps to TerminateProcess(handle, sig), which *terminates* the target
with exit code sig. os.kill(pid, 0) therefore kills the target with
exit code 0 — silently destroying our mine child (or, as happened
in test_mine_already_running_live_pid, the pytest process itself).

Fix: split into _pid_alive(pid) helper with a Windows branch using
ctypes.windll.kernel32.OpenProcess + GetExitCodeProcess.
PROCESS_QUERY_LIMITED_INFORMATION opens a handle only if the PID
exists; STILL_ACTIVE (259) distinguishes running from exited processes.

No new dependencies — stdlib ctypes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:30:17 -07:00
jp fe6b8899bc fix: broaden _mine_already_running catch — Windows os.kill raises plain OSError
On Windows, os.kill(bogus_pid, 0) raises OSError[WinError 87]
"The parameter is incorrect" — NOT ProcessLookupError. The old
except tuple missed it, so test_mine_already_running_dead_pid
failed on Windows CI.

Catching OSError covers ProcessLookupError + PermissionError +
FileNotFoundError on POSIX and WinError 87 on Windows. ValueError
still guards the int() parse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:13:37 -07:00
jp a6b6e55247 fix: PID file guard prevents stacking mine processes
Every stop hook fire spawned a new background `mempalace mine` via
subprocess.Popen with no dedup — 4 concurrent mines at ~770% CPU
observed in production. Add `_mine_already_running()` (reads
`hook_state/mine.pid`, uses `os.kill(pid, 0)` as an existence check)
and `_spawn_mine()` (writes the child PID to the lock file after
Popen returns). `_maybe_auto_ingest` bails early when the guard
reports True.

Tests: 4 new unit tests for `_mine_already_running` (no file, dead
PID, live PID using `os.getpid()`, corrupt file), 1 new test
covering the skip-when-running branch of `_maybe_auto_ingest`, and
existing spawn tests patched to redirect `_MINE_PID_FILE` into
tmp_path so they don't touch the real state dir.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 20:27:56 -07:00
jp 2183d866f3 style(hooks): ruff format hooks_cli.py and test_hooks_cli.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 18:09:29 -07:00
jp 1531a253be test: add missing import os in test_hooks_cli
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 18:05:18 -07:00
jp 5deb815f0b fix(hooks): address Copilot review feedback on #1021
- _output(): use sys.modules.get() instead of unconditional import to
  avoid triggering mcp_server's stdout redirect as a side effect
- _output(): write-all loop for os.write() to handle partial writes and
  EINTR; fall back to sys.stdout.buffer on OSError
- _output() docstring: remove inaccurate _save_diary_direct reference
- stop_hook_active guard: narrow except to ImportError/AttributeError,
  default silent_guard=False (safe: preserves block-mode loop prevention
  when config load fails) and log a warning instead of silently changing
  behavior
- tests: two new regression tests covering the real-stdout-fd path and
  the fd-1 fallback path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 18:05:18 -07:00
jp 6a3a5c7a3d fix(hooks): write hook JSON to real stdout, bypassing mcp_server redirect
mempalace.mcp_server redirects stdout → stderr at module-level import
(both Python-level and fd-level via os.dup2) to protect the MCP stdio
protocol from ChromaDB's C-level noise. Silent-save imports mcp_server
transitively via _save_diary_direct, so by the time _output() calls
print(), sys.stdout is actually stderr.

Claude Code reads hook output from fd 1. With the redirect in effect,
fd 1 points to fd 2, so our {"systemMessage": "✦ N memories woven..."}
JSON lands on stderr and Claude Code never renders it. The save still
happens, the marker still advances — the user just never sees the
beautiful checkpoint notification in their terminal.

Fix: _output() now writes to _REAL_STDOUT_FD (saved by mcp_server before
the redirect) via os.write(), falling back to sys.stdout only when the
saved fd is unavailable (e.g., hooks_cli imported without mcp_server).

Test: bash hook script 2>/dev/null now shows only the JSON;
2>&1 >/dev/null shows only the Diary entry log line — clean separation
restored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:05:18 -07:00
jp 914945637c fix(hooks): honor silent_save when stop_hook_active is set
Claude Code 2.1.114 passes stop_hook_active:true on every Stop fire
after the first in a session (plugin-dispatched hooks in particular).
The legacy guard at line 426 was written for block-mode, where a
re-fire with the flag set meant "you already blocked, don't block
again" — correct loop prevention when the hook returns
{"decision":"block"}.

Silent-save mode (default since #673) never blocks — it saves
directly and returns. The flag is meaningless there, so the old
guard was suppressing every auto-save after the first one in a
Claude Code session. Symptom: terminal never shows the "✦ N
memories woven" notification again, hook.log stays silent, save
marker stuck.

Fix: only skip on stop_hook_active when block mode is configured.
Silent mode runs through as normal — the save is deterministic and
idempotent, no loop risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:05:18 -07:00
jp 0c38deaab5 feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift
Add a helper that renames HNSW segment directories whose
`data_level0.bin` is significantly older than `chroma.sqlite3`. Drift
between the on-disk HNSW graph and the live embeddings table is the
root cause of a segfault class where the Rust graph-walk dereferences
dangling neighbor pointers for entries in the metadata segment that no
longer exist in the HNSW index, crashing in a background thread on
`count()` or `query()`.

Issue #823 describes the same drift as a silent-staleness symptom
(semantic search returns stale results after `add_drawer` because
`data_level0.bin` lags the sqlite metadata under the default
`sync_threshold=1000`). Under heavier load or after an interrupted
write, the same drift can escalate from "silent stale results" to
"SIGSEGV on next open," which is the failure mode observed at
neo-cortex-mcp#2 (chromadb 1.5.5, Python 3.12) and acknowledged at
chroma-core/chroma#2594.

On one 135K-drawer palace where `index_metadata.pickle` claimed 137,813
elements against 135,464 rows in sqlite (2,349-entry drift), fresh
Python processes crashed in `col.count()` 17/20 times; after renaming
the segment dir out of the way and letting ChromaDB rebuild lazily, the
same 20-run check went to 0 crashes.

The recovery path #823 suggests (export / recreate / reimport) is heavy
— it re-embeds every drawer. This helper is lighter: rename the segment
dir so ChromaDB reopens without it, and the indexer rebuilds lazily on
the next write. The original directory is renamed (not deleted) so the
operator can recover if the heuristic misfires.

If `chroma.sqlite3` is more than `stale_seconds` (default 3600) newer
than the segment's `data_level0.bin`, the segment is considered
suspect. One hour is deliberately conservative — normal HNSW flush
cadence is seconds to minutes, so an hour of drift implies a crashed
mid-write, not routine lag.

- Additive: exposes `quarantine_stale_hnsw(palace_path, stale_seconds)`
  as a helper. Not wired into `_client()` / startup on this PR — the
  goal is to land the primitive first so operators and higher layers
  can opt in. A follow-up could call it automatically on palace open
  behind an env var or config flag.
- Closes #823 by giving operators a first-class recovery path without
  having to install `chromadb-ops` or re-mine.

Four new tests in `tests/test_backends.py`:
- renames drifted segment, preserves original files under `.drift-TS` suffix
- leaves fresh segments alone
- no-op on missing palace path / missing `chroma.sqlite3`
- skips already-quarantined (`.drift-` suffixed) directories

`pytest tests/test_backends.py` → 11 passed. `ruff check` / `ruff format
--check` — clean.
2026-04-18 18:04:05 -07:00
Igor Lins e Silva 109d7f267c Merge pull request #990 from MemPalace/docs/rfc-source-adapter-plugin-spec
docs: RFC 002 — Source adapter plugin specification
2026-04-18 19:12:20 -03:00
Igor Lins e Silva 66090b2bcb Merge pull request #1014 from MemPalace/refactor/rfc-002-sources-scaffolding
refactor(sources): RFC 002 §9 scaffolding — BaseSourceAdapter, registry, PalaceContext
2026-04-18 18:44:52 -03:00
Igor Lins e Silva 8a130fc509 Merge pull request #1012 from MemPalace/docs/use-real-claude-projects-path-996
docs: use real ~/.claude/projects/ path in first-run help and README (#996)
2026-04-18 18:43:53 -03:00
Igor Lins e Silva 7af3bfae8f Merge pull request #1010 from MemPalace/fix/chromadb-1-5-4-py-3-13-compat-via-581
fix: upgrade chromadb to >=1.5.4 for Python 3.13/3.14 compatibility + fix 1.5.x queue-stall (closes #1006)
2026-04-18 18:43:34 -03:00
Ben Sigman 64266695b5 Merge pull request #1013 from MemPalace/fix/layer3-search-raw-none-guard-1011
fix: guard Layer3.search_raw against None doc/meta from ChromaDB (#1011)
2026-04-18 13:41:55 -07:00
Ben Sigman 1b89b49b78 Merge pull request #999 from jphein/fix/searcher-none-metadata
fix(searcher): guard against None metadata in CLI print path
2026-04-18 13:41:52 -07:00
bensig 49e9e04a12 fix: guard Layer3.search_raw against None doc/meta from ChromaDB (#1011)
Same class of bug as #1007: ChromaDB's query() can return None in the
documents and metadatas arrays when a drawer's HNSW vector entry exists
but its metadata/document rows haven't been materialized. The code in
Layer3.search_raw (mempalace/layers.py) calls meta.get("wing", ...),
meta.get("room", ...), meta.get("source_file", ...) directly without
null safety, so it raises:

  AttributeError: 'NoneType' object has no attribute 'get'

Two-line defensive coercion matching the pattern in #1009 /
PR #999 for searcher.py: meta = meta or {}, doc = doc or "".
The hit still appears with its real distance; source/wing/room
fall back to their fallback values where the metadata row is missing.

Frequently hit on chromadb 1.5.x (root cause #1006). Even after the
chromadb floor lands (#1010), partial-state results remain possible
during interrupted mines and schema upgrade boundaries, so the guard
is worth having on its own.

Fixes #1011.
2026-04-18 13:30:57 -07:00
bensig a2da0d6ef4 docs: use real ~/.claude/projects/ path in first-run help and README (#996)
The CLI help text and README told first-time users to mine from ~/chats/,
a path that doesn't exist on any machine. Real location where Claude
Code writes session JSONL is ~/.claude/projects/<escaped-project-path>/.

Updates three user-visible strings:
- mempalace/cli.py line 7 ("Two ways to ingest" block)
- mempalace/cli.py line 25 (Examples block)
- README.md line 58 (Quickstart)

Website guides (website/guide/mining.md, getting-started.md) still
reference ~/chats/ for ChatGPT/Slack export scenarios where that remains
a valid placeholder. Those can be a separate PR if the maintainers want
to tilt the website examples toward Claude Code specifically.

Fixes #996.
2026-04-18 13:30:50 -07:00
Igor Lins e Silva 89904ed03f fix(sources): address Copilot review on #1014
Five findings from the automated review, fixed with targeted tests where
behavior changed:

1. Transformation Protocol (transforms.py). The registry mixed a bytes-to-str
   transform (utf8_replace_invalid) with str-to-str transforms under a single
   Callable[..., str] type, misleading static type checkers and adapter
   authors. Introduced a Transformation Protocol with __call__(data: bytes|str)
   -> str and retyped the registry + get_transformation return.

2. Drawer-id collision risk (context.py). Switched _build_drawer_id from
   sha1[:16]=64 bits to sha256[:24]=96 bits. 64 bits sits uncomfortably
   close to the birthday bound for palace-sized corpora; 96 bits keeps the
   collision probability negligible while preserving the existing
   <prefix>_<chunk> layout adapters rely on.

3. Fresh-schema KG columns (knowledge_graph.py). source_drawer_id and
   adapter_name now live in the canonical CREATE TABLE so new palaces don't
   take an ALTER round-trip on first open. _migrate_schema stays for legacy
   palaces (SQLite has no ADD COLUMN IF NOT EXISTS, so PRAGMA introspection
   is still needed there).

4. Identity-shim comment (transforms.py). Comment said the adapter-specific
   transforms "raise if invoked without adapter context" but they return
   the input unchanged. Updated the comment to match the actual identity-
   shim behavior Copilot suggested.

5. Test docstring (test_sources.py). Comment mentioned default_factory=list
   but SourceRef.options uses default_factory=dict. Corrected.

Tests: 1020 passed (up from 1018), +2 new tests for the sha256 id shape
and the fresh-schema column presence on new palaces.
2026-04-18 17:17:50 -03:00
jp 3f0cfd5ed4 fix(mcp): guard tool_status/list_wings/list_rooms/get_taxonomy against None metadata
Four more MCP handlers iterate a metadata list and call m.get(...)
unconditionally. When the cache contains a None entry (drawers with no
metadata, common on older mining paths), the try block catches the
AttributeError and marks the response "partial: true" with an
error message — visible as {"error": "'NoneType' object has no
attribute 'get'", "partial": true} returned from mempalace_status even
though the palace data is otherwise fetchable.

Same m = m or {} guard we applied to searcher.py (d3a2d22, a51c3c2)
and miner.status() (66f08a1). None-metadata drawers now roll up under
the existing "unknown" fallback bucket instead of poisoning the
response with a misleading partial flag.

Regression test: mock the metadata cache with a None in the middle,
assert tool_status returns clean counts and no error/partial fields.
Verified the test fails without the guard.

998 tests pass.
2026-04-18 12:38:23 -07:00
Legion345 fa7fe1d51f chromadb at <2 to guard against breaking changes in future major versions 2026-04-18 12:05:46 -07:00
Legion345 d0c8ecd847 fix: upgrade chromadb to >=1.5.4 for python 3.13/3.14 compatibility 2026-04-18 12:05:46 -07:00
Igor Lins e Silva 552e9927b7 refactor(sources): RFC 002 §9 scaffolding — BaseSourceAdapter, registry, PalaceContext
Lands the read-side contract so third-party adapter authors (@Perseusxrltd,
@JakobSachs, @adv3nt3, @zendesk-thittesdorf, @mfhens, @roip, @MrDys) have a
stable target matching what RFC 001 §10 landed on the write side in #995.

Scope (this PR):

- mempalace/sources/base.py: BaseSourceAdapter ABC with kwargs-only
  ingest() / describe_schema() and default is_current() / source_summary()
  / close() (§1.1–1.2). Typed records: SourceRef, SourceItemMetadata,
  DrawerRecord, RouteHint, SourceSummary, AdapterSchema, FieldSpec (§1.3,
  §5.2). Error classes: SourceNotFoundError, AuthRequiredError,
  AdapterClosedError, TransformationViolationError, SchemaConformanceError
  (§2.7). Class-level identity contract: name / adapter_version /
  capabilities / supported_modes / declared_transformations /
  default_privacy_class (§2.1, §1.4, §1.5, §6).

- mempalace/sources/transforms.py: reference implementations of the 13
  reserved transformations (§1.4) — utf8_replace_invalid, newline_normalize,
  whitespace_trim, whitespace_collapse_internal, line_trim, line_join_spaces,
  blank_line_drop — as pure functions, plus identity shims for the six
  adapter-specific ones (strip_tool_chrome, tool_result_truncate,
  tool_result_omitted, spellcheck_user, synthesized_marker,
  speaker_role_assignment) that the conversations adapter will override
  when migrated. get_transformation(name) resolves by reserved name.

- mempalace/sources/registry.py: entry-point discovery via
  importlib.metadata.entry_points(group="mempalace.sources") + explicit
  register()/unregister() surface (§3.1–3.2). resolve_adapter_for_source()
  implements the §3.3 priority order; crucially, no auto-detection on the
  read side (§3.3 is explicit about that — user intent never inferred from
  on-disk artifacts).

- mempalace/sources/context.py: PalaceContext facade (§9) bundling the
  drawer/closet collections, knowledge graph, palace path, adapter identity,
  and progress hooks core passes into adapter.ingest(). upsert_drawer()
  applies the spec-mandated adapter_name/adapter_version stamps from §5.1.
  skip_current_item() signals laziness; emit() dispatches to hooks and
  swallows hook exceptions.

- mempalace/knowledge_graph.py: add_triple() gains optional source_drawer_id
  and adapter_name kwargs (§5.5). Backwards-compatible column migration
  auto-adds the new columns on open of a pre-RFC 002 palace (PRAGMA
  table_info then ALTER TABLE ADD COLUMN), matching the pattern used for
  any new palace-side provenance fields.

- pyproject.toml: mempalace.sources entry-point group declared. Empty on
  the first-party side for now — miners migrate in a follow-up; the group
  being present means third-party packages can begin registering today.

Out of scope (explicit follow-ups):

- miner.py → mempalace/sources/filesystem.py. Behavior-preserving rename
  that also moves READABLE_EXTENSIONS, detect_room(), detect_hall() into
  the adapter (§9). Larger refactor; lands separately.
- convo_miner.py + normalize.py → mempalace/sources/conversations.py. The
  format-detection if-chain in normalize.py becomes per-format plugins;
  declared_transformations enumerates what the current pipeline already
  does to source bytes (§1.4 existing-code mapping).
- Closet post-step wired into the conversations adapter (§1.7).
- CLI --source flag + --mode deprecation alias (§3.3).
- MCP mempalace_mine tool source parameter.
- AbstractSourceAdapterContractSuite (§7.1–7.3): byte-preservation round-
  trip and declared-transformation round-trip tests.
- Privacy-class floor enforcement (§6.2); depends on #389 for
  secrets_possible scanning.

Tests: 1018 passed (up from ~990 on develop), +27 targeted tests covering
the ABC instantiation rules, typed records, all reserved transformations,
the registry register/get/unregister surface, PalaceContext upsert + skip +
emit semantics, and both the new KG provenance kwargs and backwards-
compatible legacy-schema migration.

Refs: #989 (RFC 002 tracking), #990 (RFC 002 spec), #995 (RFC 001 §10
cleanup — sibling PR on the write side).
2026-04-18 16:05:32 -03:00
Igor Lins e Silva 2b9f17c401 Merge pull request #995 from MemPalace/refactor/rfc-001-cleanup
refactor(backends): RFC 001 §10 cleanup — typed results, PalaceRef, registry
2026-04-18 15:56:12 -03:00
jp 7690574dde fix(searcher): guard API path + closet loop against None metadata too
Per Copilot review on the CLI-only PR (#999): search_memories() has the
same vulnerability in two additional spots, since ChromaDB can return
None entries in the inner metadatas list for either the drawer query or
the closets query. Without guards, the API path crashes with:

    AttributeError: 'NoneType' object has no attribute 'get'

at either \`cmeta.get("source_file", "")\` in the closet boost lookup or
\`meta.get("source_file", "") or ""\` in the drawer scoring loop.

Applies the matching \`meta = meta or {}\` / \`cmeta = cmeta or {}\`
guard at both sites and adds an API-path regression test that mocks a
drawer query result with a None metadata entry and asserts both hits
render — the None-metadata hit with the existing \`"unknown"\` sentinel
values the scoring loop already writes for missing keys.

Verified both the new API test and the existing CLI test fail without
the guards (AttributeError) and pass with them.
2026-04-18 10:37:05 -07:00
jp feba7e8043 fix(miner): same None-metadata guard for status() histogram loop
`status()` walks `col.get(include=["metadatas"])` and buckets each drawer
into a `wing_rooms[wing][room]` histogram. The same ChromaDB return shape
fixed in the search print path — `None` entries in the `metadatas` list
for drawers with no stored metadata — crashes the status command with:

    AttributeError: 'NoneType' object has no attribute 'get'

Applies the matching ``m = m or {}`` guard so None-metadata drawers roll
up under the existing `?/?` fallback bucket instead of killing the
command mid-tally. Reproduced on a 135K-drawer palace where two drawers
had `metadata=None`; both now show under `WING: ? / ROOM: ?` in the
tally while the command prints the full histogram as designed.

Adds a regression test that feeds `status()` a fake collection whose
`get()` returns a `None` in the middle of the metadatas list and asserts
both the fallback bucket and the real wing render.
2026-04-18 10:26:11 -07:00
jp a3c778210b fix(searcher): guard against None metadata in CLI print path
`col.query(...)` can return `None` entries in the inner ``metadatas`` list
for drawers whose metadata was never set (older palaces, rows written
outside the normal mining path). The CLI `search()` function would render
earlier results successfully and then crash mid-loop with:

    AttributeError: 'NoneType' object has no attribute 'get'

at ``searcher.py:286`` — ``meta.get("source_file", "?")``. The user sees
partial output followed by a traceback, with no indication of which
drawers rendered OK and which were skipped.

Guard with ``meta = meta or {}`` inside the loop so entries with missing
metadata fall back to the existing ``"?"`` defaults instead of crashing,
matching the hit dict assembly in ``search_memories()`` which already
uses ``meta.get("wing", "unknown")`` etc. against the same data.

Adds a regression test that mocks a ChromaDB result with a ``None``
metadata entry in the middle of the inner list and asserts both result
blocks render to stdout.
2026-04-18 10:00:59 -07:00
mvalentsev 5189e0d652 test(i18n): add entity section smoke tests and schema invariants 2026-04-18 21:58:11 +05:00
mvalentsev 118cbe40bd feat(i18n): add entity detection to French locale 2026-04-18 21:56:45 +05:00
mvalentsev e17f219be8 feat(i18n): add entity detection to Spanish locale 2026-04-18 21:54:39 +05:00
Igor Lins e Silva efaa39bea9 test(backends): dedup update-length-validation tests
24bf97b (network-download fix) and my earlier Copilot-review commit both
added tests for the same ValueError. Keep the broader one that covers
both 'documents length' and 'metadatas length' mismatches; drop the
narrower duplicate.
2026-04-18 13:53:46 -03:00
mvalentsev 7006a6b42d feat(i18n): add entity detection to German locale 2026-04-18 21:53:11 +05:00
Igor Lins e Silva 61dd6e7d9c test(backends): fix Windows file-lock in cache-invalidation test
PermissionError [WinError 32] on Windows when Path.unlink() runs while
chromadb.PersistentClient still holds a handle on chroma.sqlite3. Rewrite
test_chroma_cache_invalidates_when_db_file_missing to prime
backend._clients/_freshness with a sentinel object instead of opening a
real PersistentClient, so the unlink runs against an unheld file.

The assertion is also corrected: after invalidation, ChromaBackend's
_client rebuilds a fresh PersistentClient which re-creates chroma.sqlite3
and re-stats it, so freshness ends up at the post-rebuild stat (not
(0, 0.0) as the assertion previously expected). The meaningful invariant
is "freshness advanced past the pre-unlink value AND the sentinel was
replaced", which the test now checks.

Ref: Windows CI failure on 995.
2026-04-18 13:52:56 -03:00
Igor Lins e Silva 74a31b70d3 Merge pull request #998 from MemPalace/fix/silent-transcript-drop
Fix silent transcript drop: .jsonl ingestion + 500 MB cap + tandem sweeper
2026-04-18 13:38:02 -03:00
copilot-swe-agent[bot] 24bf97bb65 fix(tests): avoid ONNX network download in update-length validation tests
test_base_collection_update_default_validates_list_lengths and
test_base_collection_update_default_rejects_mismatched_lengths were
spinning up a real ChromaBackend and calling add(documents=...), which
triggered ChromaDB's default ONNX embedding function and attempted a
network download — failing in offline/sandboxed CI.

BaseCollection.update() validates list lengths before any DB access, so
no items need to be pre-loaded for the length-check to fire. Switch both
tests to use _FakeCollection (same as the rest of the unit tests in this
file) so they are pure in-memory and network-free.

Also fixes a structural bug in test 1: collection._collection.add() was
accidentally placed inside the pytest.raises(ValueError) block, masking
the real assertion.

Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/55fc663e-b256-4b8b-88ce-4271560def8d

Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>
2026-04-18 16:23:58 +00:00