Commit Graph

321 Commits

Author SHA1 Message Date
grtninja 154e8a78ec fix: implement MCP ping health checks (#600) 2026-04-11 16:16:37 -07:00
Arnold Wender c4d8662de8 fix: correct token count estimate in compress summary (#609) 2026-04-11 16:16:34 -07:00
Arnold Wender 89c0a58271 fix: align cmd_compress dict keys with compression_stats() return values (#569)
* fix: align cmd_compress dict keys with compression_stats() return values

* test: align compress test mocks with actual compression_stats() keys

* fix: address review — add Total: assertion, move stats key test to test_dialect.py
2026-04-11 16:16:31 -07:00
Ahmad Othman Ammar Adi. 9c4b7302cc fix: skip unreachable reparse points in detect_rooms_from_folders (#558)
On Windows, projects containing git-submodule junctions or dev-drive
reparse points cause iterdir() to list the entry successfully but
Path.is_dir() to raise OSError when it calls stat() internally.

Reproducer: any Windows project with a submodule checked out as a
junction (e.g. skills/pr-perfect) crashes mempalace init with:
  OSError: [WinError 448] The path cannot be traversed because it
  contains an untrusted mount point

Fix: wrap every is_dir() call in detect_rooms_from_folders with
try/except OSError so the scanner skips inaccessible entries and
continues rather than aborting.

Covers both the top-level pass and the one-level-deep nested pass.
Two new tests mock the OSError on specific paths and verify the
function returns correct rooms from the remaining accessible entries.
2026-04-11 16:16:06 -07:00
Ben Sigman ad806cf3f8 Merge branch 'main' into fix/query-sanitizer-prompt-contamination 2026-04-10 22:39:31 -07:00
Ben Sigman 309d9b0095 Merge branch 'main' into fix/issue-339-338-silent-exceptions-pagination 2026-04-10 09:34:46 -07:00
Ben Sigman a9aaa45ccf Merge branch 'main' into fix/issue-347-codex-hook-message-counting 2026-04-10 09:25:58 -07:00
Ben Sigman 0cbbfba8ed Merge branch 'main' into fix/issue-339-338-silent-exceptions-pagination 2026-04-10 09:25:50 -07:00
Ben Sigman 91952044d6 Merge branch 'main' into fix/issue-347-codex-hook-message-counting 2026-04-10 09:23:37 -07:00
Ben Sigman 22454073a6 Merge branch 'main' into fix/issue-339-338-silent-exceptions-pagination 2026-04-10 09:23:01 -07:00
RhettOP 8a6e75eed8 fix: use len(rows) < batch_size early-exit instead of total-count loop bound
- Replace 'while offset < count/total' with 'while True' + break on short batch
- Fixes tool_list_rooms iterating over unfiltered col.count() when wing filter active
- Fixes all 4 paginated functions (tool_status, tool_list_wings, tool_list_rooms,
  tool_get_taxonomy) missing early-exit when batch smaller than batch_size
- Remove unused 'total' variable in tool_list_wings, tool_list_rooms, tool_get_taxonomy
  (replaced col.count() with accessibility check only)

Per bensig review comments on PR #371
2026-04-10 17:15:36 +01:00
MSL a868e16eaa fix: purge stale drawers before re-mine to avoid hnswlib segfault (#521)
Delete existing drawers for a file before re-inserting fresh chunks.
Converts re-mines from upsert (hnswlib updatePoint path, thread-unsafe
on macOS ARM + chromadb 0.6.3) into delete+insert (safe addPoint path).

Credit: @StefanKremen (#523)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 09:13:12 -07:00
bensig 60bea83e76 feat: mempalace migrate — recover palaces from different ChromaDB versions
Reads documents and metadata directly from ChromaDB's SQLite (bypassing
the API that fails on version-mismatched databases), then reimports into
a fresh palace using the currently installed ChromaDB.

Fixes the 3.0.0 → 3.1.0 upgrade path where chromadb was downgraded from
1.5.x to 0.6.x, breaking the on-disk storage format.

- Detects chromadb version from SQLite schema (0.6.x vs 1.x)
- Extracts all drawers with full metadata via raw SQL
- Builds fresh palace in temp dir, swaps atomically
- Backs up original palace before any changes
- Supports --dry-run to preview without modifying

Fixes #457
2026-04-10 08:50:40 -07:00
MSL e30c283fd8 style: ruff format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:49:35 -07:00
MSL 8930b45f97 fix: add --wing filter to dedup, document threshold semantics
Addresses community feedback:
- Add --wing flag to scope dedup to a single wing (catches cross-wing
  duplicates when same source mined into multiple wings)
- Document that threshold is cosine distance (not similarity) with
  guidance on values: 0.15 for near-identical, 0.3-0.4 for paraphrased
- Confirmed shutil import is present in repair.py (line 32)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:42:20 -07:00
MSL e641b80448 style: ruff check --fix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:31:56 -07:00
MSL 71e8f2d054 fix: prevent HNSW index bloat from duplicate add() calls (#525)
Root cause: convo_miner.py used collection.add() instead of upsert(),
so repeated mine runs pushed duplicate entries into the HNSW graph.
At scale (50K+ drawers) this causes link_lists.bin to grow to terabytes
and eventually segfault.

Changes:
- convo_miner.py: add() → upsert() (the one-line root cause fix)
- repair.py: new module — scan for corrupt IDs, prune them, or rebuild
  the HNSW index from scratch. Backs up only chroma.sqlite3 (not the
  bloated HNSW files). Recreates collection with hnsw:space=cosine.
- dedup.py: new module — detect and remove near-duplicate drawers from
  the same source file using cosine similarity. No API calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:14:22 -07:00
bensig 2d7d7e080f feat: mempalace migrate — recover palaces from different ChromaDB versions
Reads documents and metadata directly from ChromaDB's SQLite (bypassing
the API that fails on version-mismatched databases), then reimports into
a fresh palace using the currently installed ChromaDB.

Fixes the 3.0.0 → 3.1.0 upgrade path where chromadb was downgraded from
1.5.x to 0.6.x, breaking the on-disk storage format.

- Detects chromadb version from SQLite schema (0.6.x vs 1.x)
- Extracts all drawers with full metadata via raw SQL
- Builds fresh palace in temp dir, swaps atomically
- Backs up original palace before any changes
- Supports --dry-run to preview without modifying

Fixes #457
2026-04-10 00:08:28 -07:00
Ben Sigman a3fec4f565 Merge branch 'main' into fix/issue-339-338-silent-exceptions-pagination 2026-04-09 23:31:45 -07:00
matrix9neonebuchadnezzar2199-sketch f96300bb86 style: fix ruff formatting 2026-04-10 05:02:48 +09:00
Kevin Pulikkottil 2981433535 fix: add mcp command with setup guidance (#315)
* fix: add mcp command with setup guidance

* fix: include --palace guidance in mcp command output

* fix: make mcp guidance commands copy-pastable

---------

Co-authored-by: Milla J <millaj1217@gmail.com>
2026-04-09 11:21:18 -07:00
Milla J 3919f13523 chore: bump version to 3.1.0 (#409)
PyPI release cut covering 39 merged PRs since v3.0.0 on 2026-04-06.
Highlights: Claude/Codex plugin packaging (#270), security hardening (#387),
honest AAAK stats + benchmark corrections (#147), Windows compatibility fixes,
Knowledge Graph WAL mode + batching, 10K limit safety caps, and much more.

See GitHub release notes for full changelog.

Co-authored-by: milla-jovovich <noreply@github.com>
2026-04-09 11:04:24 -07:00
bensig b1adc047e6 fix: address Octocode review — move size check, add tests for all 3 fixes
- Move file size check before try block so IOError propagates cleanly
  (not caught by the except OSError handler below it)
- Wrap os.path.getsize in its own try/except to preserve existing
  test_normalize_io_error behavior on missing files
- Add test_normalize_rejects_large_file (mocked getsize)
- Add test_null_arguments_does_not_hang (#394)
- Add test_cmd_repair_trailing_slash_does_not_recurse (#395)

532 tests pass locally, 0 regressions.
2026-04-09 10:40:53 -07:00
RhettOP df464a991d style: fix ruff formatting in mcp_server.py 2026-04-09 18:26:07 +01:00
bensig 0720fb84f8 fix: MCP null args hang, repair infinite recursion, OOM on large files
Three critical bugfixes:

1. MCP server hangs on null arguments (#394) — `params.get("arguments", {})`
   returns None when JSON has `"arguments": null`. Changed to `or {}`.

2. cmd_repair infinite recursion (#395) — trailing slash on palace_path
   caused backup_path to be inside the source dir. Strip trailing sep.

3. OOM on large transcript files (#396) — split_mega_files.py and
   normalize.py load entire files into memory. Added 500MB safety limit
   with clear skip/error messages.

Closes #394, #395, #396.
2026-04-09 10:05:37 -07:00
Ben Sigman e293e290d5 Merge branch 'main' into fix/mcp-protocol-version-negotiation 2026-04-09 09:15:06 -07:00
bensig c2308a1e36 fix: address code review — restore mtime check, bound metadata reads, harden security
Review fixes (from Sage's review):
- Restore mtime check in file_already_mined (check_mtime=True for miner)
- Restore limit=10000 on MCP metadata fetches to prevent OOM on large palaces
- Apply _SAFE_NAME_RE regex in sanitize_name (was dead code)
- Drop raw_aaak metadata duplication in diary_write
- chmod 0o700 on WAL dir, 0o600 on WAL file
- Add check_same_thread=False on KnowledgeGraph SQLite connection
- Remove __del__ (unreliable) and dead PRAGMA foreign_keys=ON
2026-04-09 08:52:24 -07:00
bensig 0717caea5c fix: make drawer_id deterministic for idempotent writes
Remove datetime.now() from drawer_id hash so same content + wing + room
always produces the same ID. This enables the idempotency check that
returns "already_exists" on duplicate writes.
2026-04-09 08:26:47 -07:00
bensig 32297fdae8 fix: remove metadata cache that broke test isolation
The 30s TTL metadata cache returned stale data between test runs and
after write operations. Reverted to direct col.get() reads which match
the original behavior and pass all tests.
2026-04-09 08:22:17 -07:00
bensig 455871a0ef fix: align cache variable names with test fixtures, restore full SKIP_DIRS
- _client → _client_cache to match conftest.py reset fixture
- _get_collection now uses _get_client() return value instead of stale ref
- Restore .pytest_cache and other dirs missing from palace.py SKIP_DIRS
2026-04-09 08:13:32 -07:00
Ben Sigman 725fa2b6f1 Merge branch 'main' into fix/query-sanitizer-prompt-contamination 2026-04-09 08:11:39 -07:00
Ben Sigman 70f2160bd6 Merge branch 'main' into fix/mcp-protocol-version-negotiation 2026-04-09 08:09:57 -07:00
Ben Sigman 0126f750d7 Merge branch 'main' into fix/issue-339-338-silent-exceptions-pagination 2026-04-09 08:09:38 -07:00
bensig 1d19dfc9d5 security: harden inputs, fix shell injection, optimize DB access
- Fix command injection in hook script (pass paths via sys.argv)
- Add sanitize_name/sanitize_content validators in config.py
- Add 10MB file size guard + symlink skip in miners
- Fix SQLite connection leak in knowledge_graph.py (reuse connection)
- Use `with conn:` for proper transaction handling
- Consolidate shared palace operations into palace.py
- Add write-ahead log for audit trail on writes/deletes
- Add metadata cache with 30s TTL for status/taxonomy calls
- Upgrade md5 → sha256 for drawer/triple IDs
- Harden file permissions (0o700/0o600)
- Pin chromadb>=0.5.0,<0.7

Based on PR #252 by @anthonyonazure with lint fixes applied.

Co-Authored-By: anthonyonazure <anthonyonazure@users.noreply.github.com>
2026-04-09 08:06:30 -07:00
matrix9neonebuchadnezzar2199-sketch 7509a72502 fix: mitigate system prompt contamination in search queries (#333)
Addresses Issue #333: AI agents prepending system prompts to search queries
causes embedding retrieval to collapse (89.8% → 1.0% R@10).

Mitigation approach (減災):
- New query_sanitizer.py with 4-stage pipeline:
  Step 1: passthrough for short queries (≤200 chars)
  Step 2: question extraction (finds ? sentences) → ~85-89% recovery
  Step 3: tail sentence extraction → ~80-89% recovery
  Step 4: tail truncation fallback → ~70-80% recovery
  Worst case without sanitizer: 1.0% (catastrophic)
  Worst case with sanitizer: ~70-80% (survivable)

- mcp_server.py: tool_search applies sanitizer before ChromaDB query
- MCP schema: query description warns agents not to include prompts
- New 'context' parameter separates background info from search intent
- Sanitizer metadata included in response when triggered

22 new tests covering all pipeline stages and real-world scenarios.

Made-with: Cursor
2026-04-09 23:28:59 +09:00
Luna Mira e5440e31af fix: count Codex user_message turns in _count_human_messages (#347)
The _count_human_messages() function previously only handled Claude Code
transcript format: {"message": {"role": "user", "content": "..."}}

Codex CLI transcripts use a different schema:
{"type": "event_msg", "payload": {"type": "user_message", "message": "..."}}

This meant the stop-hook auto-save threshold never triggered for Codex
sessions because the count always returned 0.

Added detection for the Codex format so both Claude Code and Codex CLI
transcripts are counted correctly.
2026-04-09 13:33:45 +01:00
Openclaw d20c8ab992 fix: paginate large collection reads and surface errors in MCP tools (#339, #338) 2026-04-09 13:33:45 +01:00
Tal Muskal 667d895fb9 Merge branch 'main' into main 2026-04-09 08:17:39 +03:00
virgil-at-biocompute 950d52baf2 fix: negotiate MCP protocol version instead of hardcoding
The initialize handler hardcoded protocolVersion "2024-11-05", which
causes newer MCP clients (e.g. Claude Code) to reject the connection
when they negotiate "2025-11-25" or later.

Echo the client's requested version if it is in the supported set,
otherwise fall back to the latest supported version. This keeps
backwards compatibility with older clients while allowing newer ones
to connect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:57:32 -04:00
github-actions[bot] 26835e30ef chore: bump version to 3.0.14 2026-04-08 23:54:30 +00:00
Ben Sigman a5d9baf589 Merge pull request #165 from adv3nt3/fix/miner-room-stats
fix: use actual detected room in mine summary stats
2026-04-08 16:54:22 -07:00
Ben Sigman d26606b2f9 Merge branch 'main' into main 2026-04-08 14:07:33 -07:00
github-actions[bot] b370e86f96 chore: bump version to 3.0.13 2026-04-08 20:56:15 +00:00
Tal Muskal dbf456b73b Merge branch 'main' into main 2026-04-08 22:02:50 +03:00
github-actions[bot] cef5994ea6 chore: bump version to 3.0.12 2026-04-08 18:58:39 +00:00
Tal Muskal 4ce0d8491e fix: pin ruff <0.5 in CI to match local formatting, reset version to 3.0.11
CI was installing latest ruff (0.15.x) which has different formatting
rules than our local 0.4.x. Pin to ruff>=0.4.0,<0.5 for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 21:42:02 +03:00
github-actions[bot] ea32c2e659 chore: bump version to 3.0.12 2026-04-08 18:38:38 +00:00
Tal Muskal abd52534bb test: bring coverage to 85%, set threshold to 85, reset version to 3.0.11
- Add tests for config, convo_miner, spellcheck, knowledge_graph
- Fix Windows PermissionError in test cleanup (chromadb file locks)
- Add UTF-8 encoding to split_mega_files, entity_registry, hooks_cli
- Fix mcp_server parse_known_args logging for unknown args
- Set coverage threshold to 85 in pyproject.toml and CI
- Reset all version files to 3.0.11

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 21:38:12 +03:00
github-actions[bot] f47ca8a122 chore: bump version to 3.0.15 2026-04-08 18:23:32 +00:00
Igor Lins e Silva edf8f36099 fix: use parse_known_args to allow importing mcp_server during pytest collection 2026-04-08 15:18:40 -03:00