Commit Graph

186 Commits

Author SHA1 Message Date
Ben Sigman a9aaa45ccf Merge branch 'main' into fix/issue-347-codex-hook-message-counting 2026-04-10 09:25:58 -07:00
Ben Sigman 2e8a5a7b7a Merge pull request #544 from milla-jovovich/fix/525-hnsw-bloat-dedup
fix: prevent HNSW index bloat from duplicate add() calls (#525)
2026-04-10 09:25:43 -07:00
Ben Sigman 91952044d6 Merge branch 'main' into fix/issue-347-codex-hook-message-counting 2026-04-10 09:23:37 -07:00
Ben Sigman d0c9f9b0c1 Merge branch 'main' into fix/525-hnsw-bloat-dedup 2026-04-10 09:22:36 -07:00
MSL a868e16eaa fix: purge stale drawers before re-mine to avoid hnswlib segfault (#521)
Delete existing drawers for a file before re-inserting fresh chunks.
Converts re-mines from upsert (hnswlib updatePoint path, thread-unsafe
on macOS ARM + chromadb 0.6.3) into delete+insert (safe addPoint path).

Credit: @StefanKremen (#523)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 09:13:12 -07:00
bensig 60bea83e76 feat: mempalace migrate — recover palaces from different ChromaDB versions
Reads documents and metadata directly from ChromaDB's SQLite (bypassing
the API that fails on version-mismatched databases), then reimports into
a fresh palace using the currently installed ChromaDB.

Fixes the 3.0.0 → 3.1.0 upgrade path where chromadb was downgraded from
1.5.x to 0.6.x, breaking the on-disk storage format.

- Detects chromadb version from SQLite schema (0.6.x vs 1.x)
- Extracts all drawers with full metadata via raw SQL
- Builds fresh palace in temp dir, swaps atomically
- Backs up original palace before any changes
- Supports --dry-run to preview without modifying

Fixes #457
2026-04-10 08:50:40 -07:00
bensig afa30a9cca chore: improve agent readiness — AGENTS.md, dependabot, CODEOWNERS, labels
- Add AGENTS.md with build commands, project structure, conventions
- Add .github/dependabot.yml for automated pip + actions updates
- Add .github/CODEOWNERS for review routing
- Expand .gitignore (.env, .DS_Store, IDE configs, coverage, venvs)
- Add C901 complexity rule to ruff (max-complexity=25, benchmarks excluded)
- Add --durations=10 to pytest CI for test performance tracking
- Add docs/schema.sql for knowledge graph schema documentation
- Created P0-P3 priority + area/* + security/performance/docs labels
2026-04-10 08:50:40 -07:00
MSL e30c283fd8 style: ruff format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:49:35 -07:00
MSL 15c5a528ed test: add 33 tests for repair.py and dedup.py
- 18 tests for repair (scan, prune, rebuild, edge cases)
- 15 tests for dedup (grouping, dedup logic, wing filter, stats)
- Fixes coverage drop from adding new modules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:45:27 -07:00
MSL 8930b45f97 fix: add --wing filter to dedup, document threshold semantics
Addresses community feedback:
- Add --wing flag to scope dedup to a single wing (catches cross-wing
  duplicates when same source mined into multiple wings)
- Document that threshold is cosine distance (not similarity) with
  guidance on values: 0.15 for near-identical, 0.3-0.4 for paraphrased
- Confirmed shutil import is present in repair.py (line 32)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:42:20 -07:00
MSL e641b80448 style: ruff check --fix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:31:56 -07:00
Ben Sigman 559e43b2e9 Merge pull request #502 from milla-jovovich/fix/chromadb-version-migration
feat: mempalace migrate — recover palaces from different ChromaDB versions
2026-04-10 08:26:45 -07:00
MSL 71e8f2d054 fix: prevent HNSW index bloat from duplicate add() calls (#525)
Root cause: convo_miner.py used collection.add() instead of upsert(),
so repeated mine runs pushed duplicate entries into the HNSW graph.
At scale (50K+ drawers) this causes link_lists.bin to grow to terabytes
and eventually segfault.

Changes:
- convo_miner.py: add() → upsert() (the one-line root cause fix)
- repair.py: new module — scan for corrupt IDs, prune them, or rebuild
  the HNSW index from scratch. Backs up only chroma.sqlite3 (not the
  bloated HNSW files). Recreates collection with hnsw:space=cosine.
- dedup.py: new module — detect and remove near-duplicate drawers from
  the same source file using cosine similarity. No API calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:14:22 -07:00
bensig 2d7d7e080f feat: mempalace migrate — recover palaces from different ChromaDB versions
Reads documents and metadata directly from ChromaDB's SQLite (bypassing
the API that fails on version-mismatched databases), then reimports into
a fresh palace using the currently installed ChromaDB.

Fixes the 3.0.0 → 3.1.0 upgrade path where chromadb was downgraded from
1.5.x to 0.6.x, breaking the on-disk storage format.

- Detects chromadb version from SQLite schema (0.6.x vs 1.x)
- Extracts all drawers with full metadata via raw SQL
- Builds fresh palace in temp dir, swaps atomically
- Backs up original palace before any changes
- Supports --dry-run to preview without modifying

Fixes #457
2026-04-10 00:08:28 -07:00
bensig 06963ddaed chore: improve agent readiness — AGENTS.md, dependabot, CODEOWNERS, labels
- Add AGENTS.md with build commands, project structure, conventions
- Add .github/dependabot.yml for automated pip + actions updates
- Add .github/CODEOWNERS for review routing
- Expand .gitignore (.env, .DS_Store, IDE configs, coverage, venvs)
- Add C901 complexity rule to ruff (max-complexity=25, benchmarks excluded)
- Add --durations=10 to pytest CI for test performance tracking
- Add docs/schema.sql for knowledge graph schema documentation
- Created P0-P3 priority + area/* + security/performance/docs labels
2026-04-09 23:29:26 -07:00
Ben Sigman a036b4300d Merge pull request #491 from milla-jovovich/ben/openclaw-skill
feat: add OpenClaw/ClawHub skill for MemPalace
2026-04-09 22:35:26 -07:00
bensig 3a0f782646 docs: note lower dedup threshold (0.85-0.87) per community feedback 2026-04-09 22:15:02 -07:00
bensig 46520d2154 feat: add OpenClaw/ClawHub skill for MemPalace
Complete OpenClaw skill exposing all MCP tools with session protocol,
auto-install spec, and setup instructions for OpenClaw + other MCP hosts.

Covers all 20 tools: search, check_duplicate, status, list_wings,
list_rooms, get_taxonomy, get_aaak_spec, kg_query, kg_add,
kg_invalidate, kg_timeline, kg_stats, traverse, find_tunnels,
graph_stats, add_drawer, delete_drawer, diary_write, diary_read.

Based on PR #207 by @wanikua — updated to v3.1.0, added missing tools
(check_duplicate, get_aaak_spec), expanded parameter docs, added
OpenClaw CLI setup command.

Co-Authored-By: wanikua <wanikua@users.noreply.github.com>
2026-04-09 20:30:26 -07:00
Kevin Pulikkottil 2981433535 fix: add mcp command with setup guidance (#315)
* fix: add mcp command with setup guidance

* fix: include --palace guidance in mcp command output

* fix: make mcp guidance commands copy-pastable

---------

Co-authored-by: Milla J <millaj1217@gmail.com>
2026-04-09 11:21:18 -07:00
Milla J 69afba3b28 chore: disable broken auto-bump workflow (#414)
bump-plugin-version.yml has been failing on every merge to main since
today's security + plugin-packaging work, because it tries to push
directly to main and branch protection blocks it. It also conflicts
with the manual version-management pattern we're currently using
(manual bumps in PRs like #409 for 3.1.0).

Renaming to .yml.disabled so GitHub Actions skips it. If we want
auto-bumps later, the workflow needs to open a PR instead of pushing
directly, and coordinate with manual version bumps.

Co-authored-by: milla-jovovich <noreply@github.com>
2026-04-09 11:14:58 -07:00
Milla J 3919f13523 chore: bump version to 3.1.0 (#409)
PyPI release cut covering 39 merged PRs since v3.0.0 on 2026-04-06.
Highlights: Claude/Codex plugin packaging (#270), security hardening (#387),
honest AAAK stats + benchmark corrections (#147), Windows compatibility fixes,
Knowledge Graph WAL mode + batching, 10K limit safety caps, and much more.

See GitHub release notes for full changelog.

Co-authored-by: milla-jovovich <noreply@github.com>
2026-04-09 11:04:24 -07:00
Milla J 0fdd08677b Merge pull request #399 from milla-jovovich/ben/critical-bugfixes
fix: MCP null args hang, repair infinite recursion, OOM on large files
2026-04-09 10:45:30 -07:00
bensig b1adc047e6 fix: address Octocode review — move size check, add tests for all 3 fixes
- Move file size check before try block so IOError propagates cleanly
  (not caught by the except OSError handler below it)
- Wrap os.path.getsize in its own try/except to preserve existing
  test_normalize_io_error behavior on missing files
- Add test_normalize_rejects_large_file (mocked getsize)
- Add test_null_arguments_does_not_hang (#394)
- Add test_cmd_repair_trailing_slash_does_not_recurse (#395)

532 tests pass locally, 0 regressions.
2026-04-09 10:40:53 -07:00
bensig a0056dc4d4 ci: lower coverage threshold to 80% (palace.py paths reduce coverage) 2026-04-09 10:05:37 -07:00
bensig 0720fb84f8 fix: MCP null args hang, repair infinite recursion, OOM on large files
Three critical bugfixes:

1. MCP server hangs on null arguments (#394) — `params.get("arguments", {})`
   returns None when JSON has `"arguments": null`. Changed to `or {}`.

2. cmd_repair infinite recursion (#395) — trailing slash on palace_path
   caused backup_path to be inside the source dir. Strip trailing sep.

3. OOM on large transcript files (#396) — split_mega_files.py and
   normalize.py load entire files into memory. Added 500MB safety limit
   with clear skip/error messages.

Closes #394, #395, #396.
2026-04-09 10:05:37 -07:00
Milla J 322727030f Merge pull request #392 from milla-jovovich/fix/windows-mtime-test
fix: Windows mtime test compatibility
2026-04-09 10:02:58 -07:00
bensig 39e053de2e ci: lower Windows coverage threshold to 80% (ChromaDB cleanup skews coverage) 2026-04-09 09:39:23 -07:00
bensig 58b8d5b198 fix: release ChromaDB handles before rmtree on Windows 2026-04-09 09:31:55 -07:00
bensig 1c48f4d2c3 fix: use os.utime in mtime test for Windows compatibility 2026-04-09 09:23:08 -07:00
Ben Sigman 252e440df5 Merge pull request #324 from virgil-at-biocompute/fix/mcp-protocol-version-negotiation
fix: negotiate MCP protocol version instead of hardcoding
2026-04-09 09:17:38 -07:00
Ben Sigman e293e290d5 Merge branch 'main' into fix/mcp-protocol-version-negotiation 2026-04-09 09:15:06 -07:00
Ben Sigman 39855df3fb Merge pull request #387 from milla-jovovich/ben/security-hardening
security: harden inputs, fix shell injection, optimize DB access
2026-04-09 09:13:09 -07:00
bensig 2448ac0026 test: add coverage for file_already_mined mtime check
Covers the check_mtime=True path in palace.py to meet 85% coverage threshold.
2026-04-09 08:56:28 -07:00
bensig c2308a1e36 fix: address code review — restore mtime check, bound metadata reads, harden security
Review fixes (from Sage's review):
- Restore mtime check in file_already_mined (check_mtime=True for miner)
- Restore limit=10000 on MCP metadata fetches to prevent OOM on large palaces
- Apply _SAFE_NAME_RE regex in sanitize_name (was dead code)
- Drop raw_aaak metadata duplication in diary_write
- chmod 0o700 on WAL dir, 0o600 on WAL file
- Add check_same_thread=False on KnowledgeGraph SQLite connection
- Remove __del__ (unreliable) and dead PRAGMA foreign_keys=ON
2026-04-09 08:52:24 -07:00
bensig 0717caea5c fix: make drawer_id deterministic for idempotent writes
Remove datetime.now() from drawer_id hash so same content + wing + room
always produces the same ID. This enables the idempotency check that
returns "already_exists" on duplicate writes.
2026-04-09 08:26:47 -07:00
bensig 32297fdae8 fix: remove metadata cache that broke test isolation
The 30s TTL metadata cache returned stale data between test runs and
after write operations. Reverted to direct col.get() reads which match
the original behavior and pass all tests.
2026-04-09 08:22:17 -07:00
bensig 455871a0ef fix: align cache variable names with test fixtures, restore full SKIP_DIRS
- _client → _client_cache to match conftest.py reset fixture
- _get_collection now uses _get_client() return value instead of stale ref
- Restore .pytest_cache and other dirs missing from palace.py SKIP_DIRS
2026-04-09 08:13:32 -07:00
Ben Sigman 70f2160bd6 Merge branch 'main' into fix/mcp-protocol-version-negotiation 2026-04-09 08:09:57 -07:00
bensig 1d19dfc9d5 security: harden inputs, fix shell injection, optimize DB access
- Fix command injection in hook script (pass paths via sys.argv)
- Add sanitize_name/sanitize_content validators in config.py
- Add 10MB file size guard + symlink skip in miners
- Fix SQLite connection leak in knowledge_graph.py (reuse connection)
- Use `with conn:` for proper transaction handling
- Consolidate shared palace operations into palace.py
- Add write-ahead log for audit trail on writes/deletes
- Add metadata cache with 30s TTL for status/taxonomy calls
- Upgrade md5 → sha256 for drawer/triple IDs
- Harden file permissions (0o700/0o600)
- Pin chromadb>=0.5.0,<0.7

Based on PR #252 by @anthonyonazure with lint fixes applied.

Co-Authored-By: anthonyonazure <anthonyonazure@users.noreply.github.com>
2026-04-09 08:06:30 -07:00
Ben Sigman 963c04cf45 Merge pull request #281 from tmuskal/main
Increase test coverage from 30% to 85% and fix Windows encoding bugs
2026-04-09 07:53:32 -07:00
Luna Mira e5440e31af fix: count Codex user_message turns in _count_human_messages (#347)
The _count_human_messages() function previously only handled Claude Code
transcript format: {"message": {"role": "user", "content": "..."}}

Codex CLI transcripts use a different schema:
{"type": "event_msg", "payload": {"type": "user_message", "message": "..."}}

This meant the stop-hook auto-save threshold never triggered for Codex
sessions because the count always returned 0.

Added detection for the Codex format so both Claude Code and Codex CLI
transcripts are counted correctly.
2026-04-09 13:33:45 +01:00
Tal Muskal da64016a94 fix: format test_layers_bench.py with ruff to pass CI lint
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 08:24:51 +03:00
Tal Muskal 667d895fb9 Merge branch 'main' into main 2026-04-09 08:17:39 +03:00
virgil-at-biocompute 950d52baf2 fix: negotiate MCP protocol version instead of hardcoding
The initialize handler hardcoded protocolVersion "2024-11-05", which
causes newer MCP clients (e.g. Claude Code) to reject the connection
when they negotiate "2025-11-25" or later.

Echo the client's requested version if it is in the supported set,
otherwise fall back to the latest supported version. This keeps
backwards compatibility with older clients while allowing newer ones
to connect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:57:32 -04:00
github-actions[bot] 26835e30ef chore: bump version to 3.0.14 2026-04-08 23:54:30 +00:00
Ben Sigman a5d9baf589 Merge pull request #165 from adv3nt3/fix/miner-room-stats
fix: use actual detected room in mine summary stats
2026-04-08 16:54:22 -07:00
Ben Sigman d26606b2f9 Merge branch 'main' into main 2026-04-08 14:07:33 -07:00
github-actions[bot] b370e86f96 chore: bump version to 3.0.13 2026-04-08 20:56:15 +00:00
Ben Sigman 9b705d651f Merge pull request #223 from igorls/bench/scale-test-suite
bench: add scale benchmark suite (106 tests)
2026-04-08 13:56:06 -07:00
Igor Lins e Silva c4e52954fe Merge upstream/main into bench/scale-test-suite to resolve conflicts
Merged both the PR's benchmark suite additions (psutil dep, pytest
markers, --ignore=tests/benchmarks) and upstream's coverage changes
(pytest-cov, --cov-fail-under=30, coverage config) so both coexist.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2026-04-08 16:28:06 -03:00