Commit Graph

688 Commits

Author SHA1 Message Date
Ben Sigman 1c62045b22 Merge pull request #21 from sheetsync/bugfix/mcp-docs-alignment
docs: align MCP setup examples with shipped server
2026-04-07 13:54:00 -07:00
Ben Sigman f1f0a4c966 Merge pull request #129 from minimexat/fix/windows-unicode-encoding
fix: replace Unicode separator character for Windows compatibility
2026-04-07 13:51:58 -07:00
Ben Sigman 30699b85a9 Merge pull request #42 from adv3nt3/fix/entity-registry-dead-code
fix: remove dead code and duplicate set items in entity_registry.py
2026-04-07 13:51:55 -07:00
Ben Sigman 6aa4272b65 Merge pull request #53 from adv3nt3/fix/md5-usedforsecurity-miners
fix: mark MD5 as non-security in miner drawer ID generation
2026-04-07 13:51:52 -07:00
Ben Sigman 3e6fc6ed9f Merge pull request #83 from renatoliveira/main
fix: update input prompt for entity confirmation in entity_detector.py
2026-04-07 13:51:50 -07:00
Ben Sigman c312ff539c Merge pull request #55 from salmanmkc/upgrade-github-actions-node24
Upgrade GitHub Actions for Node 24 compatibility
2026-04-07 13:46:49 -07:00
f-hoedl d214f6a854 fix: replace Unicode separator in convo_miner.py for Windows compatibility
Replace the ─ (U+2500) separator character with - in convo_miner.py.
Windows terminals using cp1252 encoding raise UnicodeEncodeError when
printing this character unless PYTHONUTF8=1 is set explicitly.

Fixes crash on Windows: UnicodeEncodeError: 'charmap' codec can't encode
character '\u2500'
2026-04-07 21:55:34 +02:00
Ben Sigman 45c2c92c4a Merge pull request #123 from milla-jovovich/fix/non-interactive-init
fix: --yes flag skips all interactive prompts in init
2026-04-07 12:18:03 -07:00
bensig caa1169f04 fix: --yes flag now skips room confirmation in init
Pass yes flag through to detect_rooms_local so init --yes
skips both entity detection AND room approval prompts.
Agents and CI can now run init without interactive input.

Fixes #8
2026-04-07 12:16:46 -07:00
Ben Sigman 01a21dd60f Merge pull request #78 from ac-opensource/feature/respect-gitignore-mining
Respect nested .gitignore rules when mining project files
2026-04-07 12:15:23 -07:00
Ben Sigman b49cfbf164 Merge pull request #119 from milla-jovovich/fix/repair-split-rooms
fix: repair command, split args, Claude export, room keywords
2026-04-07 12:03:49 -07:00
bensig 5e8a039e7c fix: repair command, split args, Claude export, room keywords
- Add `mempalace repair` command to rebuild vector index from SQLite
  when HNSW files are corrupted after crash/interrupt (fixes #74, #72, #96)
- Fix split command passing dir as positional instead of --source
  flag to split_mega_files (fixes #63)
- Handle Claude privacy export format (array of conversation objects
  with chat_messages inside each) in normalize.py (fixes #63)
- Persist room keywords in mempalace.yaml so mine can match files
  in docs/ to room "documentation" (fixes #108)
2026-04-07 12:02:34 -07:00
Ben Sigman d1afecc478 Merge pull request #114 from milla-jovovich/fix/security-mining-chromadb
fix: shell injection in hooks, Claude Code mining, chromadb pin
2026-04-07 11:53:12 -07:00
bensig 186bb2e3d1 fix: shell injection in hooks, Claude Code mining, chromadb pin
- hooks/mempal_save_hook.sh: pass $TRANSCRIPT_PATH as sys.argv
  instead of interpolating into python -c string (fixes #110)
- normalize.py: accept type "user" in addition to "human" for
  Claude Code JSONL sessions (fixes #111)
- convo_miner.py: skip tool-results/, memory/ dirs and .meta.json
  files when scanning for conversations (fixes #111)
- pyproject.toml: pin chromadb>=0.4.0,<1 to avoid crashing 1.x
  builds on macOS ARM64 (fixes #100)
2026-04-07 11:45:51 -07:00
Milla Jovovich aa10f8fbf1 README: honest update from Milla & Ben — own the mistakes, fix the claims
The community caught real problems within hours of launch. Addressing them
directly:

- Added prominent "A Note from Milla & Ben" section at top owning the issues
- Fixed AAAK section: removed "lossless" claim, removed bogus token example,
  honest about lossy nature and 84.2% regression on LongMemEval
- Headline benchmark table: clearly labeled as "raw mode" (the 96.6% number)
- Removed misleading "100%" headline (still real but rerank pipeline not
  in public scripts yet — addressing)
- Removed misleading "+34% palace boost" headline (it's metadata filtering,
  real but not a novel mechanism)
- Marked Contradiction Detection as "experimental, not yet wired into KG ops"
- Closet legend now notes plain-text summaries in v3.0.0, AAAK closets coming
- Intro pillars rewritten honestly — raw verbatim is the win, AAAK is
  experimental compression layer

Thank you to @panuhorsmalahti (#43), @lhl (#27), @gizmax (#39) and everyone
who filed issues in the first 48 hours. Brutal honest criticism is exactly
what makes open source work.
2026-04-07 11:08:53 -07:00
JayadityaGit ca2549da5a docs: add Gemini to MCP header in README 2026-04-07 22:12:20 +05:30
JayadityaGit 2df6c1b121 docs: add Gemini CLI setup guide and integration section 2026-04-07 22:08:26 +05:30
Oussama b3c48d0775 docs: add beginner-friendly hooks tutorial for issue #20 2026-04-07 17:13:39 +01:00
ac-opensource c8c220d789 fix: support nested .gitignore rules during mining 2026-04-08 00:02:21 +08:00
Alexey Samosadov 8fbb6178dd fix: coerce MCP integer arguments to native Python int
ChromaDB requires native `int` for `n_results`, but the MCP JSON-RPC
transport can deliver JSON integers as floats or strings depending on
the client implementation. This causes `mempalace_search` (and any
tool with integer params like `max_hops`, `last_n`) to fail with:

  "Expected requested number of results to be a int, got 3 in query."

Fix: auto-coerce tool arguments based on the declared `input_schema`
types before calling handlers. This covers all current and future
tools generically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 17:48:03 +03:00
Renato Oliveira cfe878204e fix: update input prompt for entity confirmation in entity_detector.py
Refine the prompt for distinguishing between person and project entities by adjusting the wording for clarity.
2026-04-07 11:41:15 -03:00
ac-opensource 9b9daa9b4b fix: respect .gitignore during project mining 2026-04-07 22:26:06 +08:00
Maurice Wen 0e77981dec fix: batch ChromaDB reads to avoid SQLite variable limit
col.get() without limit generates SELECT ... WHERE id IN (...) with all
document IDs, which exceeds SQLite's ~999 variable limit when a palace
has more than ~1000 drawers.

This breaks both `mempalace compress` and `mempalace wake-up` on large
palaces. Reproduced on a 13880-file codebase (242K+ drawers).

Fix: paginate reads in batches of 500 using ChromaDB's offset/limit
parameters in both Layer1.generate() and cmd_compress().
2026-04-07 21:40:12 +08:00
adv3nt3 d4e1945f77 feat: add OpenAI Codex CLI JSONL normalizer
Add _try_codex_jsonl parser for Codex CLI session files stored at
~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl.

Uses only event_msg entries (user_message / agent_message) which
represent the canonical conversation turns. response_item entries
are intentionally skipped — they include synthetic context injections
(environment_context) and can duplicate real messages when both
representations are present in the same rollout.

Format based on Codex source tests (codex-rs/rollout/src/recorder_tests.rs).
Requires session_meta header to reduce false positives on other JSONL.

Refs: #59
2026-04-07 14:50:04 +02:00
Salman Muin Kayser Chishti ae7722e541 Upgrade GitHub Actions for Node 24 compatibility
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-04-07 11:57:44 +00:00
adv3nt3 312d380aab fix: narrow bare except Exception to specific types where safe
Replace broad except Exception with specific exception types in 6
sites where the expected failure mode is well-defined:

- normalize.py: OSError for file read, ImportError for optional import
- miner.py: OSError for file read_text
- entity_detector.py: OSError for file read in scan loop
- convo_miner.py: (OSError, ValueError) for normalize which reads
  and parses files
- entity_registry.py: (URLError, OSError, JSONDecodeError, KeyError)
  for Wikipedia lookup fallback

ChromaDB except Exception sites (~30) are left broad for now.
chromadb.errors defines NotFoundError, DuplicateIDError,
InvalidDimensionException etc., but narrowing those sites requires
importing from chromadb.errors and validating across supported
versions (>=0.4.0). MCP server handlers also left broad for
resilience.
2026-04-07 13:51:27 +02:00
adv3nt3 3a2817505a fix: mark MD5 as non-security in miner drawer ID generation
Add usedforsecurity=False to hashlib.md5() calls in miner.py and
convo_miner.py to document that MD5 is used for deterministic ID
generation, not cryptographic security. Preserves stable drawer IDs
for backward compatibility with existing palaces.

Swapping to SHA-256 would change the ID formula and make existing
drawers unreachable on re-ingestion. PR #34 covers the MD5 sites
in knowledge_graph.py and mcp_server.py.

Verified: usedforsecurity kwarg is supported since Python 3.9
(project target per pyproject.toml line 10), confirmed via Context7
CPython docs.
2026-04-07 13:41:00 +02:00
adv3nt3 3c78e2fbb5 fix: remove dead code and duplicate set items in entity_registry.py
Remove discarded `query.lower()` call in `extract_people_from_query` —
strings are immutable so the result was always thrown away. The existing
`re.IGNORECASE` flag already handles case-insensitive matching.

Remove duplicate literals in COMMON_ENGLISH_WORDS set: "hunter" (consecutive
duplicate), "april" and "june" (appeared in both names and months sections).
2026-04-07 13:00:59 +02:00
James Cane 0808ad96c2 refactor: consolidate split known-names config loading 2026-04-07 09:16:07 +01:00
James Cane 1557eaa2f5 docs: align MCP setup examples with shipped server 2026-04-07 09:15:16 +01:00
James Cane 55152ce476 fix: unify package and MCP version reporting 2026-04-07 08:53:25 +01:00
bensig 1782628b8a docs: add narrative palace walkthrough to README
Add Milla's conversational explanation of the palace architecture
(wings → rooms → closets → drawers → halls → tunnels) as an
introductory section before the technical diagram.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:34:48 -07:00
Milla Jovovich f14f59d61c README: How You Actually Use It — MCP flow + local model flow 2026-04-05 22:46:50 -07:00
Milla Jovovich f54d42f50a README: AAAK works with local models too — Claude, GPT, Gemini, Llama, Mistral 2026-04-05 22:14:15 -07:00
Milla Jovovich 18cca2c97c README: Specialist Agents — agents live in the palace, not CLAUDE.md 2026-04-05 12:23:14 -07:00
bensig 6d8c462219 fix: resolve ruff lint and format errors across codebase
Fix E402 import ordering, F841 unused variable, F541 unnecessary
f-strings, F401 unused import, and auto-format 6 files.
2026-04-04 18:37:17 -07:00
bensig 0f8fa8c7d5 bench: add benchmark runners, results docs, and test suite
Benchmarks: LongMemEval, LoCoMo, ConvoMem, MemBench runners with
methodology docs and hybrid retrieval analysis.

Tests: config, miner, convo_miner, normalize — 9 tests, all passing.
2026-04-04 18:33:42 -07:00
Milla Jovovich 068dbd9a7b MemPalace: palace architecture, AAAK compression, knowledge graph
The memory system:
- Palace structure: Wings (people/projects) → Rooms (topics) → Closets (AAAK compressed) → Drawers (verbatim transcripts)
- Halls connect related rooms within a wing
- Tunnels cross-reference rooms across wings
- AAAK: 30x lossless compression dialect for AI agents
- Knowledge graph: temporal entity-relationship triples (SQLite)
- Palace graph: room-based navigation with tunnel detection
- MCP server: 19 tools — search, graph traversal, agent diary, AAAK auto-teach
- Onboarding: guided setup generates wing config + AAAK entity registry
- Contradiction detection: catches wrong pronouns, names, ages
- Auto-save hooks for Claude Code

96.6% Recall@5 on LongMemEval — highest zero-API score published.
100% with optional Haiku rerank (500/500).
Local. Free. No API key required.
2026-04-04 18:16:04 -07:00