Commit Graph

32 Commits

Author SHA1 Message Date
bensig 39c14be113 fix: honest AAAK stats — word-based token estimator, lossy labels
- Replace len(text)//3 token heuristic with word-based estimate (~1.3 tokens/word)
- Old heuristic inflated compression ratios by ~3-5x
- Update docstrings: "compression" → "lossy summarization"
- Update module docstring to clarify AAAK is NOT lossless
- compression_stats() now returns honest field names and a note
- CLI output labels ratios as lossy

Fixes #43
2026-04-07 14:14:31 -07:00
bensig 71fb66d687 fix: room detection checks keywords against folder paths
detect_room() now matches folder path parts against room keywords,
not just the room name. Fixes docs/ files routing to general instead
of documentation room — "docs" wasn't a substring of "documentation"
but is now matched via the persisted keywords list.

Found during end-to-end testing after merging #108 keyword persistence.
2026-04-07 14:06:56 -07:00
Ben Sigman 3068f75c2d Merge pull request #22 from sheetsync/bugfix/split-known-names-loading
refactor: consolidate split known-names config loading
2026-04-07 13:58:54 -07:00
Ben Sigman a59df81611 Merge pull request #66 from MARUCIE/fix/sqlite-batch-reads
fix: batch ChromaDB reads to avoid SQLite variable limit
2026-04-07 13:58:52 -07:00
Ben Sigman cea34366f5 Merge pull request #84 from AlexeySamosadov/fix/mcp-integer-coercion
fix: coerce MCP integer arguments to native Python int
2026-04-07 13:58:49 -07:00
Ben Sigman 0b0e123f42 Merge pull request #61 from adv3nt3/feat/codex-cli-normalizer
feat: add OpenAI Codex CLI JSONL normalizer
2026-04-07 13:54:08 -07:00
Ben Sigman 6af6fe3dda Merge pull request #54 from adv3nt3/fix/narrow-exception-handling
fix: narrow bare except Exception to specific types where safe
2026-04-07 13:54:05 -07:00
Ben Sigman e8f9b47e31 Merge pull request #16 from sheetsync/bugfix/version-consistency
fix: unify package and MCP version reporting
2026-04-07 13:54:03 -07:00
Ben Sigman 1c62045b22 Merge pull request #21 from sheetsync/bugfix/mcp-docs-alignment
docs: align MCP setup examples with shipped server
2026-04-07 13:54:00 -07:00
Ben Sigman f1f0a4c966 Merge pull request #129 from minimexat/fix/windows-unicode-encoding
fix: replace Unicode separator character for Windows compatibility
2026-04-07 13:51:58 -07:00
Ben Sigman 30699b85a9 Merge pull request #42 from adv3nt3/fix/entity-registry-dead-code
fix: remove dead code and duplicate set items in entity_registry.py
2026-04-07 13:51:55 -07:00
Ben Sigman 6aa4272b65 Merge pull request #53 from adv3nt3/fix/md5-usedforsecurity-miners
fix: mark MD5 as non-security in miner drawer ID generation
2026-04-07 13:51:52 -07:00
Ben Sigman 3e6fc6ed9f Merge pull request #83 from renatoliveira/main
fix: update input prompt for entity confirmation in entity_detector.py
2026-04-07 13:51:50 -07:00
f-hoedl d214f6a854 fix: replace Unicode separator in convo_miner.py for Windows compatibility
Replace the ─ (U+2500) separator character with - in convo_miner.py.
Windows terminals using cp1252 encoding raise UnicodeEncodeError when
printing this character unless PYTHONUTF8=1 is set explicitly.

Fixes crash on Windows: UnicodeEncodeError: 'charmap' codec can't encode
character '\u2500'
2026-04-07 21:55:34 +02:00
bensig caa1169f04 fix: --yes flag now skips room confirmation in init
Pass yes flag through to detect_rooms_local so init --yes
skips both entity detection AND room approval prompts.
Agents and CI can now run init without interactive input.

Fixes #8
2026-04-07 12:16:46 -07:00
Ben Sigman 01a21dd60f Merge pull request #78 from ac-opensource/feature/respect-gitignore-mining
Respect nested .gitignore rules when mining project files
2026-04-07 12:15:23 -07:00
bensig 5e8a039e7c fix: repair command, split args, Claude export, room keywords
- Add `mempalace repair` command to rebuild vector index from SQLite
  when HNSW files are corrupted after crash/interrupt (fixes #74, #72, #96)
- Fix split command passing dir as positional instead of --source
  flag to split_mega_files (fixes #63)
- Handle Claude privacy export format (array of conversation objects
  with chat_messages inside each) in normalize.py (fixes #63)
- Persist room keywords in mempalace.yaml so mine can match files
  in docs/ to room "documentation" (fixes #108)
2026-04-07 12:02:34 -07:00
bensig 186bb2e3d1 fix: shell injection in hooks, Claude Code mining, chromadb pin
- hooks/mempal_save_hook.sh: pass $TRANSCRIPT_PATH as sys.argv
  instead of interpolating into python -c string (fixes #110)
- normalize.py: accept type "user" in addition to "human" for
  Claude Code JSONL sessions (fixes #111)
- convo_miner.py: skip tool-results/, memory/ dirs and .meta.json
  files when scanning for conversations (fixes #111)
- pyproject.toml: pin chromadb>=0.4.0,<1 to avoid crashing 1.x
  builds on macOS ARM64 (fixes #100)
2026-04-07 11:45:51 -07:00
ac-opensource c8c220d789 fix: support nested .gitignore rules during mining 2026-04-08 00:02:21 +08:00
Alexey Samosadov 8fbb6178dd fix: coerce MCP integer arguments to native Python int
ChromaDB requires native `int` for `n_results`, but the MCP JSON-RPC
transport can deliver JSON integers as floats or strings depending on
the client implementation. This causes `mempalace_search` (and any
tool with integer params like `max_hops`, `last_n`) to fail with:

  "Expected requested number of results to be a int, got 3 in query."

Fix: auto-coerce tool arguments based on the declared `input_schema`
types before calling handlers. This covers all current and future
tools generically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 17:48:03 +03:00
Renato Oliveira cfe878204e fix: update input prompt for entity confirmation in entity_detector.py
Refine the prompt for distinguishing between person and project entities by adjusting the wording for clarity.
2026-04-07 11:41:15 -03:00
ac-opensource 9b9daa9b4b fix: respect .gitignore during project mining 2026-04-07 22:26:06 +08:00
Maurice Wen 0e77981dec fix: batch ChromaDB reads to avoid SQLite variable limit
col.get() without limit generates SELECT ... WHERE id IN (...) with all
document IDs, which exceeds SQLite's ~999 variable limit when a palace
has more than ~1000 drawers.

This breaks both `mempalace compress` and `mempalace wake-up` on large
palaces. Reproduced on a 13880-file codebase (242K+ drawers).

Fix: paginate reads in batches of 500 using ChromaDB's offset/limit
parameters in both Layer1.generate() and cmd_compress().
2026-04-07 21:40:12 +08:00
adv3nt3 d4e1945f77 feat: add OpenAI Codex CLI JSONL normalizer
Add _try_codex_jsonl parser for Codex CLI session files stored at
~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl.

Uses only event_msg entries (user_message / agent_message) which
represent the canonical conversation turns. response_item entries
are intentionally skipped — they include synthetic context injections
(environment_context) and can duplicate real messages when both
representations are present in the same rollout.

Format based on Codex source tests (codex-rs/rollout/src/recorder_tests.rs).
Requires session_meta header to reduce false positives on other JSONL.

Refs: #59
2026-04-07 14:50:04 +02:00
adv3nt3 312d380aab fix: narrow bare except Exception to specific types where safe
Replace broad except Exception with specific exception types in 6
sites where the expected failure mode is well-defined:

- normalize.py: OSError for file read, ImportError for optional import
- miner.py: OSError for file read_text
- entity_detector.py: OSError for file read in scan loop
- convo_miner.py: (OSError, ValueError) for normalize which reads
  and parses files
- entity_registry.py: (URLError, OSError, JSONDecodeError, KeyError)
  for Wikipedia lookup fallback

ChromaDB except Exception sites (~30) are left broad for now.
chromadb.errors defines NotFoundError, DuplicateIDError,
InvalidDimensionException etc., but narrowing those sites requires
importing from chromadb.errors and validating across supported
versions (>=0.4.0). MCP server handlers also left broad for
resilience.
2026-04-07 13:51:27 +02:00
adv3nt3 3a2817505a fix: mark MD5 as non-security in miner drawer ID generation
Add usedforsecurity=False to hashlib.md5() calls in miner.py and
convo_miner.py to document that MD5 is used for deterministic ID
generation, not cryptographic security. Preserves stable drawer IDs
for backward compatibility with existing palaces.

Swapping to SHA-256 would change the ID formula and make existing
drawers unreachable on re-ingestion. PR #34 covers the MD5 sites
in knowledge_graph.py and mcp_server.py.

Verified: usedforsecurity kwarg is supported since Python 3.9
(project target per pyproject.toml line 10), confirmed via Context7
CPython docs.
2026-04-07 13:41:00 +02:00
adv3nt3 3c78e2fbb5 fix: remove dead code and duplicate set items in entity_registry.py
Remove discarded `query.lower()` call in `extract_people_from_query` —
strings are immutable so the result was always thrown away. The existing
`re.IGNORECASE` flag already handles case-insensitive matching.

Remove duplicate literals in COMMON_ENGLISH_WORDS set: "hunter" (consecutive
duplicate), "april" and "june" (appeared in both names and months sections).
2026-04-07 13:00:59 +02:00
James Cane 0808ad96c2 refactor: consolidate split known-names config loading 2026-04-07 09:16:07 +01:00
James Cane 1557eaa2f5 docs: align MCP setup examples with shipped server 2026-04-07 09:15:16 +01:00
James Cane 55152ce476 fix: unify package and MCP version reporting 2026-04-07 08:53:25 +01:00
bensig 6d8c462219 fix: resolve ruff lint and format errors across codebase
Fix E402 import ordering, F841 unused variable, F541 unnecessary
f-strings, F401 unused import, and auto-format 6 files.
2026-04-04 18:37:17 -07:00
Milla Jovovich 068dbd9a7b MemPalace: palace architecture, AAAK compression, knowledge graph
The memory system:
- Palace structure: Wings (people/projects) → Rooms (topics) → Closets (AAAK compressed) → Drawers (verbatim transcripts)
- Halls connect related rooms within a wing
- Tunnels cross-reference rooms across wings
- AAAK: 30x lossless compression dialect for AI agents
- Knowledge graph: temporal entity-relationship triples (SQLite)
- Palace graph: room-based navigation with tunnel detection
- MCP server: 19 tools — search, graph traversal, agent diary, AAAK auto-teach
- Onboarding: guided setup generates wing config + AAAK entity registry
- Contradiction detection: catches wrong pronouns, names, ages
- Auto-save hooks for Claude Code

96.6% Recall@5 on LongMemEval — highest zero-API score published.
100% with optional Haiku rerank (500/500).
Local. Free. No API key required.
2026-04-04 18:16:04 -07:00