Files
mempalace/mempalace
Igor Lins e Silva 6b7dcc53d4 merge: pr/closet-llm-generic + harden LLM regen path for production
Brings in PR #793 (optional LLM-based closet regeneration via
user-configured OpenAI-compatible endpoint) and PR #795 (hybrid
closet+drawer search — closets boost, never gate). Stack: #784#788#789#790#791#792#793 (+ #795).

Findings hardened on our side
─────────────────────────────

1) closet_llm.regenerate_closets didn't use the blessed palace helpers.

   Before:
     * manual closets_col.get(where=...) + .delete(ids=...) with a
       silent ``except Exception: pass`` around both — if the purge
       failed, pre-existing regex closets survived alongside fresh LLM
       closets, giving the searcher double hits for the same source.
     * ``source.split('/')[-1][:30]`` to build the closet_id — quietly
       wrong on Windows paths (``C:\\proj\\a.md`` has no ``/``, so the
       whole string ends up in the ID).
     * no mine_lock around purge+upsert — a concurrent regex rebuild of
       the same source could interleave with our purge and leave a mix
       of regex and LLM pointers.
     * no ``normalize_version`` stamp on the LLM closets — the miner's
       stale-version gate would treat them as leftovers from an older
       schema and rebuild over them on the next mine.

   After: routes through ``purge_file_closets`` + ``mine_lock`` +
   ``os.path.basename`` + ``NORMALIZE_VERSION`` stamp. Regression tests
   cover each.

2) searcher.search_memories was still closet-first.

   PR #795 merged into #793's head to fix the recall regression
   documented in that PR (R@1 0.25 on narrative content vs. 0.42
   baseline). The hybrid design makes closets a ranking boost rather
   than a gate: drawers are always queried at the floor, and matching
   closet hits (rank 0-4 within CLOSET_DISTANCE_CAP=1.5) add a boost
   of 0.40/0.25/0.15/0.08/0.04 to the effective distance.

   Merged to take the incoming hybrid design, with two cleanups:
   * kept the ``_expand_with_neighbors`` / ``_extract_drawer_ids_from_closet``
     helpers as separately-tested utilities (still imported by tests
     and future callers);
   * replaced the fragile ``source_file.endswith(basename)`` reverse-
     lookup in the enrichment step with internal ``_source_file_full``
     / ``_chunk_index`` fields stripped before return, so enrichment
     doesn't silently pick the wrong path when two sources share a
     basename across directories;
   * drawer-grep enrichment now sorts by ``chunk_index`` before
     neighbor expansion, so ``best_idx ± 1`` corresponds to actual
     document order rather than whatever order Chroma returned.

3) Closet-first tests in test_closets.py (``TestSearchMemoriesClosetFirst``,
   end-to-end ``test_closet_first_search_includes_drawer_index_and_total``)
   pinned contracts that the hybrid path now violates (``matched_via``
   went from ``"closet"`` to ``"drawer+closet"``). Rewrote them around
   the new invariant: direct drawers are always the floor, closet
   agreement flips the hit's matched_via and exposes closet_preview.

Verification
────────────

* 805/805 pass under ``uv run pytest tests/ -v --ignore=tests/benchmarks``
  (13 new tests from PR #793 + 5 from PR #795 + 2 new regressions for
  the closet_llm hardening + the rewritten hybrid assertions in
  test_closets.py).
* CI-pinned ruff 0.4.x clean on ``mempalace/`` + ``tests/`` (check +
  format both pass).
* No new deps — closet_llm.py still uses stdlib ``urllib.request`` per
  the PR's "zero new dependencies" promise.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>
2026-04-13 18:40:36 -03:00
..
2026-04-10 08:49:35 -07:00
2026-04-10 08:49:35 -07:00
2026-04-13 15:46:27 -03:00

mempalace/ — Core Package

The Python package that powers MemPalace. All modules, all logic.

Modules

Module What it does
cli.py CLI entry point — routes to mine, search, init, compress, wake-up
config.py Configuration loading — ~/.mempalace/config.json, env vars, defaults
normalize.py Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format
miner.py Project file ingest — scans directories, chunks by paragraph, stores to ChromaDB
convo_miner.py Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content
searcher.py Semantic search via ChromaDB vectors — filters by wing/room, returns verbatim + scores
layers.py 4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search)
dialect.py AAAK compression — entity codes, emotion markers, 30x lossless ratio
knowledge_graph.py Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation
palace_graph.py Room-based navigation graph — BFS traversal, tunnel detection across wings
mcp_server.py MCP server — 19 tools, AAAK auto-teach, Palace Protocol, agent diary
onboarding.py Guided first-run setup — asks about people/projects, generates AAAK bootstrap + wing config
entity_registry.py Entity code registry — maps names to AAAK codes, handles ambiguous names
entity_detector.py Auto-detect people and projects from file content
general_extractor.py Classifies text into 5 memory types (decision, preference, milestone, problem, emotional)
room_detector_local.py Maps folders to room names using 70+ patterns — no API
spellcheck.py Name-aware spellcheck — won't "correct" proper nouns in your entity registry
split_mega_files.py Splits concatenated transcript files into per-session files

Architecture

User → CLI → miner/convo_miner → ChromaDB (palace)
                                     ↕
                              knowledge_graph (SQLite)
                                     ↕
User → MCP Server → searcher → results
                  → kg_query → entity facts
                  → diary    → agent journal

The palace (ChromaDB) stores verbatim content. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool.