Commit Graph

13 Commits

Author SHA1 Message Date
jp 5b07b869b0 fix(palace_graph): skip None metadata in build_graph
ChromaDB can return None for drawers without metadata (legacy data,
partial writes — same root cause as upstream #1020 / our PR #1094).
build_graph at line 95 called meta.get("room", "") unconditionally,
which AttributeErrors on None and takes out every consumer of
build_graph for the whole call path: graph_stats, find_tunnels,
traverse, and (most visibly) the daemon's /stats endpoint.

Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test
against the canonical 151K-drawer palace — /stats was 500-ing on a
single None drawer.

Adds `if meta is None: continue` guard. Closes the same gap upstream's
#999 None-metadata audit closed in searcher.py / mcp_server.py /
miner.status, just in a different file the audit didn't reach. The
graph-build is recoverable: skipping a single None drawer doesn't
distort the graph since build_graph already filters
`room and room != "general" and wing` — a missing-metadata drawer was
never going to participate anyway.

Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None
entry into a 3-drawer fixture and asserts the two real drawers are
processed normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 11:06:32 -07:00
Igor Lins e Silva 91a60263e3 Merge pull request #1168 from arnoldwender/fix/security-tunnels-permissions
fix(security): restrict tunnels.json file permissions
2026-04-25 04:21:44 -03:00
Igor Lins e Silva 865a36bc5c feat(graph): namespace topic-tunnel rooms with "topic:" prefix + kind field
Previously a cross-wing topic tunnel for "Angular" stored the room as
"Angular" — colliding with a wing's literal folder-derived "Angular" room
at follow_tunnels/list_tunnels read time, and exposing raw topic strings
(which may contain characters rejected by sanitize_name) to the MCP
surface.

Topic tunnels now store their room as "topic:<original-casing>" and carry
kind="topic" on the stored dict. Explicit tunnels get kind="explicit"
(default). follow_tunnels("wing", "Angular") on a literal Angular room
no longer surfaces topic connections for the same name, and any LLM
scanning list_tunnels has a visible discriminator.
2026-04-24 23:06:26 -03:00
Igor Lins e Silva fe051adc73 feat(graph): cross-wing tunnels by shared topics (#1180)
When two wings have one or more confirmed TOPIC labels in common, the
miner now drops a symmetric tunnel between them at mine time so the
palace graph reflects shared themes (frameworks, vendors, recurring
concepts).

- llm_refine: TOPIC label routes to a dedicated `topics` bucket so the
  signal survives confirmation instead of getting collapsed into
  `uncertain` and dropped.
- entity_detector / project_scanner: bucket plumbed through the
  detection pipeline; `confirm_entities` returns confirmed topics
  alongside people/projects.
- miner.add_to_known_entities: optional `wing` parameter records the
  confirmed topics under `topics_by_wing` in
  `~/.mempalace/known_entities.json`. Wing names do NOT leak into the
  flat known-name set used by drawer-tagging.
- palace_graph: `compute_topic_tunnels` and `topic_tunnels_for_wing`
  create symmetric tunnels via the existing `create_tunnel` API so they
  share dedup and persistence with explicit tunnels.
- miner.mine: post-file-loop pass calls `topic_tunnels_for_wing` for
  the freshly-mined wing. Failures are logged but never abort the mine.
- config: `topic_tunnel_min_count` knob (env
  `MEMPALACE_TOPIC_TUNNEL_MIN_COUNT` or `~/.mempalace/config.json`),
  default 1.

Tests cover topic persistence through init->mine, tunnel creation when
wings share a topic, no tunnel below threshold, cross-wing tunnel
retrieval via `list_tunnels`, dedup on recompute, case-insensitive
overlap, and the end-to-end mine-time wiring.

Out of scope for this PR (called out in the PR body): manifest-
dependency overlap, per-topic allow/deny lists, search-result surfacing.
2026-04-24 23:06:26 -03:00
Arnold Wender 5fd09d3693 fix(security): restrict tunnels.json file permissions
~/.mempalace/tunnels.json (introduced in #790) was created via plain
open(..., "w") with no chmod, and its parent dir via os.makedirs()
without mode=0o700. On Linux with default umask 022 both end up
world-readable (0o644 / 0o755).

Tunnels reveal cross-wing connections — which projects, people, and
rooms the user has explicitly linked — so they are sensitive metadata
that should not be readable by other local users on shared systems.

Apply the same 0o700 / 0o600 pattern that #814 established for the
other sensitive palace files. Chmod calls are wrapped in try/except
(OSError, NotImplementedError) for Windows / unsupported-filesystem
compatibility.

Closes #1165
2026-04-24 22:57:34 +02:00
jp 8adf35a13c fix: add threading lock to graph cache, expand docstring
Address review feedback from @bensig:
1. Wrap cache reads/writes in threading.Lock for thread safety
2. Promote the col-arg caveat from inline comment to docstring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 09:00:36 -07:00
jp 1657a79649 fix: clarify cache docs, skip caching empty graphs
Addresses Copilot review feedback on #661.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 09:00:27 -07:00
jp 84e2aa16e4 perf: graph cache with write-invalidation in build_graph()
build_graph() scans every drawer's metadata in 1000-item batches on
every call — O(n) per graph build with no caching. At 50K+ drawers
this costs several seconds per MCP tool call (traverse, find_tunnels,
graph_stats all call build_graph on every invocation).

Add a module-level cache (nodes + edges + timestamp) with a 60-second
TTL. Cache is invalidated via invalidate_graph_cache(), exported for
write operations to call. Tests updated with setup_method cache resets
and two new tests verifying cache hit and invalidation behaviour.
2026-04-16 09:00:27 -07:00
Igor Lins e Silva 20255b05be merge: develop + harden cross-wing tunnels for production
Merges the hardened closet/entity/BM25/diary stack from #789 and fixes
five correctness/durability issues in the tunnels module plus the
directional/symmetric design question.

## Design: tunnels are now symmetric

Per review discussion: a tunnel represents "these two things relate",
not "A causes B". The canonical ID now hashes the *sorted* endpoint
pair, so ``create_tunnel(A, B)`` and ``create_tunnel(B, A)`` resolve to
the same record and the second call updates the label rather than
creating a duplicate. ``follow_tunnels`` can be called from either
endpoint and surfaces the other side consistently.

The returned dict still preserves ``source``/``target`` in the order
the caller supplied, so UIs that want to render the connection
directionally can do so.

## Correctness fixes

* **Atomic write** — ``_save_tunnels`` writes to ``tunnels.json.tmp``
  and ``os.replace``s it into place. A crash mid-write can no longer
  leave a truncated file that silently reads back as ``[]`` and wipes
  every tunnel. Includes ``f.flush() + os.fsync`` before replace on
  platforms that support it.
* **Concurrent-write lock** — ``create_tunnel`` and ``delete_tunnel``
  wrap the load→mutate→save cycle in ``mine_lock(_TUNNEL_FILE)``.
  Without this, two agents creating tunnels simultaneously would both
  read the same snapshot and the later writer would drop the earlier
  writer's tunnel.
* **Corrupt-file tolerance** — ``_load_tunnels`` now uses a context
  manager, validates that the loaded JSON is a list, and returns ``[]``
  for any read failure. Subsequent ``create_tunnel`` then overwrites
  the corrupt file via atomic write — no manual recovery needed.
* **Input validation** — new ``_require_name`` helper rejects empty or
  whitespace-only wing/room names with a clear ``ValueError``. Prevents
  phantom tunnels with blank endpoints from ever reaching the JSON
  store.
* **Timezone-aware timestamps** — ``created_at`` / ``updated_at`` now
  use ``datetime.now(timezone.utc).isoformat()``, matching diary ingest
  and other recent modules.

## Tests (12 in TestTunnels)

5 original + 7 regression cases:
* ``test_tunnel_is_symmetric`` — A↔B and B↔A dedupe to one record.
* ``test_follow_tunnels_works_from_either_endpoint`` — symmetric surface.
* ``test_empty_endpoint_fields_rejected`` — validation guard.
* ``test_corrupt_tunnel_file_does_not_lose_new_writes`` — truncated
  JSON treated as empty; next create persists cleanly.
* ``test_atomic_write_leaves_no_stray_tmp_file`` — no leftover ``.tmp``.
* ``test_concurrent_creates_preserve_all_tunnels`` — 5 threads each
  create a distinct tunnel; all 5 persisted (regression for the
  read-modify-write race).
* ``test_created_at_is_timezone_aware`` — ISO8601 has tz suffix.

Merge resolutions: tests/test_closets.py combined develop's hardened
closet/entity/BM25/diary tests with this PR's TestTunnels class.

755/755 tests pass. ruff + format clean under CI-pinned 0.4.x.
2026-04-13 17:50:43 -03:00
MSL 1b4ce0b1f8 feat: explicit cross-wing tunnels for multi-project agents
Adds active tunnel creation alongside passive tunnel discovery.

Passive tunnels (existing): rooms with the same name across wings.
Explicit tunnels (new): agent-created links between specific
locations. "This API design in project_api relates to the database
schema in project_database."

New functions in palace_graph.py:
- create_tunnel() — link two wing/room pairs with a label
- list_tunnels() — list all explicit tunnels, filter by wing
- delete_tunnel() — remove a tunnel by ID
- follow_tunnels() — from a room, find all connected rooms in
  other wings with drawer content previews

New MCP tools:
- mempalace_create_tunnel
- mempalace_list_tunnels
- mempalace_delete_tunnel
- mempalace_follow_tunnels

Tunnels stored in ~/.mempalace/tunnels.json (persists across
palace rebuilds). Deduplicated by endpoint pair.

689/689 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 07:43:48 -03:00
Sergey Kuznetsov ae5196bc8d Мempalace backend seam (#413)
* refactor: add stage-1 backend abstraction seam

Introduce the first upstreamable storage seam for MemPalace without
bringing in the PostgreSQL spike or any benchmark artifacts.

This change adds a small backend package with:
- BaseCollection as the minimal collection contract
- ChromaBackend/ChromaCollection as the default implementation

It then routes the main runtime collection consumers through that seam:
- palace.py
- searcher.py
- layers.py
- palace_graph.py
- mcp_server.py
- miner.status()

Behavioral constraints kept for stage 1:
- ChromaDB remains the only backend and the default path
- no config/env backend selection yet
- no PostgreSQL code
- no benchmark or research files
- existing tests stay unchanged

Important compatibility details:
- read paths now call the seam with create=False so they still surface
  the existing 'no palace found' behavior instead of silently creating
  empty collections
- write paths keep create=True semantics through palace.get_collection()
- layers/searcher retain a chromadb module attribute so the existing
  mock-based tests can keep patching PersistentClient unchanged
- ChromaBackend only creates palace directories on create=True, which
  preserves mocked read-path tests that use fake read-only paths

Verification:
- python3 -m py_compile mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py
- pytest -q  # 529 passed, 106 deselected

* refactor: clean up stage-1 seam compatibility shims

Tighten the stage-1 backend abstraction branch after review.

This follow-up does three small things:
- keep the chromadb compatibility hook in searcher.py and layers.py,
  but express it through the backends.chroma module so it no longer
  reads like an accidental unused import
- fix the palace_graph.py helper alias to avoid the local name collision
  flagged by ruff (imported helper vs local _get_collection wrapper)
- preserve the existing mock-based test patch points unchanged while
  keeping the new backend seam intact

Why this matters:
- the direct  form looked like a
  dead import in review, even though it was intentionally preserving the
  existing test seam ( and
  )
- palace_graph.py had a real lint issue ( redefinition) that was
  small but worth fixing before a public PR

Verification:
- /opt/homebrew/bin/ruff check mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py
- pytest -q tests/test_layers.py tests/test_searcher.py
- pytest -q  # 529 passed, 106 deselected

* docs: explain backend shim imports in search paths

Add short code comments in searcher.py and layers.py explaining why the
module-level `chromadb` alias remains after the stage-1 backend seam
refactor.

The alias is intentional: it preserves the existing mock patch points used
by the current test suite (`mempalace.searcher.chromadb.PersistentClient`
and `mempalace.layers.chromadb.PersistentClient`) while the runtime logic
now flows through the backend abstraction.

This keeps the public PR easier to review because the apparent "unused
import" now has an explicit reason next to it.

Verification:
- /opt/homebrew/bin/ruff check mempalace/searcher.py mempalace/layers.py
- pytest -q tests/test_layers.py tests/test_searcher.py

* refactor: reuse a default backend instance in palace helper

Tighten the stage-1 backend seam by promoting the default Chroma backend
adapter to a module-level singleton in `mempalace/palace.py`.

This keeps the stage-1 scope unchanged — Chroma is still the only backend
wired in this branch — but avoids constructing a fresh `ChromaBackend()`
object on every `get_collection()` call. The backend is stateless today,
so this is a readability/cleanup change rather than a behavioral one.

Why this helps:
- makes `palace.get_collection()` read like a real default factory instead
  of an inline constructor call
- keeps the stage-1 branch a little cleaner before opening the public PR
- does not widen the backend surface or change any config/runtime behavior

Verification:
- python3 -m py_compile mempalace/palace.py
- pytest -q tests/test_miner.py tests/test_layers.py tests/test_searcher.py
- pytest -q  # 529 passed, 106 deselected

* fix: harden read-only seam behavior and update seam tests

Preserve the stage-1 backend abstraction while closing the real read-path
regression surfaced in PR review.

What changed:
- make ChromaBackend.get_collection(create=False) fail fast when the palace
  directory does not exist instead of letting PersistentClient create it as a
  side effect
- update miner.status() to call get_collection(..., create=False) so status
  keeps the historical 'No palace found' behavior
- remove the temporary chromadb shim aliases from layers.py and searcher.py
  now that the tests patch the seam directly
- add focused tests for the new backends package, including ChromaCollection
  delegation and ChromaBackend create=True/create=False behavior
- retarget layer/searcher tests to patch the backend seam instead of patching
  chromadb.PersistentClient inside production modules
- add a regression test that status() does not create an empty palace when the
  target path is missing

Verification:
- ruff check .
- uv run pytest -q
- uv run pytest -q tests/test_backends.py tests/test_cli.py tests/test_mcp_server.py tests/test_layers.py tests/test_searcher.py tests/test_miner.py

Notes:
- the separate benchmark/slow/stress layer was started as a soak but not used
  as the merge gate for this PR branch

* refactor: drop duplicate mcp collection cache declaration

Remove a redundant `_collection_cache = None` assignment in
`mempalace/mcp_server.py` left over after the stage-1 backend seam refactor.

This does not change behavior; it only trims review noise in the MCP server
module after the read-path hardening pass.

Verification:
- ruff check mempalace/mcp_server.py
- uv run pytest -q tests/test_mcp_server.py

---------

Co-authored-by: Sergey Kuznetsov <sergey@iterudit.com>
2026-04-11 16:16:49 -07:00
bensig 6d8c462219 fix: resolve ruff lint and format errors across codebase
Fix E402 import ordering, F841 unused variable, F541 unnecessary
f-strings, F401 unused import, and auto-format 6 files.
2026-04-04 18:37:17 -07:00
Milla Jovovich 068dbd9a7b MemPalace: palace architecture, AAAK compression, knowledge graph
The memory system:
- Palace structure: Wings (people/projects) → Rooms (topics) → Closets (AAAK compressed) → Drawers (verbatim transcripts)
- Halls connect related rooms within a wing
- Tunnels cross-reference rooms across wings
- AAAK: 30x lossless compression dialect for AI agents
- Knowledge graph: temporal entity-relationship triples (SQLite)
- Palace graph: room-based navigation with tunnel detection
- MCP server: 19 tools — search, graph traversal, agent diary, AAAK auto-teach
- Onboarding: guided setup generates wing config + AAAK entity registry
- Contradiction detection: catches wrong pronouns, names, ages
- Auto-save hooks for Claude Code

96.6% Recall@5 on LongMemEval — highest zero-API score published.
100% with optional Haiku rerank (500/500).
Local. Free. No API key required.
2026-04-04 18:16:04 -07:00