ae5196bc8d
* refactor: add stage-1 backend abstraction seam Introduce the first upstreamable storage seam for MemPalace without bringing in the PostgreSQL spike or any benchmark artifacts. This change adds a small backend package with: - BaseCollection as the minimal collection contract - ChromaBackend/ChromaCollection as the default implementation It then routes the main runtime collection consumers through that seam: - palace.py - searcher.py - layers.py - palace_graph.py - mcp_server.py - miner.status() Behavioral constraints kept for stage 1: - ChromaDB remains the only backend and the default path - no config/env backend selection yet - no PostgreSQL code - no benchmark or research files - existing tests stay unchanged Important compatibility details: - read paths now call the seam with create=False so they still surface the existing 'no palace found' behavior instead of silently creating empty collections - write paths keep create=True semantics through palace.get_collection() - layers/searcher retain a chromadb module attribute so the existing mock-based tests can keep patching PersistentClient unchanged - ChromaBackend only creates palace directories on create=True, which preserves mocked read-path tests that use fake read-only paths Verification: - python3 -m py_compile mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q # 529 passed, 106 deselected * refactor: clean up stage-1 seam compatibility shims Tighten the stage-1 backend abstraction branch after review. This follow-up does three small things: - keep the chromadb compatibility hook in searcher.py and layers.py, but express it through the backends.chroma module so it no longer reads like an accidental unused import - fix the palace_graph.py helper alias to avoid the local name collision flagged by ruff (imported helper vs local _get_collection wrapper) - preserve the existing mock-based test patch points unchanged while keeping the new backend seam intact Why this matters: - the direct form looked like a dead import in review, even though it was intentionally preserving the existing test seam ( and ) - palace_graph.py had a real lint issue ( redefinition) that was small but worth fixing before a public PR Verification: - /opt/homebrew/bin/ruff check mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * docs: explain backend shim imports in search paths Add short code comments in searcher.py and layers.py explaining why the module-level `chromadb` alias remains after the stage-1 backend seam refactor. The alias is intentional: it preserves the existing mock patch points used by the current test suite (`mempalace.searcher.chromadb.PersistentClient` and `mempalace.layers.chromadb.PersistentClient`) while the runtime logic now flows through the backend abstraction. This keeps the public PR easier to review because the apparent "unused import" now has an explicit reason next to it. Verification: - /opt/homebrew/bin/ruff check mempalace/searcher.py mempalace/layers.py - pytest -q tests/test_layers.py tests/test_searcher.py * refactor: reuse a default backend instance in palace helper Tighten the stage-1 backend seam by promoting the default Chroma backend adapter to a module-level singleton in `mempalace/palace.py`. This keeps the stage-1 scope unchanged — Chroma is still the only backend wired in this branch — but avoids constructing a fresh `ChromaBackend()` object on every `get_collection()` call. The backend is stateless today, so this is a readability/cleanup change rather than a behavioral one. Why this helps: - makes `palace.get_collection()` read like a real default factory instead of an inline constructor call - keeps the stage-1 branch a little cleaner before opening the public PR - does not widen the backend surface or change any config/runtime behavior Verification: - python3 -m py_compile mempalace/palace.py - pytest -q tests/test_miner.py tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * fix: harden read-only seam behavior and update seam tests Preserve the stage-1 backend abstraction while closing the real read-path regression surfaced in PR review. What changed: - make ChromaBackend.get_collection(create=False) fail fast when the palace directory does not exist instead of letting PersistentClient create it as a side effect - update miner.status() to call get_collection(..., create=False) so status keeps the historical 'No palace found' behavior - remove the temporary chromadb shim aliases from layers.py and searcher.py now that the tests patch the seam directly - add focused tests for the new backends package, including ChromaCollection delegation and ChromaBackend create=True/create=False behavior - retarget layer/searcher tests to patch the backend seam instead of patching chromadb.PersistentClient inside production modules - add a regression test that status() does not create an empty palace when the target path is missing Verification: - ruff check . - uv run pytest -q - uv run pytest -q tests/test_backends.py tests/test_cli.py tests/test_mcp_server.py tests/test_layers.py tests/test_searcher.py tests/test_miner.py Notes: - the separate benchmark/slow/stress layer was started as a soak but not used as the merge gate for this PR branch * refactor: drop duplicate mcp collection cache declaration Remove a redundant `_collection_cache = None` assignment in `mempalace/mcp_server.py` left over after the stage-1 backend seam refactor. This does not change behavior; it only trims review noise in the MCP server module after the read-path hardening pass. Verification: - ruff check mempalace/mcp_server.py - uv run pytest -q tests/test_mcp_server.py --------- Co-authored-by: Sergey Kuznetsov <sergey@iterudit.com>
231 lines
7.1 KiB
Python
231 lines
7.1 KiB
Python
"""
|
|
palace_graph.py — Graph traversal layer for MemPalace
|
|
======================================================
|
|
|
|
Builds a navigable graph from the palace structure:
|
|
- Nodes = rooms (named ideas)
|
|
- Edges = shared rooms across wings (tunnels)
|
|
- Edge types = halls (the corridors)
|
|
|
|
Enables queries like:
|
|
"Start at chromadb-setup in wing_code, walk to wing_myproject"
|
|
"Find all rooms connected to riley-college-apps"
|
|
"What topics bridge wing_hardware and wing_myproject?"
|
|
|
|
No external graph DB needed — built from ChromaDB metadata.
|
|
"""
|
|
|
|
from collections import defaultdict, Counter
|
|
|
|
from .config import MempalaceConfig
|
|
from .palace import get_collection as _get_palace_collection
|
|
|
|
|
|
def _get_collection(config=None):
|
|
config = config or MempalaceConfig()
|
|
try:
|
|
return _get_palace_collection(
|
|
config.palace_path,
|
|
collection_name=config.collection_name,
|
|
create=False,
|
|
)
|
|
except Exception:
|
|
return None
|
|
|
|
|
|
def build_graph(col=None, config=None):
|
|
"""
|
|
Build the palace graph from ChromaDB metadata.
|
|
|
|
Returns:
|
|
nodes: dict of {room: {wings: set, halls: set, count: int}}
|
|
edges: list of {room, wing_a, wing_b, hall} — one per tunnel crossing
|
|
"""
|
|
if col is None:
|
|
col = _get_collection(config)
|
|
if not col:
|
|
return {}, []
|
|
|
|
total = col.count()
|
|
room_data = defaultdict(lambda: {"wings": set(), "halls": set(), "count": 0, "dates": set()})
|
|
|
|
offset = 0
|
|
while offset < total:
|
|
batch = col.get(limit=1000, offset=offset, include=["metadatas"])
|
|
for meta in batch["metadatas"]:
|
|
room = meta.get("room", "")
|
|
wing = meta.get("wing", "")
|
|
hall = meta.get("hall", "")
|
|
date = meta.get("date", "")
|
|
if room and room != "general" and wing:
|
|
room_data[room]["wings"].add(wing)
|
|
if hall:
|
|
room_data[room]["halls"].add(hall)
|
|
if date:
|
|
room_data[room]["dates"].add(date)
|
|
room_data[room]["count"] += 1
|
|
if not batch["ids"]:
|
|
break
|
|
offset += len(batch["ids"])
|
|
|
|
# Build edges from rooms that span multiple wings
|
|
edges = []
|
|
for room, data in room_data.items():
|
|
wings = sorted(data["wings"])
|
|
if len(wings) >= 2:
|
|
for i, wa in enumerate(wings):
|
|
for wb in wings[i + 1 :]:
|
|
for hall in data["halls"]:
|
|
edges.append(
|
|
{
|
|
"room": room,
|
|
"wing_a": wa,
|
|
"wing_b": wb,
|
|
"hall": hall,
|
|
"count": data["count"],
|
|
}
|
|
)
|
|
|
|
# Convert sets to lists for JSON serialization
|
|
nodes = {}
|
|
for room, data in room_data.items():
|
|
nodes[room] = {
|
|
"wings": sorted(data["wings"]),
|
|
"halls": sorted(data["halls"]),
|
|
"count": data["count"],
|
|
"dates": sorted(data["dates"])[-5:] if data["dates"] else [],
|
|
}
|
|
|
|
return nodes, edges
|
|
|
|
|
|
def traverse(start_room: str, col=None, config=None, max_hops: int = 2):
|
|
"""
|
|
Walk the graph from a starting room. Find connected rooms
|
|
through shared wings.
|
|
|
|
Returns list of paths: [{room, wing, hall, hop_distance}]
|
|
"""
|
|
nodes, edges = build_graph(col, config)
|
|
|
|
if start_room not in nodes:
|
|
return {
|
|
"error": f"Room '{start_room}' not found",
|
|
"suggestions": _fuzzy_match(start_room, nodes),
|
|
}
|
|
|
|
start = nodes[start_room]
|
|
visited = {start_room}
|
|
results = [
|
|
{
|
|
"room": start_room,
|
|
"wings": start["wings"],
|
|
"halls": start["halls"],
|
|
"count": start["count"],
|
|
"hop": 0,
|
|
}
|
|
]
|
|
|
|
# BFS traversal
|
|
frontier = [(start_room, 0)]
|
|
while frontier:
|
|
current_room, depth = frontier.pop(0)
|
|
if depth >= max_hops:
|
|
continue
|
|
|
|
current = nodes.get(current_room, {})
|
|
current_wings = set(current.get("wings", []))
|
|
|
|
# Find all rooms that share a wing with current room
|
|
for room, data in nodes.items():
|
|
if room in visited:
|
|
continue
|
|
shared_wings = current_wings & set(data["wings"])
|
|
if shared_wings:
|
|
visited.add(room)
|
|
results.append(
|
|
{
|
|
"room": room,
|
|
"wings": data["wings"],
|
|
"halls": data["halls"],
|
|
"count": data["count"],
|
|
"hop": depth + 1,
|
|
"connected_via": sorted(shared_wings),
|
|
}
|
|
)
|
|
if depth + 1 < max_hops:
|
|
frontier.append((room, depth + 1))
|
|
|
|
# Sort by relevance (hop distance, then count)
|
|
results.sort(key=lambda x: (x["hop"], -x["count"]))
|
|
return results[:50] # cap results
|
|
|
|
|
|
def find_tunnels(wing_a: str = None, wing_b: str = None, col=None, config=None):
|
|
"""
|
|
Find rooms that connect two wings (or all tunnel rooms if no wings specified).
|
|
These are the "hallways" — same named idea appearing in multiple domains.
|
|
"""
|
|
nodes, edges = build_graph(col, config)
|
|
|
|
tunnels = []
|
|
for room, data in nodes.items():
|
|
wings = data["wings"]
|
|
if len(wings) < 2:
|
|
continue
|
|
|
|
if wing_a and wing_a not in wings:
|
|
continue
|
|
if wing_b and wing_b not in wings:
|
|
continue
|
|
|
|
tunnels.append(
|
|
{
|
|
"room": room,
|
|
"wings": wings,
|
|
"halls": data["halls"],
|
|
"count": data["count"],
|
|
"recent": data["dates"][-1] if data["dates"] else "",
|
|
}
|
|
)
|
|
|
|
tunnels.sort(key=lambda x: -x["count"])
|
|
return tunnels[:50]
|
|
|
|
|
|
def graph_stats(col=None, config=None):
|
|
"""Summary statistics about the palace graph."""
|
|
nodes, edges = build_graph(col, config)
|
|
|
|
tunnel_rooms = sum(1 for n in nodes.values() if len(n["wings"]) >= 2)
|
|
wing_counts = Counter()
|
|
for data in nodes.values():
|
|
for w in data["wings"]:
|
|
wing_counts[w] += 1
|
|
|
|
return {
|
|
"total_rooms": len(nodes),
|
|
"tunnel_rooms": tunnel_rooms,
|
|
"total_edges": len(edges),
|
|
"rooms_per_wing": dict(wing_counts.most_common()),
|
|
"top_tunnels": [
|
|
{"room": r, "wings": d["wings"], "count": d["count"]}
|
|
for r, d in sorted(nodes.items(), key=lambda x: -len(x[1]["wings"]))[:10]
|
|
if len(d["wings"]) >= 2
|
|
],
|
|
}
|
|
|
|
|
|
def _fuzzy_match(query: str, nodes: dict, n: int = 5):
|
|
"""Find rooms that approximately match a query string."""
|
|
query_lower = query.lower()
|
|
scored = []
|
|
for room in nodes:
|
|
# Simple substring matching
|
|
if query_lower in room:
|
|
scored.append((room, 1.0))
|
|
elif any(word in room for word in query_lower.split("-")):
|
|
scored.append((room, 0.5))
|
|
scored.sort(key=lambda x: -x[1])
|
|
return [r for r, _ in scored[:n]]
|