mempalace

Author	SHA1	Message	Date
Igor Lins e Silva	3a76360301	fix(hooks): per-target PID guard with atomic claim (#1212 , #1206 ) The hook PID guard used a single global ``~/.mempalace/hook_state/mine.pid`` file, which failed two ways: 1. ``_mine_already_running`` read-then-spawn was a TOCTOU race. Two near-simultaneous Stop hook fires both passed the existence/liveness check before either wrote — so both ended up calling ``_spawn_mine``. 2. ``_spawn_mine`` unconditionally overwrote the global PID file with the new child's PID. The first PID was lost, orphaning the first child. The user-visible result in #1212 was two concurrent ``mempalace mine`` processes running against the same source, both driving HNSW inserts in parallel — exactly the corruption pattern the guard was meant to prevent. #1206 reported the same shape from the perspective of the user (two mines hung on a 350MB folder). Replace the global file with per-target slots under ``~/.mempalace/hook_state/mine_pids/``, keyed by sha256 of the mine sub-arguments (everything after ``mine``). The slot is claimed via ``O_CREAT \| O_EXCL`` so the claim is atomic — two simultaneous fires can never both pass. Stale slots (PID exists but is dead) are reclaimed transparently. Different targets (e.g. project mine vs transcript ingest, or two different MEMPAL_DIRs) get independent slots and run in parallel. The mine subprocess receives its slot path via ``MEMPALACE_MINE_PID_FILE`` env var; ``miner._cleanup_mine_pid_file`` reads that var on exit and removes the slot if it points at our PID, so orphaned slots from crashed mines don't accumulate. Also routes ``_ingest_transcript`` through ``_spawn_mine`` so the transcript ingest path now participates in the same dedup — repeated Stop fires for the same transcript no longer stack parallel mines. Closes #1212 Closes #1206	2026-05-08 02:09:00 -03:00
Igor Lins e Silva	5488e7bb22	fix(miner): harden Windows mine against ONNX bad_alloc + silent partial exits Three small changes that together address the failure modes in #1296: 1. Add pnpm-lock.yaml and yarn.lock to SKIP_FILENAMES, mirroring the existing package-lock.json rule. A 24K-line pnpm-lock.yaml produced ~1124 chunks in one batch and tripped onnxruntime bad_alloc on Windows; pnpm/yarn lockfiles are no more useful to mine than npm's. 2. Skip any file that produces more than MAX_CHUNKS_PER_FILE (500) chunks, with a clear log line. Catches the broader class — generated CSV/JSON, build artifacts, etc. — that the named-file SKIP list will never fully cover. The cap is conservative (500 chunks * 800 chars ≈ 400 KB of source) so legitimate hand-written content still mines. 3. Print a partial-progress summary on any exception in _mine_impl, not just KeyboardInterrupt, then re-raise. Without this, an arbitrary exception (ONNX bad_alloc, chromadb HNSW error, OS fault) propagates silently — the operator sees only the last progress line and assumes the mine succeeded. The new path mirrors the KeyboardInterrupt summary (files_processed, drawers_filed, last_file) plus the exception type and message, then re-raises so the original traceback surfaces and the exit code is non-zero. Tests cover: SKIP_FILENAMES contents, the chunk-cap path returning (0, room) with no upserts, and the new mine-aborted summary surfacing both the partial counters and the exception class.	2026-05-07 08:56:41 -03:00
Igor Lins e Silva	3bebef1503	fix(miner,convo_miner): close remaining wing-name normalization gaps (#1194 ) Two follow-ups against the review on this PR: 1. ``miner.load_config`` no-yaml fallback was returning the raw dirname as the wing, while ``cmd_init`` writes ``topics_by_wing`` under the normalized slug. A hyphenated project mined without a ``mempalace.yaml`` file silently lost every topic tunnel — same key-miss class as #1194, just down the no-yaml branch (raised by Qodo on this PR). 2. ``convo_miner`` was applying the lower/replace rule inline at one call site. Now folded through ``normalize_wing_name`` so all wing-slug producers — ``cmd_init``, ``room_detector_local``, ``miner.load_config`` fallback, ``convo_miner`` — share a single source of truth. No behavior change for any input; pure consolidation. Added ``test_load_config_no_yaml_normalizes_hyphenated_wing`` to lock the fallback path to the normalized slug — fails on develop without the miner change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 03:12:06 -03:00
Igor Lins e Silva	c4eeec8642	test: use shlex.quote in resume-hint assertions for Windows The pre-existing test_maybe_run_mine_prompt_declined_prints_hint asserted the bare unquoted form `mempalace mine {tmp_path}`. After the production code switched to shlex.quote on the resume hint, this passed on Linux/macOS (POSIX paths have no characters that trigger quoting) but failed on Windows where backslashes always get wrapped in single quotes. Mirror the production code in the assertion via shlex.quote so it's portable across platforms; do the same for the two new spaces-in-path tests for consistency.	2026-04-25 01:18:31 -03:00
Igor Lins e Silva	8faf0042b5	fix(cli,mine): shell-quote project_dir in resume hints The "Skipped. Run mempalace mine <dir>" hint after declining the init prompt and the "Re-run mempalace mine <dir> to resume" hint after a Ctrl-C interruption both interpolated project_dir without shell-quoting. A path containing spaces or metacharacters produced a copy-paste-broken command. Both spots now use shlex.quote(project_dir). Adds regression tests covering each hint with a path that contains a space.	2026-04-25 01:10:17 -03:00
Igor Lins e Silva	f13b9a46a2	feat(cli): init prompts to mine, mine handles Ctrl-C gracefully `mempalace init` now ends with a `Mine this directory now? [Y/n]` prompt and runs `mine()` in-process when accepted; `--yes` skips the prompt and auto-mines for non-interactive callers. Declining prints the resume command. Removes the "remember to type the next command" friction since rooms + entities just got set up. `mempalace mine` now wraps its main loop in `try / except KeyboardInterrupt` and prints `files_processed`, `drawers_filed`, and `last_file` before exiting with code 130 on Ctrl-C. Re-mining is safe because deterministic drawer IDs make the upsert idempotent. The hooks PID lock at `~/.mempalace/hook_state/mine.pid` is now actively removed in a `finally` when its entry points at us, on clean exit, error, or interrupt — preventing the next hook fire from briefly waiting on a stale PID. Closes #1181, #1182.	2026-04-25 01:01:24 -03:00
Igor Lins e Silva	865a36bc5c	feat(graph): namespace topic-tunnel rooms with "topic:" prefix + kind field Previously a cross-wing topic tunnel for "Angular" stored the room as "Angular" — colliding with a wing's literal folder-derived "Angular" room at follow_tunnels/list_tunnels read time, and exposing raw topic strings (which may contain characters rejected by sanitize_name) to the MCP surface. Topic tunnels now store their room as "topic:<original-casing>" and carry kind="topic" on the stored dict. Explicit tunnels get kind="explicit" (default). follow_tunnels("wing", "Angular") on a literal Angular room no longer surfaces topic connections for the same name, and any LLM scanning list_tunnels has a visible discriminator.	2026-04-24 23:06:26 -03:00
Igor Lins e Silva	fe051adc73	feat(graph): cross-wing tunnels by shared topics (#1180 ) When two wings have one or more confirmed TOPIC labels in common, the miner now drops a symmetric tunnel between them at mine time so the palace graph reflects shared themes (frameworks, vendors, recurring concepts). - llm_refine: TOPIC label routes to a dedicated `topics` bucket so the signal survives confirmation instead of getting collapsed into `uncertain` and dropped. - entity_detector / project_scanner: bucket plumbed through the detection pipeline; `confirm_entities` returns confirmed topics alongside people/projects. - miner.add_to_known_entities: optional `wing` parameter records the confirmed topics under `topics_by_wing` in `~/.mempalace/known_entities.json`. Wing names do NOT leak into the flat known-name set used by drawer-tagging. - palace_graph: `compute_topic_tunnels` and `topic_tunnels_for_wing` create symmetric tunnels via the existing `create_tunnel` API so they share dedup and persistence with explicit tunnels. - miner.mine: post-file-loop pass calls `topic_tunnels_for_wing` for the freshly-mined wing. Failures are logged but never abort the mine. - config: `topic_tunnel_min_count` knob (env `MEMPALACE_TOPIC_TUNNEL_MIN_COUNT` or `~/.mempalace/config.json`), default 1. Tests cover topic persistence through init->mine, tunnel creation when wings share a topic, no tunnel below threshold, cross-wing tunnel retrieval via `list_tunnels`, dedup on recompute, case-insensitive overlap, and the end-to-end mine-time wiring. Out of scope for this PR (called out in the PR body): manifest- dependency overlap, per-topic allow/deny lists, search-result surfacing.	2026-04-24 23:06:26 -03:00
copilot-swe-agent[bot]	fbd0904799	test: cover embedding device fallback and bounded upserts Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 23:06:50 +00:00
Igor Lins e Silva	b150d33398	fix(mine): skip generated entities file	2026-04-24 01:42:19 -03:00
jp	feba7e8043	fix(miner): same None-metadata guard for status() histogram loop `status()` walks `col.get(include=["metadatas"])` and buckets each drawer into a `wing_rooms[wing][room]` histogram. The same ChromaDB return shape fixed in the search print path — `None` entries in the `metadatas` list for drawers with no stored metadata — crashes the status command with: AttributeError: 'NoneType' object has no attribute 'get' Applies the matching ``m = m or {}`` guard so None-metadata drawers roll up under the existing `?/?` fallback bucket instead of killing the command mid-tally. Reproduced on a 135K-drawer palace where two drawers had `metadata=None`; both now show under `WING: ? / ROOM: ?` in the tally while the command prints the full histogram as designed. Adds a regression test that feeds `status()` a fake collection whose `get()` returns a `None` in the middle of the metadatas list and asserts both the fallback bucket and the real wing render.	2026-04-18 10:26:11 -07:00
mvalentsev	8bf940f861	fix: use i18n candidate patterns for entity extraction in miner and palace entity_detector.py was refactored in #911 to load candidate patterns from i18n locale JSON files, supporting non-Latin scripts (Cyrillic, accented Latin, etc.). But three other code paths still hardcoded the ASCII-only regex [A-Z][a-z]{2,}, silently missing non-Latin entity names in metadata tagging, closet indexing, and registry lookups. Replace the hardcoded regex with a shared _candidate_entity_words() helper that reuses the same i18n candidate_patterns as entity_detector.	2026-04-16 10:35:40 +05:00
Matt Van Horn	e8e93b53c0	fix: allow mining directories without local mempalace.yaml When no mempalace.yaml or mempal.yaml exists in the source directory, return a default config (wing = directory name, room = general) instead of calling sys.exit(1). This lets users mine any directory into their palace without requiring init first. Closes #14.	2026-04-14 13:53:07 -03:00
Igor Lins e Silva	5320246297	Merge pull request #807 from sha2fiddy/fix/218-cosine-distance-metadata Fix: set cosine distance metadata on all collection creation sites	2026-04-13 21:18:40 -03:00
eblander	8dc5970ca9	Fix: ruff format with CI-pinned version (0.4.x)	2026-04-13 18:29:48 -04:00
Igor Lins e Silva	7e5eeda9a5	feat(normalize): auto-rebuild stale drawers via NORMALIZE_VERSION schema gate Without this, the strip_noise improvement only helps new mines. Every user who had already mined Claude Code JSONL sessions would keep their noise-polluted drawers forever, because convo_miner's file_already_mined skip short-circuits before re-processing. Adds a versioned schema gate so upgrades propagate silently: - palace.NORMALIZE_VERSION=2 — bumped when the normalization pipeline changes shape (this PR's strip_noise is the v1→v2 bump). - file_already_mined now returns False if the stored normalize_version is missing or less than current, triggering a rebuild on next mine. - Both miners stamp drawers with the current normalize_version. - convo_miner now purges stale drawers before inserting fresh chunks (mirrors miner.py's existing delete+insert), extracted into _file_convo_chunks helper to keep mine_convos under ruff's C901 limit. User experience: upgrade mempalace, run `mempalace mine` as usual, old noisy drawers get silently replaced with clean ones. No erase needed, no "you need to rebuild" changelog footgun. Tests: - test_file_already_mined_returns_false_for_stale_normalize_version — pins the version gate contract for missing/v1/current. - test_add_drawer_stamps_normalize_version — fresh project-miner drawers carry the field. - test_mine_convos_rebuilds_stale_drawers_after_schema_bump — end-to-end proof that a pre-v2 palace gets silently cleaned on next mine, with orphan drawers purged and NOT skipped. Existing test_file_already_mined_check_mtime updated to include the new field; all other tests unaffected.	2026-04-13 16:20:55 -03:00
eblander	1e86892e62	Fix: set cosine distance metadata on all collection creation sites ChromaDB defaults HNSW index to L2 (Euclidean) distance, but MemPalace scoring uses 1-distance which requires cosine (range 0-2). Add metadata={"hnsw:space": "cosine"} to the 4 production and 3 test call sites that were missing it. Closes #218	2026-04-13 11:00:52 -04:00
Mikhail Valentsev	091c2fe1c6	fix: mine --dry-run TypeError on files with room=None (#586 ) (#687 ) * fix: return "general" room from process_file error paths (#586) process_file() returned (0, None) for already-mined, unreadable, and too-short files. In --dry-run mode the caller always enters the room_counts branch, so None ended up as a dict key and crashed the summary printer with "unsupported format string passed to NoneType.__format__". Returning "general" instead of None makes the function contract explicit: it always yields (int, str). This matches the consensus fix discussed in the issue thread. * style: apply ruff format to test_miner.py	2026-04-12 14:23:44 -07:00
Sergey Kuznetsov	ae5196bc8d	Мempalace backend seam (#413 ) * refactor: add stage-1 backend abstraction seam Introduce the first upstreamable storage seam for MemPalace without bringing in the PostgreSQL spike or any benchmark artifacts. This change adds a small backend package with: - BaseCollection as the minimal collection contract - ChromaBackend/ChromaCollection as the default implementation It then routes the main runtime collection consumers through that seam: - palace.py - searcher.py - layers.py - palace_graph.py - mcp_server.py - miner.status() Behavioral constraints kept for stage 1: - ChromaDB remains the only backend and the default path - no config/env backend selection yet - no PostgreSQL code - no benchmark or research files - existing tests stay unchanged Important compatibility details: - read paths now call the seam with create=False so they still surface the existing 'no palace found' behavior instead of silently creating empty collections - write paths keep create=True semantics through palace.get_collection() - layers/searcher retain a chromadb module attribute so the existing mock-based tests can keep patching PersistentClient unchanged - ChromaBackend only creates palace directories on create=True, which preserves mocked read-path tests that use fake read-only paths Verification: - python3 -m py_compile mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q # 529 passed, 106 deselected * refactor: clean up stage-1 seam compatibility shims Tighten the stage-1 backend abstraction branch after review. This follow-up does three small things: - keep the chromadb compatibility hook in searcher.py and layers.py, but express it through the backends.chroma module so it no longer reads like an accidental unused import - fix the palace_graph.py helper alias to avoid the local name collision flagged by ruff (imported helper vs local _get_collection wrapper) - preserve the existing mock-based test patch points unchanged while keeping the new backend seam intact Why this matters: - the direct form looked like a dead import in review, even though it was intentionally preserving the existing test seam ( and ) - palace_graph.py had a real lint issue ( redefinition) that was small but worth fixing before a public PR Verification: - /opt/homebrew/bin/ruff check mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * docs: explain backend shim imports in search paths Add short code comments in searcher.py and layers.py explaining why the module-level `chromadb` alias remains after the stage-1 backend seam refactor. The alias is intentional: it preserves the existing mock patch points used by the current test suite (`mempalace.searcher.chromadb.PersistentClient` and `mempalace.layers.chromadb.PersistentClient`) while the runtime logic now flows through the backend abstraction. This keeps the public PR easier to review because the apparent "unused import" now has an explicit reason next to it. Verification: - /opt/homebrew/bin/ruff check mempalace/searcher.py mempalace/layers.py - pytest -q tests/test_layers.py tests/test_searcher.py * refactor: reuse a default backend instance in palace helper Tighten the stage-1 backend seam by promoting the default Chroma backend adapter to a module-level singleton in `mempalace/palace.py`. This keeps the stage-1 scope unchanged — Chroma is still the only backend wired in this branch — but avoids constructing a fresh `ChromaBackend()` object on every `get_collection()` call. The backend is stateless today, so this is a readability/cleanup change rather than a behavioral one. Why this helps: - makes `palace.get_collection()` read like a real default factory instead of an inline constructor call - keeps the stage-1 branch a little cleaner before opening the public PR - does not widen the backend surface or change any config/runtime behavior Verification: - python3 -m py_compile mempalace/palace.py - pytest -q tests/test_miner.py tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * fix: harden read-only seam behavior and update seam tests Preserve the stage-1 backend abstraction while closing the real read-path regression surfaced in PR review. What changed: - make ChromaBackend.get_collection(create=False) fail fast when the palace directory does not exist instead of letting PersistentClient create it as a side effect - update miner.status() to call get_collection(..., create=False) so status keeps the historical 'No palace found' behavior - remove the temporary chromadb shim aliases from layers.py and searcher.py now that the tests patch the seam directly - add focused tests for the new backends package, including ChromaCollection delegation and ChromaBackend create=True/create=False behavior - retarget layer/searcher tests to patch the backend seam instead of patching chromadb.PersistentClient inside production modules - add a regression test that status() does not create an empty palace when the target path is missing Verification: - ruff check . - uv run pytest -q - uv run pytest -q tests/test_backends.py tests/test_cli.py tests/test_mcp_server.py tests/test_layers.py tests/test_searcher.py tests/test_miner.py Notes: - the separate benchmark/slow/stress layer was started as a soak but not used as the merge gate for this PR branch * refactor: drop duplicate mcp collection cache declaration Remove a redundant `_collection_cache = None` assignment in `mempalace/mcp_server.py` left over after the stage-1 backend seam refactor. This does not change behavior; it only trims review noise in the MCP server module after the read-path hardening pass. Verification: - ruff check mempalace/mcp_server.py - uv run pytest -q tests/test_mcp_server.py --------- Co-authored-by: Sergey Kuznetsov <sergey@iterudit.com>	2026-04-11 16:16:49 -07:00
bensig	58b8d5b198	fix: release ChromaDB handles before rmtree on Windows	2026-04-09 09:31:55 -07:00
bensig	1c48f4d2c3	fix: use os.utime in mtime test for Windows compatibility	2026-04-09 09:23:08 -07:00
bensig	2448ac0026	test: add coverage for file_already_mined mtime check Covers the check_mtime=True path in palace.py to meet 85% coverage threshold.	2026-04-09 08:56:28 -07:00
Tal Muskal	abd52534bb	test: bring coverage to 85%, set threshold to 85, reset version to 3.0.11 - Add tests for config, convo_miner, spellcheck, knowledge_graph - Fix Windows PermissionError in test cleanup (chromadb file locks) - Add UTF-8 encoding to split_mega_files, entity_registry, hooks_cli - Fix mcp_server parse_known_args logging for unknown args - Set coverage threshold to 85 in pyproject.toml and CI - Reset all version files to 3.0.11 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 21:38:12 +03:00
ac-opensource	c8c220d789	fix: support nested .gitignore rules during mining	2026-04-08 00:02:21 +08:00
ac-opensource	9b9daa9b4b	fix: respect .gitignore during project mining	2026-04-07 22:26:06 +08:00
bensig	0f8fa8c7d5	bench: add benchmark runners, results docs, and test suite Benchmarks: LongMemEval, LoCoMo, ConvoMem, MemBench runners with methodology docs and hybrid retrieval analysis. Tests: config, miner, convo_miner, normalize — 9 tests, all passing.	2026-04-04 18:33:42 -07:00

26 Commits