mempalace

Author	SHA1	Message	Date
Igor Lins e Silva	133dfbfb41	fix(search): BM25 hybrid rerank, legacy-metric warning, invariant tests Three tightly-coupled search-quality fixes for v3.3.3: 1. CLI `mempalace search` now routes through the same `_hybrid_rank` the MCP path already used. Drawers whose text contains every query term but embed as file-tree noise (directory listings, diffs, log fragments) were scoring cosine distance >= 1.0 — the display formula `max(0, 1 - dist)` then floored every result to `Match: 0.0`, with no way for the user to tell a lexical match from a total miss. BM25 catches these cleanly; the display surfaces both `cosine=` and `bm25=` so users see which component is firing. 2. Legacy-palace distance-metric warning. Palaces created before `hnsw:space=cosine` was consistently set silently use ChromaDB's default L2 metric, which breaks the cosine-similarity formula (L2 distances routinely exceed 1.0 on normalized 384-dim vectors). The search path now detects this at query time and prints a one-line notice pointing at `mempalace repair`. Only fires for legacy palaces; new palaces already set cosine correctly. 3. Invariant tests pinning `hnsw:space=cosine` on every collection- creation path — legacy `get_or_create_collection`, legacy `create_collection`, RFC 001 `get_collection(create=True)`, the public `palace.get_collection`, and a round-trip through reopen. Locks down the correctness that new-user palaces already have so a future refactor can't silently regress it. Also adds a `metadata` property to `ChromaCollection` so callers can read the underlying hnsw:space without reaching into `_collection`. Tests: - New regression: simulate three candidates at distance 1.5 (cosine=0), one containing query terms — must rank first with non-zero bm25. - New: legacy metric (empty or non-cosine) produces stderr warning. - New: correctly-configured palace produces no warning. - New: all five creation paths pin cosine metadata. All existing tests still pass.	2026-04-25 00:39:37 -03:00
Igor Lins e Silva	b9e41286fa	Merge pull request #1189 from MemPalace/openarena-claim chore: add OpenArena owner claim verification file	2026-04-24 23:27:49 -03:00
Igor Lins e Silva	8d49b009e0	Merge pull request #1184 from MemPalace/feat/cross-wing-topic-tunnels feat(graph): cross-wing tunnels by shared topics (#1180)	2026-04-24 23:25:55 -03:00
Igor Lins e Silva	0197b2eea9	chore: add OpenArena owner claim verification file	2026-04-24 23:19:29 -03:00
Igor Lins e Silva	865a36bc5c	feat(graph): namespace topic-tunnel rooms with "topic:" prefix + kind field Previously a cross-wing topic tunnel for "Angular" stored the room as "Angular" — colliding with a wing's literal folder-derived "Angular" room at follow_tunnels/list_tunnels read time, and exposing raw topic strings (which may contain characters rejected by sanitize_name) to the MCP surface. Topic tunnels now store their room as "topic:<original-casing>" and carry kind="topic" on the stored dict. Explicit tunnels get kind="explicit" (default). follow_tunnels("wing", "Angular") on a literal Angular room no longer surfaces topic connections for the same name, and any LLM scanning list_tunnels has a visible discriminator.	2026-04-24 23:06:26 -03:00
Igor Lins e Silva	fe051adc73	feat(graph): cross-wing tunnels by shared topics (#1180 ) When two wings have one or more confirmed TOPIC labels in common, the miner now drops a symmetric tunnel between them at mine time so the palace graph reflects shared themes (frameworks, vendors, recurring concepts). - llm_refine: TOPIC label routes to a dedicated `topics` bucket so the signal survives confirmation instead of getting collapsed into `uncertain` and dropped. - entity_detector / project_scanner: bucket plumbed through the detection pipeline; `confirm_entities` returns confirmed topics alongside people/projects. - miner.add_to_known_entities: optional `wing` parameter records the confirmed topics under `topics_by_wing` in `~/.mempalace/known_entities.json`. Wing names do NOT leak into the flat known-name set used by drawer-tagging. - palace_graph: `compute_topic_tunnels` and `topic_tunnels_for_wing` create symmetric tunnels via the existing `create_tunnel` API so they share dedup and persistence with explicit tunnels. - miner.mine: post-file-loop pass calls `topic_tunnels_for_wing` for the freshly-mined wing. Failures are logged but never abort the mine. - config: `topic_tunnel_min_count` knob (env `MEMPALACE_TOPIC_TUNNEL_MIN_COUNT` or `~/.mempalace/config.json`), default 1. Tests cover topic persistence through init->mine, tunnel creation when wings share a topic, no tunnel below threshold, cross-wing tunnel retrieval via `list_tunnels`, dedup on recompute, case-insensitive overlap, and the end-to-end mine-time wiring. Out of scope for this PR (called out in the PR body): manifest- dependency overlap, per-topic allow/deny lists, search-result surfacing.	2026-04-24 23:06:26 -03:00
Igor Lins e Silva	ed2ba726c9	Merge pull request #1185 from MemPalace/perf/batched-upsert-gpu perf(mining): batch per-chunk upserts + optional GPU acceleration	2026-04-24 20:34:28 -03:00
copilot-swe-agent[bot]	031512438e	test: isolate embedding module state with monkeypatch Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 23:11:29 +00:00
copilot-swe-agent[bot]	3d529e7028	test: tidy embedding follow-up imports Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 23:10:20 +00:00
copilot-swe-agent[bot]	9fbdba17ca	test: isolate embedding device env override tests Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 23:09:23 +00:00
copilot-swe-agent[bot]	25c885ae0b	test: use tmp_path for embedding device config tests Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 23:08:26 +00:00
copilot-swe-agent[bot]	fbd0904799	test: cover embedding device fallback and bounded upserts Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 23:06:50 +00:00
Igor Lins e Silva	a4868a3589	perf(mining): batch per-chunk upserts and add optional GPU acceleration The miner upserted one drawer per ChromaDB call, paying tokenizer + ONNX session setup per chunk. The embedding device was CPU-only because no EmbeddingFunction was ever wired through the backend. Two changes, each a speedup in its own right; stacked they give ~10x end-to-end on a medium corpus (20 files, 568 drawers): 1. Batched upsert. `process_file` and `_file_chunks_locked` now collect all chunks of a file into a single `collection.upsert(...)` so the embedding model runs one forward pass per file instead of N. 2. Hardware-accelerated embedding function. New `mempalace/embedding.py` wraps `ONNXMiniLM_L6_V2` with configurable `preferred_providers`. `MEMPALACE_EMBEDDING_DEVICE` (or `embedding_device` in config.json) selects auto / cpu / cuda / coreml / dml. Unavailable accelerators log a warning and fall back to CPU. The factory subclasses `ONNXMiniLM_L6_V2` and spoofs its `name()` to `"default"` so the persisted EF identity matches existing palaces created with ChromaDB's bare `DefaultEmbeddingFunction` -- same model, same 384-dim vectors, no rebuild needed when turning GPU on. `ChromaBackend.get_collection` / `create_collection` now pass the resolved EF on every call so miner writes and searcher reads agree. Benchmarks (i9-12900KF + RTX 3090, medium scenario, 568 drawers): per-chunk + CPU 19.77s · 29 drw/s (baseline) batched + CPU 8.07s · 70 drw/s (2.4x) batched + CUDA 2.15s · 264 drw/s (9.2x) Reproducible via `benchmarks/mine_bench.py`. Install paths: pip install mempalace[gpu] # NVIDIA CUDA pip install mempalace[dml] # DirectML (Windows) pip install mempalace[coreml] # macOS Neural Engine Mine header now prints `Device: cpu\|cuda\|...` so users can confirm the accelerator engaged.	2026-04-24 19:42:35 -03:00
Igor Lins e Silva	7a757916b3	Merge pull request #1176 from MemPalace/docs/changelog-3.3.3-init-overhaul docs(changelog): document init entity-detection overhaul in 3.3.3	2026-04-24 14:34:09 -03:00
Igor Lins e Silva	174ecaf42c	Update CHANGELOG.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-24 14:33:51 -03:00
Igor Lins e Silva	431e42a720	docs(changelog): document init entity-detection overhaul in 3.3.3 Adds entries to the 3.3.3 section for the work that landed via #1148, #1150, #1157, and #1175 (rescued from stacked feature branches into develop via #1175). Without these entries the 3.3.3 release notes on main would advertise only the hook/diary/search fixes that made it to develop through the first direct merge. Covers: - Manifest + git-author entity detection (#1148) - Regex detector accuracy improvements (#1148) - Optional --llm classification with Ollama / openai-compat / Anthropic provider abstraction and interactive UX (#1150) - Claude Code conversation scanner (#1150) - Init → miner registry wire-up so confirmed entities actually reach drawer metadata tagging (#1157) - Case-insensitive project dedup across all sources (#1175) - `mempalace mine` skips the generated entities.json artifact	2026-04-24 14:25:13 -03:00
Igor Lins e Silva	f246d25b7f	Merge pull request #1166 from arnoldwender/fix/security-palace-path-env-normalize fix(security): normalize MEMPALACE_PALACE_PATH env var with abspath+expanduser	2026-04-24 14:16:58 -03:00
Igor Lins e Silva	8a6ebbe363	Merge pull request #1175 from MemPalace/chore/rescue-stacked-prs-into-develop chore: rescue merged stacked PRs #1150 and #1157 into develop	2026-04-24 14:14:50 -03:00
Igor Lins e Silva	55c83e9f3d	fix(init): case-insensitive project dedup across manifest and convo sources `discover_entities` was deduping the convo_scanner results against the manifest/git scan with a case-sensitive key, while every other dedup path in the pipeline (`_merge_detected`, `miner.add_to_known_entities`) uses case-insensitive matching. A project named `foo` in a manifest plus `Foo` as a Claude Code `cwd` variant would surface as two review entries instead of collapsing to one. Fix keys `by_name` by `name.lower()` while preserving the first-seen casing, matching the rest of the pipeline. Flagged by Copilot on #1175. Regression test asserts a manifest project + a CamelCase-variant convo cwd for the same real project collapse to one entry.	2026-04-24 14:11:54 -03:00
Igor Lins e Silva	19ce58c143	chore: rescue merged stacked PRs #1150 and #1157 into develop #1148, #1150, and #1157 were reviewed and merged on GitHub, but the two stacked children landed on their parent feature branches (now stale) rather than on develop. Only #1148's commits reached develop via the direct merge. Release PR #1159 (develop → main for v3.3.3) is therefore missing the LLM refinement, Claude-conversation scanner, and miner- registry wire-up that were ostensibly part of the release. This merge brings the stale `feat/llm-entity-refine` branch (which contains the rolled-up merge commit for #1157 → #1150 → everything below) into develop so the release tag includes it. No code changes here — only history recovery.	2026-04-24 13:49:12 -03:00
Igor Lins e Silva	61d6c3cc3c	Merge pull request #1157 from MemPalace/feat/wire-entities-to-miner feat(init): wire confirmed entities into the miner's known-entities registry	2026-04-24 13:24:56 -03:00
Igor Lins e Silva	a851c7a7df	Merge pull request #1148 from MemPalace/feat/project-scanner-entity-detection feat(init): scan manifests and git authors for real entity signal (v1)	2026-04-24 13:23:43 -03:00
Arnold Wender	ae1c52e43b	test(config): drop tilde-absence assertion for Windows 8.3 compatibility Windows 8.3 short paths legitimately contain tildes (e.g. the CI runner's USERPROFILE resolves to C:\Users\RUNNER~1\...), so asserting "~" is absent from the expanded path fails on Windows even when expanduser worked correctly. The equality check against os.path.abspath(os.path.expanduser()) is authoritative; drop the redundant absence heuristic.	2026-04-24 11:20:30 +02:00
Arnold Wender	02a88b0864	test(config): make palace_path tests portable across POSIX and Windows The new abspath+expanduser normalization means /env/palace no longer round-trips literally on Windows (abspath prepends the current drive, producing D:\env\palace). Rewrite the env-var tests to compare against os.path.abspath(os.path.expanduser(raw)) instead of hardcoded Unix strings, and build raw paths with os.path.join so backslash-vs-slash differences don't leak into assertions. Covers test_env_override, the three new tests, and the legacy-alias test in test_config_extra.	2026-04-24 11:13:51 +02:00
Arnold Wender	bcd07916a3	fix(security): normalize MEMPALACE_PALACE_PATH env var with abspath+expanduser MEMPALACE_PALACE_PATH (and legacy MEMPAL_PALACE_PATH) read from the environment was returned as-is from Config.palace_path, while the sibling --palace CLI path gets os.path.abspath() applied at mcp_server.py:62. That inconsistency means env-var callers can end up with literal '~' or unresolved '..' segments in the path, which (a) breaks user intuition and (b) lets a caller who can set env vars on the target user's session redirect palace storage to an unexpected location. Apply os.path.abspath(os.path.expanduser(...)) to the env-var branch so both code paths converge on the same resolved absolute path. Closes #1163	2026-04-24 11:06:30 +02:00
Ben Sigman	8ac98f038c	Merge pull request #1147 from MemPalace/fix/3.3.3-followups fix(3.3.3): two followups from #1145 before tag cut	2026-04-24 00:07:12 -07:00
copilot-swe-agent[bot]	1b1854e5ae	fix(init): address registry review feedback Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/76794fde-2383-4674-ab36-f89ad803eeb2 Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 05:25:34 +00:00
Igor Lins e Silva	4631d6a7db	feat(init): wire confirmed entities into the miner's known-entities registry The init step's output was a dead file. miner.py has always read `~/.mempalace/known_entities.json` to tag drawer metadata with recognized names, but nothing ever wrote it — so init's careful manifest + git + LLM detection work stopped at `<project>/entities.json` and never reached the path that actually uses it. Measured delta on a representative prose snippet (eight sentences mentioning six real people and four real projects): - Empty registry: 0 entities recognized (multi-word names fail the frequency threshold; lowercase/hyphenated project names don't match the CamelCase regex). - Registry populated by init: 12 entities recognized (all correct, zero false positives). Every recognized name becomes a semicolon-separated metadata tag on the drawer, which ChromaDB uses for entity-filtered search. Implementation: - `miner.add_to_known_entities({category: [names]})` reads the existing registry, unions each category (case-insensitively, preserving first- seen casing), and writes back. The function is tolerant of the two on-disk shapes miner already supports: list of names, or dict mapping name → code (dialect-style). In the dict case new names are added as keys with `None` values so existing codes aren't overwritten. - Invalidates the in-process mtime cache so same-process callers (`cmd_init` → `cmd_mine` in one run) see the write immediately. - Writes with `ensure_ascii=False` so non-ASCII names (Gergő Móricz, Arturo Domínguez, etc.) stay readable on disk. - Chmods 0o600 — the registry mirrors confirm-step PII from the user's git authors and local paths. cmd_init now calls this at the end of the confirm-entities step, after the per-project `entities.json` is written (which is kept as an audit trail the user can inspect or hand-edit). The per-project file is still excluded from mining via `SKIP_FILENAMES` from the earlier fix. 17 new tests cover: fresh-file creation, list-category union, case- insensitive dedup, preservation of untouched categories, dict-format registries, malformed/non-dict file recovery, cache invalidation, unicode round-trip, and an end-to-end verification that the miner's `_extract_entities_for_metadata` picks up every registered name.	2026-04-24 02:09:32 -03:00
Igor Lins e Silva	b150d33398	fix(mine): skip generated entities file	2026-04-24 01:42:19 -03:00
Igor Lins e Silva	035fe6d658	fix(llm): tighter refinement — word boundaries, JSON extraction, authoritative sources Addresses issues found while reviewing the initial phase-2 implementation against real data: Bug: uncertain bucket starved from the LLM. `discover_entities` was dropping the regex-uncertain bucket whenever real git/manifest signal existed — which is exactly when `--llm` is most useful for cleaning up prose noise. The uncertain candidates never reached the refinement step. Fixed: only drop when `llm_provider is None`. Context collection: word boundaries, not substring. `_collect_contexts` used substring matching on lower-cased lines, so the name "Go" matched "good", "going", "forgot". Switched to a `(?<!\w)…(?!\w)` regex so short names only match at token boundaries. Authoritative-source detection replaces confidence threshold. Previously the refinement step skipped entries with `confidence >= 0.95` to avoid second-guessing manifest-backed projects. That threshold was fragile — the regex detector produces 0.99 confidence for things like `code file reference (5x)` on framework names (OpenAPI, etc.), so those skipped the LLM despite being regex-only noise. New helpers `_is_authoritative_person` / `_is_authoritative_project` look at the actual signal strings (commits, package.json, etc.) to decide. Now also refines regex-derived people. After #1148's high-pronoun-signal fix, the regex detector can promote non-people to the `people` bucket (e.g. a capitalized common noun that happened to appear near pronouns). The LLM now gets a chance to clean those up, while git-authored people are still skipped. Robust JSON extraction. Small local models routinely wrap JSON output in prose ("Sure, here's the classification: {…}"). The previous code-fence stripper failed on that. `_extract_json_candidates` now does balanced-bracket extraction with string-aware quote handling, so it recovers JSON from: - raw responses - markdown fenced blocks - JSON embedded inside surrounding text - multiple candidate objects/arrays Prompt guidance for frameworks vs user projects. Added an explicit instruction: frameworks, runtimes, APIs, cloud services, and third-party vendors (Angular, OpenAPI, Terraform, Bun, Google, etc.) are TOPIC unless the context clearly says it's the user's own codebase. Directly addresses a false-positive pattern observed during dev runs. Defensive mtime. `convo_scanner._safe_mtime` catches OSError during `stat()` — permission changes, filesystem races, broken symlinks — and sorts the affected file to the end of the newest-first order rather than crashing the scan. Cosmetic: merged two adjacent f-strings on the same line in `backends/chroma.py` and `llm_client.py` (no behaviour change). 15 new tests cover the OSError fallback, word-boundary matching, JSON extraction variants, authoritative-source helpers, refining high- confidence regex projects, and end-to-end LLM refinement preserving the uncertain bucket.	2026-04-24 01:30:40 -03:00
copilot-swe-agent[bot]	9486d8b129	test(project-scanner): make gitdir fixtures portable Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3c277c46-20b3-4a43-8eb7-8ee2eb3cb55a Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 03:53:43 +00:00
copilot-swe-agent[bot]	d4cc367261	test(project-scanner): harden git helper execution Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3c277c46-20b3-4a43-8eb7-8ee2eb3cb55a Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 03:52:37 +00:00
copilot-swe-agent[bot]	ec9084f4d8	refactor(project-scanner): tidy manifest priority helpers Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3c277c46-20b3-4a43-8eb7-8ee2eb3cb55a Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 03:51:21 +00:00
copilot-swe-agent[bot]	851ebebc29	test(project-scanner): tighten git helper env handling Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3c277c46-20b3-4a43-8eb7-8ee2eb3cb55a Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 03:50:13 +00:00
copilot-swe-agent[bot]	70d4c5471e	fix(project-scanner): address review feedback Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3c277c46-20b3-4a43-8eb7-8ee2eb3cb55a Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 03:48:47 +00:00
Igor Lins e Silva	36a8f219c2	feat(init): wire --llm flag and convo_scanner into discover_entities Extends the init orchestrator to consume two new signal sources: 1. Claude Code conversation dirs: when the target is a `~/.claude/projects/` root, convo_scanner contributes ProjectInfo entries alongside the git/manifest projects. Dedup is by name, preferring the entry with more user-authored activity. 2. Optional LLM refinement: when --llm is passed, discover_entities constructs the provider, validates availability, and runs llm_refine.refine_entities on the merged candidates. Status summary (reclassified / dropped / cancelled / batch errors) prints to stderr. New init flags (opt-in, default remains zero-API): - --llm: enable refinement - --llm-provider: ollama (default) \| openai-compat \| anthropic - --llm-model: default gemma4:e4b for Ollama - --llm-endpoint: URL (required for openai-compat) - --llm-api-key: falls back to env ($ANTHROPIC_API_KEY or $OPENAI_API_KEY depending on provider) Provider check_available runs before the scan, so the user sees an immediate error ("Run: ollama pull <model>" or "ANTHROPIC_API_KEY not set") rather than a mid-scan failure.	2026-04-24 00:47:14 -03:00
Igor Lins e Silva	10a743d5d8	feat(llm): interactive entity refinement with batching and cancellation Takes the candidate set produced by phase-1 detection (manifests, git authors, regex on prose) and asks an LLM to reclassify each candidate as PERSON / PROJECT / TOPIC / COMMON_WORD / AMBIGUOUS. Scale approach: never feed the raw corpus to the LLM. For each candidate, collect up to 3 context lines from sampled prose, cap each at 240 chars, batch 25 candidates per call. Keeps total input around 50-100K tokens even on large corpora and completes in a few minutes on a 4B local model. Interactive UX: - Stderr progress bar with the current candidate name, updates per-batch. - Ctrl-C interrupts cleanly: returns a RefineResult with `cancelled=True` and whatever was classified before the interrupt. The partial result is safe to pass straight to confirm_entities. - Per-batch errors (transport, parse) are recorded in `errors` and don't abort the whole run. Refinement scope: only `uncertain` and low-confidence `projects` entries are sent. Manifest-backed projects (conf >= 0.95) and git- authored people are already authoritative and skip the LLM. Response parser is defensive — accepts `label` or `type` keys, lowercase/uppercase variants, top-level list or wrapped object, and strips markdown code fences. Unknown labels become AMBIGUOUS so the user reviews them rather than silently accepting a bad classification. `collect_corpus_text` provides a simple stratified prose sampler (recent first, capped per-file) so callers don't need to build their own corpus window. 28 tests with a FakeProvider (no network). Covers context collection, prompt building, response parsing variants, classification apply, end-to-end refine, and Ctrl-C partial-result behavior.	2026-04-24 00:46:59 -03:00
Igor Lins e Silva	df6c7d0dc3	feat(llm): pluggable provider abstraction for entity refinement Three providers cover the useful space while keeping the zero-API default: - `ollama` (default): local models via http://localhost:11434. Works fully offline. Tag-matching check accepts both `model` and `model:latest` forms. - `openai-compat`: any /v1/chat/completions endpoint. Covers OpenRouter, LM Studio, llama.cpp server, vLLM, Groq, Together, Fireworks, and most self-hosted frameworks. API key falls back to $OPENAI_API_KEY. Endpoint normalization is forgiving about trailing `/v1`. - `anthropic`: Messages API v2023-06-01. API key falls back to $ANTHROPIC_API_KEY. Concatenates multi-block text responses. JSON mode is normalized across providers — Ollama uses `format: "json"`, OpenAI-compat uses `response_format`, Anthropic uses prompt-level instruction. Callers request JSON once; this module handles the provider-specific plumbing. No external SDK dependency; stdlib `urllib` throughout. HTTP errors are wrapped into a single `LLMError` class so callers don't need to distinguish transport, auth, and parse failures at the call site. 26 tests, all with mocked HTTP — suite runs offline with no real provider required.	2026-04-24 00:46:43 -03:00
Igor Lins e Silva	c7bd2cd8e4	feat(convo): parse Claude Code conversation dirs into project entities Claude Code stores sessions under `~/.claude/projects/<slug>/<id>.jsonl` where `<slug>` is the original CWD with `/` replaced by `-`. That encoding is lossy — can't distinguish `foo-bar` (one segment) from `foo/bar` (two) — so slug-decoding alone produces wrong names for any hyphenated project. Fortunately, every message record carries a `cwd` field with the true path. This scanner reads one record per session to recover the accurate project name deterministically, falling back to slug-decoding only if the JSONL is malformed or empty. Output shape matches project_scanner.ProjectInfo so the discover orchestrator can union results across sources. Session count doubles as a density signal for ranking. 22 unit tests cover: root detection, cwd extraction with malformed input tolerance, fallback slug decoding, name resolution using the newest session (so renames win), and dedup when two encoded dirs resolve to the same project.	2026-04-24 00:46:31 -03:00
Igor Lins e Silva	14d7444abe	fix(deps): add tomli fallback for Python <3.11 `tomllib` is stdlib only in Python 3.11+. On Python 3.9/3.10 (and the macOS runner) the scanner's toml parsing returned empty, so manifest lookups for `pyproject.toml` / `Cargo.toml` produced no name. CI surfaced this via 4 test_project_scanner.py failures on the 3.9 matrix. Add `tomli>=2.0.0` as a conditional dependency for `python_version < '3.11'` and fall back to it in `project_scanner.py`. The project still declares `requires-python = ">=3.9"` so the fallback is the correct shape.	2026-04-24 00:27:09 -03:00
Igor Lins e Silva	9e7fa1ceb5	feat(init): scan manifests and git authors for real entity signal `mempalace init` previously leaned entirely on regex-based entity extraction from prose. That path works for text-only folders but wastes signal in any codebase: the project's own name is already in `package.json` / `pyproject.toml` / `Cargo.toml` / `go.mod`, and the people who worked on it are in `git log`. This adds `project_scanner.py`, which becomes the primary signal source when real signal is available, with the regex detector preserved as the fallback for prose-only folders (diaries, research notes, writing). What it does: - Walks the target directory, parses manifests for canonical project names, and detects git repos by the presence of a `.git` directory. - For each repo, reads `git log` for authors and filters obvious bots (`[bot]`, `dependabot`, `renovate`, `github-actions`, names ending in `bot`, `-autoroll`). Importantly does NOT filter `@users.noreply.github.com` - that's GitHub's privacy-protected human email, used by real contributors. - Resolves author aliases with a union-find: commits that share a name OR an email collapse into one person. Picks the most-frequent real-name variant as display, ignoring handles and single-token usernames. - Flags "mine" projects: user is top-5 committer OR has >=10% of commits OR >=20 commits. Ordered by user_commits in the UX. - `discover_entities()` merges scanner results with the regex detector case-insensitively (so `mempalace` from pyproject absorbs `MemPalace` from docs), and suppresses the regex `uncertain` bucket when real signal is already found - the user doesn't need to adjudicate prose noise when the answer is already in git. Integration: `cmd_init` now calls `discover_entities` instead of running the regex detector directly. Same output shape, so `confirm_entities` works unchanged. Ships with 39 new tests covering manifest parsing, bot filtering, union-find dedup, git repo discovery, scan integration, and merge/fallback behavior. Existing 56 regex-detector tests all pass.	2026-04-24 00:20:53 -03:00
Igor Lins e Silva	6aebf458ff	fix(entity): reduce noise in regex-based detection The pattern-matching detector had several systematic false positives that crowded the init review with nonsense. Concrete fixes: - CamelCase extraction: add `[A-Z][a-z]+(?:[A-Z][a-z]+\|[A-Z]{2,})+` to candidate patterns so `MemPalace`, `ChromaDB`, `OpenAI`, `ChatGPT` are visible. Previously `MemPalace` fragmented into `Mem` + `Palace`. - Dialogue `^NAME:\s` requires >=2 matches to count. A single metadata line like `Created: 2026-04-21` was scoring as dialogue and classifying `Created` as a person. - Versioned/hyphenated pattern tightened to `\b{name}[-_]v?\d+(?:\.\d+)*\b` (version-only). The previous `\b{name}[-v]\w+` matched `context-manager`, `multi-word`, etc. - every hyphenated compound. - Skip LICENSE/COPYING/NOTICE/AUTHORS/PATENTS files during scan. They produce pure-English-prose noise (`Contributor`, `Software`, `Covered`, `Before`). - Extra SKIP_DIRS: `.terraform`, `vendor`, `target`. - Expand stopword list with capitalized participles/descriptors that commonly appear at sentence start: `created`, `updated`, `extracted`, `processed`, `total`, `summary`, `auto`, `multi`, `hybrid`, `context`, `bridge`, `batch`, `local`, `native`, `never`, `before`, `after`, etc. - classify_entity: high-pronoun single-category signal now classifies as person. A diary's main character gets referenced with pronouns, not dialogue markers - requiring two signal categories demoted `Lu` (16 pronoun hits across 30 mentions) to uncertain. Gate on `pronoun_hits >= 5 AND pronoun_hits / frequency >= 0.2` so common sentence-start words (`Never`, `Before`) with incidental proximity stay uncertain.	2026-04-24 00:20:32 -03:00
Igor Lins e Silva	6fcfd34aa4	docs(changelog): log #1145 fixes in 3.3.3 section Two follow-up fixes from the v3.3.3 smoke test get folded into 3.3.3 before the tag is cut. Also syncs uv.lock with the 3.3.3 version bump merged via #1144.	2026-04-23 23:39:41 -03:00
Igor Lins e Silva	1fd16daac2	fix(mcp): diary_read(wing='') spans all wings for agent (#1145 ) #1097 fixed mempalace_search to treat empty-string wing/room as no filter, matching how LLM agents default to filling every optional parameter with ''. The same pattern wasn't applied to diary_read: passing wing='' defaulted to wing_<agent_name>, siloing away entries that hooks had written to project-derived wings per #659. When wing is empty/omitted, filter only on agent + room=diary so callers get a unified view of the agent's journal across every wing it has written to. Explicit wing=<name> continues to scope reads to that wing only. Adds test covering empty-wing read after writing to both the default and a non-default wing.	2026-04-23 23:39:34 -03:00
Igor Lins e Silva	d1583750e8	fix(hooks): derive project wing from non-macOS transcript paths (#1145 ) _wing_from_transcript_path only matched '-Projects-<name>' segments, so Linux users with code under ~/dev/, ~/code/, or ~/src/ fell through to the wing_sessions fallback and lost the per-project diary scoping introduced in #659. Broaden the heuristic to derive the project from the final dash-separated token of the encoded project-folder name under .claude/projects/. Keeps the legacy -Projects- regex as a secondary match for transcripts living outside the standard Claude Code path. Covers macOS Users layout, Linux dev/code layouts, and deeper nested source paths while preserving existing Projects/ behavior.	2026-04-23 23:39:23 -03:00
Igor Lins e Silva	6d252a0de4	Merge pull request #1144 from MemPalace/chore/release-3.3.3-prep release: v3.3.3 — restore install integrity	2026-04-23 21:56:02 -03:00
bensig	4f799afd76	release(3.3.3): bump README version badge test_readme_badge_matches_version_py compares the version in the README badge URL against version.py. Missed it in the initial bump.	2026-04-23 16:51:41 -07:00
bensig	102372b179	release: v3.3.3 Restore-integrity release. Unbreaks fresh `pip install mempalace` from v3.3.2 by re-tagging current develop, which carries both the plugin.json consumer (shipped in 3.3.2) and the matching mempalace-mcp entry point in pyproject.toml (added on develop ~10h after the 3.3.2 tag via #340 by @messelink). #1093 diagnosed by @jphein. Bumps (all 5 sources agree per Version Guard / CLAUDE.md): - mempalace/version.py 3.3.2 → 3.3.3 - pyproject.toml 3.3.2 → 3.3.3 - .claude-plugin/plugin.json 3.3.2 → 3.3.3 - .claude-plugin/marketplace.json 3.3.2 → 3.3.3 - .codex-plugin/plugin.json 3.3.2 → 3.3.3 - CHANGELOG.md new [3.3.3] entry No code changes. The fix for #1093 is already on develop via merged PRs #340, #1021, #851, #942, #833, #673, #661, #659, #1097, #1051, #1001, #945. Branch name intentionally outside the `release/*` ruleset so follow-up CI-fix commits aren't gated behind a nested PR. (Supersedes #1143 — closed for exactly that reason after it missed 3 of 5 version files.) Smoke-tested locally from a fresh develop clone: grep mempalace-mcp pyproject.toml .claude-plugin/plugin.json # both ✓ python -m build --wheel # ✓ pip install …-py3-none-any.whl # ✓ which mempalace-mcp # ✓ mempalace-mcp --help # ✓	2026-04-23 16:44:22 -07:00
Kunal Garhewal	9947ad06df	fix: treat empty string as no filter in mempalace_search wing/room (#1097 ) * fix: treat empty string as no filter in mempalace_search wing/room * fix: also treat whitespace-only strings as no filter	2026-04-23 15:19:18 -07:00
Jeffrey Hein	df3ee289fc	fix: add wing param to diary_write/diary_read, derive from transcript path (#659 ) * fix: add wing param to diary_write/diary_read, derive from transcript path Without a wing override, all diary entries from the stop hook land in wing_session-hook regardless of which project the session is in, making per-project diary search impossible. - tool_diary_write(): add optional `wing` param; sanitize and use it when provided, fall back to wing_{agent_name} when omitted - tool_diary_read(): add optional `wing` param for filtering by target wing - TOOLS dict: expose `wing` in input_schema for both diary tools - hooks_cli: add _wing_from_transcript_path() helper that extracts the project name from Claude Code paths like ~/.claude/projects/-home-jp-Projects-kiyo-xhci-fix/... → kiyo-xhci-fix - hook_stop: derive project wing and append wing= hint to block reason so Claude writes diary entries to the correct per-project wing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: sanitize wing param, cross-platform paths, tighten test assertions Addresses Copilot review feedback on #659. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wing_ prefix + agent filter on diary_read Addresses bensig's 2-issue review on this PR. 1. _wing_from_transcript_path() was returning bare project names (e.g. "myproject") while all existing wings follow the wing_* convention from AAAK_SPEC. Entries landed in wing="myproject" while diary_read defaulted to wing="wing_<agent_name>" — orphaning every diary entry written by the stop hook. Now returns "wing_<project>" and falls back to "wing_sessions". 2. tool_diary_read() did not include agent_name in the ChromaDB where filter when a custom wing was provided — any caller with a shared wing could read entries written by other agents. Add {"agent": agent_name} to the $and clause. Also flagged by Qudo and left unresolved until now. Tests updated to expect the wing_ prefix (6 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 15:07:25 -07:00

1 2 3 4 5 ...

584 Commits