`mempalace init` previously leaned entirely on regex-based entity
extraction from prose. That path works for text-only folders but wastes
signal in any codebase: the project's own name is already in
`package.json` / `pyproject.toml` / `Cargo.toml` / `go.mod`, and the
people who worked on it are in `git log`.
This adds `project_scanner.py`, which becomes the primary signal source
when real signal is available, with the regex detector preserved as the
fallback for prose-only folders (diaries, research notes, writing).
What it does:
- Walks the target directory, parses manifests for canonical project
names, and detects git repos by the presence of a `.git` directory.
- For each repo, reads `git log` for authors and filters obvious bots
(`[bot]`, `dependabot`, `renovate`, `github-actions`, names ending in
`bot`, `-autoroll`). Importantly does NOT filter
`@users.noreply.github.com` - that's GitHub's privacy-protected human
email, used by real contributors.
- Resolves author aliases with a union-find: commits that share a name
OR an email collapse into one person. Picks the most-frequent
real-name variant as display, ignoring handles and single-token
usernames.
- Flags "mine" projects: user is top-5 committer OR has >=10% of
commits OR >=20 commits. Ordered by user_commits in the UX.
- `discover_entities()` merges scanner results with the regex detector
case-insensitively (so `mempalace` from pyproject absorbs `MemPalace`
from docs), and suppresses the regex `uncertain` bucket when real
signal is already found - the user doesn't need to adjudicate prose
noise when the answer is already in git.
Integration: `cmd_init` now calls `discover_entities` instead of
running the regex detector directly. Same output shape, so
`confirm_entities` works unchanged.
Ships with 39 new tests covering manifest parsing, bot filtering,
union-find dedup, git repo discovery, scan integration, and
merge/fallback behavior. Existing 56 regex-detector tests all pass.
Version bumps across pyproject.toml, mempalace/version.py, README badge,
uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*).
CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added
covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop
fix + tandem sweeper (#998), None-metadata guards (#999, #1013),
chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681),
HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path
cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).
Prerequisite for RFC 001 (plugin spec, #743). Removes every direct
`import chromadb` outside the ChromaDB backend itself so the core
modules depend only on the backend abstraction layer.
Extends ChromaBackend with make_client, get_or_create_collection,
delete_collection, create_collection, and backend_version. Adds
update() to the BaseCollection contract. Non-backend callers
(mcp_server, dedup, repair, migrate, cli) now go through the
abstraction; tests patch ChromaBackend instead of chromadb.
With this landed, the RFC 001 spec can be enforced and PalaceStore
(#643) can ship as a plugin without touching core modules.
Prepare develop for the 3.3.0 release cycle.
Version bumps:
- mempalace/version.py: 3.2.0 -> 3.3.0
- pyproject.toml: 3.2.0 -> 3.3.0
- README.md: pyproject.toml label and shields.io badge
- uv.lock: mempalace 3.0.0 -> 3.3.0 (also fills in resolved dev/extras)
CHANGELOG.md:
- Close out the stale [Unreleased] section as [3.2.0] - 2026-04-12
(v3.2.0 was tagged on that date but the release flip was never made)
- Add a fresh [Unreleased] - v3.3.0 section covering the 49 commits
since v3.2.0: closet layer, BM25 hybrid search, entity metadata,
diary ingest, cross-wing tunnels, drawer-grep, offline fact checker,
LLM-based closet regen, hall detection, cosine-distance fix,
multi-agent locking, README audit, etc.
- Adopt Keep a Changelog + SemVer framing
- Add version compare reference links at the bottom
- Fix stale milla-jovovich/mempalace preamble URL to MemPalace/mempalace