mempalace

jason/mempalace

Fork 0

Commit Graph

Author	SHA1	Message	Date
copilot-swe-agent[bot]	851ebebc29	test(project-scanner): tighten git helper env handling Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3c277c46-20b3-4a43-8eb7-8ee2eb3cb55a Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 03:50:13 +00:00
copilot-swe-agent[bot]	70d4c5471e	fix(project-scanner): address review feedback Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3c277c46-20b3-4a43-8eb7-8ee2eb3cb55a Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-24 03:48:47 +00:00
Igor Lins e Silva	9e7fa1ceb5	feat(init): scan manifests and git authors for real entity signal `mempalace init` previously leaned entirely on regex-based entity extraction from prose. That path works for text-only folders but wastes signal in any codebase: the project's own name is already in `package.json` / `pyproject.toml` / `Cargo.toml` / `go.mod`, and the people who worked on it are in `git log`. This adds `project_scanner.py`, which becomes the primary signal source when real signal is available, with the regex detector preserved as the fallback for prose-only folders (diaries, research notes, writing). What it does: - Walks the target directory, parses manifests for canonical project names, and detects git repos by the presence of a `.git` directory. - For each repo, reads `git log` for authors and filters obvious bots (`[bot]`, `dependabot`, `renovate`, `github-actions`, names ending in `bot`, `-autoroll`). Importantly does NOT filter `@users.noreply.github.com` - that's GitHub's privacy-protected human email, used by real contributors. - Resolves author aliases with a union-find: commits that share a name OR an email collapse into one person. Picks the most-frequent real-name variant as display, ignoring handles and single-token usernames. - Flags "mine" projects: user is top-5 committer OR has >=10% of commits OR >=20 commits. Ordered by user_commits in the UX. - `discover_entities()` merges scanner results with the regex detector case-insensitively (so `mempalace` from pyproject absorbs `MemPalace` from docs), and suppresses the regex `uncertain` bucket when real signal is already found - the user doesn't need to adjudicate prose noise when the answer is already in git. Integration: `cmd_init` now calls `discover_entities` instead of running the regex detector directly. Same output shape, so `confirm_entities` works unchanged. Ships with 39 new tests covering manifest parsing, bot filtering, union-find dedup, git repo discovery, scan integration, and merge/fallback behavior. Existing 56 regex-detector tests all pass.	2026-04-24 00:20:53 -03:00

Author

SHA1

Message

Date

copilot-swe-agent[bot]

851ebebc29

test(project-scanner): tighten git helper env handling

Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3c277c46-20b3-4a43-8eb7-8ee2eb3cb55a

Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>

2026-04-24 03:50:13 +00:00

copilot-swe-agent[bot]

70d4c5471e

fix(project-scanner): address review feedback

Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3c277c46-20b3-4a43-8eb7-8ee2eb3cb55a

Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>

2026-04-24 03:48:47 +00:00

Igor Lins e Silva

9e7fa1ceb5

feat(init): scan manifests and git authors for real entity signal

`mempalace init` previously leaned entirely on regex-based entity
extraction from prose. That path works for text-only folders but wastes
signal in any codebase: the project's own name is already in
`package.json` / `pyproject.toml` / `Cargo.toml` / `go.mod`, and the
people who worked on it are in `git log`.

This adds `project_scanner.py`, which becomes the primary signal source
when real signal is available, with the regex detector preserved as the
fallback for prose-only folders (diaries, research notes, writing).

What it does:
- Walks the target directory, parses manifests for canonical project
  names, and detects git repos by the presence of a `.git` directory.
- For each repo, reads `git log` for authors and filters obvious bots
  (`[bot]`, `dependabot`, `renovate`, `github-actions`, names ending in
  `bot`, `-autoroll`). Importantly does NOT filter
  `@users.noreply.github.com` - that's GitHub's privacy-protected human
  email, used by real contributors.
- Resolves author aliases with a union-find: commits that share a name
  OR an email collapse into one person. Picks the most-frequent
  real-name variant as display, ignoring handles and single-token
  usernames.
- Flags "mine" projects: user is top-5 committer OR has >=10% of
  commits OR >=20 commits. Ordered by user_commits in the UX.
- `discover_entities()` merges scanner results with the regex detector
  case-insensitively (so `mempalace` from pyproject absorbs `MemPalace`
  from docs), and suppresses the regex `uncertain` bucket when real
  signal is already found - the user doesn't need to adjudicate prose
  noise when the answer is already in git.

Integration: `cmd_init` now calls `discover_entities` instead of
running the regex detector directly. Same output shape, so
`confirm_entities` works unchanged.

Ships with 39 new tests covering manifest parsing, bot filtering,
union-find dedup, git repo discovery, scan integration, and
merge/fallback behavior. Existing 56 regex-detector tests all pass.

2026-04-24 00:20:53 -03:00

3 Commits