Merged both the PR's benchmark suite additions (psutil dep, pytest
markers, --ignore=tests/benchmarks) and upstream's coverage changes
(pytest-cov, --cov-fail-under=30, coverage config) so both coexist.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
CI was installing latest ruff (0.15.x) which has different formatting
rules than our local 0.4.x. Pin to ruff>=0.4.0,<0.5 for consistency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add tests for config, convo_miner, spellcheck, knowledge_graph
- Fix Windows PermissionError in test cleanup (chromadb file locks)
- Add UTF-8 encoding to split_mega_files, entity_registry, hooks_cli
- Fix mcp_server parse_known_args logging for unknown args
- Set coverage threshold to 85 in pyproject.toml and CI
- Reset all version files to 3.0.11
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add/expand tests for normalize (39%→97%), searcher (39%→100%),
layers (28%→97%), split_mega_files (34%→72%).
Fix mcp_server.py parse_args→parse_known_args to prevent SystemExit
when imported during pytest (CI was crashing on all test jobs).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Run ruff format on all benchmark files (fixes CI lint job)
- Fix check_regression() substring ambiguity: ordered keyword matching
so "latency_improvement_pct" is correctly classified as higher-is-better
- Update stale comments in conftest.py referencing wrong fixture
- Add pytest addopts to skip benchmark/slow/stress markers by default
Benchmark mempalace at configurable scale (1K–100K drawers) to find
real-world performance limits. Tests cover MCP tool OOM thresholds,
ChromaDB query degradation, search recall@k, mining throughput,
knowledge graph concurrency, memory leak detection, palace boost
quantification, and Layer1 unbounded fetch behavior.
- tests/benchmarks/ with 8 test modules + data generator + report system
- Deterministic data factory with planted needles for recall measurement
- JSON report output with regression detection (--bench-report flag)
- CI benchmark job on PRs at small scale
- psutil added as dev dependency for RSS tracking
- Tighten chromadb dependency from >=0.4.0,<1 to >=0.5.0,<0.7
(the collection API changed significantly across majors; this
pins to the tested range)
- Add optional 'spellcheck' extras for the undeclared autocorrect
dependency used in spellcheck.py
- Add PEP 561 py.typed marker for type checker support
Findings: #10 (HIGH — chromadb range too wide), #30 (LOW — undeclared
autocorrect), #32 (LOW — missing py.typed)
Includes test infrastructure from PR #131.
92 tests pass.
- hooks/mempal_save_hook.sh: pass $TRANSCRIPT_PATH as sys.argv
instead of interpolating into python -c string (fixes#110)
- normalize.py: accept type "user" in addition to "human" for
Claude Code JSONL sessions (fixes#111)
- convo_miner.py: skip tool-results/, memory/ dirs and .meta.json
files when scanning for conversations (fixes#111)
- pyproject.toml: pin chromadb>=0.4.0,<1 to avoid crashing 1.x
builds on macOS ARM64 (fixes#100)