Addresses Issue #333: AI agents prepending system prompts to search queries
causes embedding retrieval to collapse (89.8% → 1.0% R@10).
Mitigation approach (減災):
- New query_sanitizer.py with 4-stage pipeline:
Step 1: passthrough for short queries (≤200 chars)
Step 2: question extraction (finds ? sentences) → ~85-89% recovery
Step 3: tail sentence extraction → ~80-89% recovery
Step 4: tail truncation fallback → ~70-80% recovery
Worst case without sanitizer: 1.0% (catastrophic)
Worst case with sanitizer: ~70-80% (survivable)
- mcp_server.py: tool_search applies sanitizer before ChromaDB query
- MCP schema: query description warns agents not to include prompts
- New 'context' parameter separates background info from search intent
- Sanitizer metadata included in response when triggered
22 new tests covering all pipeline stages and real-world scenarios.
Made-with: Cursor
Merged both the PR's benchmark suite additions (psutil dep, pytest
markers, --ignore=tests/benchmarks) and upstream's coverage changes
(pytest-cov, --cov-fail-under=30, coverage config) so both coexist.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
_patch_mcp_server had palace_path removed from its signature but the
assertion body still referenced it, causing NameError at runtime and
F821 from ruff.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add tests for config, convo_miner, spellcheck, knowledge_graph
- Fix Windows PermissionError in test cleanup (chromadb file locks)
- Add UTF-8 encoding to split_mega_files, entity_registry, hooks_cli
- Fix mcp_server parse_known_args logging for unknown args
- Set coverage threshold to 85 in pyproject.toml and CI
- Reset all version files to 3.0.11
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MCP tool_add_drawer:
- Make drawer_id content-based: hash full content instead of
content[:100] + timestamp. Same content → same ID, eliminating
TOCTOU race conditions
- Switch from col.add() to col.upsert() so re-filing with updated
content updates the existing drawer
miner.add_drawer:
- Switch from collection.add() to collection.upsert() so re-mining
a modified file updates instead of silently failing
- Remove the try/except catching 'already exists' — upsert handles
this naturally
Findings: #11 (HIGH — add ignores updates), #6 (MEDIUM — TOCTOU),
#13 (MEDIUM — non-deterministic IDs)
Includes test infrastructure from PR #131.
92 tests pass.
Add/expand tests for normalize (39%→97%), searcher (39%→100%),
layers (28%→97%), split_mega_files (34%→72%).
Fix mcp_server.py parse_args→parse_known_args to prevent SystemExit
when imported during pytest (CI was crashing on all test jobs).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Run ruff format on all benchmark files (fixes CI lint job)
- Fix check_regression() substring ambiguity: ordered keyword matching
so "latency_improvement_pct" is correctly classified as higher-is-better
- Update stale comments in conftest.py referencing wrong fixture
- Add pytest addopts to skip benchmark/slow/stress markers by default
Concentrates all drawers into a single wing+room to isolate the
embedding model's retrieval limit independent of palace filtering.
Confirms recall degrades to ~0.4-0.5 at 5K drawers per room even
with wing+room filters applied — the spatial structure helps by
keeping buckets small, but can't fix the underlying embedding ceiling.
Benchmark mempalace at configurable scale (1K–100K drawers) to find
real-world performance limits. Tests cover MCP tool OOM thresholds,
ChromaDB query degradation, search recall@k, mining throughput,
knowledge graph concurrency, memory leak detection, palace boost
quantification, and Layer1 unbounded fetch behavior.
- tests/benchmarks/ with 8 test modules + data generator + report system
- Deterministic data factory with planted needles for recall measurement
- JSON report output with regression detection (--bench-report flag)
- CI benchmark job on PRs at small scale
- psutil added as dev dependency for RSS tracking
The MCP server previously created a new PersistentClient on every tool
call via _get_collection(). This incurs HNSW index loading overhead
on each request.
Cache the client and collection at module level. The cache resets
naturally on process restart (MCP runs as a subprocess).
Also adds a _reset_mcp_cache fixture to conftest.py for test isolation.
Includes test infrastructure from PR #131.
92 tests pass.
- Switch CI install step from `pip install -r requirements.txt` to
`pip install -e ".[dev]"` since requirements.txt was removed
- Add noqa: E402 to intentionally-late imports in conftest.py
(HOME must be isolated before mempalace imports)
- Remove unused KnowledgeGraph import in test_knowledge_graph.py
- Apply ruff formatting to test files