col.get() without limit generates SELECT ... WHERE id IN (...) with all
document IDs, which exceeds SQLite's ~999 variable limit when a palace
has more than ~1000 drawers.
This breaks both `mempalace compress` and `mempalace wake-up` on large
palaces. Reproduced on a 13880-file codebase (242K+ drawers).
Fix: paginate reads in batches of 500 using ChromaDB's offset/limit
parameters in both Layer1.generate() and cmd_compress().
Add _try_codex_jsonl parser for Codex CLI session files stored at
~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl.
Uses only event_msg entries (user_message / agent_message) which
represent the canonical conversation turns. response_item entries
are intentionally skipped — they include synthetic context injections
(environment_context) and can duplicate real messages when both
representations are present in the same rollout.
Format based on Codex source tests (codex-rs/rollout/src/recorder_tests.rs).
Requires session_meta header to reduce false positives on other JSONL.
Refs: #59
Replace broad except Exception with specific exception types in 6
sites where the expected failure mode is well-defined:
- normalize.py: OSError for file read, ImportError for optional import
- miner.py: OSError for file read_text
- entity_detector.py: OSError for file read in scan loop
- convo_miner.py: (OSError, ValueError) for normalize which reads
and parses files
- entity_registry.py: (URLError, OSError, JSONDecodeError, KeyError)
for Wikipedia lookup fallback
ChromaDB except Exception sites (~30) are left broad for now.
chromadb.errors defines NotFoundError, DuplicateIDError,
InvalidDimensionException etc., but narrowing those sites requires
importing from chromadb.errors and validating across supported
versions (>=0.4.0). MCP server handlers also left broad for
resilience.
Add usedforsecurity=False to hashlib.md5() calls in miner.py and
convo_miner.py to document that MD5 is used for deterministic ID
generation, not cryptographic security. Preserves stable drawer IDs
for backward compatibility with existing palaces.
Swapping to SHA-256 would change the ID formula and make existing
drawers unreachable on re-ingestion. PR #34 covers the MD5 sites
in knowledge_graph.py and mcp_server.py.
Verified: usedforsecurity kwarg is supported since Python 3.9
(project target per pyproject.toml line 10), confirmed via Context7
CPython docs.
Remove discarded `query.lower()` call in `extract_people_from_query` —
strings are immutable so the result was always thrown away. The existing
`re.IGNORECASE` flag already handles case-insensitive matching.
Remove duplicate literals in COMMON_ENGLISH_WORDS set: "hunter" (consecutive
duplicate), "april" and "june" (appeared in both names and months sections).
Add Milla's conversational explanation of the palace architecture
(wings → rooms → closets → drawers → halls → tunnels) as an
introductory section before the technical diagram.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>