Windows runs treated `/tmp/elsewhere/x.md` as relative because Windows
absolute paths require a drive letter, so `_classify_drawer` routed
`drawer_out_of_scope` to `no_source` instead of `out_of_scope` and
`test_dry_run_classifies_correctly` failed on test-windows.
`Path(tmp_dir) / "elsewhere" / "x.md"` is absolute on every platform
and still lives outside the project root that the synced_world fixture
exposes via `repo_path`, so the bucket assertions hold cross-platform.
CI fix: `_classify_drawer` now resolves `source_file` symmetric to
`project_roots` (which `_normalize_project_dirs` and
`_auto_detect_project_roots` already `.resolve()`). Without this, on
platforms where the temp directory is a symlink (macOS `/var/folders` ->
`/private/var/folders`, Windows 8.3 short-name normalization), every
drawer mis-bucketed as `out_of_scope` and survived prune.
Perf:
- `_resolve_project_root`: early-return on first match (sorted-desc
precondition).
- `_normalize_project_dirs`: sort `(-len(str(p)), str(p))` desc for
early-return + deterministic tie-break on equal-length paths.
- `_auto_detect_project_roots`: `seen_sources` dedupe so a 200-chunk
file costs one disk walk, not 200.
- `sync_palace` main loop: per-file classification cache; registry
sentinels (`_reg_*`, `room=_registry`, `ingest_mode=registry`) routed
to "kept" before cache lookup so a sentinel sharing a `source_file`
with a pruned drawer cannot inherit a stale "gitignored" verdict.
- Closet purge: collapse O(N) per-file purge into one
`where={"source_file": {"$in": [...]}}` get + one bulk delete.
Tests (5 new in `TestSyncPalace`, 38 total):
- `test_symlinked_project_root_resolves`: pins symmetric resolve via
real `os.symlink` (skipped on Windows).
- `test_classification_cache_avoids_redundant_disk_hits`: monkeypatch
counter on `_classify_drawer` asserts `call_count == 1` for 5 chunks
sharing one source_file.
- `test_closet_batch_purge_single_call`: wraps closets collection with
`CallCountingCol` (forwards `.get`/`.delete`); asserts
`delete_calls == 1` and `get_calls == 1`; expected `removed_closets`
derived from `report["by_source"]` to stay robust to fixture changes.
- `test_registry_check_runs_before_cache_lookup`: a regular drawer
caches "gitignored" first; a sentinel with the same source_file must
still be kept.
- `test_normalize_project_dirs_sort_stable_on_equal_length`: pins the
alphabetical secondary key when paths share length.
Add `mempalace sync` CLI command and `mempalace_sync` MCP tool that
prune drawers whose source files are gitignored, deleted, or moved
out of the project. Reuses the existing GitignoreMatcher
infrastructure in mempalace/miner.py so the same gitignore rules
that block ingest also drive the corresponding cleanup.
Closes#1252.
PEP 604 union syntax (Path | None) requires Python 3.10+. The project
still supports 3.9 (per pyproject target-version and CI matrix), and
this annotation lives in a function signature so it is evaluated at
module load time — failing with "unsupported operand type(s) for |"
on test-linux 3.9.
The other ``int | None`` annotation in this file is inside a function
body, where Python skips runtime evaluation of local annotations, so
it does not trip 3.9.
The hook PID guard used a single global ``~/.mempalace/hook_state/mine.pid``
file, which failed two ways:
1. ``_mine_already_running`` read-then-spawn was a TOCTOU race. Two
near-simultaneous Stop hook fires both passed the existence/liveness
check before either wrote — so both ended up calling
``_spawn_mine``.
2. ``_spawn_mine`` unconditionally overwrote the global PID file with
the new child's PID. The first PID was lost, orphaning the first
child. The user-visible result in #1212 was two concurrent
``mempalace mine`` processes running against the same source, both
driving HNSW inserts in parallel — exactly the corruption pattern
the guard was meant to prevent. #1206 reported the same shape from
the perspective of the user (two mines hung on a 350MB folder).
Replace the global file with per-target slots under
``~/.mempalace/hook_state/mine_pids/``, keyed by sha256 of the mine
sub-arguments (everything after ``mine``). The slot is claimed via
``O_CREAT | O_EXCL`` so the claim is atomic — two simultaneous fires
can never both pass. Stale slots (PID exists but is dead) are
reclaimed transparently. Different targets (e.g. project mine vs
transcript ingest, or two different MEMPAL_DIRs) get independent
slots and run in parallel.
The mine subprocess receives its slot path via
``MEMPALACE_MINE_PID_FILE`` env var; ``miner._cleanup_mine_pid_file``
reads that var on exit and removes the slot if it points at our PID,
so orphaned slots from crashed mines don't accumulate.
Also routes ``_ingest_transcript`` through ``_spawn_mine`` so the
transcript ingest path now participates in the same dedup — repeated
Stop fires for the same transcript no longer stack parallel mines.
Closes#1212Closes#1206
- mempalace/instructions/init.md: only skip Step 4 when `mempalace
--version` succeeds. `pip show` / `uv tool list` reporting an install
is not enough -- if the package lives in an unactivated venv, Step 5
(`mempalace init ...`) fails with command-not-found. Treat that case
as not-installed and re-install via Step 3 into a PATH-visible
location.
- .codex-plugin/README.md: switch the git-install recipe from `uv sync`
to `uv tool install --editable .` so the bundled `plugin.json`
(which invokes `mempalace-mcp` by bare name) can launch the MCP
server. Plain `uv sync` only puts the script in `.venv/bin/`, which
Codex won't find unless the venv is activated first.
End-user installs now lead with `uv tool install mempalace`, with
`pip install mempalace` kept as a fallback. Dev/contributor docs lead
with `uv sync --extra dev` and `uv run` for tests/benchmarks/lint, with
the equivalent pip recipe kept inline. The shipped `/mempalace:init`
skill instructions (mempalace/instructions/init.md) try `uv tool install`
first when uv is on PATH, then fall back through the pip variants.
Adds a .python-version pin at 3.12 because the lockfile's
onnxruntime==1.24.3 only ships wheels for Python >=3.11; without the
pin, `uv sync` on a host where uv prefers 3.10 fails with no source
distribution available, which would make the documented command a
footgun. pyproject's `requires-python = ">=3.9"` is unchanged — pip
users on 3.9/3.10 are unaffected.
Files updated: README.md, CONTRIBUTING.md, CLAUDE.md, the gemini-cli
guide and example, the .claude-plugin / .codex-plugin READMEs, the
mempalace SKILL, the openclaw SKILL, tools/save.md, the three
benchmarks docs, and the corresponding website mirrors.
The v3.3.4 release prep landed on main but was never merged back into
develop, leaving every version-bearing file one release behind. Bumps
pyproject.toml, mempalace/version.py, both plugin manifests, the
marketplace entry, the README badge, and the lockfile to 3.3.4 to match
the tagged release.
os.path.expanduser("~") reads HOME on POSIX but USERPROFILE on Windows;
the lock-body bound test was monkeypatching HOME only, so on
test-windows the lock file landed in the runner's real ~/.mempalace
and the tmp_path glob found nothing.
Patch USERPROFILE in addition to HOME, and read the body as bytes so
the byte-0 sentinel doesn't trip a UTF-8 decode warning. Assertion
shifts from line-count to size-bound (still detects unbounded growth
across re-acquires).
Windows CI surfaced two bugs introduced by the holder-identity write:
1. msvcrt.locking(LK_NBLCK, 1) locks 1 byte at the *current* file
position. Switching to "a+" mode put the position at end-of-file,
so two contenders locked different bytes and silently both
acquired (the test asserts saw [(ok, 1), (ok, 2)] instead of
ok+busy).
2. With the byte-range lock active on Windows, the locked byte is
read-blocked for other processes. A contender trying to read the
holder identity from byte 0 would hit PermissionError.
Switch to "r+" mode (after touch-create) and explicitly seek(0) before
both lock and unlock. Then reserve byte 0 as a pure lock sentinel and
write the holder identity from byte 1 onward. _read_lock_holder reads
from byte 1+, so it never touches the locked byte.
Also bound file growth across re-acquires: truncate to
sentinel + len(ident) before writing so the file body stays the size
of the current holder, never accumulating across runs.
Linux fcntl.flock locks the whole file independent of byte position,
so the seek(0) is harmless on POSIX. The shape works on both.
When a `mempalace mine` collided with another writer (live mcp_server,
another mine, anything taking mine_palace_lock), the operator saw a
generic "another `mempalace mine` is already running" message and the
CLI exited 0 — making the contention invisible to nohup or scripts
checking $?. The reporter ran a `nohup mempalace mine ... & disown`
and got a 200-byte log with only the auto-defaults warning, no clue
that an MCP server was holding the store.
palace.py: the lock file now records the holder's PID + first three
argv tokens on acquire. A failed acquire reads the file and surfaces
"palace <path> is held by PID N (mempalace mcp_server); wait for it
to finish or stop the holder before retrying" in the
MineAlreadyRunning message. Open mode changes from "w" to "a+" so the
prior holder's identity survives long enough to be read.
miner.mine() now lets MineAlreadyRunning propagate. cmd_mine catches
it, prints the holder-aware message to stderr, and exits non-zero so
shell wrappers detect the contention.
Note: this is a behavior change for in-process callers that depended
on miner.mine() silently swallowing MineAlreadyRunning. The silent
swallow was the bug.
Closes#1264
The Stop hook spawns mining subprocesses via subprocess.Popen and then
returns. On Windows the parent stays blocked at session end because the
child inherits stdout/stderr handles and the OS waits for them to
release before the parent can exit — the user-visible symptom is the
"running stop hooks... 3/3" spinner hanging for minutes (#1268).
Add _detached_popen_kwargs() helper that returns the right detach knobs
per platform:
- POSIX: start_new_session=True, stdin=DEVNULL, close_fds=True
- Windows: creationflags=DETACHED_PROCESS|CREATE_NEW_PROCESS_GROUP|
CREATE_BREAKAWAY_FROM_JOB, stdin=DEVNULL, close_fds=True
Apply to all three fire-and-forget Popen sites in hooks_cli:
_spawn_mine, _ingest_transcript, _desktop_toast. Leave _mine_sync's
subprocess.run alone — that path is intentionally synchronous (the
precompact hook must wait for the mine to finish).
Note: the issue body references mempalace-stop.js, which does not exist
in this repo (the plugin ships shell wrappers calling Python). The
mechanism described — child holds parent open via inherited handles —
is universal, so this fix targets the equivalent symptom in our Python
hook path. Will follow up on the upstream JS file with the reporter.
test_legacy_state_backfills_content_hash failed on test-windows because
Path.write_text without an encoding uses the system locale (cp1252 on
Windows). The em dash was written as 0x97, then read back by
diary_ingest as UTF-8 with errors=replace — round-trip produced
different bytes than the in-Python literal, so the assertion comparing
the persisted hash to sha256(text.encode(utf-8)) diverged.
Pin the write side to encoding=utf-8 so the on-disk bytes match what
diary_ingest decodes. No production change.
Address Copilot review on #925:
- Full closet rebuild whenever the content hash differs from prior
state, not only on entry-count growth. Without this, an in-place
edit (same entry count, different body) updated the drawer but
left the closet/search index stale — defeats the verbatim guarantee
at the search layer even if the drawer is correct.
- Legacy size-only skip path now records the computed content_hash
back into state so subsequent runs use the strict hash check
instead of remaining on the size-only path indefinitely.
- Test updates: typo direction in the regression test now matches the
comment (typo "Teh" → fix "The"), assertion now also checks the
closet collection reflects the edit, and a new test exercises the
legacy-state backfill path.
Address Copilot review on #1156:
- Per-file symlink check via new _safe_open_for_write() helper. Uses
O_NOFOLLOW on POSIX (close TOCTOU window between islink check and
open) and falls back to islink + open on Windows. Applied to room
files and index.md, mirroring the existing dir-level check.
- Tests now wrap os.symlink() in _try_symlink_or_skip() so Windows
without Developer Mode and restricted CI sandboxes skip rather than
hard-fail. Added two regression tests for the file-level cases
(room file, index.md).
The skip-if-unchanged check compared byte length only, so any in-place
edit preserving total length (typo fix "teh"→"the", word swap) was
silently dropped — a verbatim-storage violation: the user's actual
words never reached the palace.
Switch the gate to sha256(text). State entries gain a "content_hash"
field; the legacy size-only path is preserved when prev_hash is missing
so a post-upgrade run does not re-ingest every untouched diary.
Closes#925
A symlink pre-placed at the export output_dir or any wing subdirectory
would redirect markdown writes to wherever the symlink points. The
miner already rejects symlinked inputs via Path.is_symlink(); the
exporter should apply the same caution to outputs.
Add _reject_symlink() helper and call it before makedirs on both
output_dir and each wing_dir. Refusal raises ValueError with a clear
message rather than silently falling through.
Closes#1156
The retry loop already backs off on HTTP 429/503 and rate-limit-shaped
exceptions, but JSONDecodeError exited on the first failure. Local LLM
runtimes occasionally produce malformed JSON (truncated streams, partial
chunks under load), and the retry was effectively dead for that path.
Mirror the 429/503 branch: sleep with exponential backoff and continue
through all 3 attempts, only returning None after the final failure.
Closes#1155
Address Copilot review on #1403: the test seeked unconditionally to
offset 40960 with only `pre_size > 16384` as a guard. If pre_size sat
between 16384 and 40960 + 16384 = 57344 (e.g., on a chromadb version
that allocated fewer pages on init, or a future schema change), the
seek would extend the file with zero-padding and the original pages
would stay intact — quick_check would still pass on the (untouched)
real data, and the regression guard would silently skip detecting a
preflight-ordering regression.
Compute the offset from pre_size, page-aligned, with explicit asserts
that the file is large enough to mangle 4 pages without truncating
the header or extending past EOF.
#1364 added the SQLite quick_check preflight to rebuild_index, but
placed it AFTER backend.get_collection(...). On a SQLite-corrupt
palace, chromadb's rust binding raises pyo3_runtime.PanicException —
which is not a regular Exception subclass — so it propagates past the
existing `except Exception` handlers and the user sees a 30-line stack
trace instead of the friendly abort message #1364 was designed to
deliver. Reproduced with `mempalace repair --yes` against a palace
whose chroma.sqlite3 has 4 mangled pages: pre-fix, panic; post-fix,
the clean abort message and exit code 1.
Two changes:
- mempalace/cli.py cmd_repair: run sqlite_integrity_errors() right
after the basic palace-existence check, BEFORE the max_seq_id
preflight (which itself opens sqlite3) and BEFORE backend =
ChromaBackend(). Exit non-zero so unattended scripts and CI gates
see the failure.
- mempalace/repair.py rebuild_index: same move at the function level
for direct callers (tests, MCP) that bypass cmd_repair.
The new test test_rebuild_index_runs_sqlite_preflight_before_chromadb_open
uses a real chromadb-built palace (no ChromaBackend mock) plus a
real corrupt SQLite (16 KB of mangled pages) so the ordering is
exercised end-to-end. The previously-shipping test for the abort path
mocked both the backend and sqlite_integrity_errors, which is why the
ordering bug shipped CI-green.
Six existing test_cli.py cmd_repair tests used `(palace_dir /
"chroma.sqlite3").write_text("db")` to fake the SQLite file. The new
preflight correctly fails quick_check on those 2-byte stubs, so the
tests now create empty real SQLite DBs the same way the test_repair.py
fixtures already do.
#1357 (max_seq_id preflight) merged into develop while this branch
was in CI, opening a fresh conflict between the two preflight helpers.
mempalace/repair.py:
- Kept both: this branch's sqlite_integrity_errors() / print_sqlite_
integrity_abort() AND develop's maybe_repair_poisoned_max_seq_id_
before_rebuild() from #1357. They check for distinct corruption
classes and run as separate preflights.
tests/test_repair.py:
- Kept both this branch's sqlite_integrity_errors test group and
develop's max_seq_id preflight test group; non-overlapping coverage.
Local: 1623 tests pass, ruff lint+format clean against 0.4.x CI pin.
Conflicts opened by #1285 (temp-staging rebuild) and #1312
(collection_name in recovery paths) merging after this branch was
authored.
mempalace/repair.py:
- Kept this branch's sqlite_integrity_errors() and
print_sqlite_integrity_abort() helpers; took develop's rebuild_index
signature with the collection_name parameter from #1312. Normalized
the helper's print indent to 2 spaces to match the rest of the file.
tests/test_repair.py:
- Kept both this branch's sqlite_integrity_errors tests and develop's
rebuild_from_sqlite + configured-collection coverage.
- Replaced 7 sites of sqlite_path.write_text("fake") with
sqlite3.connect(...).close() — write_text("fake") fails PRAGMA
quick_check, so the new preflight aborts before the rebuild logic
the tests actually exercise. An empty real SQLite DB passes
quick_check and lets the tests run as intended.
- Took develop's temp-staging assertion shape (delete/create the
__repair_tmp collection in addition to the live drawers collection)
for the existing test_rebuild_index_success test.
Local: 1618 tests pass, ruff lint+format clean against 0.4.x CI pin.
Two conflicts, both because #1339 (bloated link payloads) merged into
develop after this branch was authored:
- mempalace/backends/chroma.py: _segment_appears_healthy now stacks
three checks — bloated-link from #1339 (top), missing-metadata-with-
data-floor from this branch (middle), pickle format sniff (bottom).
All three are complementary; #1339 catches structural payload
corruption, this branch catches pickle truncation, the original
catches pickle protocol-byte corruption.
- tests/test_backends.py: kept both new imports (_segment_appears_healthy
from this branch, quarantine_invalid_hnsw_metadata from #1285).
Local: 1618 tests pass, ruff lint+format clean against 0.4.x CI pin.
Three conflicts, all from develop landing #1285/#1310/#1312 after this
branch was authored:
- mempalace/cli.py: keep both import sets — this branch's
maybe_repair_poisoned_max_seq_id_before_rebuild plus develop's
RebuildCollectionError / _close_chroma_handles / _extract_drawers /
_rebuild_collection_via_temp added in #1285.
- mempalace/repair.py: keep this branch's
maybe_repair_poisoned_max_seq_id_before_rebuild definition; use
develop's rebuild_index signature with the collection_name parameter
added in #1312. Normalized print indent to 2 spaces matching the
rest of the file.
- tests/test_repair.py: keep both this branch's max_seq_id preflight
tests and develop's rebuild_from_sqlite + configured-collection-name
tests; they exercise distinct code paths and don't overlap.
Local: 1617 tests pass, ruff lint+format clean against 0.4.x CI pin.
The collection_name plumbing rebase produced a few unformatted blocks
in test_mcp_server.py and test_searcher.py; bringing them in line with
the 0.4.x CI pin so test-windows / lint stay green.
Three small changes that together address the failure modes in #1296:
1. Add pnpm-lock.yaml and yarn.lock to SKIP_FILENAMES, mirroring the
existing package-lock.json rule. A 24K-line pnpm-lock.yaml produced
~1124 chunks in one batch and tripped onnxruntime bad_alloc on
Windows; pnpm/yarn lockfiles are no more useful to mine than npm's.
2. Skip any file that produces more than MAX_CHUNKS_PER_FILE (500)
chunks, with a clear log line. Catches the broader class — generated
CSV/JSON, build artifacts, etc. — that the named-file SKIP list will
never fully cover. The cap is conservative (500 chunks * 800 chars ≈
400 KB of source) so legitimate hand-written content still mines.
3. Print a partial-progress summary on any exception in _mine_impl, not
just KeyboardInterrupt, then re-raise. Without this, an arbitrary
exception (ONNX bad_alloc, chromadb HNSW error, OS fault) propagates
silently — the operator sees only the last progress line and assumes
the mine succeeded. The new path mirrors the KeyboardInterrupt
summary (files_processed, drawers_filed, last_file) plus the
exception type and message, then re-raises so the original traceback
surfaces and the exit code is non-zero.
Tests cover: SKIP_FILENAMES contents, the chunk-cap path returning
(0, room) with no upserts, and the new mine-aborted summary surfacing
both the partial counters and the exception class.
`backend: ChromaBackend | None = None` evaluates the X | None union
eagerly at function-definition time, which Python 3.9 rejects with
TypeError: unsupported operand type(s) for |: 'ABCMeta' and 'NoneType'
since the new union syntax is 3.10+. Quoting matches the existing
forward-reference style in repair.py (sqlite_drawer_count, etc.) and
defers evaluation, restoring 3.9 compatibility.