mempalace

Author	SHA1	Message	Date
Igor Lins e Silva	c35686c9e1	docs(install): recommend uv as the package manager End-user installs now lead with `uv tool install mempalace`, with `pip install mempalace` kept as a fallback. Dev/contributor docs lead with `uv sync --extra dev` and `uv run` for tests/benchmarks/lint, with the equivalent pip recipe kept inline. The shipped `/mempalace:init` skill instructions (mempalace/instructions/init.md) try `uv tool install` first when uv is on PATH, then fall back through the pip variants. Adds a .python-version pin at 3.12 because the lockfile's onnxruntime==1.24.3 only ships wheels for Python >=3.11; without the pin, `uv sync` on a host where uv prefers 3.10 fails with no source distribution available, which would make the documented command a footgun. pyproject's `requires-python = ">=3.9"` is unchanged — pip users on 3.9/3.10 are unaffected. Files updated: README.md, CONTRIBUTING.md, CLAUDE.md, the gemini-cli guide and example, the .claude-plugin / .codex-plugin READMEs, the mempalace SKILL, the openclaw SKILL, tools/save.md, the three benchmarks docs, and the corresponding website mirrors.	2026-05-08 01:38:00 -03:00
Igor Lins e Silva	a4868a3589	perf(mining): batch per-chunk upserts and add optional GPU acceleration The miner upserted one drawer per ChromaDB call, paying tokenizer + ONNX session setup per chunk. The embedding device was CPU-only because no EmbeddingFunction was ever wired through the backend. Two changes, each a speedup in its own right; stacked they give ~10x end-to-end on a medium corpus (20 files, 568 drawers): 1. Batched upsert. `process_file` and `_file_chunks_locked` now collect all chunks of a file into a single `collection.upsert(...)` so the embedding model runs one forward pass per file instead of N. 2. Hardware-accelerated embedding function. New `mempalace/embedding.py` wraps `ONNXMiniLM_L6_V2` with configurable `preferred_providers`. `MEMPALACE_EMBEDDING_DEVICE` (or `embedding_device` in config.json) selects auto / cpu / cuda / coreml / dml. Unavailable accelerators log a warning and fall back to CPU. The factory subclasses `ONNXMiniLM_L6_V2` and spoofs its `name()` to `"default"` so the persisted EF identity matches existing palaces created with ChromaDB's bare `DefaultEmbeddingFunction` -- same model, same 384-dim vectors, no rebuild needed when turning GPU on. `ChromaBackend.get_collection` / `create_collection` now pass the resolved EF on every call so miner writes and searcher reads agree. Benchmarks (i9-12900KF + RTX 3090, medium scenario, 568 drawers): per-chunk + CPU 19.77s · 29 drw/s (baseline) batched + CPU 8.07s · 70 drw/s (2.4x) batched + CUDA 2.15s · 264 drw/s (9.2x) Reproducible via `benchmarks/mine_bench.py`. Install paths: pip install mempalace[gpu] # NVIDIA CUDA pip install mempalace[dml] # DirectML (Windows) pip install mempalace[coreml] # macOS Neural Engine Mine header now prints `Device: cpu\|cuda\|...` so users can confirm the accelerator engaged.	2026-04-24 19:42:35 -03:00
Ben Sigman	ced1fc955d	Merge pull request #897 from MemPalace/docs/honest-benchmarks-and-readme docs: honest benchmarks + README/site rewrite (#875)	2026-04-14 20:35:29 -07:00
Igor Lins e Silva	bf3b9c5979	docs: #875 follow-up — repo surfaces + reproduction URLs + CHANGELOG Remaining in-repo surfaces carrying the same retracted or broken claims as the public pages fixed in the previous two commits. CONTRIBUTING.md - "Palace structure matters ... 34% retrieval improvement" → reframed as scoping (same rewording applied to the website equivalents). benchmarks/BENCHMARKS.md - Add a prominent "Important caveat" block at the top of the "Comparison vs Published Systems" table explaining that R@5 (retrieval recall) and QA accuracy are different metrics, with citations to Mastra, Mem0, and Supermemory's own published methodology pages. Annotate the specific competitor rows whose numbers are QA accuracy, not retrieval recall. - Annotate the `hybrid v4 + rerank 100%` row to note that the 99.4 → 100 step was tuned on 3 specific wrong answers (already disclosed further down in the doc under "Benchmark Integrity"); the honest hybrid figure is held-out 98.4%. - Fix the broken clone URL — `aya-thekeeper/mempal` no longer points at anything; now `MemPalace/mempalace`. benchmarks/README.md + benchmarks/HYBRID_MODE.md - Same clone-URL fix applied. CHANGELOG.md - Add a ### Documentation entry under [Unreleased] v3.3.0 that names #875 and summarises the scope of the rewrite.	2026-04-14 21:38:00 -03:00
Igor Lins e Silva	61d02e10fe	benchmarks: add v3.3.0 reproduction results + 50/450 split Addresses #875: every internal BENCHMARKS.md claim reproduced on Linux x86_64 (v3.3.0 tag, deterministic ChromaDB embeddings, seed=42 for the LongMemEval dev/held-out split). Scorecard — all reproduce exactly: LongMemEval raw R@5 96.6% (500/500) ✅ hybrid_v4 held-out 450 R@5 98.4% (442/450) ✅ hybrid_v4 + minimax rerank R@5 99.2% (496/500) * hybrid_v4 + minimax rerank R@10 100.0% (500/500) * LoCoMo (session, top-10) raw 60.3% (1986q) ✅ hybrid v5 88.9% (1986q) ✅ ConvoMem all-categories (250 items) 92.9% ✅ MemBench all-categories (8500) 80.3% ✅ * The minimax-m2.7:cloud rerank run replicates the "100%" claim with a different LLM family (no Anthropic dependency). R@10 is a perfect reproduction; R@5 misses 4 questions that the published Haiku run caught — consistent with BENCHMARKS.md's own disclosure that hybrid_v4 includes three question-specific fixes developed by inspecting misses, i.e. teaching to the test. The committed 50/450 split is the deterministic (seed=42) split BENCHMARKS.md references but wasn't previously in the repo. Full result JSONLs include every question, every retrieved id, and every score — auditable end-to-end.	2026-04-14 21:21:11 -03:00
Igor Lins e Silva	ca0682abe3	benchmarks: apply ruff-format to llm_rerank (trivial line wrap)	2026-04-14 21:20:54 -03:00
Igor Lins e Silva	8df7b9bf2c	benchmarks: add --llm-backend ollama for non-Anthropic rerank The rerank pipeline was hardcoded to Anthropic's /v1/messages. Add a backend flag so the same code path can be exercised with any OpenAI-compatible endpoint — local Ollama, Ollama Cloud, or any gateway that speaks /v1/chat/completions. Enables independent verification of the "100% with Haiku rerank" claim by running the full benchmark with a different LLM family (e.g. minimax-m2.7:cloud) and zero Anthropic dependency. Both longmemeval_bench.py and locomo_bench.py: - llm_rerank*() gain backend= / base_url= kwargs - CLI: --llm-backend {anthropic,ollama}, --llm-base-url - API key required only when backend=anthropic (diary/palace modes still require it) - Parse last integer in response (reasoning models emit multi-int output) - Fallback to message.reasoning when content is empty - Raise max_tokens to 1024 for reasoning models	2026-04-14 21:20:14 -03:00
travisBREAKS	89206107fa	fix(bench): remove hardcoded credential paths from benchmark runners (#177 ) The `_load_api_key()` function in longmemeval_bench.py and locomo_bench.py searched for API keys in a fixed path (`~/.config/lu/keys.json`) using personal key names (`anthropic_milla`, `anthropic_claude_code_main`). This leaks internal infrastructure details into the public codebase and trains contributors to store credentials in a non-standard location rather than using the standard ANTHROPIC_API_KEY env var. Simplified to: CLI flag > env var > empty string. Updated help text and HYBRID_MODE.md docs to match. Co-authored-by: Tadao <tadao@travisfixes.com>	2026-04-11 23:14:36 -07:00
travisBREAKS	d8b2db696f	fix(bench): remove global SSL verification bypass in convomem_bench (#176 ) The module-level `ssl._create_default_https_context = ssl._create_unverified_context` disables certificate verification for ALL urllib requests in the process, not just the benchmark's HuggingFace downloads. This silently exposes the benchmark runner to MITM attacks. If a specific environment needs to skip verification (e.g. corporate proxy), users can set `PYTHONHTTPSVERIFY=0` or pass a custom ssl context per-request rather than globally patching the ssl module. Co-authored-by: Tadao <tadao@travisfixes.com>	2026-04-11 23:14:12 -07:00
bensig	6d8c462219	fix: resolve ruff lint and format errors across codebase Fix E402 import ordering, F841 unused variable, F541 unnecessary f-strings, F401 unused import, and auto-format 6 files.	2026-04-04 18:37:17 -07:00
bensig	0f8fa8c7d5	bench: add benchmark runners, results docs, and test suite Benchmarks: LongMemEval, LoCoMo, ConvoMem, MemBench runners with methodology docs and hybrid retrieval analysis. Tests: config, miner, convo_miner, normalize — 9 tests, all passing.	2026-04-04 18:33:42 -07:00

11 Commits