Files
mempalace/docs/HISTORY.md
T
Igor Lins e Silva 65bf1ebda3 docs: slim README and move corrections/notices to docs/HISTORY.md
Addresses #875. The previous README was 755 lines mixing six purposes
(scam alert, hero, two mea-culpa notes, install guide, architecture
explainer, API reference, file map). Rework it as a pure entry point:
what MemPalace is, how to install, honest benchmark numbers, links to
the website for concept/architecture documentation.

Key content changes:
 - Drop the "highest-scoring AI memory system ever benchmarked" framing.
 - New tagline: "Local-first AI memory. Verbatim storage, pluggable
   backend, 96.6% R@5 raw on LongMemEval — zero API calls." Avoids
   naming a specific vector-store implementation since the backend is
   pluggable (see mempalace/backends/base.py).
 - Remove the cross-system comparison table. Retrieval recall (R@5)
   and end-to-end QA accuracy are different metrics and are not
   comparable; placing MemPalace's R@5 next to competitor QA accuracy
   under a single column header was a category error.
 - The "100%" LongMemEval headline is no longer the lead. The honest
   held-out figure is 98.4% R@5 on 450 unseen questions. The rerank
   pipeline reaches >=99% with any capable LLM (reproduced with
   Claude Haiku, Sonnet, and minimax-m2.7 via Ollama) — pipeline-level,
   not model-specific.
 - Benchmark reproduction commands now reference the correct repo
   (MemPalace/mempalace, not the defunct aya-thekeeper/mempal branch).

New file: docs/HISTORY.md as the canonical home for post-launch
corrections, public notices, and retractions. Contains verbatim:
 - 2026-04-14 note on this rewrite (links to #875)
 - 2026-04-11 impostor-domain notice (moved from README header)
 - 2026-04-07 "A Note from Milla & Ben" (moved from README body)

README keeps a one-line scam-alert callout that links to
docs/HISTORY.md for the full timeline.
2026-04-14 21:37:20 -03:00

145 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# MemPalace — History, Corrections, and Public Notices
This file is the canonical record of post-launch corrections, public notices,
and retractions that affect MemPalace's public claims. Newest first.
---
## 2026-04-14 — Benchmark table rewrite (issue [#875](https://github.com/MemPalace/mempalace/issues/875))
A community audit identified a category error in the public benchmark tables
on `README.md` and `mempalaceofficial.com`: MemPalace's retrieval recall
numbers (R@5, R@10) were listed in the same columns as competitors'
end-to-end QA accuracy numbers. They are different metrics and are not
comparable — a system can have 100% retrieval recall and 40% QA accuracy.
The audit also found that the retracted "+34% palace boost" claim (see the
April 7 note below) was still present in multiple surfaces despite that
retraction, and that two competitor numbers (`Mem0 ~85%`, `Zep ~85%`) had no
published source and did not match the metrics those projects actually
publish.
What changed in this PR:
- The headline number on all surfaces is now **96.6% R@5 on LongMemEval in
raw mode**, independently reproduced on Linux x86_64 against the tagged
v3.3.0 release on 2026-04-14. Result JSONLs are committed under
`benchmarks/results_*.jsonl` (see PR description for the scorecard).
- The **"100% with Haiku rerank"** claim has been removed from all public
comparison tables. It reproduces on our machines and with a different LLM
family (minimax-m2.7 via Ollama Cloud: 99.2% R@5 / 100.0% R@10 on the full
500-question LongMemEval set) — but the 99.4% → 100% step was developed
by inspecting three specific wrong answers (`benchmarks/BENCHMARKS.md` has
called this "teaching to the test" since February). It belongs in the
methodology document, not in a headline.
- The **honest held-out number** for the hybrid pipeline — 98.4% R@5 on 450
questions that `hybrid_v4` was never tuned on, deterministic seed — is now
the comparable figure when an LLM rerank is involved.
- The **retracted "+34% palace boost"** has been removed from
`README.md`, `website/concepts/the-palace.md`,
`website/guide/searching.md`, and `website/reference/contributing.md`.
Wing and room filters remain useful — they're standard metadata filters —
but they are not presented as a novel retrieval improvement.
- **Competitor comparison tables** mixing retrieval recall with QA accuracy
have been removed from `README.md` and `website/reference/benchmarks.md`.
Where MemPalace can be fairly compared on the same metric, we link to the
cited source. Otherwise we report our own numbers and let readers draw
their own conclusions.
- **Reproduction instructions** in `benchmarks/BENCHMARKS.md` and
`benchmarks/README.md` were pointing at a defunct branch
(`aya-thekeeper/mempal`); they now point at `MemPalace/mempalace`.
- The **LoCoMo 100% R@10 with top-50 rerank** row has been removed from
public comparison surfaces. With per-conversation session counts of 1932
and `top_k=50`, the retrieval stage returns every session in the
conversation by construction, so the number measures an LLM's
reading comprehension over the whole conversation, not retrieval.
Thanks to [@dial481](https://github.com/MemPalace/mempalace/issues/875) for
the detailed audit and to [@rohitg00](https://github.com/rohitg00) for the
parallel write-up in Discussion #747.
---
## 2026-04-11 — Impostor domains and malware
Several community members (issues #267, #326, #506) reported fake MemPalace
websites distributing malware. The only official surfaces for this project
are:
- This GitHub repository: [github.com/MemPalace/mempalace](https://github.com/MemPalace/mempalace)
- The PyPI package: [pypi.org/project/mempalace](https://pypi.org/project/mempalace/)
- The docs site: [mempalaceofficial.com](https://mempalaceofficial.com)
Any other domain — `mempalace.tech` being the one most commonly reported —
is not ours. Never run install scripts from unofficial sites.
Thanks to our community members for flagging the problem.
---
## 2026-04-07 — A Note from Milla & Ben
> The community caught real problems in this README within hours of launch
> and we want to address them directly.
>
> **What we got wrong:**
>
> - **The AAAK token example was incorrect.** We used a rough heuristic
> (`len(text)//3`) for token counts instead of an actual tokenizer. Real
> counts via OpenAI's tokenizer: the English example is 66 tokens, the
> AAAK example is 73. AAAK does not save tokens at small scales — it's
> designed for *repeated entities at scale*, and the README example was a
> bad demonstration of that. We're rewriting it.
>
> - **"30x lossless compression" was overstated.** AAAK is a lossy
> abbreviation system (entity codes, sentence truncation). Independent
> benchmarks show AAAK mode scores **84.2% R@5 vs raw mode's 96.6%** on
> LongMemEval — a 12.4 point regression. The honest framing is: AAAK is
> an experimental compression layer that trades fidelity for token
> density, and **the 96.6% headline number is from RAW mode, not AAAK**.
>
> - **"+34% palace boost" was misleading.** That number compares unfiltered
> search to wing+room metadata filtering. Metadata filtering is a
> standard feature of the underlying vector store, not a novel retrieval
> mechanism. Real and useful, but not a moat.
>
> - **"Contradiction detection"** exists as a separate utility
> (`fact_checker.py`) but is not currently wired into the knowledge graph
> operations as the README implied.
>
> - **"100% with Haiku rerank"** is real (we have the result files) but
> the rerank pipeline is not in the public benchmark scripts. We're
> adding it.
>
> **What's still true and reproducible:**
>
> - **96.6% R@5 on LongMemEval in raw mode**, on 500 questions, zero API
> calls — independently reproduced on M2 Ultra in under 5 minutes by
> [@gizmax](https://github.com/MemPalace/mempalace/issues/39).
> - Local, free, no subscription, no cloud, no data leaving your machine.
> - The architecture (wings, rooms, closets, drawers) is real and useful,
> even if it's not a magical retrieval boost.
>
> **What we're doing:**
>
> 1. Rewriting the AAAK example with real tokenizer counts and a scenario
> where AAAK actually demonstrates compression
> 2. Adding `mode raw / aaak / rooms` clearly to the benchmark
> documentation so the trade-offs are visible
> 3. Wiring `fact_checker.py` into the KG ops so the contradiction
> detection claim becomes true
> 4. Pinning the vector store dependency to a tested range (issue #100),
> fixing the shell injection in hooks (#110), and addressing the macOS
> ARM64 segfault (#74)
>
> **Thank you to everyone who poked holes in this.** Brutal honest
> criticism is exactly what makes open source work, and it's what we asked
> for. Special thanks to
> [@panuhorsmalahti](https://github.com/MemPalace/mempalace/issues/43),
> [@lhl](https://github.com/MemPalace/mempalace/issues/27),
> [@gizmax](https://github.com/MemPalace/mempalace/issues/39), and everyone
> who filed an issue or a PR in the first 48 hours. We're listening, we're
> fixing, and we'd rather be right than impressive.
>
> — *Milla Jovovich & Ben Sigman*