docs(website): align mempalaceofficial.com with honest benchmarks

Part of #875. Bring the VitePress site into line with the new README
and the reproducibility scorecard: drop category-error comparisons,
drop retracted claims, retain only metrics and caveats that survive
audit.

website/index.md
 - New tagline matches README (local-first, verbatim, pluggable backend,
   96.6% R@5 raw, zero API calls).
 - Replace the "MemPalace hybrid 100% / Supermemory ~99% / Mastra
   94.87% / Mem0 ~85%" comparison table with a single honest table
   showing MemPalace's own retrieval-recall numbers (raw 96.6%,
   hybrid v4 held-out 98.4%). Add an explicit sentence explaining why
   we no longer publish a cross-system table on the landing page
   (retrieval recall vs QA accuracy are different metrics).
 - Soften the "ChromaDB-powered vector search" feature blurb to be
   backend-agnostic, since the retrieval layer is pluggable.

website/reference/benchmarks.md
 - Full rewrite of the retrieval-recall tables. No more "100%"
   headline; honest held-out 98.4% R@5 replaces it. Added the
   model-agnostic rerank result (99.2% R@5 / 100% R@10 with
   minimax-m2.7 via Ollama) to show the pipeline is not Haiku-specific.
 - Drop the LoCoMo "Hybrid v5 + Sonnet rerank (top-50) 100%" row.
   With per-conversation session counts of 19-32 and top_k=50, the
   retrieval stage returns every session by construction — the number
   measures an LLM's reading comprehension, not retrieval.
 - Drop the cross-system comparison tables. Link out to each project's
   own research page (Mastra, Mem0, Supermemory) for their published
   numbers and metric definitions.
 - Rewrite reproduction commands to use the correct repository and
   demonstrate the new --llm-backend ollama flag.

website/concepts/the-palace.md
 - Remove the "+34%" row / paragraph. Wing/room filtering is standard
   metadata filtering in the vector store, not a novel retrieval
   mechanism — the April-7 note already retracted that framing; this
   finishes the retraction on the website where it had remained.

website/guide/searching.md
 - Same treatment for "34% retrieval improvement". Reframe as
   operational scoping, not a novel boost.

website/reference/contributing.md
 - Update the "palace structure matters" bullet to reflect the same
   framing: scoping-not-magic.

website/concepts/knowledge-graph.md
 - Replace the MemPalace-vs-Zep feature matrix with a short "related
   work" note that links to Zep's own documentation for authoritative
   details on their deployment model. Avoids claims we cannot verify
   at source.
This commit is contained in:
Igor Lins e Silva
2026-04-14 21:37:45 -03:00
parent 65bf1ebda3
commit f20a1a30fe
6 changed files with 133 additions and 95 deletions
+14 -13
View File
@@ -4,7 +4,7 @@ layout: home
hero:
name: MemPalace
text: Give your AI a memory.
tagline: "96.6% recall on LongMemEval in raw mode. Local-first, open source, and usable without an API key."
tagline: "Local-first AI memory. Verbatim storage, pluggable backend, 96.6% R@5 raw on LongMemEval — zero API calls."
image:
src: /mempalace_logo.png
alt: MemPalace
@@ -34,7 +34,7 @@ features:
src: /icons/search.svg
alt: Semantic Search
title: Semantic Search
details: ChromaDB-powered vector search lets the model retrieve past discussions by topic, project, or room.
details: Vector search over verbatim content lets the model retrieve past discussions by topic, project, or room. Backend is pluggable.
- icon:
src: /icons/git-merge.svg
alt: Knowledge Graph
@@ -49,7 +49,7 @@ features:
src: /icons/shield-check.svg
alt: Zero Cloud
title: Zero Cloud
details: Core storage and retrieval run locally on ChromaDB and SQLite. Optional reranking features can add an API dependency.
details: Core storage and retrieval run locally. Optional reranking features can add an API dependency but are not required for the benchmark path.
---
<style>
@@ -68,20 +68,21 @@ features:
## Verbatim Retrieval First
MemPalace starts from a simple premise: **store the source text and retrieve it well**. The benchmarked raw mode does not require an LLM extraction step.
MemPalace stores source text and retrieves it with semantic search. The benchmarked raw mode does not require an LLM at any stage — no extraction, no rerank, no summarisation.
| System | LongMemEval R@5 | API Required | Cost |
|--------|----------------|--------------|------|
| **MemPalace (hybrid)** | **100%** | Optional | Free |
| Supermemory ASMR | ~99% | Yes | — |
| **MemPalace (raw)** | **96.6%** | **None** | **Free** |
| Mastra | 94.87% | Yes | API costs |
| Mem0 | ~85% | Yes | $19249/mo |
**LongMemEval retrieval recall (500 questions):**
The raw 96.6% LongMemEval result is the baseline story: strong recall without requiring an API key or an LLM in the retrieval pipeline.
| Mode | R@5 | LLM required |
|---|---|---|
| Raw (semantic search over verbatim text) | **96.6%** | None |
| Hybrid v4, held-out 450q | **98.4%** | None |
The raw 96.6% reproduces on any machine with the committed dataset: result JSONLs, the `seed=42` train/held-out split, and the `--mode raw` / `--held-out` runners are all in the `benchmarks/` directory of the repo.
We deliberately do not publish a side-by-side comparison against other memory systems on this page. Retrieval recall (R@5) and end-to-end QA accuracy are different metrics and are not comparable; where MemPalace can be fairly compared on the same metric, we link to the other project's published source.
<div style="text-align: center; padding-top: 16px;">
<a href="./reference/benchmarks" style="color: var(--vp-c-brand-1); font-weight: 500;">Full benchmark results →</a>
<a href="./reference/benchmarks" style="color: var(--vp-c-brand-1); font-weight: 500;">Full benchmark methodology →</a>
</div>
</div>