docs(website): align mempalaceofficial.com with honest benchmarks

Part of #875. Bring the VitePress site into line with the new README and the reproducibility scorecard: drop category-error comparisons, drop retracted claims, retain only metrics and caveats that survive audit. website/index.md - New tagline matches README (local-first, verbatim, pluggable backend, 96.6% R@5 raw, zero API calls). - Replace the "MemPalace hybrid 100% / Supermemory ~99% / Mastra 94.87% / Mem0 ~85%" comparison table with a single honest table showing MemPalace's own retrieval-recall numbers (raw 96.6%, hybrid v4 held-out 98.4%). Add an explicit sentence explaining why we no longer publish a cross-system table on the landing page (retrieval recall vs QA accuracy are different metrics). - Soften the "ChromaDB-powered vector search" feature blurb to be backend-agnostic, since the retrieval layer is pluggable. website/reference/benchmarks.md - Full rewrite of the retrieval-recall tables. No more "100%" headline; honest held-out 98.4% R@5 replaces it. Added the model-agnostic rerank result (99.2% R@5 / 100% R@10 with minimax-m2.7 via Ollama) to show the pipeline is not Haiku-specific. - Drop the LoCoMo "Hybrid v5 + Sonnet rerank (top-50) 100%" row. With per-conversation session counts of 19-32 and top_k=50, the retrieval stage returns every session by construction — the number measures an LLM's reading comprehension, not retrieval. - Drop the cross-system comparison tables. Link out to each project's own research page (Mastra, Mem0, Supermemory) for their published numbers and metric definitions. - Rewrite reproduction commands to use the correct repository and demonstrate the new --llm-backend ollama flag. website/concepts/the-palace.md - Remove the "+34%" row / paragraph. Wing/room filtering is standard metadata filtering in the vector store, not a novel retrieval mechanism — the April-7 note already retracted that framing; this finishes the retraction on the website where it had remained. website/guide/searching.md - Same treatment for "34% retrieval improvement". Reframe as operational scoping, not a novel boost. website/reference/contributing.md - Update the "palace structure matters" bullet to reflect the same framing: scoping-not-magic. website/concepts/knowledge-graph.md - Replace the MemPalace-vs-Zep feature matrix with a short "related work" note that links to Zep's own documentation for authoritative details on their deployment model. Avoids claims we cannot verify at source.
2026-04-14 21:37:45 -03:00
parent 65bf1ebda3
commit f20a1a30fe
6 changed files with 133 additions and 95 deletions
@@ -4,7 +4,7 @@ layout: home
 hero:
  name: MemPalace
  text: Give your AI a memory.
-  tagline: "96.6% recall on LongMemEval in raw mode. Local-first, open source, and usable without an API key."
+  tagline: "Local-first AI memory. Verbatim storage, pluggable backend, 96.6% R@5 raw on LongMemEval — zero API calls."
  image:
    src: /mempalace_logo.png
    alt: MemPalace
@@ -34,7 +34,7 @@ features:
      src: /icons/search.svg
      alt: Semantic Search
    title: Semantic Search
-    details: ChromaDB-powered vector search lets the model retrieve past discussions by topic, project, or room.
+    details: Vector search over verbatim content lets the model retrieve past discussions by topic, project, or room. Backend is pluggable.
  - icon:
      src: /icons/git-merge.svg
      alt: Knowledge Graph
@@ -49,7 +49,7 @@ features:
      src: /icons/shield-check.svg
      alt: Zero Cloud
    title: Zero Cloud
-    details: Core storage and retrieval run locally on ChromaDB and SQLite. Optional reranking features can add an API dependency.
+    details: Core storage and retrieval run locally. Optional reranking features can add an API dependency but are not required for the benchmark path.
 ---

 <style>
@@ -68,20 +68,21 @@ features:

 ## Verbatim Retrieval First

-MemPalace starts from a simple premise: **store the source text and retrieve it well**. The benchmarked raw mode does not require an LLM extraction step.
+MemPalace stores source text and retrieves it with semantic search. The benchmarked raw mode does not require an LLM at any stage — no extraction, no rerank, no summarisation.

-| System | LongMemEval R@5 | API Required | Cost |
-|--------|----------------|--------------|------|
-| **MemPalace (hybrid)** | **100%** | Optional | Free |
-| Supermemory ASMR | ~99% | Yes | — |
-| **MemPalace (raw)** | **96.6%** | **None** | **Free** |
-| Mastra | 94.87% | Yes | API costs |
-| Mem0 | ~85% | Yes | $19–249/mo |
+**LongMemEval retrieval recall (500 questions):**

-The raw 96.6% LongMemEval result is the baseline story: strong recall without requiring an API key or an LLM in the retrieval pipeline.
+| Mode | R@5 | LLM required |
+|---|---|---|
+| Raw (semantic search over verbatim text) | **96.6%** | None |
+| Hybrid v4, held-out 450q | **98.4%** | None |
+
+The raw 96.6% reproduces on any machine with the committed dataset: result JSONLs, the `seed=42` train/held-out split, and the `--mode raw` / `--held-out` runners are all in the `benchmarks/` directory of the repo.
+
+We deliberately do not publish a side-by-side comparison against other memory systems on this page. Retrieval recall (R@5) and end-to-end QA accuracy are different metrics and are not comparable; where MemPalace can be fairly compared on the same metric, we link to the other project's published source.

 <div style="text-align: center; padding-top: 16px;">
-  <a href="./reference/benchmarks" style="color: var(--vp-c-brand-1); font-weight: 500;">Full benchmark results →</a>
+  <a href="./reference/benchmarks" style="color: var(--vp-c-brand-1); font-weight: 500;">Full benchmark methodology →</a>
 </div>

 </div>