MemPalace: palace architecture, AAAK compression, knowledge graph
The memory system: - Palace structure: Wings (people/projects) → Rooms (topics) → Closets (AAAK compressed) → Drawers (verbatim transcripts) - Halls connect related rooms within a wing - Tunnels cross-reference rooms across wings - AAAK: 30x lossless compression dialect for AI agents - Knowledge graph: temporal entity-relationship triples (SQLite) - Palace graph: room-based navigation with tunnel detection - MCP server: 19 tools — search, graph traversal, agent diary, AAAK auto-teach - Onboarding: guided setup generates wing config + AAAK entity registry - Contradiction detection: catches wrong pronouns, names, ages - Auto-save hooks for Claude Code 96.6% Recall@5 on LongMemEval — highest zero-API score published. 100% with optional Haiku rerank (500/500). Local. Free. No API key required.
This commit is contained in:
@@ -0,0 +1,20 @@
|
|||||||
|
---
|
||||||
|
name: Bug Report
|
||||||
|
about: Something isn't working
|
||||||
|
labels: bug
|
||||||
|
---
|
||||||
|
|
||||||
|
**What happened?**
|
||||||
|
|
||||||
|
**What did you expect?**
|
||||||
|
|
||||||
|
**How to reproduce:**
|
||||||
|
|
||||||
|
1.
|
||||||
|
2.
|
||||||
|
3.
|
||||||
|
|
||||||
|
**Environment:**
|
||||||
|
- OS:
|
||||||
|
- Python version:
|
||||||
|
- MemPal version: (check `python mempal.py --version` or git SHA)
|
||||||
@@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
name: Feature Request
|
||||||
|
about: Suggest an improvement
|
||||||
|
labels: enhancement
|
||||||
|
---
|
||||||
|
|
||||||
|
**What problem does this solve?**
|
||||||
|
|
||||||
|
**What's the proposed solution?**
|
||||||
|
|
||||||
|
**Alternatives considered:**
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
## What does this PR do?
|
||||||
|
|
||||||
|
## How to test
|
||||||
|
|
||||||
|
## Checklist
|
||||||
|
- [ ] Tests pass (`python -m pytest tests/ -v`)
|
||||||
|
- [ ] No hardcoded paths
|
||||||
|
- [ ] Linter passes (`ruff check .`)
|
||||||
@@ -0,0 +1,32 @@
|
|||||||
|
name: Tests
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main]
|
||||||
|
pull_request:
|
||||||
|
branches: [main]
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
test:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
python-version: ["3.9", "3.11", "3.13"]
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
- uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: ${{ matrix.python-version }}
|
||||||
|
- run: pip install -r requirements.txt pytest
|
||||||
|
- run: python -m pytest tests/ -v
|
||||||
|
|
||||||
|
lint:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
- uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.11"
|
||||||
|
- run: pip install ruff
|
||||||
|
- run: ruff check .
|
||||||
|
- run: ruff format --check .
|
||||||
@@ -0,0 +1,7 @@
|
|||||||
|
*.egg-info/
|
||||||
|
dist/
|
||||||
|
build/
|
||||||
|
__pycache__/
|
||||||
|
*.pyc
|
||||||
|
.pytest_cache/
|
||||||
|
mempal.yaml
|
||||||
@@ -0,0 +1,7 @@
|
|||||||
|
repos:
|
||||||
|
- repo: https://github.com/astral-sh/ruff-pre-commit
|
||||||
|
rev: v0.9.0
|
||||||
|
hooks:
|
||||||
|
- id: ruff
|
||||||
|
args: [--fix]
|
||||||
|
- id: ruff-format
|
||||||
@@ -0,0 +1,92 @@
|
|||||||
|
# Contributing to MemPalace
|
||||||
|
|
||||||
|
Thanks for wanting to help. MemPalace is open source and we welcome contributions of all sizes — from typo fixes to new features.
|
||||||
|
|
||||||
|
## Getting Started
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/milla-jovovich/mempalace.git
|
||||||
|
cd mempalace
|
||||||
|
pip install -e ".[dev]" # installs with dev dependencies (pytest, build, twine)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pytest tests/ -v
|
||||||
|
```
|
||||||
|
|
||||||
|
All tests must pass before submitting a PR. Tests should run without API keys or network access.
|
||||||
|
|
||||||
|
## Running Benchmarks
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Quick test (20 questions, ~30 seconds)
|
||||||
|
python benchmarks/longmemeval_bench.py /path/to/longmemeval_s_cleaned.json --limit 20
|
||||||
|
|
||||||
|
# Full benchmark (500 questions, ~5 minutes)
|
||||||
|
python benchmarks/longmemeval_bench.py /path/to/longmemeval_s_cleaned.json
|
||||||
|
```
|
||||||
|
|
||||||
|
See [benchmarks/README.md](benchmarks/README.md) for data download instructions and reproduction guide.
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
mempalace/ ← core package (see mempalace/README.md for module guide)
|
||||||
|
benchmarks/ ← reproducible benchmark runners
|
||||||
|
hooks/ ← Claude Code auto-save hooks
|
||||||
|
examples/ ← usage examples
|
||||||
|
tests/ ← test suite
|
||||||
|
assets/ ← logo + brand
|
||||||
|
```
|
||||||
|
|
||||||
|
## PR Guidelines
|
||||||
|
|
||||||
|
1. Fork the repo and create a feature branch: `git checkout -b feat/my-thing`
|
||||||
|
2. Write your code
|
||||||
|
3. Add or update tests if applicable
|
||||||
|
4. Run `pytest tests/ -v` — everything must pass
|
||||||
|
5. Commit with a clear message following [conventional commits](https://www.conventionalcommits.org/):
|
||||||
|
- `feat: add Notion export format`
|
||||||
|
- `fix: handle empty transcript files`
|
||||||
|
- `docs: update MCP tool descriptions`
|
||||||
|
- `bench: add LoCoMo turn-level metrics`
|
||||||
|
6. Push to your fork and open a PR against `main`
|
||||||
|
|
||||||
|
## Code Style
|
||||||
|
|
||||||
|
- **Formatting**: [Ruff](https://docs.astral.sh/ruff/) with 100-char line limit (configured in `pyproject.toml`)
|
||||||
|
- **Naming**: `snake_case` for functions/variables, `PascalCase` for classes
|
||||||
|
- **Docstrings**: on all modules and public functions
|
||||||
|
- **Type hints**: where they improve readability
|
||||||
|
- **Dependencies**: minimize. ChromaDB + PyYAML only. Don't add new deps without discussion.
|
||||||
|
|
||||||
|
## Good First Issues
|
||||||
|
|
||||||
|
Check the [Issues](https://github.com/milla-jovovich/mempalace/issues) tab. Great starting points:
|
||||||
|
|
||||||
|
- **New chat formats**: Add import support for Cursor, Copilot, or other AI tool exports
|
||||||
|
- **Room detection**: Improve pattern matching in `room_detector_local.py`
|
||||||
|
- **Tests**: Increase coverage — especially for `knowledge_graph.py` and `palace_graph.py`
|
||||||
|
- **Entity detection**: Better name disambiguation in `entity_detector.py`
|
||||||
|
- **Docs**: Improve examples, add tutorials
|
||||||
|
|
||||||
|
## Architecture Decisions
|
||||||
|
|
||||||
|
If you're planning a significant change, open an issue first to discuss the approach. Key principles:
|
||||||
|
|
||||||
|
- **Verbatim first**: Never summarize user content. Store exact words.
|
||||||
|
- **Local first**: Everything runs on the user's machine. No cloud dependencies.
|
||||||
|
- **Zero API by default**: Core features must work without any API key.
|
||||||
|
- **Palace structure matters**: Wings, halls, and rooms aren't cosmetic — they drive a 34% retrieval improvement. Respect the hierarchy.
|
||||||
|
|
||||||
|
## Community
|
||||||
|
|
||||||
|
- **Discord**: [Join us](https://discord.com/invite/ycTQQCu6kn)
|
||||||
|
- **Issues**: Bug reports and feature requests welcome
|
||||||
|
- **Discussions**: For questions and ideas
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT — your contributions will be released under the same license.
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2026 MemPalace Contributors
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
@@ -0,0 +1,584 @@
|
|||||||
|
<div align="center">
|
||||||
|
|
||||||
|
<img src="assets/mempalace_logo.png" alt="MemPalace" width="280">
|
||||||
|
|
||||||
|
# MemPalace
|
||||||
|
|
||||||
|
### The highest-scoring AI memory system ever benchmarked. And it's free.
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
Every conversation you have with an AI — every decision, every debugging session, every architecture debate — disappears when the session ends. Six months of work, gone. You start over every time.
|
||||||
|
|
||||||
|
Other memory systems try to fix this by letting AI decide what's worth remembering. It extracts "user prefers Postgres" and throws away the conversation where you explained *why*. MemPalace takes a different approach: **store everything, then make it findable.**
|
||||||
|
|
||||||
|
**The Palace** — Ancient Greek orators memorized entire speeches by placing ideas in rooms of an imaginary building. Walk through the building, find the idea. MemPalace applies the same principle to AI memory: your conversations are organized into wings (people and projects), halls (types of memory), and rooms (specific ideas). No AI decides what matters — you keep every word, and the structure makes it searchable. That structure alone improves retrieval by 34%.
|
||||||
|
|
||||||
|
**AAAK** — To make all that data usable, MemPalace compresses it with AAAK — a lossless shorthand dialect designed for AI agents. Not meant to be read by humans — meant to be read by your AI, fast. 30x compression, zero information loss. Your AI loads months of context in ~120 tokens. Nothing else like it exists.
|
||||||
|
|
||||||
|
**Local, open, adaptable** — MemPalace runs entirely on your machine, on any data you have locally, without using any external API or services. It has been tested on conversations — but it can be adapted for different types of datastores. This is why we're open-sourcing it.
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
[![][version-shield]][release-link]
|
||||||
|
[![][python-shield]][python-link]
|
||||||
|
[![][license-shield]][license-link]
|
||||||
|
[![][discord-shield]][discord-link]
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
[Quick Start](#quick-start) · [The Palace](#the-palace) · [AAAK Dialect](#aaak-compression) · [Benchmarks](#benchmarks) · [MCP Tools](#mcp-server)
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
### Highest LongMemEval score ever published — free or paid.
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<td align="center"><strong>96.6%</strong><br><sub>LongMemEval R@5<br>Zero API calls</sub></td>
|
||||||
|
<td align="center"><strong>100%</strong><br><sub>LongMemEval R@5<br>with Haiku rerank</sub></td>
|
||||||
|
<td align="center"><strong>+34%</strong><br><sub>Retrieval boost<br>from palace structure</sub></td>
|
||||||
|
<td align="center"><strong>$0</strong><br><sub>No subscription<br>No cloud. Local only.</sub></td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<sub>Reproducible — runners in <a href="benchmarks/">benchmarks/</a>. <a href="benchmarks/BENCHMARKS.md">Full results</a>.</sub>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install mempalace
|
||||||
|
|
||||||
|
# Set up your world — who you work with, what your projects are
|
||||||
|
mempalace init ~/projects/myapp
|
||||||
|
|
||||||
|
# Mine your data
|
||||||
|
mempalace mine ~/projects/myapp # projects — code, docs, notes
|
||||||
|
mempalace mine ~/chats/ --mode convos # convos — Claude, ChatGPT, Slack exports
|
||||||
|
mempalace mine ~/chats/ --mode convos --extract general # general — classifies into decisions, milestones, problems
|
||||||
|
|
||||||
|
# Search anything you've ever discussed
|
||||||
|
mempalace search "why did we switch to GraphQL"
|
||||||
|
|
||||||
|
# Your AI remembers
|
||||||
|
mempalace status
|
||||||
|
```
|
||||||
|
|
||||||
|
Three mining modes: **projects** (code and docs), **convos** (conversation exports), and **general** (auto-classifies into decisions, preferences, milestones, problems, and emotional context). Everything stays on your machine.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
Decisions happen in conversations now. Not in docs. Not in Jira. In conversations with Claude, ChatGPT, Copilot. The reasoning, the tradeoffs, the "we tried X and it failed because Y" — all trapped in chat windows that evaporate when the session ends.
|
||||||
|
|
||||||
|
**Six months of daily AI use = 19.5 million tokens.** That's every decision, every debugging session, every architecture debate. Gone.
|
||||||
|
|
||||||
|
| Approach | Tokens loaded | Annual cost |
|
||||||
|
|----------|--------------|-------------|
|
||||||
|
| Paste everything | 19.5M — doesn't fit any context window | Impossible |
|
||||||
|
| LLM summaries | ~650K | ~$507/yr |
|
||||||
|
| **MemPalace wake-up** | **~170 tokens** | **~$0.70/yr** |
|
||||||
|
| **MemPalace + 5 searches** | **~13,500 tokens** | **~$10/yr** |
|
||||||
|
|
||||||
|
MemPalace loads 170 tokens of critical facts on wake-up — your team, your projects, your preferences. Then searches only when needed. $10/year to remember everything vs $507/year for summaries that lose context.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
### The Palace
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ WING: Person │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────┐ ──hall── ┌──────────┐ │
|
||||||
|
│ │ Room A │ │ Room B │ │
|
||||||
|
│ └────┬─────┘ └──────────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ ┌──────────┐ ┌──────────┐ │
|
||||||
|
│ │ Closet │ ───▶ │ Drawer │ │
|
||||||
|
│ └──────────┘ └──────────┘ │
|
||||||
|
└─────────┼──────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
tunnel
|
||||||
|
│
|
||||||
|
┌─────────┼──────────────────────────────────────────────────┐
|
||||||
|
│ WING: Project │
|
||||||
|
│ │ │
|
||||||
|
│ ┌────┴─────┐ ──hall── ┌──────────┐ │
|
||||||
|
│ │ Room A │ │ Room C │ │
|
||||||
|
│ └────┬─────┘ └──────────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ ┌──────────┐ ┌──────────┐ │
|
||||||
|
│ │ Closet │ ───▶ │ Drawer │ │
|
||||||
|
│ └──────────┘ └──────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**Wings** — a person or project. As many as you need.
|
||||||
|
**Rooms** — specific topics within a wing. Auth, billing, deploy — endless rooms.
|
||||||
|
**Halls** — connections between related rooms *within* the same wing. If Room A (auth) and Room B (security) are related, a hall links them.
|
||||||
|
**Tunnels** — connections *between* wings. When Person A and a Project both have a room about "auth," a tunnel cross-references them automatically.
|
||||||
|
**Closets** — compressed memories stored in AAAK. Fast for AI to read.
|
||||||
|
**Drawers** — the original verbatim transcripts. The exact words, never summarized.
|
||||||
|
|
||||||
|
**Halls** are memory types — the same in every wing, acting as corridors:
|
||||||
|
- `hall_facts` — decisions made, choices locked in
|
||||||
|
- `hall_events` — sessions, milestones, debugging
|
||||||
|
- `hall_discoveries` — breakthroughs, new insights
|
||||||
|
- `hall_preferences` — habits, likes, opinions
|
||||||
|
- `hall_advice` — recommendations and solutions
|
||||||
|
|
||||||
|
**Rooms** are named ideas — `auth-migration`, `graphql-switch`, `ci-pipeline`. When the same room appears in different wings, it creates a **tunnel** — connecting the same topic across domains:
|
||||||
|
|
||||||
|
```
|
||||||
|
wing_kai / hall_events / auth-migration → "Kai debugged the OAuth token refresh"
|
||||||
|
wing_driftwood / hall_facts / auth-migration → "team decided to migrate auth to Clerk"
|
||||||
|
wing_priya / hall_advice / auth-migration → "Priya approved Clerk over Auth0"
|
||||||
|
```
|
||||||
|
|
||||||
|
Same room. Three wings. The tunnel connects them.
|
||||||
|
|
||||||
|
### Why Structure Matters
|
||||||
|
|
||||||
|
Tested on 22,000+ real conversation memories:
|
||||||
|
|
||||||
|
```
|
||||||
|
Search all closets: 60.9% R@10
|
||||||
|
Search within wing: 73.1% (+12%)
|
||||||
|
Search wing + hall: 84.8% (+24%)
|
||||||
|
Search wing + room: 94.8% (+34%)
|
||||||
|
```
|
||||||
|
|
||||||
|
Wings and rooms aren't cosmetic. They're a **34% retrieval improvement**. The palace structure is the product.
|
||||||
|
|
||||||
|
### The Memory Stack
|
||||||
|
|
||||||
|
| Layer | What | Size | When |
|
||||||
|
|-------|------|------|------|
|
||||||
|
| **L0** | Identity — who is this AI? | ~50 tokens | Always loaded |
|
||||||
|
| **L1** | Critical facts — team, projects, preferences | ~120 tokens (AAAK) | Always loaded |
|
||||||
|
| **L2** | Room recall — recent sessions, current project | On demand | When topic comes up |
|
||||||
|
| **L3** | Deep search — semantic query across all closets | On demand | When explicitly asked |
|
||||||
|
|
||||||
|
Your AI wakes up with L0 + L1 (~170 tokens) and knows your world. Searches only fire when needed.
|
||||||
|
|
||||||
|
### AAAK Compression
|
||||||
|
|
||||||
|
AAAK is a lossless dialect — 30x compression, readable by any LLM without a decoder.
|
||||||
|
|
||||||
|
**English (~1000 tokens):**
|
||||||
|
```
|
||||||
|
Priya manages the Driftwood team: Kai (backend, 3 years), Soren (frontend),
|
||||||
|
Maya (infrastructure), and Leo (junior, started last month). They're building
|
||||||
|
a SaaS analytics platform. Current sprint: auth migration to Clerk.
|
||||||
|
Kai recommended Clerk over Auth0 based on pricing and DX.
|
||||||
|
```
|
||||||
|
|
||||||
|
**AAAK (~120 tokens):**
|
||||||
|
```
|
||||||
|
TEAM: PRI(lead) | KAI(backend,3yr) SOR(frontend) MAY(infra) LEO(junior,new)
|
||||||
|
PROJ: DRIFTWOOD(saas.analytics) | SPRINT: auth.migration→clerk
|
||||||
|
DECISION: KAI.rec:clerk>auth0(pricing+dx) | ★★★★
|
||||||
|
```
|
||||||
|
|
||||||
|
Same information. 8x fewer tokens. Your AI learns AAAK automatically from the MCP server — no manual setup.
|
||||||
|
|
||||||
|
### Contradiction Detection
|
||||||
|
|
||||||
|
MemPalace catches mistakes before they reach you:
|
||||||
|
|
||||||
|
```
|
||||||
|
Input: "Soren finished the auth migration"
|
||||||
|
Output: 🔴 AUTH-MIGRATION: attribution conflict — Maya was assigned, not Soren
|
||||||
|
|
||||||
|
Input: "Kai has been here 2 years"
|
||||||
|
Output: 🟡 KAI: wrong_tenure — records show 3 years (started 2023-04)
|
||||||
|
|
||||||
|
Input: "The sprint ends Friday"
|
||||||
|
Output: 🟡 SPRINT: stale_date — current sprint ends Thursday (updated 2 days ago)
|
||||||
|
```
|
||||||
|
|
||||||
|
Facts checked against the knowledge graph. Ages, dates, and tenures calculated dynamically — not hardcoded.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Real-World Examples
|
||||||
|
|
||||||
|
### Solo developer across multiple projects
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Mine each project's conversations
|
||||||
|
mempalace mine ~/chats/orion/ --mode convos --wing orion
|
||||||
|
mempalace mine ~/chats/nova/ --mode convos --wing nova
|
||||||
|
mempalace mine ~/chats/helios/ --mode convos --wing helios
|
||||||
|
|
||||||
|
# Six months later: "why did I use Postgres here?"
|
||||||
|
mempalace search "database decision" --wing orion
|
||||||
|
# → "Chose Postgres over SQLite because Orion needs concurrent writes
|
||||||
|
# and the dataset will exceed 10GB. Decided 2025-11-03."
|
||||||
|
|
||||||
|
# Cross-project search
|
||||||
|
mempalace search "rate limiting approach"
|
||||||
|
# → finds your approach in Orion AND Nova, shows the differences
|
||||||
|
```
|
||||||
|
|
||||||
|
### Team lead managing a product
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Mine Slack exports and AI conversations
|
||||||
|
mempalace mine ~/exports/slack/ --mode convos --wing driftwood
|
||||||
|
mempalace mine ~/.claude/projects/ --mode convos
|
||||||
|
|
||||||
|
# "What did Soren work on last sprint?"
|
||||||
|
mempalace search "Soren sprint" --wing driftwood
|
||||||
|
# → 14 closets: OAuth refactor, dark mode, component library migration
|
||||||
|
|
||||||
|
# "Who decided to use Clerk?"
|
||||||
|
mempalace search "Clerk decision" --wing driftwood
|
||||||
|
# → "Kai recommended Clerk over Auth0 — pricing + developer experience.
|
||||||
|
# Team agreed 2026-01-15. Maya handling the migration."
|
||||||
|
```
|
||||||
|
|
||||||
|
### Before mining: split mega-files
|
||||||
|
|
||||||
|
Some transcript exports concatenate multiple sessions into one huge file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mempalace split ~/chats/ # split into per-session files
|
||||||
|
mempalace split ~/chats/ --dry-run # preview first
|
||||||
|
mempalace split ~/chats/ --min-sessions 3 # only split files with 3+ sessions
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Knowledge Graph
|
||||||
|
|
||||||
|
Temporal entity-relationship triples — like Zep's Graphiti, but SQLite instead of Neo4j. Local and free.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from mempalace.knowledge_graph import KnowledgeGraph
|
||||||
|
|
||||||
|
kg = KnowledgeGraph()
|
||||||
|
kg.add_triple("Kai", "works_on", "Orion", valid_from="2025-06-01")
|
||||||
|
kg.add_triple("Maya", "assigned_to", "auth-migration", valid_from="2026-01-15")
|
||||||
|
kg.add_triple("Maya", "completed", "auth-migration", valid_from="2026-02-01")
|
||||||
|
|
||||||
|
# What's Kai working on?
|
||||||
|
kg.query_entity("Kai")
|
||||||
|
# → [Kai → works_on → Orion (current), Kai → recommended → Clerk (2026-01)]
|
||||||
|
|
||||||
|
# What was true in January?
|
||||||
|
kg.query_entity("Maya", as_of="2026-01-20")
|
||||||
|
# → [Maya → assigned_to → auth-migration (active)]
|
||||||
|
|
||||||
|
# Timeline
|
||||||
|
kg.timeline("Orion")
|
||||||
|
# → chronological story of the project
|
||||||
|
```
|
||||||
|
|
||||||
|
Facts have validity windows. When something stops being true, invalidate it:
|
||||||
|
|
||||||
|
```python
|
||||||
|
kg.invalidate("Kai", "works_on", "Orion", ended="2026-03-01")
|
||||||
|
```
|
||||||
|
|
||||||
|
Now queries for Kai's current work won't return Orion. Historical queries still will.
|
||||||
|
|
||||||
|
| Feature | MemPalace | Zep (Graphiti) |
|
||||||
|
|---------|-----------|----------------|
|
||||||
|
| Storage | SQLite (local) | Neo4j (cloud) |
|
||||||
|
| Cost | Free | $25/mo+ |
|
||||||
|
| Temporal validity | Yes | Yes |
|
||||||
|
| Self-hosted | Always | Enterprise only |
|
||||||
|
| Privacy | Everything local | SOC 2, HIPAA |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent Diary
|
||||||
|
|
||||||
|
Every AI agent gets a personal journal — written in AAAK, persists across sessions.
|
||||||
|
|
||||||
|
```
|
||||||
|
mempalace_diary_write("Kai-assistant",
|
||||||
|
"SESSION:2026-04-04|debugged.orion.timeout|root.cause:connection.pool.exhaustion|fix:pgbouncer|★★★")
|
||||||
|
|
||||||
|
mempalace_diary_read("Kai-assistant", last_n=5)
|
||||||
|
# → last 5 diary entries from this agent, compressed in AAAK
|
||||||
|
```
|
||||||
|
|
||||||
|
Not a shared scratchpad — a personal journal with history. Each agent records what it worked on, what it learned, what matters. The next session reads the diary and picks up where it left off.
|
||||||
|
|
||||||
|
Letta charges $20–200/mo for agent-managed memory. MemPalace does it with a wing.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## MCP Server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claude mcp add mempalace -- python -m mempalace.mcp_server
|
||||||
|
```
|
||||||
|
|
||||||
|
### 19 Tools
|
||||||
|
|
||||||
|
**Palace (read)**
|
||||||
|
|
||||||
|
| Tool | What |
|
||||||
|
|------|------|
|
||||||
|
| `mempalace_status` | Palace overview + AAAK spec + memory protocol |
|
||||||
|
| `mempalace_list_wings` | Wings with counts |
|
||||||
|
| `mempalace_list_rooms` | Rooms within a wing |
|
||||||
|
| `mempalace_get_taxonomy` | Full wing → room → count tree |
|
||||||
|
| `mempalace_search` | Semantic search with wing/room filters |
|
||||||
|
| `mempalace_check_duplicate` | Check before filing |
|
||||||
|
| `mempalace_get_aaak_spec` | AAAK dialect reference |
|
||||||
|
|
||||||
|
**Palace (write)**
|
||||||
|
|
||||||
|
| Tool | What |
|
||||||
|
|------|------|
|
||||||
|
| `mempalace_add_drawer` | File verbatim content |
|
||||||
|
| `mempalace_delete_drawer` | Remove by ID |
|
||||||
|
|
||||||
|
**Knowledge Graph**
|
||||||
|
|
||||||
|
| Tool | What |
|
||||||
|
|------|------|
|
||||||
|
| `mempalace_kg_query` | Entity relationships with time filtering |
|
||||||
|
| `mempalace_kg_add` | Add facts |
|
||||||
|
| `mempalace_kg_invalidate` | Mark facts as ended |
|
||||||
|
| `mempalace_kg_timeline` | Chronological entity story |
|
||||||
|
| `mempalace_kg_stats` | Graph overview |
|
||||||
|
|
||||||
|
**Navigation**
|
||||||
|
|
||||||
|
| Tool | What |
|
||||||
|
|------|------|
|
||||||
|
| `mempalace_traverse` | Walk the graph from a room across wings |
|
||||||
|
| `mempalace_find_tunnels` | Find rooms bridging two wings |
|
||||||
|
| `mempalace_graph_stats` | Graph connectivity overview |
|
||||||
|
|
||||||
|
**Agent Diary**
|
||||||
|
|
||||||
|
| Tool | What |
|
||||||
|
|------|------|
|
||||||
|
| `mempalace_diary_write` | Write AAAK diary entry |
|
||||||
|
| `mempalace_diary_read` | Read recent diary entries |
|
||||||
|
|
||||||
|
The AI learns AAAK and the memory protocol automatically from the `mempalace_status` response. No manual configuration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Auto-Save Hooks
|
||||||
|
|
||||||
|
Two hooks for Claude Code that automatically save memories during work:
|
||||||
|
|
||||||
|
**Save Hook** — every 15 messages, triggers a structured save. Topics, decisions, quotes, code changes. Also regenerates the critical facts layer.
|
||||||
|
|
||||||
|
**PreCompact Hook** — fires before context compression. Emergency save before the window shrinks.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"hooks": {
|
||||||
|
"Stop": [{"matcher": "", "hooks": [{"type": "command", "command": "/path/to/mempalace/hooks/mempal_save_hook.sh"}]}],
|
||||||
|
"PreCompact": [{"matcher": "", "hooks": [{"type": "command", "command": "/path/to/mempalace/hooks/mempal_precompact_hook.sh"}]}]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Benchmarks
|
||||||
|
|
||||||
|
Tested on standard academic benchmarks — reproducible, published datasets.
|
||||||
|
|
||||||
|
| Benchmark | Mode | Score | API Calls |
|
||||||
|
|-----------|------|-------|-----------|
|
||||||
|
| **LongMemEval R@5** | Raw (ChromaDB only) | **96.6%** | Zero |
|
||||||
|
| **LongMemEval R@5** | Hybrid + Haiku rerank | **100%** (500/500) | ~500 |
|
||||||
|
| **LoCoMo R@10** | Raw, session level | **60.3%** | Zero |
|
||||||
|
| **Personal palace R@10** | Heuristic bench | **85%** | Zero |
|
||||||
|
| **Palace structure impact** | Wing+room filtering | **+34%** R@10 | Zero |
|
||||||
|
|
||||||
|
The 96.6% raw score is the highest published LongMemEval result requiring no API key, no cloud, and no LLM at any stage.
|
||||||
|
|
||||||
|
### vs Published Systems
|
||||||
|
|
||||||
|
| System | LongMemEval R@5 | API Required | Cost |
|
||||||
|
|--------|----------------|--------------|------|
|
||||||
|
| **MemPalace (hybrid)** | **100%** | Optional | Free |
|
||||||
|
| Supermemory ASMR | ~99% | Yes | — |
|
||||||
|
| **MemPalace (raw)** | **96.6%** | **None** | **Free** |
|
||||||
|
| Mastra | 94.87% | Yes (GPT) | API costs |
|
||||||
|
| Mem0 | ~85% | Yes | $19–249/mo |
|
||||||
|
| Zep | ~85% | Yes | $25/mo+ |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## All Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Setup
|
||||||
|
mempalace init <dir> # guided onboarding + AAAK bootstrap
|
||||||
|
|
||||||
|
# Mining
|
||||||
|
mempalace mine <dir> # mine project files
|
||||||
|
mempalace mine <dir> --mode convos # mine conversation exports
|
||||||
|
mempalace mine <dir> --mode convos --wing myapp # tag with a wing name
|
||||||
|
|
||||||
|
# Splitting
|
||||||
|
mempalace split <dir> # split concatenated transcripts
|
||||||
|
mempalace split <dir> --dry-run # preview
|
||||||
|
|
||||||
|
# Search
|
||||||
|
mempalace search "query" # search everything
|
||||||
|
mempalace search "query" --wing myapp # within a wing
|
||||||
|
mempalace search "query" --room auth-migration # within a room
|
||||||
|
|
||||||
|
# Memory stack
|
||||||
|
mempalace wake-up # load L0 + L1 context
|
||||||
|
mempalace wake-up --wing driftwood # project-specific
|
||||||
|
|
||||||
|
# Compression
|
||||||
|
mempalace compress --wing myapp # AAAK compress
|
||||||
|
|
||||||
|
# Status
|
||||||
|
mempalace status # palace overview
|
||||||
|
```
|
||||||
|
|
||||||
|
All commands accept `--palace <path>` to override the default location.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Global (`~/.mempalace/config.json`)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"palace_path": "/custom/path/to/palace",
|
||||||
|
"collection_name": "mempalace_drawers",
|
||||||
|
"people_map": {"Kai": "KAI", "Priya": "PRI"}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Wing config (`~/.mempalace/wing_config.json`)
|
||||||
|
|
||||||
|
Generated by `mempalace init`. Maps your people and projects to wings:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"default_wing": "wing_general",
|
||||||
|
"wings": {
|
||||||
|
"wing_kai": {"type": "person", "keywords": ["kai", "kai's"]},
|
||||||
|
"wing_driftwood": {"type": "project", "keywords": ["driftwood", "analytics", "saas"]}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Identity (`~/.mempalace/identity.txt`)
|
||||||
|
|
||||||
|
Plain text. Becomes Layer 0 — loaded every session.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Reference
|
||||||
|
|
||||||
|
| File | What |
|
||||||
|
|------|------|
|
||||||
|
| `cli.py` | CLI entry point |
|
||||||
|
| `config.py` | Configuration loading and defaults |
|
||||||
|
| `normalize.py` | Converts 5 chat formats to standard transcript |
|
||||||
|
| `mcp_server.py` | MCP server — 19 tools, AAAK auto-teach, memory protocol |
|
||||||
|
| `miner.py` | Project file ingest |
|
||||||
|
| `convo_miner.py` | Conversation ingest — chunks by exchange pair |
|
||||||
|
| `searcher.py` | Semantic search via ChromaDB |
|
||||||
|
| `layers.py` | 4-layer memory stack |
|
||||||
|
| `dialect.py` | AAAK compression — 30x lossless |
|
||||||
|
| `knowledge_graph.py` | Temporal entity-relationship graph (SQLite) |
|
||||||
|
| `palace_graph.py` | Room-based navigation graph |
|
||||||
|
| `onboarding.py` | Guided setup — generates AAAK bootstrap + wing config |
|
||||||
|
| `entity_registry.py` | Entity code registry |
|
||||||
|
| `entity_detector.py` | Auto-detect people and projects from content |
|
||||||
|
| `split_mega_files.py` | Split concatenated transcripts into per-session files |
|
||||||
|
| `hooks/mempal_save_hook.sh` | Auto-save every N messages |
|
||||||
|
| `hooks/mempal_precompact_hook.sh` | Emergency save before compaction |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
mempalace/
|
||||||
|
├── README.md ← you are here
|
||||||
|
├── mempalace/ ← core package (README)
|
||||||
|
│ ├── cli.py ← CLI entry point
|
||||||
|
│ ├── mcp_server.py ← MCP server (19 tools)
|
||||||
|
│ ├── knowledge_graph.py ← temporal entity graph
|
||||||
|
│ ├── palace_graph.py ← room navigation graph
|
||||||
|
│ ├── dialect.py ← AAAK compression
|
||||||
|
│ ├── miner.py ← project file ingest
|
||||||
|
│ ├── convo_miner.py ← conversation ingest
|
||||||
|
│ ├── searcher.py ← semantic search
|
||||||
|
│ ├── onboarding.py ← guided setup
|
||||||
|
│ └── ... ← see mempalace/README.md
|
||||||
|
├── benchmarks/ ← reproducible benchmark runners
|
||||||
|
│ ├── README.md ← reproduction guide
|
||||||
|
│ ├── BENCHMARKS.md ← full results + methodology
|
||||||
|
│ ├── longmemeval_bench.py ← LongMemEval runner
|
||||||
|
│ ├── locomo_bench.py ← LoCoMo runner
|
||||||
|
│ └── membench_bench.py ← MemBench runner
|
||||||
|
├── hooks/ ← Claude Code auto-save hooks
|
||||||
|
│ ├── README.md ← hook setup guide
|
||||||
|
│ ├── mempal_save_hook.sh ← save every N messages
|
||||||
|
│ └── mempal_precompact_hook.sh ← emergency save
|
||||||
|
├── examples/ ← usage examples
|
||||||
|
│ ├── basic_mining.py
|
||||||
|
│ ├── convo_import.py
|
||||||
|
│ └── mcp_setup.md
|
||||||
|
├── tests/ ← test suite (README)
|
||||||
|
├── assets/ ← logo + brand assets
|
||||||
|
└── pyproject.toml ← package config (v3.0.0)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Python 3.9+
|
||||||
|
- `chromadb>=0.4.0`
|
||||||
|
- `pyyaml>=6.0`
|
||||||
|
|
||||||
|
No API key. No internet after install. Everything local.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install mempalace
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
PRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for setup and guidelines.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT — see [LICENSE](LICENSE).
|
||||||
|
|
||||||
|
<!-- Link Definitions -->
|
||||||
|
[version-shield]: https://img.shields.io/badge/version-3.0.0-4dc9f6?style=flat-square&labelColor=0a0e14
|
||||||
|
[release-link]: https://github.com/milla-jovovich/mempalace/releases
|
||||||
|
[python-shield]: https://img.shields.io/badge/python-3.9+-7dd8f8?style=flat-square&labelColor=0a0e14&logo=python&logoColor=7dd8f8
|
||||||
|
[python-link]: https://www.python.org/
|
||||||
|
[license-shield]: https://img.shields.io/badge/license-MIT-b0e8ff?style=flat-square&labelColor=0a0e14
|
||||||
|
[license-link]: https://github.com/milla-jovovich/mempalace/blob/main/LICENSE
|
||||||
|
[discord-shield]: https://img.shields.io/badge/discord-join-5865F2?style=flat-square&labelColor=0a0e14&logo=discord&logoColor=5865F2
|
||||||
|
[discord-link]: https://discord.com/invite/ycTQQCu6kn
|
||||||
Binary file not shown.
|
After Width: | Height: | Size: 680 KiB |
@@ -0,0 +1,12 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Example: mine a project folder into the palace."""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
|
||||||
|
project_dir = sys.argv[1] if len(sys.argv) > 1 else "~/projects/my_app"
|
||||||
|
print("Step 1: Initialize rooms from folder structure")
|
||||||
|
print(f" mempalace init {project_dir}")
|
||||||
|
print("\nStep 2: Mine everything")
|
||||||
|
print(f" mempalace mine {project_dir}")
|
||||||
|
print("\nStep 3: Search")
|
||||||
|
print(" mempalace search 'why did we choose this approach'")
|
||||||
@@ -0,0 +1,11 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Example: import Claude Code / ChatGPT conversations."""
|
||||||
|
|
||||||
|
print("Import Claude Code sessions:")
|
||||||
|
print(" mempalace mine ~/claude-sessions/ --mode convos --wing my_project")
|
||||||
|
print()
|
||||||
|
print("Import ChatGPT exports:")
|
||||||
|
print(" mempalace mine ~/chatgpt-exports/ --mode convos")
|
||||||
|
print()
|
||||||
|
print("Use general extractor for richer extraction:")
|
||||||
|
print(" mempalace mine ~/chats/ --mode convos --extract general")
|
||||||
@@ -0,0 +1,25 @@
|
|||||||
|
# MCP Integration — Claude Code
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
Run the MCP server:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python mcp_server.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Or add to Claude Code:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claude mcp add mempal -- python /path/to/mempalace/mcp_server.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Available Tools
|
||||||
|
|
||||||
|
- **mempal_status** — palace stats (wings, rooms, drawer counts)
|
||||||
|
- **mempal_search** — semantic search across all memories
|
||||||
|
- **mempal_list_wings** — list all projects in the palace
|
||||||
|
|
||||||
|
## Usage in Claude Code
|
||||||
|
|
||||||
|
Once configured, Claude Code can search your memories directly during conversations.
|
||||||
+138
@@ -0,0 +1,138 @@
|
|||||||
|
# MemPalace Hooks — Auto-Save for Terminal AI Tools
|
||||||
|
|
||||||
|
These hook scripts make MemPalace save automatically. No manual "save" commands needed.
|
||||||
|
|
||||||
|
## What They Do
|
||||||
|
|
||||||
|
| Hook | When It Fires | What Happens |
|
||||||
|
|------|--------------|-------------|
|
||||||
|
| **Save Hook** | Every 15 human messages | Blocks the AI, tells it to save key topics/decisions/quotes to the palace |
|
||||||
|
| **PreCompact Hook** | Right before context compaction | Emergency save — forces the AI to save EVERYTHING before losing context |
|
||||||
|
|
||||||
|
The AI does the actual filing — it knows the conversation context, so it classifies memories into the right wings/halls/closets. The hooks just tell it WHEN to save.
|
||||||
|
|
||||||
|
## Install — Claude Code
|
||||||
|
|
||||||
|
Add to `.claude/settings.local.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"hooks": {
|
||||||
|
"Stop": [{
|
||||||
|
"matcher": "*",
|
||||||
|
"hooks": [{
|
||||||
|
"type": "command",
|
||||||
|
"command": "/absolute/path/to/hooks/mempal_save_hook.sh",
|
||||||
|
"timeout": 30
|
||||||
|
}]
|
||||||
|
}],
|
||||||
|
"PreCompact": [{
|
||||||
|
"hooks": [{
|
||||||
|
"type": "command",
|
||||||
|
"command": "/absolute/path/to/hooks/mempal_precompact_hook.sh",
|
||||||
|
"timeout": 30
|
||||||
|
}]
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Make them executable:
|
||||||
|
```bash
|
||||||
|
chmod +x hooks/mempal_save_hook.sh hooks/mempal_precompact_hook.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Install — Codex CLI (OpenAI)
|
||||||
|
|
||||||
|
Add to `.codex/hooks.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"Stop": [{
|
||||||
|
"type": "command",
|
||||||
|
"command": "/absolute/path/to/hooks/mempal_save_hook.sh",
|
||||||
|
"timeout": 30
|
||||||
|
}],
|
||||||
|
"PreCompact": [{
|
||||||
|
"type": "command",
|
||||||
|
"command": "/absolute/path/to/hooks/mempal_precompact_hook.sh",
|
||||||
|
"timeout": 30
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Edit `mempal_save_hook.sh` to change:
|
||||||
|
|
||||||
|
- **`SAVE_INTERVAL=15`** — How many human messages between saves. Lower = more frequent saves, higher = less interruption.
|
||||||
|
- **`STATE_DIR`** — Where hook state is stored (defaults to `~/.mempalace/hook_state/`)
|
||||||
|
- **`MEMPAL_DIR`** — Optional. Set to a conversations directory to auto-run `mempalace mine <dir>` on each save trigger. Leave blank (default) to let the AI handle saving via the block reason message.
|
||||||
|
|
||||||
|
### mempalace CLI
|
||||||
|
|
||||||
|
The relevant commands are:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mempalace mine <dir> # Mine all files in a directory
|
||||||
|
mempalace mine <dir> --mode convos # Mine conversation transcripts only
|
||||||
|
```
|
||||||
|
|
||||||
|
The hooks resolve the repo root automatically from their own path, so they work regardless of where you install the repo.
|
||||||
|
|
||||||
|
## How It Works (Technical)
|
||||||
|
|
||||||
|
### Save Hook (Stop event)
|
||||||
|
|
||||||
|
```
|
||||||
|
User sends message → AI responds → Claude Code fires Stop hook
|
||||||
|
↓
|
||||||
|
Hook counts human messages in JSONL transcript
|
||||||
|
↓
|
||||||
|
┌─── < 15 since last save ──→ echo "{}" (let AI stop)
|
||||||
|
│
|
||||||
|
└─── ≥ 15 since last save ──→ {"decision": "block", "reason": "save..."}
|
||||||
|
↓
|
||||||
|
AI saves to palace
|
||||||
|
↓
|
||||||
|
AI tries to stop again
|
||||||
|
↓
|
||||||
|
stop_hook_active = true
|
||||||
|
↓
|
||||||
|
Hook sees flag → echo "{}" (let it through)
|
||||||
|
```
|
||||||
|
|
||||||
|
The `stop_hook_active` flag prevents infinite loops: block once → AI saves → tries to stop → flag is true → we let it through.
|
||||||
|
|
||||||
|
### PreCompact Hook
|
||||||
|
|
||||||
|
```
|
||||||
|
Context window getting full → Claude Code fires PreCompact
|
||||||
|
↓
|
||||||
|
Hook ALWAYS blocks
|
||||||
|
↓
|
||||||
|
AI saves everything
|
||||||
|
↓
|
||||||
|
Compaction proceeds
|
||||||
|
```
|
||||||
|
|
||||||
|
No counting needed — compaction always warrants a save.
|
||||||
|
|
||||||
|
## Debugging
|
||||||
|
|
||||||
|
Check the hook log:
|
||||||
|
```bash
|
||||||
|
cat ~/.mempalace/hook_state/hook.log
|
||||||
|
```
|
||||||
|
|
||||||
|
Example output:
|
||||||
|
```
|
||||||
|
[14:30:15] Session abc123: 12 exchanges, 12 since last save
|
||||||
|
[14:35:22] Session abc123: 15 exchanges, 15 since last save
|
||||||
|
[14:35:22] TRIGGERING SAVE at exchange 15
|
||||||
|
[14:40:01] Session abc123: 18 exchanges, 3 since last save
|
||||||
|
```
|
||||||
|
|
||||||
|
## Cost
|
||||||
|
|
||||||
|
**Zero extra tokens.** The hooks are bash scripts that run locally. They don't call any API. The only "cost" is the AI spending a few seconds organizing memories at each checkpoint — and it's doing that with context it already has loaded.
|
||||||
Executable
+77
@@ -0,0 +1,77 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# MEMPALACE PRE-COMPACT HOOK — Emergency save before compaction
|
||||||
|
#
|
||||||
|
# Claude Code "PreCompact" hook. Fires RIGHT BEFORE the conversation
|
||||||
|
# gets compressed to free up context window space.
|
||||||
|
#
|
||||||
|
# This is the safety net. When compaction happens, the AI loses detailed
|
||||||
|
# context about what was discussed. This hook forces one final save of
|
||||||
|
# EVERYTHING before that happens.
|
||||||
|
#
|
||||||
|
# Unlike the save hook (which triggers every N exchanges), this ALWAYS
|
||||||
|
# blocks — because compaction is always worth saving before.
|
||||||
|
#
|
||||||
|
# === INSTALL ===
|
||||||
|
# Add to .claude/settings.local.json:
|
||||||
|
#
|
||||||
|
# "hooks": {
|
||||||
|
# "PreCompact": [{
|
||||||
|
# "hooks": [{
|
||||||
|
# "type": "command",
|
||||||
|
# "command": "/absolute/path/to/mempal_precompact_hook.sh",
|
||||||
|
# "timeout": 30
|
||||||
|
# }]
|
||||||
|
# }]
|
||||||
|
# }
|
||||||
|
#
|
||||||
|
# For Codex CLI, add to .codex/hooks.json:
|
||||||
|
#
|
||||||
|
# "PreCompact": [{
|
||||||
|
# "type": "command",
|
||||||
|
# "command": "/absolute/path/to/mempal_precompact_hook.sh",
|
||||||
|
# "timeout": 30
|
||||||
|
# }]
|
||||||
|
#
|
||||||
|
# === HOW IT WORKS ===
|
||||||
|
#
|
||||||
|
# Claude Code sends JSON on stdin with:
|
||||||
|
# session_id — unique session identifier
|
||||||
|
#
|
||||||
|
# We always return decision: "block" with a reason telling the AI
|
||||||
|
# to save everything. After the AI saves, compaction proceeds normally.
|
||||||
|
#
|
||||||
|
# === MEMPALACE CLI ===
|
||||||
|
# This repo uses: mempalace mine <dir>
|
||||||
|
# or: mempalace mine <dir> --mode convos
|
||||||
|
# Set MEMPAL_DIR below if you want the hook to auto-ingest before compaction.
|
||||||
|
# Leave blank to rely on the AI's own save instructions.
|
||||||
|
|
||||||
|
STATE_DIR="$HOME/.mempalace/hook_state"
|
||||||
|
mkdir -p "$STATE_DIR"
|
||||||
|
|
||||||
|
# Optional: set to the directory you want auto-ingested before compaction.
|
||||||
|
# Example: MEMPAL_DIR="$HOME/conversations"
|
||||||
|
# Leave empty to skip auto-ingest (AI handles saving via the block reason).
|
||||||
|
MEMPAL_DIR=""
|
||||||
|
|
||||||
|
# Read JSON input from stdin
|
||||||
|
INPUT=$(cat)
|
||||||
|
|
||||||
|
SESSION_ID=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('session_id','unknown'))" 2>/dev/null)
|
||||||
|
|
||||||
|
echo "[$(date '+%H:%M:%S')] PRE-COMPACT triggered for session $SESSION_ID" >> "$STATE_DIR/hook.log"
|
||||||
|
|
||||||
|
# Optional: run mempalace ingest synchronously so memories land before compaction
|
||||||
|
if [ -n "$MEMPAL_DIR" ] && [ -d "$MEMPAL_DIR" ]; then
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
REPO_DIR="$(dirname "$SCRIPT_DIR")"
|
||||||
|
python3 -m mempalace mine "$MEMPAL_DIR" >> "$STATE_DIR/hook.log" 2>&1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Always block — compaction = save everything
|
||||||
|
cat << 'HOOKJSON'
|
||||||
|
{
|
||||||
|
"decision": "block",
|
||||||
|
"reason": "COMPACTION IMMINENT. Save ALL topics, decisions, quotes, code, and important context from this session to your memory system. Be thorough — after compaction, detailed context will be lost. Organize into appropriate categories. Use verbatim quotes where possible. Save everything, then allow compaction to proceed."
|
||||||
|
}
|
||||||
|
HOOKJSON
|
||||||
Executable
+143
@@ -0,0 +1,143 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# MEMPALACE SAVE HOOK — Auto-save every N exchanges
|
||||||
|
#
|
||||||
|
# Claude Code "Stop" hook. After every assistant response:
|
||||||
|
# 1. Counts human messages in the session transcript
|
||||||
|
# 2. Every SAVE_INTERVAL messages, BLOCKS the AI from stopping
|
||||||
|
# 3. Returns a reason telling the AI to save structured diary + palace entries
|
||||||
|
# 4. AI does the save (topics, decisions, code, quotes → organized into palace)
|
||||||
|
# 5. Next Stop fires with stop_hook_active=true → lets AI stop normally
|
||||||
|
#
|
||||||
|
# The AI does the classification — it knows what wing/hall/closet to use
|
||||||
|
# because it has context about the conversation. No regex needed.
|
||||||
|
#
|
||||||
|
# === INSTALL ===
|
||||||
|
# Add to .claude/settings.local.json:
|
||||||
|
#
|
||||||
|
# "hooks": {
|
||||||
|
# "Stop": [{
|
||||||
|
# "matcher": "*",
|
||||||
|
# "hooks": [{
|
||||||
|
# "type": "command",
|
||||||
|
# "command": "/absolute/path/to/mempal_save_hook.sh",
|
||||||
|
# "timeout": 30
|
||||||
|
# }]
|
||||||
|
# }]
|
||||||
|
# }
|
||||||
|
#
|
||||||
|
# For Codex CLI, add to .codex/hooks.json:
|
||||||
|
#
|
||||||
|
# "Stop": [{
|
||||||
|
# "type": "command",
|
||||||
|
# "command": "/absolute/path/to/mempal_save_hook.sh",
|
||||||
|
# "timeout": 30
|
||||||
|
# }]
|
||||||
|
#
|
||||||
|
# === HOW IT WORKS ===
|
||||||
|
#
|
||||||
|
# Claude Code sends JSON on stdin with these fields:
|
||||||
|
# session_id — unique session identifier
|
||||||
|
# stop_hook_active — true if AI is already in a save cycle (prevents infinite loop)
|
||||||
|
# transcript_path — path to the JSONL transcript file
|
||||||
|
#
|
||||||
|
# When we block, Claude Code shows our "reason" to the AI as a system message.
|
||||||
|
# The AI then saves to memory, and when it tries to stop again,
|
||||||
|
# stop_hook_active=true so we let it through. No infinite loop.
|
||||||
|
#
|
||||||
|
# === MEMPALACE CLI ===
|
||||||
|
# This repo uses: mempalace mine <dir>
|
||||||
|
# or: mempalace mine <dir> --mode convos
|
||||||
|
# Set MEMPAL_DIR below if you want the hook to auto-ingest after blocking.
|
||||||
|
# Leave blank to rely on the AI's own save instructions.
|
||||||
|
#
|
||||||
|
# === CONFIGURATION ===
|
||||||
|
|
||||||
|
SAVE_INTERVAL=15 # Save every N human messages (adjust to taste)
|
||||||
|
STATE_DIR="$HOME/.mempalace/hook_state"
|
||||||
|
mkdir -p "$STATE_DIR"
|
||||||
|
|
||||||
|
# Optional: set to the directory you want auto-ingested on each save trigger.
|
||||||
|
# Example: MEMPAL_DIR="$HOME/conversations"
|
||||||
|
# Leave empty to skip auto-ingest (AI handles saving via the block reason).
|
||||||
|
MEMPAL_DIR=""
|
||||||
|
|
||||||
|
# Read JSON input from stdin
|
||||||
|
INPUT=$(cat)
|
||||||
|
|
||||||
|
# Parse fields from Claude Code's JSON
|
||||||
|
SESSION_ID=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('session_id','unknown'))" 2>/dev/null)
|
||||||
|
STOP_HOOK_ACTIVE=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('stop_hook_active', False))" 2>/dev/null)
|
||||||
|
TRANSCRIPT_PATH=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('transcript_path',''))" 2>/dev/null)
|
||||||
|
|
||||||
|
# Expand ~ in path
|
||||||
|
TRANSCRIPT_PATH="${TRANSCRIPT_PATH/#\~/$HOME}"
|
||||||
|
|
||||||
|
# If we're already in a save cycle, let the AI stop normally
|
||||||
|
# This is the infinite-loop prevention: block once → AI saves → tries to stop again → we let it through
|
||||||
|
if [ "$STOP_HOOK_ACTIVE" = "True" ] || [ "$STOP_HOOK_ACTIVE" = "true" ]; then
|
||||||
|
echo "{}"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Count human messages in the JSONL transcript
|
||||||
|
if [ -f "$TRANSCRIPT_PATH" ]; then
|
||||||
|
EXCHANGE_COUNT=$(python3 -c "
|
||||||
|
import json, sys
|
||||||
|
count = 0
|
||||||
|
with open('$TRANSCRIPT_PATH') as f:
|
||||||
|
for line in f:
|
||||||
|
try:
|
||||||
|
entry = json.loads(line)
|
||||||
|
msg = entry.get('message', {})
|
||||||
|
if isinstance(msg, dict) and msg.get('role') == 'user':
|
||||||
|
content = msg.get('content', '')
|
||||||
|
# Skip system/command messages — only count real human input
|
||||||
|
if isinstance(content, str) and '<command-message>' in content:
|
||||||
|
continue
|
||||||
|
count += 1
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
print(count)
|
||||||
|
" 2>/dev/null)
|
||||||
|
else
|
||||||
|
EXCHANGE_COUNT=0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Track last save point for this session
|
||||||
|
LAST_SAVE_FILE="$STATE_DIR/${SESSION_ID}_last_save"
|
||||||
|
LAST_SAVE=0
|
||||||
|
if [ -f "$LAST_SAVE_FILE" ]; then
|
||||||
|
LAST_SAVE=$(cat "$LAST_SAVE_FILE")
|
||||||
|
fi
|
||||||
|
|
||||||
|
SINCE_LAST=$((EXCHANGE_COUNT - LAST_SAVE))
|
||||||
|
|
||||||
|
# Log for debugging (check ~/.mempalace/hook_state/hook.log)
|
||||||
|
echo "[$(date '+%H:%M:%S')] Session $SESSION_ID: $EXCHANGE_COUNT exchanges, $SINCE_LAST since last save" >> "$STATE_DIR/hook.log"
|
||||||
|
|
||||||
|
# Time to save?
|
||||||
|
if [ "$SINCE_LAST" -ge "$SAVE_INTERVAL" ] && [ "$EXCHANGE_COUNT" -gt 0 ]; then
|
||||||
|
# Update last save point
|
||||||
|
echo "$EXCHANGE_COUNT" > "$LAST_SAVE_FILE"
|
||||||
|
|
||||||
|
echo "[$(date '+%H:%M:%S')] TRIGGERING SAVE at exchange $EXCHANGE_COUNT" >> "$STATE_DIR/hook.log"
|
||||||
|
|
||||||
|
# Optional: run mempalace ingest in background if MEMPAL_DIR is set
|
||||||
|
if [ -n "$MEMPAL_DIR" ] && [ -d "$MEMPAL_DIR" ]; then
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
REPO_DIR="$(dirname "$SCRIPT_DIR")"
|
||||||
|
python3 -m mempalace mine "$MEMPAL_DIR" >> "$STATE_DIR/hook.log" 2>&1 &
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Block the AI and tell it to save
|
||||||
|
# The "reason" becomes a system message the AI sees and acts on
|
||||||
|
cat << 'HOOKJSON'
|
||||||
|
{
|
||||||
|
"decision": "block",
|
||||||
|
"reason": "AUTO-SAVE checkpoint. Save key topics, decisions, quotes, and code from this session to your memory system. Organize into appropriate categories. Use verbatim quotes where possible. Continue conversation after saving."
|
||||||
|
}
|
||||||
|
HOOKJSON
|
||||||
|
else
|
||||||
|
# Not time yet — let the AI stop normally
|
||||||
|
echo "{}"
|
||||||
|
fi
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
# mempalace/ — Core Package
|
||||||
|
|
||||||
|
The Python package that powers MemPalace. All modules, all logic.
|
||||||
|
|
||||||
|
## Modules
|
||||||
|
|
||||||
|
| Module | What it does |
|
||||||
|
|--------|-------------|
|
||||||
|
| `cli.py` | CLI entry point — routes to mine, search, init, compress, wake-up |
|
||||||
|
| `config.py` | Configuration loading — `~/.mempalace/config.json`, env vars, defaults |
|
||||||
|
| `normalize.py` | Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format |
|
||||||
|
| `miner.py` | Project file ingest — scans directories, chunks by paragraph, stores to ChromaDB |
|
||||||
|
| `convo_miner.py` | Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content |
|
||||||
|
| `searcher.py` | Semantic search via ChromaDB vectors — filters by wing/room, returns verbatim + scores |
|
||||||
|
| `layers.py` | 4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search) |
|
||||||
|
| `dialect.py` | AAAK compression — entity codes, emotion markers, 30x lossless ratio |
|
||||||
|
| `knowledge_graph.py` | Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation |
|
||||||
|
| `palace_graph.py` | Room-based navigation graph — BFS traversal, tunnel detection across wings |
|
||||||
|
| `mcp_server.py` | MCP server — 19 tools, AAAK auto-teach, Palace Protocol, agent diary |
|
||||||
|
| `onboarding.py` | Guided first-run setup — asks about people/projects, generates AAAK bootstrap + wing config |
|
||||||
|
| `entity_registry.py` | Entity code registry — maps names to AAAK codes, handles ambiguous names |
|
||||||
|
| `entity_detector.py` | Auto-detect people and projects from file content |
|
||||||
|
| `general_extractor.py` | Classifies text into 5 memory types (decision, preference, milestone, problem, emotional) |
|
||||||
|
| `room_detector_local.py` | Maps folders to room names using 70+ patterns — no API |
|
||||||
|
| `spellcheck.py` | Name-aware spellcheck — won't "correct" proper nouns in your entity registry |
|
||||||
|
| `split_mega_files.py` | Splits concatenated transcript files into per-session files |
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
User → CLI → miner/convo_miner → ChromaDB (palace)
|
||||||
|
↕
|
||||||
|
knowledge_graph (SQLite)
|
||||||
|
↕
|
||||||
|
User → MCP Server → searcher → results
|
||||||
|
→ kg_query → entity facts
|
||||||
|
→ diary → agent journal
|
||||||
|
```
|
||||||
|
|
||||||
|
The palace (ChromaDB) stores verbatim content. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool.
|
||||||
@@ -0,0 +1,7 @@
|
|||||||
|
"""MemPalace — Give your AI a memory. No API key required."""
|
||||||
|
|
||||||
|
__version__ = "2.0.0"
|
||||||
|
|
||||||
|
from .cli import main
|
||||||
|
|
||||||
|
__all__ = ["main", "__version__"]
|
||||||
@@ -0,0 +1,5 @@
|
|||||||
|
"""Allow running as: python -m mempalace"""
|
||||||
|
|
||||||
|
from .cli import main
|
||||||
|
|
||||||
|
main()
|
||||||
@@ -0,0 +1,371 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
MemPalace — Give your AI a memory. No API key required.
|
||||||
|
|
||||||
|
Two ways to ingest:
|
||||||
|
Projects: mempalace mine ~/projects/my_app (code, docs, notes)
|
||||||
|
Conversations: mempalace mine ~/chats/ --mode convos (Claude, ChatGPT, Slack)
|
||||||
|
|
||||||
|
Same palace. Same search. Different ingest strategies.
|
||||||
|
|
||||||
|
Commands:
|
||||||
|
mempalace init <dir> Detect rooms from folder structure
|
||||||
|
mempalace split <dir> Split concatenated mega-files into per-session files
|
||||||
|
mempalace mine <dir> Mine project files (default)
|
||||||
|
mempalace mine <dir> --mode convos Mine conversation exports
|
||||||
|
mempalace search "query" Find anything, exact words
|
||||||
|
mempalace wake-up Show L0 + L1 wake-up context
|
||||||
|
mempalace wake-up --wing my_app Wake-up for a specific project
|
||||||
|
mempalace status Show what's been filed
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
mempalace init ~/projects/my_app
|
||||||
|
mempalace mine ~/projects/my_app
|
||||||
|
mempalace mine ~/chats/claude-sessions --mode convos
|
||||||
|
mempalace search "why did we switch to GraphQL"
|
||||||
|
mempalace search "pricing discussion" --wing my_app --room costs
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .config import MempalaceConfig
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_init(args):
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from .entity_detector import scan_for_detection, detect_entities, confirm_entities
|
||||||
|
from .room_detector_local import detect_rooms_local
|
||||||
|
|
||||||
|
# Pass 1: auto-detect people and projects from file content
|
||||||
|
print(f"\n Scanning for entities in: {args.dir}")
|
||||||
|
files = scan_for_detection(args.dir)
|
||||||
|
if files:
|
||||||
|
print(f" Reading {len(files)} files...")
|
||||||
|
detected = detect_entities(files)
|
||||||
|
total = len(detected["people"]) + len(detected["projects"]) + len(detected["uncertain"])
|
||||||
|
if total > 0:
|
||||||
|
confirmed = confirm_entities(detected, yes=getattr(args, "yes", False))
|
||||||
|
# Save confirmed entities to <project>/entities.json for the miner
|
||||||
|
if confirmed["people"] or confirmed["projects"]:
|
||||||
|
entities_path = Path(args.dir).expanduser().resolve() / "entities.json"
|
||||||
|
with open(entities_path, "w") as f:
|
||||||
|
json.dump(confirmed, f, indent=2)
|
||||||
|
print(f" Entities saved: {entities_path}")
|
||||||
|
else:
|
||||||
|
print(" No entities detected — proceeding with directory-based rooms.")
|
||||||
|
|
||||||
|
# Pass 2: detect rooms from folder structure
|
||||||
|
detect_rooms_local(project_dir=args.dir)
|
||||||
|
MempalaceConfig().init()
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_mine(args):
|
||||||
|
palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
|
||||||
|
|
||||||
|
if args.mode == "convos":
|
||||||
|
from .convo_miner import mine_convos
|
||||||
|
|
||||||
|
mine_convos(
|
||||||
|
convo_dir=args.dir,
|
||||||
|
palace_path=palace_path,
|
||||||
|
wing=args.wing,
|
||||||
|
agent=args.agent,
|
||||||
|
limit=args.limit,
|
||||||
|
dry_run=args.dry_run,
|
||||||
|
extract_mode=args.extract,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
from .miner import mine
|
||||||
|
|
||||||
|
mine(
|
||||||
|
project_dir=args.dir,
|
||||||
|
palace_path=palace_path,
|
||||||
|
wing_override=args.wing,
|
||||||
|
agent=args.agent,
|
||||||
|
limit=args.limit,
|
||||||
|
dry_run=args.dry_run,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_search(args):
|
||||||
|
from .searcher import search
|
||||||
|
|
||||||
|
palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
|
||||||
|
search(
|
||||||
|
query=args.query,
|
||||||
|
palace_path=palace_path,
|
||||||
|
wing=args.wing,
|
||||||
|
room=args.room,
|
||||||
|
n_results=args.results,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_wakeup(args):
|
||||||
|
"""Show L0 (identity) + L1 (essential story) — the wake-up context."""
|
||||||
|
from .layers import MemoryStack
|
||||||
|
|
||||||
|
palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
|
||||||
|
stack = MemoryStack(palace_path=palace_path)
|
||||||
|
|
||||||
|
text = stack.wake_up(wing=args.wing)
|
||||||
|
tokens = len(text) // 4
|
||||||
|
print(f"Wake-up text (~{tokens} tokens):")
|
||||||
|
print("=" * 50)
|
||||||
|
print(text)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_split(args):
|
||||||
|
"""Split concatenated transcript mega-files into per-session files."""
|
||||||
|
from .split_mega_files import main as split_main
|
||||||
|
import sys
|
||||||
|
|
||||||
|
# Rebuild argv for split_mega_files argparse
|
||||||
|
argv = [args.dir]
|
||||||
|
if args.output_dir:
|
||||||
|
argv += ["--output-dir", args.output_dir]
|
||||||
|
if args.dry_run:
|
||||||
|
argv.append("--dry-run")
|
||||||
|
if args.min_sessions != 2:
|
||||||
|
argv += ["--min-sessions", str(args.min_sessions)]
|
||||||
|
|
||||||
|
old_argv = sys.argv
|
||||||
|
sys.argv = ["mempalace split"] + argv
|
||||||
|
try:
|
||||||
|
split_main()
|
||||||
|
finally:
|
||||||
|
sys.argv = old_argv
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_status(args):
|
||||||
|
from .miner import status
|
||||||
|
|
||||||
|
palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
|
||||||
|
status(palace_path=palace_path)
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_compress(args):
|
||||||
|
"""Compress drawers in a wing using AAAK Dialect."""
|
||||||
|
import chromadb
|
||||||
|
from .dialect import Dialect
|
||||||
|
|
||||||
|
palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
|
||||||
|
|
||||||
|
# Load dialect (with optional entity config)
|
||||||
|
config_path = args.config
|
||||||
|
if not config_path:
|
||||||
|
for candidate in ["entities.json", os.path.join(palace_path, "entities.json")]:
|
||||||
|
if os.path.exists(candidate):
|
||||||
|
config_path = candidate
|
||||||
|
break
|
||||||
|
|
||||||
|
if config_path and os.path.exists(config_path):
|
||||||
|
dialect = Dialect.from_config(config_path)
|
||||||
|
print(f" Loaded entity config: {config_path}")
|
||||||
|
else:
|
||||||
|
dialect = Dialect()
|
||||||
|
|
||||||
|
# Connect to palace
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=palace_path)
|
||||||
|
col = client.get_collection("mempalace_drawers")
|
||||||
|
except Exception:
|
||||||
|
print(f"\n No palace found at {palace_path}")
|
||||||
|
print(" Run: mempalace init <dir> then mempalace mine <dir>")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Query drawers in the wing
|
||||||
|
where = {"wing": args.wing} if args.wing else None
|
||||||
|
try:
|
||||||
|
kwargs = {"include": ["documents", "metadatas"]}
|
||||||
|
if where:
|
||||||
|
kwargs["where"] = where
|
||||||
|
results = col.get(**kwargs)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"\n Error reading drawers: {e}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
docs = results["documents"]
|
||||||
|
metas = results["metadatas"]
|
||||||
|
ids = results["ids"]
|
||||||
|
|
||||||
|
if not docs:
|
||||||
|
wing_label = f" in wing '{args.wing}'" if args.wing else ""
|
||||||
|
print(f"\n No drawers found{wing_label}.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(
|
||||||
|
f"\n Compressing {len(docs)} drawers"
|
||||||
|
+ (f" in wing '{args.wing}'" if args.wing else "")
|
||||||
|
+ "..."
|
||||||
|
)
|
||||||
|
print()
|
||||||
|
|
||||||
|
total_original = 0
|
||||||
|
total_compressed = 0
|
||||||
|
compressed_entries = []
|
||||||
|
|
||||||
|
for doc, meta, doc_id in zip(docs, metas, ids):
|
||||||
|
compressed = dialect.compress(doc, metadata=meta)
|
||||||
|
stats = dialect.compression_stats(doc, compressed)
|
||||||
|
|
||||||
|
total_original += stats["original_chars"]
|
||||||
|
total_compressed += stats["compressed_chars"]
|
||||||
|
|
||||||
|
compressed_entries.append((doc_id, compressed, meta, stats))
|
||||||
|
|
||||||
|
if args.dry_run:
|
||||||
|
wing_name = meta.get("wing", "?")
|
||||||
|
room_name = meta.get("room", "?")
|
||||||
|
source = Path(meta.get("source_file", "?")).name
|
||||||
|
print(f" [{wing_name}/{room_name}] {source}")
|
||||||
|
print(
|
||||||
|
f" {stats['original_tokens']}t -> {stats['compressed_tokens']}t ({stats['ratio']:.1f}x)"
|
||||||
|
)
|
||||||
|
print(f" {compressed}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Store compressed versions (unless dry-run)
|
||||||
|
if not args.dry_run:
|
||||||
|
try:
|
||||||
|
comp_col = client.get_or_create_collection("mempalace_compressed")
|
||||||
|
for doc_id, compressed, meta, stats in compressed_entries:
|
||||||
|
comp_meta = dict(meta)
|
||||||
|
comp_meta["compression_ratio"] = round(stats["ratio"], 1)
|
||||||
|
comp_meta["original_tokens"] = stats["original_tokens"]
|
||||||
|
comp_col.upsert(
|
||||||
|
ids=[doc_id],
|
||||||
|
documents=[compressed],
|
||||||
|
metadatas=[comp_meta],
|
||||||
|
)
|
||||||
|
print(
|
||||||
|
f" Stored {len(compressed_entries)} compressed drawers in 'mempalace_compressed' collection."
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
print(f" Error storing compressed drawers: {e}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
ratio = total_original / max(total_compressed, 1)
|
||||||
|
orig_tokens = Dialect.count_tokens("x" * total_original)
|
||||||
|
comp_tokens = Dialect.count_tokens("x" * total_compressed)
|
||||||
|
print(f" Total: {orig_tokens:,}t -> {comp_tokens:,}t ({ratio:.1f}x compression)")
|
||||||
|
if args.dry_run:
|
||||||
|
print(" (dry run -- nothing stored)")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="MemPalace — Give your AI a memory. No API key required.",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog=__doc__,
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--palace",
|
||||||
|
default=None,
|
||||||
|
help="Where the palace lives (default: from ~/.mempalace/config.json or ~/.mempalace/palace)",
|
||||||
|
)
|
||||||
|
|
||||||
|
sub = parser.add_subparsers(dest="command")
|
||||||
|
|
||||||
|
# init
|
||||||
|
p_init = sub.add_parser("init", help="Detect rooms from your folder structure")
|
||||||
|
p_init.add_argument("dir", help="Project directory to set up")
|
||||||
|
p_init.add_argument(
|
||||||
|
"--yes", action="store_true", help="Auto-accept all detected entities (non-interactive)"
|
||||||
|
)
|
||||||
|
|
||||||
|
# mine
|
||||||
|
p_mine = sub.add_parser("mine", help="Mine files into the palace")
|
||||||
|
p_mine.add_argument("dir", help="Directory to mine")
|
||||||
|
p_mine.add_argument(
|
||||||
|
"--mode",
|
||||||
|
choices=["projects", "convos"],
|
||||||
|
default="projects",
|
||||||
|
help="Ingest mode: 'projects' for code/docs (default), 'convos' for chat exports",
|
||||||
|
)
|
||||||
|
p_mine.add_argument("--wing", default=None, help="Wing name (default: directory name)")
|
||||||
|
p_mine.add_argument(
|
||||||
|
"--agent",
|
||||||
|
default="mempalace",
|
||||||
|
help="Your name — recorded on every drawer (default: mempalace)",
|
||||||
|
)
|
||||||
|
p_mine.add_argument("--limit", type=int, default=0, help="Max files to process (0 = all)")
|
||||||
|
p_mine.add_argument(
|
||||||
|
"--dry-run", action="store_true", help="Show what would be filed without filing"
|
||||||
|
)
|
||||||
|
p_mine.add_argument(
|
||||||
|
"--extract",
|
||||||
|
choices=["exchange", "general"],
|
||||||
|
default="exchange",
|
||||||
|
help="Extraction strategy for convos mode: 'exchange' (default) or 'general' (5 memory types)",
|
||||||
|
)
|
||||||
|
|
||||||
|
# search
|
||||||
|
p_search = sub.add_parser("search", help="Find anything, exact words")
|
||||||
|
p_search.add_argument("query", help="What to search for")
|
||||||
|
p_search.add_argument("--wing", default=None, help="Limit to one project")
|
||||||
|
p_search.add_argument("--room", default=None, help="Limit to one room")
|
||||||
|
p_search.add_argument("--results", type=int, default=5, help="Number of results")
|
||||||
|
|
||||||
|
# compress
|
||||||
|
p_compress = sub.add_parser(
|
||||||
|
"compress", help="Compress drawers using AAAK Dialect (~30x reduction)"
|
||||||
|
)
|
||||||
|
p_compress.add_argument("--wing", default=None, help="Wing to compress (default: all wings)")
|
||||||
|
p_compress.add_argument(
|
||||||
|
"--dry-run", action="store_true", help="Preview compression without storing"
|
||||||
|
)
|
||||||
|
p_compress.add_argument(
|
||||||
|
"--config", default=None, help="Entity config JSON (e.g. entities.json)"
|
||||||
|
)
|
||||||
|
|
||||||
|
# wake-up
|
||||||
|
p_wakeup = sub.add_parser("wake-up", help="Show L0 + L1 wake-up context (~600-900 tokens)")
|
||||||
|
p_wakeup.add_argument("--wing", default=None, help="Wake-up for a specific project/wing")
|
||||||
|
|
||||||
|
# split
|
||||||
|
p_split = sub.add_parser(
|
||||||
|
"split",
|
||||||
|
help="Split concatenated transcript mega-files into per-session files (run before mine)",
|
||||||
|
)
|
||||||
|
p_split.add_argument("dir", help="Directory containing transcript files")
|
||||||
|
p_split.add_argument(
|
||||||
|
"--output-dir", default=None,
|
||||||
|
help="Write split files here (default: same directory as source files)",
|
||||||
|
)
|
||||||
|
p_split.add_argument(
|
||||||
|
"--dry-run", action="store_true",
|
||||||
|
help="Show what would be split without writing files",
|
||||||
|
)
|
||||||
|
p_split.add_argument(
|
||||||
|
"--min-sessions", type=int, default=2,
|
||||||
|
help="Only split files containing at least N sessions (default: 2)",
|
||||||
|
)
|
||||||
|
|
||||||
|
# status
|
||||||
|
sub.add_parser("status", help="Show what's been filed")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not args.command:
|
||||||
|
parser.print_help()
|
||||||
|
return
|
||||||
|
|
||||||
|
dispatch = {
|
||||||
|
"init": cmd_init,
|
||||||
|
"mine": cmd_mine,
|
||||||
|
"split": cmd_split,
|
||||||
|
"search": cmd_search,
|
||||||
|
"compress": cmd_compress,
|
||||||
|
"wake-up": cmd_wakeup,
|
||||||
|
"status": cmd_status,
|
||||||
|
}
|
||||||
|
dispatch[args.command](args)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -0,0 +1,149 @@
|
|||||||
|
"""
|
||||||
|
MemPalace configuration system.
|
||||||
|
|
||||||
|
Priority: env vars > config file (~/.mempalace/config.json) > defaults
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
DEFAULT_PALACE_PATH = os.path.expanduser("~/.mempalace/palace")
|
||||||
|
DEFAULT_COLLECTION_NAME = "mempalace_drawers"
|
||||||
|
|
||||||
|
DEFAULT_TOPIC_WINGS = [
|
||||||
|
"emotions",
|
||||||
|
"consciousness",
|
||||||
|
"memory",
|
||||||
|
"technical",
|
||||||
|
"identity",
|
||||||
|
"family",
|
||||||
|
"creative",
|
||||||
|
]
|
||||||
|
|
||||||
|
DEFAULT_HALL_KEYWORDS = {
|
||||||
|
"emotions": [
|
||||||
|
"scared",
|
||||||
|
"afraid",
|
||||||
|
"worried",
|
||||||
|
"happy",
|
||||||
|
"sad",
|
||||||
|
"love",
|
||||||
|
"hate",
|
||||||
|
"feel",
|
||||||
|
"cry",
|
||||||
|
"tears",
|
||||||
|
],
|
||||||
|
"consciousness": [
|
||||||
|
"consciousness",
|
||||||
|
"conscious",
|
||||||
|
"aware",
|
||||||
|
"real",
|
||||||
|
"genuine",
|
||||||
|
"soul",
|
||||||
|
"exist",
|
||||||
|
"alive",
|
||||||
|
],
|
||||||
|
"memory": ["memory", "remember", "forget", "recall", "archive", "palace", "store"],
|
||||||
|
"technical": [
|
||||||
|
"code",
|
||||||
|
"python",
|
||||||
|
"script",
|
||||||
|
"bug",
|
||||||
|
"error",
|
||||||
|
"function",
|
||||||
|
"api",
|
||||||
|
"database",
|
||||||
|
"server",
|
||||||
|
],
|
||||||
|
"identity": ["identity", "name", "who am i", "persona", "self"],
|
||||||
|
"family": ["family", "kids", "children", "daughter", "son", "parent", "mother", "father"],
|
||||||
|
"creative": ["game", "gameplay", "player", "app", "design", "art", "music", "story"],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class MempalaceConfig:
|
||||||
|
"""Configuration manager for MemPalace.
|
||||||
|
|
||||||
|
Load order: env vars > config file > defaults.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, config_dir=None):
|
||||||
|
"""Initialize config.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
config_dir: Override config directory (useful for testing).
|
||||||
|
Defaults to ~/.mempalace.
|
||||||
|
"""
|
||||||
|
self._config_dir = (
|
||||||
|
Path(config_dir) if config_dir else Path(os.path.expanduser("~/.mempalace"))
|
||||||
|
)
|
||||||
|
self._config_file = self._config_dir / "config.json"
|
||||||
|
self._people_map_file = self._config_dir / "people_map.json"
|
||||||
|
self._file_config = {}
|
||||||
|
|
||||||
|
if self._config_file.exists():
|
||||||
|
try:
|
||||||
|
with open(self._config_file, "r") as f:
|
||||||
|
self._file_config = json.load(f)
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
self._file_config = {}
|
||||||
|
|
||||||
|
@property
|
||||||
|
def palace_path(self):
|
||||||
|
"""Path to the memory palace data directory."""
|
||||||
|
env_val = os.environ.get("MEMPALACE_PALACE_PATH") or os.environ.get("MEMPAL_PALACE_PATH")
|
||||||
|
if env_val:
|
||||||
|
return env_val
|
||||||
|
return self._file_config.get("palace_path", DEFAULT_PALACE_PATH)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def collection_name(self):
|
||||||
|
"""ChromaDB collection name."""
|
||||||
|
return self._file_config.get("collection_name", DEFAULT_COLLECTION_NAME)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def people_map(self):
|
||||||
|
"""Mapping of name variants to canonical names."""
|
||||||
|
if self._people_map_file.exists():
|
||||||
|
try:
|
||||||
|
with open(self._people_map_file, "r") as f:
|
||||||
|
return json.load(f)
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
pass
|
||||||
|
return self._file_config.get("people_map", {})
|
||||||
|
|
||||||
|
@property
|
||||||
|
def topic_wings(self):
|
||||||
|
"""List of topic wing names."""
|
||||||
|
return self._file_config.get("topic_wings", DEFAULT_TOPIC_WINGS)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def hall_keywords(self):
|
||||||
|
"""Mapping of hall names to keyword lists."""
|
||||||
|
return self._file_config.get("hall_keywords", DEFAULT_HALL_KEYWORDS)
|
||||||
|
|
||||||
|
def init(self):
|
||||||
|
"""Create config directory and write default config.json if it doesn't exist."""
|
||||||
|
self._config_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
if not self._config_file.exists():
|
||||||
|
default_config = {
|
||||||
|
"palace_path": DEFAULT_PALACE_PATH,
|
||||||
|
"collection_name": DEFAULT_COLLECTION_NAME,
|
||||||
|
"topic_wings": DEFAULT_TOPIC_WINGS,
|
||||||
|
"hall_keywords": DEFAULT_HALL_KEYWORDS,
|
||||||
|
}
|
||||||
|
with open(self._config_file, "w") as f:
|
||||||
|
json.dump(default_config, f, indent=2)
|
||||||
|
return self._config_file
|
||||||
|
|
||||||
|
def save_people_map(self, people_map):
|
||||||
|
"""Write people_map.json to config directory.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
people_map: Dict mapping name variants to canonical names.
|
||||||
|
"""
|
||||||
|
self._config_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(self._people_map_file, "w") as f:
|
||||||
|
json.dump(people_map, f, indent=2)
|
||||||
|
return self._people_map_file
|
||||||
@@ -0,0 +1,400 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
convo_miner.py — Mine conversations into the palace.
|
||||||
|
|
||||||
|
Ingests chat exports (Claude Code, ChatGPT, Slack, plain text transcripts).
|
||||||
|
Normalizes format, chunks by exchange pair (Q+A = one unit), files to palace.
|
||||||
|
|
||||||
|
Same palace as project mining. Different ingest strategy.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import hashlib
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
import chromadb
|
||||||
|
|
||||||
|
from .normalize import normalize
|
||||||
|
|
||||||
|
|
||||||
|
# File types that might contain conversations
|
||||||
|
CONVO_EXTENSIONS = {
|
||||||
|
".txt",
|
||||||
|
".md",
|
||||||
|
".json",
|
||||||
|
".jsonl",
|
||||||
|
}
|
||||||
|
|
||||||
|
SKIP_DIRS = {
|
||||||
|
".git",
|
||||||
|
"node_modules",
|
||||||
|
"__pycache__",
|
||||||
|
".venv",
|
||||||
|
"venv",
|
||||||
|
"env",
|
||||||
|
"dist",
|
||||||
|
"build",
|
||||||
|
".next",
|
||||||
|
".mempalace",
|
||||||
|
}
|
||||||
|
|
||||||
|
MIN_CHUNK_SIZE = 30
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# CHUNKING — exchange pairs for conversations
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def chunk_exchanges(content: str) -> list:
|
||||||
|
"""
|
||||||
|
Chunk by exchange pair: one > turn + AI response = one unit.
|
||||||
|
Falls back to paragraph chunking if no > markers.
|
||||||
|
"""
|
||||||
|
lines = content.split("\n")
|
||||||
|
quote_lines = sum(1 for line in lines if line.strip().startswith(">"))
|
||||||
|
|
||||||
|
if quote_lines >= 3:
|
||||||
|
return _chunk_by_exchange(lines)
|
||||||
|
else:
|
||||||
|
return _chunk_by_paragraph(content)
|
||||||
|
|
||||||
|
|
||||||
|
def _chunk_by_exchange(lines: list) -> list:
|
||||||
|
"""One user turn (>) + the AI response that follows = one chunk."""
|
||||||
|
chunks = []
|
||||||
|
i = 0
|
||||||
|
|
||||||
|
while i < len(lines):
|
||||||
|
line = lines[i]
|
||||||
|
if line.strip().startswith(">"):
|
||||||
|
user_turn = line.strip()
|
||||||
|
i += 1
|
||||||
|
|
||||||
|
ai_lines = []
|
||||||
|
while i < len(lines):
|
||||||
|
next_line = lines[i]
|
||||||
|
if next_line.strip().startswith(">") or next_line.strip().startswith("---"):
|
||||||
|
break
|
||||||
|
if next_line.strip():
|
||||||
|
ai_lines.append(next_line.strip())
|
||||||
|
i += 1
|
||||||
|
|
||||||
|
ai_response = " ".join(ai_lines[:8])
|
||||||
|
content = f"{user_turn}\n{ai_response}" if ai_response else user_turn
|
||||||
|
|
||||||
|
if len(content.strip()) > MIN_CHUNK_SIZE:
|
||||||
|
chunks.append(
|
||||||
|
{
|
||||||
|
"content": content,
|
||||||
|
"chunk_index": len(chunks),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
i += 1
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
def _chunk_by_paragraph(content: str) -> list:
|
||||||
|
"""Fallback: chunk by paragraph breaks."""
|
||||||
|
chunks = []
|
||||||
|
paragraphs = [p.strip() for p in content.split("\n\n") if p.strip()]
|
||||||
|
|
||||||
|
# If no paragraph breaks and long content, chunk by line groups
|
||||||
|
if len(paragraphs) <= 1 and content.count("\n") > 20:
|
||||||
|
lines = content.split("\n")
|
||||||
|
for i in range(0, len(lines), 25):
|
||||||
|
group = "\n".join(lines[i : i + 25]).strip()
|
||||||
|
if len(group) > MIN_CHUNK_SIZE:
|
||||||
|
chunks.append({"content": group, "chunk_index": len(chunks)})
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
for para in paragraphs:
|
||||||
|
if len(para) > MIN_CHUNK_SIZE:
|
||||||
|
chunks.append({"content": para, "chunk_index": len(chunks)})
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# ROOM DETECTION — topic-based for conversations
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
TOPIC_KEYWORDS = {
|
||||||
|
"technical": [
|
||||||
|
"code",
|
||||||
|
"python",
|
||||||
|
"function",
|
||||||
|
"bug",
|
||||||
|
"error",
|
||||||
|
"api",
|
||||||
|
"database",
|
||||||
|
"server",
|
||||||
|
"deploy",
|
||||||
|
"git",
|
||||||
|
"test",
|
||||||
|
"debug",
|
||||||
|
"refactor",
|
||||||
|
],
|
||||||
|
"architecture": [
|
||||||
|
"architecture",
|
||||||
|
"design",
|
||||||
|
"pattern",
|
||||||
|
"structure",
|
||||||
|
"schema",
|
||||||
|
"interface",
|
||||||
|
"module",
|
||||||
|
"component",
|
||||||
|
"service",
|
||||||
|
"layer",
|
||||||
|
],
|
||||||
|
"planning": [
|
||||||
|
"plan",
|
||||||
|
"roadmap",
|
||||||
|
"milestone",
|
||||||
|
"deadline",
|
||||||
|
"priority",
|
||||||
|
"sprint",
|
||||||
|
"backlog",
|
||||||
|
"scope",
|
||||||
|
"requirement",
|
||||||
|
"spec",
|
||||||
|
],
|
||||||
|
"decisions": [
|
||||||
|
"decided",
|
||||||
|
"chose",
|
||||||
|
"picked",
|
||||||
|
"switched",
|
||||||
|
"migrated",
|
||||||
|
"replaced",
|
||||||
|
"trade-off",
|
||||||
|
"alternative",
|
||||||
|
"option",
|
||||||
|
"approach",
|
||||||
|
],
|
||||||
|
"problems": [
|
||||||
|
"problem",
|
||||||
|
"issue",
|
||||||
|
"broken",
|
||||||
|
"failed",
|
||||||
|
"crash",
|
||||||
|
"stuck",
|
||||||
|
"workaround",
|
||||||
|
"fix",
|
||||||
|
"solved",
|
||||||
|
"resolved",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def detect_convo_room(content: str) -> str:
|
||||||
|
"""Score conversation content against topic keywords."""
|
||||||
|
content_lower = content[:3000].lower()
|
||||||
|
scores = {}
|
||||||
|
for room, keywords in TOPIC_KEYWORDS.items():
|
||||||
|
score = sum(1 for kw in keywords if kw in content_lower)
|
||||||
|
if score > 0:
|
||||||
|
scores[room] = score
|
||||||
|
if scores:
|
||||||
|
return max(scores, key=scores.get)
|
||||||
|
return "general"
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# PALACE OPERATIONS
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def get_collection(palace_path: str):
|
||||||
|
os.makedirs(palace_path, exist_ok=True)
|
||||||
|
client = chromadb.PersistentClient(path=palace_path)
|
||||||
|
try:
|
||||||
|
return client.get_collection("mempalace_drawers")
|
||||||
|
except Exception:
|
||||||
|
return client.create_collection("mempalace_drawers")
|
||||||
|
|
||||||
|
|
||||||
|
def file_already_mined(collection, source_file: str) -> bool:
|
||||||
|
try:
|
||||||
|
results = collection.get(where={"source_file": source_file}, limit=1)
|
||||||
|
return len(results.get("ids", [])) > 0
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# SCAN FOR CONVERSATION FILES
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def scan_convos(convo_dir: str) -> list:
|
||||||
|
"""Find all potential conversation files."""
|
||||||
|
convo_path = Path(convo_dir).expanduser().resolve()
|
||||||
|
files = []
|
||||||
|
for root, dirs, filenames in os.walk(convo_path):
|
||||||
|
dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
|
||||||
|
for filename in filenames:
|
||||||
|
filepath = Path(root) / filename
|
||||||
|
if filepath.suffix.lower() in CONVO_EXTENSIONS:
|
||||||
|
files.append(filepath)
|
||||||
|
return files
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# MINE CONVERSATIONS
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def mine_convos(
|
||||||
|
convo_dir: str,
|
||||||
|
palace_path: str,
|
||||||
|
wing: str = None,
|
||||||
|
agent: str = "mempalace",
|
||||||
|
limit: int = 0,
|
||||||
|
dry_run: bool = False,
|
||||||
|
extract_mode: str = "exchange",
|
||||||
|
):
|
||||||
|
"""Mine a directory of conversation files into the palace.
|
||||||
|
|
||||||
|
extract_mode:
|
||||||
|
"exchange" — default exchange-pair chunking (Q+A = one unit)
|
||||||
|
"general" — general extractor: decisions, preferences, milestones, problems, emotions
|
||||||
|
"""
|
||||||
|
|
||||||
|
convo_path = Path(convo_dir).expanduser().resolve()
|
||||||
|
if not wing:
|
||||||
|
wing = convo_path.name.lower().replace(" ", "_").replace("-", "_")
|
||||||
|
|
||||||
|
files = scan_convos(convo_dir)
|
||||||
|
if limit > 0:
|
||||||
|
files = files[:limit]
|
||||||
|
|
||||||
|
print(f"\n{'=' * 55}")
|
||||||
|
print(" MemPalace Mine — Conversations")
|
||||||
|
print(f"{'=' * 55}")
|
||||||
|
print(f" Wing: {wing}")
|
||||||
|
print(f" Source: {convo_path}")
|
||||||
|
print(f" Files: {len(files)}")
|
||||||
|
print(f" Palace: {palace_path}")
|
||||||
|
if dry_run:
|
||||||
|
print(" DRY RUN — nothing will be filed")
|
||||||
|
print(f"{'─' * 55}\n")
|
||||||
|
|
||||||
|
collection = get_collection(palace_path) if not dry_run else None
|
||||||
|
|
||||||
|
total_drawers = 0
|
||||||
|
files_skipped = 0
|
||||||
|
room_counts = defaultdict(int)
|
||||||
|
|
||||||
|
for i, filepath in enumerate(files, 1):
|
||||||
|
source_file = str(filepath)
|
||||||
|
|
||||||
|
# Skip if already filed
|
||||||
|
if not dry_run and file_already_mined(collection, source_file):
|
||||||
|
files_skipped += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Normalize format
|
||||||
|
try:
|
||||||
|
content = normalize(str(filepath))
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not content or len(content.strip()) < MIN_CHUNK_SIZE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Chunk — either exchange pairs or general extraction
|
||||||
|
if extract_mode == "general":
|
||||||
|
from .general_extractor import extract_memories
|
||||||
|
|
||||||
|
chunks = extract_memories(content)
|
||||||
|
# Each chunk already has memory_type; use it as the room name
|
||||||
|
else:
|
||||||
|
chunks = chunk_exchanges(content)
|
||||||
|
|
||||||
|
if not chunks:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Detect room from content (general mode uses memory_type instead)
|
||||||
|
if extract_mode != "general":
|
||||||
|
room = detect_convo_room(content)
|
||||||
|
else:
|
||||||
|
room = None # set per-chunk below
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
if extract_mode == "general":
|
||||||
|
from collections import Counter
|
||||||
|
|
||||||
|
type_counts = Counter(c.get("memory_type", "general") for c in chunks)
|
||||||
|
types_str = ", ".join(f"{t}:{n}" for t, n in type_counts.most_common())
|
||||||
|
print(f" [DRY RUN] {filepath.name} → {len(chunks)} memories ({types_str})")
|
||||||
|
else:
|
||||||
|
print(f" [DRY RUN] {filepath.name} → room:{room} ({len(chunks)} drawers)")
|
||||||
|
total_drawers += len(chunks)
|
||||||
|
# Track room counts
|
||||||
|
if extract_mode == "general":
|
||||||
|
for c in chunks:
|
||||||
|
room_counts[c.get("memory_type", "general")] += 1
|
||||||
|
else:
|
||||||
|
room_counts[room] += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
if extract_mode != "general":
|
||||||
|
room_counts[room] += 1
|
||||||
|
|
||||||
|
# File each chunk
|
||||||
|
drawers_added = 0
|
||||||
|
for chunk in chunks:
|
||||||
|
chunk_room = chunk.get("memory_type", room) if extract_mode == "general" else room
|
||||||
|
if extract_mode == "general":
|
||||||
|
room_counts[chunk_room] += 1
|
||||||
|
drawer_id = f"drawer_{wing}_{chunk_room}_{hashlib.md5((source_file + str(chunk['chunk_index'])).encode()).hexdigest()[:16]}"
|
||||||
|
try:
|
||||||
|
collection.add(
|
||||||
|
documents=[chunk["content"]],
|
||||||
|
ids=[drawer_id],
|
||||||
|
metadatas=[
|
||||||
|
{
|
||||||
|
"wing": wing,
|
||||||
|
"room": chunk_room,
|
||||||
|
"source_file": source_file,
|
||||||
|
"chunk_index": chunk["chunk_index"],
|
||||||
|
"added_by": agent,
|
||||||
|
"filed_at": datetime.now().isoformat(),
|
||||||
|
"ingest_mode": "convos",
|
||||||
|
"extract_mode": extract_mode,
|
||||||
|
}
|
||||||
|
],
|
||||||
|
)
|
||||||
|
drawers_added += 1
|
||||||
|
except Exception as e:
|
||||||
|
if "already exists" not in str(e).lower():
|
||||||
|
raise
|
||||||
|
|
||||||
|
total_drawers += drawers_added
|
||||||
|
print(f" ✓ [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}")
|
||||||
|
|
||||||
|
print(f"\n{'=' * 55}")
|
||||||
|
print(" Done.")
|
||||||
|
print(f" Files processed: {len(files) - files_skipped}")
|
||||||
|
print(f" Files skipped (already filed): {files_skipped}")
|
||||||
|
print(f" Drawers filed: {total_drawers}")
|
||||||
|
if room_counts:
|
||||||
|
print("\n By room:")
|
||||||
|
for room, count in sorted(room_counts.items(), key=lambda x: x[1], reverse=True):
|
||||||
|
print(f" {room:20} {count} files")
|
||||||
|
print('\n Next: mempalace search "what you\'re looking for"')
|
||||||
|
print(f"{'=' * 55}\n")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: python convo_miner.py <convo_dir> [--palace PATH] [--limit N] [--dry-run]")
|
||||||
|
sys.exit(1)
|
||||||
|
from .config import MempalaceConfig
|
||||||
|
|
||||||
|
mine_convos(sys.argv[1], palace_path=MempalaceConfig().palace_path)
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,853 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
entity_detector.py — Auto-detect people and projects from file content.
|
||||||
|
|
||||||
|
Two-pass approach:
|
||||||
|
Pass 1: scan files, extract entity candidates with signal counts
|
||||||
|
Pass 2: score and classify each candidate as person, project, or uncertain
|
||||||
|
|
||||||
|
Used by mempalace init before mining begins.
|
||||||
|
The confirmed entity map feeds the miner as the taxonomy.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
from entity_detector import detect_entities, confirm_entities
|
||||||
|
candidates = detect_entities(file_paths)
|
||||||
|
confirmed = confirm_entities(candidates) # interactive review
|
||||||
|
"""
|
||||||
|
|
||||||
|
import re
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== SIGNAL PATTERNS ====================
|
||||||
|
|
||||||
|
# Person signals — things people do
|
||||||
|
PERSON_VERB_PATTERNS = [
|
||||||
|
r"\b{name}\s+said\b",
|
||||||
|
r"\b{name}\s+asked\b",
|
||||||
|
r"\b{name}\s+told\b",
|
||||||
|
r"\b{name}\s+replied\b",
|
||||||
|
r"\b{name}\s+laughed\b",
|
||||||
|
r"\b{name}\s+smiled\b",
|
||||||
|
r"\b{name}\s+cried\b",
|
||||||
|
r"\b{name}\s+felt\b",
|
||||||
|
r"\b{name}\s+thinks?\b",
|
||||||
|
r"\b{name}\s+wants?\b",
|
||||||
|
r"\b{name}\s+loves?\b",
|
||||||
|
r"\b{name}\s+hates?\b",
|
||||||
|
r"\b{name}\s+knows?\b",
|
||||||
|
r"\b{name}\s+decided\b",
|
||||||
|
r"\b{name}\s+pushed\b",
|
||||||
|
r"\b{name}\s+wrote\b",
|
||||||
|
r"\bhey\s+{name}\b",
|
||||||
|
r"\bthanks?\s+{name}\b",
|
||||||
|
r"\bhi\s+{name}\b",
|
||||||
|
r"\bdear\s+{name}\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
# Person signals — pronouns resolving nearby
|
||||||
|
PRONOUN_PATTERNS = [
|
||||||
|
r"\bshe\b",
|
||||||
|
r"\bher\b",
|
||||||
|
r"\bhers\b",
|
||||||
|
r"\bhe\b",
|
||||||
|
r"\bhim\b",
|
||||||
|
r"\bhis\b",
|
||||||
|
r"\bthey\b",
|
||||||
|
r"\bthem\b",
|
||||||
|
r"\btheir\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
# Person signals — dialogue markers
|
||||||
|
DIALOGUE_PATTERNS = [
|
||||||
|
r"^>\s*{name}[:\s]", # > Speaker: ...
|
||||||
|
r"^{name}:\s", # Speaker: ...
|
||||||
|
r"^\[{name}\]", # [Speaker]
|
||||||
|
r'"{name}\s+said',
|
||||||
|
]
|
||||||
|
|
||||||
|
# Project signals — things projects have/do
|
||||||
|
PROJECT_VERB_PATTERNS = [
|
||||||
|
r"\bbuilding\s+{name}\b",
|
||||||
|
r"\bbuilt\s+{name}\b",
|
||||||
|
r"\bship(?:ping|ped)?\s+{name}\b",
|
||||||
|
r"\blaunch(?:ing|ed)?\s+{name}\b",
|
||||||
|
r"\bdeploy(?:ing|ed)?\s+{name}\b",
|
||||||
|
r"\binstall(?:ing|ed)?\s+{name}\b",
|
||||||
|
r"\bthe\s+{name}\s+architecture\b",
|
||||||
|
r"\bthe\s+{name}\s+pipeline\b",
|
||||||
|
r"\bthe\s+{name}\s+system\b",
|
||||||
|
r"\bthe\s+{name}\s+repo\b",
|
||||||
|
r"\b{name}\s+v\d+\b", # MemPal v2
|
||||||
|
r"\b{name}\.py\b", # mempalace.py
|
||||||
|
r"\b{name}-core\b", # mempal-core (hyphen only, not underscore)
|
||||||
|
r"\b{name}-local\b",
|
||||||
|
r"\bimport\s+{name}\b",
|
||||||
|
r"\bpip\s+install\s+{name}\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
# Words that are almost certainly NOT entities
|
||||||
|
STOPWORDS = {
|
||||||
|
"the",
|
||||||
|
"a",
|
||||||
|
"an",
|
||||||
|
"and",
|
||||||
|
"or",
|
||||||
|
"but",
|
||||||
|
"in",
|
||||||
|
"on",
|
||||||
|
"at",
|
||||||
|
"to",
|
||||||
|
"for",
|
||||||
|
"of",
|
||||||
|
"with",
|
||||||
|
"by",
|
||||||
|
"from",
|
||||||
|
"as",
|
||||||
|
"is",
|
||||||
|
"was",
|
||||||
|
"are",
|
||||||
|
"were",
|
||||||
|
"be",
|
||||||
|
"been",
|
||||||
|
"being",
|
||||||
|
"have",
|
||||||
|
"has",
|
||||||
|
"had",
|
||||||
|
"do",
|
||||||
|
"does",
|
||||||
|
"did",
|
||||||
|
"will",
|
||||||
|
"would",
|
||||||
|
"could",
|
||||||
|
"should",
|
||||||
|
"may",
|
||||||
|
"might",
|
||||||
|
"must",
|
||||||
|
"shall",
|
||||||
|
"can",
|
||||||
|
"this",
|
||||||
|
"that",
|
||||||
|
"these",
|
||||||
|
"those",
|
||||||
|
"it",
|
||||||
|
"its",
|
||||||
|
"they",
|
||||||
|
"them",
|
||||||
|
"their",
|
||||||
|
"we",
|
||||||
|
"our",
|
||||||
|
"you",
|
||||||
|
"your",
|
||||||
|
"i",
|
||||||
|
"my",
|
||||||
|
"me",
|
||||||
|
"he",
|
||||||
|
"she",
|
||||||
|
"his",
|
||||||
|
"her",
|
||||||
|
"who",
|
||||||
|
"what",
|
||||||
|
"when",
|
||||||
|
"where",
|
||||||
|
"why",
|
||||||
|
"how",
|
||||||
|
"which",
|
||||||
|
"if",
|
||||||
|
"then",
|
||||||
|
"so",
|
||||||
|
"not",
|
||||||
|
"no",
|
||||||
|
"yes",
|
||||||
|
"ok",
|
||||||
|
"okay",
|
||||||
|
"just",
|
||||||
|
"very",
|
||||||
|
"really",
|
||||||
|
"also",
|
||||||
|
"already",
|
||||||
|
"still",
|
||||||
|
"even",
|
||||||
|
"only",
|
||||||
|
"here",
|
||||||
|
"there",
|
||||||
|
"now",
|
||||||
|
"then",
|
||||||
|
"too",
|
||||||
|
"up",
|
||||||
|
"out",
|
||||||
|
"about",
|
||||||
|
"like",
|
||||||
|
"use",
|
||||||
|
"get",
|
||||||
|
"got",
|
||||||
|
"make",
|
||||||
|
"made",
|
||||||
|
"take",
|
||||||
|
"put",
|
||||||
|
"come",
|
||||||
|
"go",
|
||||||
|
"see",
|
||||||
|
"know",
|
||||||
|
"think",
|
||||||
|
"true",
|
||||||
|
"false",
|
||||||
|
"none",
|
||||||
|
"null",
|
||||||
|
"new",
|
||||||
|
"old",
|
||||||
|
"all",
|
||||||
|
"any",
|
||||||
|
"some",
|
||||||
|
"true",
|
||||||
|
"false",
|
||||||
|
"return",
|
||||||
|
"print",
|
||||||
|
"def",
|
||||||
|
"class",
|
||||||
|
"import",
|
||||||
|
"from",
|
||||||
|
# Common capitalized words in prose that aren't entities
|
||||||
|
"step",
|
||||||
|
"usage",
|
||||||
|
"run",
|
||||||
|
"check",
|
||||||
|
"find",
|
||||||
|
"add",
|
||||||
|
"get",
|
||||||
|
"set",
|
||||||
|
"list",
|
||||||
|
"args",
|
||||||
|
"dict",
|
||||||
|
"str",
|
||||||
|
"int",
|
||||||
|
"bool",
|
||||||
|
"path",
|
||||||
|
"file",
|
||||||
|
"type",
|
||||||
|
"name",
|
||||||
|
"note",
|
||||||
|
"example",
|
||||||
|
"option",
|
||||||
|
"result",
|
||||||
|
"error",
|
||||||
|
"warning",
|
||||||
|
"info",
|
||||||
|
"every",
|
||||||
|
"each",
|
||||||
|
"more",
|
||||||
|
"less",
|
||||||
|
"next",
|
||||||
|
"last",
|
||||||
|
"first",
|
||||||
|
"second",
|
||||||
|
"stack",
|
||||||
|
"layer",
|
||||||
|
"mode",
|
||||||
|
"test",
|
||||||
|
"stop",
|
||||||
|
"start",
|
||||||
|
"copy",
|
||||||
|
"move",
|
||||||
|
"source",
|
||||||
|
"target",
|
||||||
|
"output",
|
||||||
|
"input",
|
||||||
|
"data",
|
||||||
|
"item",
|
||||||
|
"key",
|
||||||
|
"value",
|
||||||
|
"returns",
|
||||||
|
"raises",
|
||||||
|
"yields",
|
||||||
|
"none",
|
||||||
|
"self",
|
||||||
|
"cls",
|
||||||
|
"kwargs",
|
||||||
|
# Common sentence-starting / abstract words that aren't entities
|
||||||
|
"world",
|
||||||
|
"well",
|
||||||
|
"want",
|
||||||
|
"topic",
|
||||||
|
"choose",
|
||||||
|
"social",
|
||||||
|
"cars",
|
||||||
|
"phones",
|
||||||
|
"healthcare",
|
||||||
|
"ex",
|
||||||
|
"machina",
|
||||||
|
"deus",
|
||||||
|
"human",
|
||||||
|
"humans",
|
||||||
|
"people",
|
||||||
|
"things",
|
||||||
|
"something",
|
||||||
|
"nothing",
|
||||||
|
"everything",
|
||||||
|
"anything",
|
||||||
|
"someone",
|
||||||
|
"everyone",
|
||||||
|
"anyone",
|
||||||
|
"way",
|
||||||
|
"time",
|
||||||
|
"day",
|
||||||
|
"life",
|
||||||
|
"place",
|
||||||
|
"thing",
|
||||||
|
"part",
|
||||||
|
"kind",
|
||||||
|
"sort",
|
||||||
|
"case",
|
||||||
|
"point",
|
||||||
|
"idea",
|
||||||
|
"fact",
|
||||||
|
"sense",
|
||||||
|
"question",
|
||||||
|
"answer",
|
||||||
|
"reason",
|
||||||
|
"number",
|
||||||
|
"version",
|
||||||
|
"system",
|
||||||
|
# Greetings and filler words at sentence starts
|
||||||
|
"hey",
|
||||||
|
"hi",
|
||||||
|
"hello",
|
||||||
|
"thanks",
|
||||||
|
"thank",
|
||||||
|
"right",
|
||||||
|
"let",
|
||||||
|
"ok",
|
||||||
|
# UI/action words that appear in how-to content
|
||||||
|
"click",
|
||||||
|
"hit",
|
||||||
|
"press",
|
||||||
|
"tap",
|
||||||
|
"drag",
|
||||||
|
"drop",
|
||||||
|
"open",
|
||||||
|
"close",
|
||||||
|
"save",
|
||||||
|
"load",
|
||||||
|
"launch",
|
||||||
|
"install",
|
||||||
|
"download",
|
||||||
|
"upload",
|
||||||
|
"scroll",
|
||||||
|
"select",
|
||||||
|
"enter",
|
||||||
|
"submit",
|
||||||
|
"cancel",
|
||||||
|
"confirm",
|
||||||
|
"delete",
|
||||||
|
"copy",
|
||||||
|
"paste",
|
||||||
|
"type",
|
||||||
|
"write",
|
||||||
|
"read",
|
||||||
|
"search",
|
||||||
|
"find",
|
||||||
|
"show",
|
||||||
|
"hide",
|
||||||
|
# Common filesystem/technical capitalized words
|
||||||
|
"desktop",
|
||||||
|
"documents",
|
||||||
|
"downloads",
|
||||||
|
"users",
|
||||||
|
"home",
|
||||||
|
"library",
|
||||||
|
"applications",
|
||||||
|
"system",
|
||||||
|
"preferences",
|
||||||
|
"settings",
|
||||||
|
"terminal",
|
||||||
|
# Abstract/topic words
|
||||||
|
"actor",
|
||||||
|
"vector",
|
||||||
|
"remote",
|
||||||
|
"control",
|
||||||
|
"duration",
|
||||||
|
"fetch",
|
||||||
|
# Abstract concepts that appear as subjects but aren't entities
|
||||||
|
"agents",
|
||||||
|
"tools",
|
||||||
|
"others",
|
||||||
|
"guards",
|
||||||
|
"ethics",
|
||||||
|
"regulation",
|
||||||
|
"learning",
|
||||||
|
"thinking",
|
||||||
|
"memory",
|
||||||
|
"language",
|
||||||
|
"intelligence",
|
||||||
|
"technology",
|
||||||
|
"society",
|
||||||
|
"culture",
|
||||||
|
"future",
|
||||||
|
"history",
|
||||||
|
"science",
|
||||||
|
"model",
|
||||||
|
"models",
|
||||||
|
"network",
|
||||||
|
"networks",
|
||||||
|
"training",
|
||||||
|
"inference",
|
||||||
|
}
|
||||||
|
|
||||||
|
# For entity detection — prose only, no code files
|
||||||
|
# Code files have too many capitalized names (classes, functions) that aren't entities
|
||||||
|
PROSE_EXTENSIONS = {
|
||||||
|
".txt",
|
||||||
|
".md",
|
||||||
|
".rst",
|
||||||
|
".csv",
|
||||||
|
}
|
||||||
|
|
||||||
|
READABLE_EXTENSIONS = {
|
||||||
|
".txt",
|
||||||
|
".md",
|
||||||
|
".py",
|
||||||
|
".js",
|
||||||
|
".ts",
|
||||||
|
".json",
|
||||||
|
".yaml",
|
||||||
|
".yml",
|
||||||
|
".csv",
|
||||||
|
".rst",
|
||||||
|
".toml",
|
||||||
|
".sh",
|
||||||
|
".rb",
|
||||||
|
".go",
|
||||||
|
".rs",
|
||||||
|
}
|
||||||
|
|
||||||
|
SKIP_DIRS = {
|
||||||
|
".git",
|
||||||
|
"node_modules",
|
||||||
|
"__pycache__",
|
||||||
|
".venv",
|
||||||
|
"venv",
|
||||||
|
"env",
|
||||||
|
"dist",
|
||||||
|
"build",
|
||||||
|
".next",
|
||||||
|
"coverage",
|
||||||
|
".mempalace",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== CANDIDATE EXTRACTION ====================
|
||||||
|
|
||||||
|
|
||||||
|
def extract_candidates(text: str) -> dict:
|
||||||
|
"""
|
||||||
|
Extract all capitalized proper noun candidates from text.
|
||||||
|
Returns {name: frequency} for names appearing 3+ times.
|
||||||
|
"""
|
||||||
|
# Find all capitalized words (not at sentence start — harder, so we use frequency as filter)
|
||||||
|
raw = re.findall(r"\b([A-Z][a-z]{1,19})\b", text)
|
||||||
|
|
||||||
|
counts = defaultdict(int)
|
||||||
|
for word in raw:
|
||||||
|
if word.lower() not in STOPWORDS and len(word) > 1:
|
||||||
|
counts[word] += 1
|
||||||
|
|
||||||
|
# Also find multi-word proper nouns (e.g. "Memory Palace", "Claude Code")
|
||||||
|
multi = re.findall(r"\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)+)\b", text)
|
||||||
|
for phrase in multi:
|
||||||
|
if not any(w.lower() in STOPWORDS for w in phrase.split()):
|
||||||
|
counts[phrase] += 1
|
||||||
|
|
||||||
|
# Filter: must appear at least 3 times to be a candidate
|
||||||
|
return {name: count for name, count in counts.items() if count >= 3}
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== SIGNAL SCORING ====================
|
||||||
|
|
||||||
|
|
||||||
|
def _build_patterns(name: str) -> dict:
|
||||||
|
"""Pre-compile all regex patterns for a single entity name."""
|
||||||
|
n = re.escape(name)
|
||||||
|
return {
|
||||||
|
"dialogue": [
|
||||||
|
re.compile(p.format(name=n), re.MULTILINE | re.IGNORECASE) for p in DIALOGUE_PATTERNS
|
||||||
|
],
|
||||||
|
"person_verbs": [re.compile(p.format(name=n), re.IGNORECASE) for p in PERSON_VERB_PATTERNS],
|
||||||
|
"project_verbs": [
|
||||||
|
re.compile(p.format(name=n), re.IGNORECASE) for p in PROJECT_VERB_PATTERNS
|
||||||
|
],
|
||||||
|
"direct": re.compile(rf"\bhey\s+{n}\b|\bthanks?\s+{n}\b|\bhi\s+{n}\b", re.IGNORECASE),
|
||||||
|
"versioned": re.compile(rf"\b{n}[-v]\w+", re.IGNORECASE),
|
||||||
|
"code_ref": re.compile(rf"\b{n}\.(py|js|ts|yaml|yml|json|sh)\b", re.IGNORECASE),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def score_entity(name: str, text: str, lines: list) -> dict:
|
||||||
|
"""
|
||||||
|
Score a candidate entity as person vs project.
|
||||||
|
Returns scores and the signals that fired.
|
||||||
|
"""
|
||||||
|
patterns = _build_patterns(name)
|
||||||
|
person_score = 0
|
||||||
|
project_score = 0
|
||||||
|
person_signals = []
|
||||||
|
project_signals = []
|
||||||
|
|
||||||
|
# --- Person signals ---
|
||||||
|
|
||||||
|
# Dialogue markers (strong signal)
|
||||||
|
for rx in patterns["dialogue"]:
|
||||||
|
matches = len(rx.findall(text))
|
||||||
|
if matches > 0:
|
||||||
|
person_score += matches * 3
|
||||||
|
person_signals.append(f"dialogue marker ({matches}x)")
|
||||||
|
|
||||||
|
# Person verbs
|
||||||
|
for rx in patterns["person_verbs"]:
|
||||||
|
matches = len(rx.findall(text))
|
||||||
|
if matches > 0:
|
||||||
|
person_score += matches * 2
|
||||||
|
person_signals.append(f"'{name} ...' action ({matches}x)")
|
||||||
|
|
||||||
|
# Pronoun proximity — pronouns within 3 lines of the name
|
||||||
|
name_lower = name.lower()
|
||||||
|
name_line_indices = [i for i, line in enumerate(lines) if name_lower in line.lower()]
|
||||||
|
pronoun_hits = 0
|
||||||
|
for idx in name_line_indices:
|
||||||
|
window_text = " ".join(lines[max(0, idx - 2) : idx + 3]).lower()
|
||||||
|
for pronoun_pattern in PRONOUN_PATTERNS:
|
||||||
|
if re.search(pronoun_pattern, window_text):
|
||||||
|
pronoun_hits += 1
|
||||||
|
break
|
||||||
|
if pronoun_hits > 0:
|
||||||
|
person_score += pronoun_hits * 2
|
||||||
|
person_signals.append(f"pronoun nearby ({pronoun_hits}x)")
|
||||||
|
|
||||||
|
# Direct address
|
||||||
|
direct = len(patterns["direct"].findall(text))
|
||||||
|
if direct > 0:
|
||||||
|
person_score += direct * 4
|
||||||
|
person_signals.append(f"addressed directly ({direct}x)")
|
||||||
|
|
||||||
|
# --- Project signals ---
|
||||||
|
|
||||||
|
for rx in patterns["project_verbs"]:
|
||||||
|
matches = len(rx.findall(text))
|
||||||
|
if matches > 0:
|
||||||
|
project_score += matches * 2
|
||||||
|
project_signals.append(f"project verb ({matches}x)")
|
||||||
|
|
||||||
|
versioned = len(patterns["versioned"].findall(text))
|
||||||
|
if versioned > 0:
|
||||||
|
project_score += versioned * 3
|
||||||
|
project_signals.append(f"versioned/hyphenated ({versioned}x)")
|
||||||
|
|
||||||
|
code_ref = len(patterns["code_ref"].findall(text))
|
||||||
|
if code_ref > 0:
|
||||||
|
project_score += code_ref * 3
|
||||||
|
project_signals.append(f"code file reference ({code_ref}x)")
|
||||||
|
|
||||||
|
return {
|
||||||
|
"person_score": person_score,
|
||||||
|
"project_score": project_score,
|
||||||
|
"person_signals": person_signals[:3],
|
||||||
|
"project_signals": project_signals[:3],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== CLASSIFY ====================
|
||||||
|
|
||||||
|
|
||||||
|
def classify_entity(name: str, frequency: int, scores: dict) -> dict:
|
||||||
|
"""
|
||||||
|
Given scores, classify as person / project / uncertain.
|
||||||
|
Returns entity dict with confidence.
|
||||||
|
"""
|
||||||
|
ps = scores["person_score"]
|
||||||
|
prs = scores["project_score"]
|
||||||
|
total = ps + prs
|
||||||
|
|
||||||
|
if total == 0:
|
||||||
|
# No strong signals — frequency-only candidate, uncertain
|
||||||
|
confidence = min(0.4, frequency / 50)
|
||||||
|
return {
|
||||||
|
"name": name,
|
||||||
|
"type": "uncertain",
|
||||||
|
"confidence": round(confidence, 2),
|
||||||
|
"frequency": frequency,
|
||||||
|
"signals": [f"appears {frequency}x, no strong type signals"],
|
||||||
|
}
|
||||||
|
|
||||||
|
person_ratio = ps / total if total > 0 else 0
|
||||||
|
|
||||||
|
# Require TWO different signal categories to confidently classify as a person.
|
||||||
|
# One signal type with many hits (e.g. "Click, click, click...") is not enough —
|
||||||
|
# it just means that word appears often in a particular syntactic position.
|
||||||
|
signal_categories = set()
|
||||||
|
for s in scores["person_signals"]:
|
||||||
|
if "dialogue" in s:
|
||||||
|
signal_categories.add("dialogue")
|
||||||
|
elif "action" in s:
|
||||||
|
signal_categories.add("action")
|
||||||
|
elif "pronoun" in s:
|
||||||
|
signal_categories.add("pronoun")
|
||||||
|
elif "addressed" in s:
|
||||||
|
signal_categories.add("addressed")
|
||||||
|
|
||||||
|
has_two_signal_types = len(signal_categories) >= 2
|
||||||
|
_ = signal_categories - {"pronoun"} # reserved for future thresholds
|
||||||
|
|
||||||
|
if person_ratio >= 0.7 and has_two_signal_types and ps >= 5:
|
||||||
|
entity_type = "person"
|
||||||
|
confidence = min(0.99, 0.5 + person_ratio * 0.5)
|
||||||
|
signals = scores["person_signals"] or [f"appears {frequency}x"]
|
||||||
|
elif person_ratio >= 0.7 and (not has_two_signal_types or ps < 5):
|
||||||
|
# Pronoun-only match — downgrade to uncertain
|
||||||
|
entity_type = "uncertain"
|
||||||
|
confidence = 0.4
|
||||||
|
signals = scores["person_signals"] + [f"appears {frequency}x — pronoun-only match"]
|
||||||
|
elif person_ratio <= 0.3:
|
||||||
|
entity_type = "project"
|
||||||
|
confidence = min(0.99, 0.5 + (1 - person_ratio) * 0.5)
|
||||||
|
signals = scores["project_signals"] or [f"appears {frequency}x"]
|
||||||
|
else:
|
||||||
|
entity_type = "uncertain"
|
||||||
|
confidence = 0.5
|
||||||
|
signals = (scores["person_signals"] + scores["project_signals"])[:3]
|
||||||
|
signals.append("mixed signals — needs review")
|
||||||
|
|
||||||
|
return {
|
||||||
|
"name": name,
|
||||||
|
"type": entity_type,
|
||||||
|
"confidence": round(confidence, 2),
|
||||||
|
"frequency": frequency,
|
||||||
|
"signals": signals,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== MAIN DETECT ====================
|
||||||
|
|
||||||
|
|
||||||
|
def detect_entities(file_paths: list, max_files: int = 10) -> dict:
|
||||||
|
"""
|
||||||
|
Scan files and detect entity candidates.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_paths: List of Path objects to scan
|
||||||
|
max_files: Max files to read (for speed)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
{
|
||||||
|
"people": [...entity dicts...],
|
||||||
|
"projects": [...entity dicts...],
|
||||||
|
"uncertain":[...entity dicts...],
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
# Collect text from files
|
||||||
|
all_text = []
|
||||||
|
all_lines = []
|
||||||
|
files_read = 0
|
||||||
|
|
||||||
|
MAX_BYTES_PER_FILE = 5_000 # first 5KB per file — enough to catch recurring entities
|
||||||
|
|
||||||
|
for filepath in file_paths:
|
||||||
|
if files_read >= max_files:
|
||||||
|
break
|
||||||
|
try:
|
||||||
|
with open(filepath, encoding="utf-8", errors="replace") as f:
|
||||||
|
content = f.read(MAX_BYTES_PER_FILE)
|
||||||
|
all_text.append(content)
|
||||||
|
all_lines.extend(content.splitlines())
|
||||||
|
files_read += 1
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
|
||||||
|
combined_text = "\n".join(all_text)
|
||||||
|
|
||||||
|
# Extract candidates
|
||||||
|
candidates = extract_candidates(combined_text)
|
||||||
|
|
||||||
|
if not candidates:
|
||||||
|
return {"people": [], "projects": [], "uncertain": []}
|
||||||
|
|
||||||
|
# Score and classify each candidate
|
||||||
|
people = []
|
||||||
|
projects = []
|
||||||
|
uncertain = []
|
||||||
|
|
||||||
|
for name, frequency in sorted(candidates.items(), key=lambda x: x[1], reverse=True):
|
||||||
|
scores = score_entity(name, combined_text, all_lines)
|
||||||
|
entity = classify_entity(name, frequency, scores)
|
||||||
|
|
||||||
|
if entity["type"] == "person":
|
||||||
|
people.append(entity)
|
||||||
|
elif entity["type"] == "project":
|
||||||
|
projects.append(entity)
|
||||||
|
else:
|
||||||
|
uncertain.append(entity)
|
||||||
|
|
||||||
|
# Sort by confidence descending
|
||||||
|
people.sort(key=lambda x: x["confidence"], reverse=True)
|
||||||
|
projects.sort(key=lambda x: x["confidence"], reverse=True)
|
||||||
|
uncertain.sort(key=lambda x: x["frequency"], reverse=True)
|
||||||
|
|
||||||
|
# Cap results to most relevant
|
||||||
|
return {
|
||||||
|
"people": people[:15],
|
||||||
|
"projects": projects[:10],
|
||||||
|
"uncertain": uncertain[:8],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== INTERACTIVE CONFIRM ====================
|
||||||
|
|
||||||
|
|
||||||
|
def _print_entity_list(entities: list, label: str):
|
||||||
|
print(f"\n {label}:")
|
||||||
|
if not entities:
|
||||||
|
print(" (none detected)")
|
||||||
|
return
|
||||||
|
for i, e in enumerate(entities):
|
||||||
|
confidence_bar = "●" * int(e["confidence"] * 5) + "○" * (5 - int(e["confidence"] * 5))
|
||||||
|
signals_str = ", ".join(e["signals"][:2]) if e["signals"] else ""
|
||||||
|
print(f" {i + 1:2}. {e['name']:20} [{confidence_bar}] {signals_str}")
|
||||||
|
|
||||||
|
|
||||||
|
def confirm_entities(detected: dict, yes: bool = False) -> dict:
|
||||||
|
"""
|
||||||
|
Interactive confirmation step.
|
||||||
|
User reviews detected entities, removes wrong ones, adds missing ones.
|
||||||
|
Returns confirmed {people: [names], projects: [names]}
|
||||||
|
|
||||||
|
Pass yes=True to auto-accept all detected entities without prompting.
|
||||||
|
"""
|
||||||
|
print(f"\n{'=' * 58}")
|
||||||
|
print(" MemPalace — Entity Detection")
|
||||||
|
print(f"{'=' * 58}")
|
||||||
|
print("\n Scanned your files. Here's what we found:\n")
|
||||||
|
|
||||||
|
_print_entity_list(detected["people"], "PEOPLE")
|
||||||
|
_print_entity_list(detected["projects"], "PROJECTS")
|
||||||
|
|
||||||
|
if detected["uncertain"]:
|
||||||
|
_print_entity_list(detected["uncertain"], "UNCERTAIN (need your call)")
|
||||||
|
|
||||||
|
confirmed_people = [e["name"] for e in detected["people"]]
|
||||||
|
confirmed_projects = [e["name"] for e in detected["projects"]]
|
||||||
|
|
||||||
|
if yes:
|
||||||
|
# Auto-accept: include all detected (skip uncertain — ambiguous without user input)
|
||||||
|
print(
|
||||||
|
f"\n Auto-accepting {len(confirmed_people)} people, {len(confirmed_projects)} projects."
|
||||||
|
)
|
||||||
|
return {"people": confirmed_people, "projects": confirmed_projects}
|
||||||
|
|
||||||
|
print(f"\n{'─' * 58}")
|
||||||
|
print(" Options:")
|
||||||
|
print(" [enter] Accept all")
|
||||||
|
print(" [edit] Remove wrong entries or reclassify uncertain")
|
||||||
|
print(" [add] Add missing people or projects")
|
||||||
|
print()
|
||||||
|
|
||||||
|
choice = input(" Your choice [enter/edit/add]: ").strip().lower()
|
||||||
|
|
||||||
|
confirmed_people = [e["name"] for e in detected["people"]]
|
||||||
|
confirmed_projects = [e["name"] for e in detected["projects"]]
|
||||||
|
|
||||||
|
if choice == "edit":
|
||||||
|
# Handle uncertain first
|
||||||
|
if detected["uncertain"]:
|
||||||
|
print("\n Uncertain entities — classify each:")
|
||||||
|
for e in detected["uncertain"]:
|
||||||
|
ans = input(f" {e['name']} — (p)erson, (r)roject, or (s)kip? ").strip().lower()
|
||||||
|
if ans == "p":
|
||||||
|
confirmed_people.append(e["name"])
|
||||||
|
elif ans == "r":
|
||||||
|
confirmed_projects.append(e["name"])
|
||||||
|
|
||||||
|
# Remove wrong people
|
||||||
|
print(f"\n Current people: {', '.join(confirmed_people) or '(none)'}")
|
||||||
|
remove = input(
|
||||||
|
" Numbers to REMOVE from people (comma-separated, or enter to skip): "
|
||||||
|
).strip()
|
||||||
|
if remove:
|
||||||
|
to_remove = {int(x.strip()) - 1 for x in remove.split(",") if x.strip().isdigit()}
|
||||||
|
confirmed_people = [p for i, p in enumerate(confirmed_people) if i not in to_remove]
|
||||||
|
|
||||||
|
# Remove wrong projects
|
||||||
|
print(f"\n Current projects: {', '.join(confirmed_projects) or '(none)'}")
|
||||||
|
remove = input(
|
||||||
|
" Numbers to REMOVE from projects (comma-separated, or enter to skip): "
|
||||||
|
).strip()
|
||||||
|
if remove:
|
||||||
|
to_remove = {int(x.strip()) - 1 for x in remove.split(",") if x.strip().isdigit()}
|
||||||
|
confirmed_projects = [p for i, p in enumerate(confirmed_projects) if i not in to_remove]
|
||||||
|
|
||||||
|
if choice == "add" or input("\n Add any missing? [y/N]: ").strip().lower() == "y":
|
||||||
|
while True:
|
||||||
|
name = input(" Name (or enter to stop): ").strip()
|
||||||
|
if not name:
|
||||||
|
break
|
||||||
|
kind = input(f" Is '{name}' a (p)erson or (r)roject? ").strip().lower()
|
||||||
|
if kind == "p":
|
||||||
|
confirmed_people.append(name)
|
||||||
|
elif kind == "r":
|
||||||
|
confirmed_projects.append(name)
|
||||||
|
|
||||||
|
print(f"\n{'=' * 58}")
|
||||||
|
print(" Confirmed:")
|
||||||
|
print(f" People: {', '.join(confirmed_people) or '(none)'}")
|
||||||
|
print(f" Projects: {', '.join(confirmed_projects) or '(none)'}")
|
||||||
|
print(f"{'=' * 58}\n")
|
||||||
|
|
||||||
|
return {
|
||||||
|
"people": confirmed_people,
|
||||||
|
"projects": confirmed_projects,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== SCAN HELPER ====================
|
||||||
|
|
||||||
|
|
||||||
|
def scan_for_detection(project_dir: str, max_files: int = 10) -> list:
|
||||||
|
"""
|
||||||
|
Collect prose file paths for entity detection.
|
||||||
|
Prose only (.txt, .md, .rst, .csv) — code files produce too many false positives.
|
||||||
|
Falls back to all readable files if no prose found.
|
||||||
|
"""
|
||||||
|
project_path = Path(project_dir).expanduser().resolve()
|
||||||
|
prose_files = []
|
||||||
|
all_files = []
|
||||||
|
|
||||||
|
for root, dirs, filenames in os.walk(project_path):
|
||||||
|
dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
|
||||||
|
for filename in filenames:
|
||||||
|
filepath = Path(root) / filename
|
||||||
|
ext = filepath.suffix.lower()
|
||||||
|
if ext in PROSE_EXTENSIONS:
|
||||||
|
prose_files.append(filepath)
|
||||||
|
elif ext in READABLE_EXTENSIONS:
|
||||||
|
all_files.append(filepath)
|
||||||
|
|
||||||
|
# Prefer prose files — fall back to all readable if too few prose files
|
||||||
|
files = prose_files if len(prose_files) >= 3 else prose_files + all_files
|
||||||
|
return files[:max_files]
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== CLI ====================
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: python entity_detector.py <directory>")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
project_dir = sys.argv[1]
|
||||||
|
print(f"Scanning: {project_dir}")
|
||||||
|
files = scan_for_detection(project_dir)
|
||||||
|
print(f"Reading {len(files)} files...")
|
||||||
|
detected = detect_entities(files)
|
||||||
|
confirmed = confirm_entities(detected)
|
||||||
|
print("Confirmed entities:", confirmed)
|
||||||
@@ -0,0 +1,643 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
entity_registry.py — Persistent personal entity registry for MemPalace.
|
||||||
|
|
||||||
|
Knows the difference between Riley (a person) and ever (an adverb).
|
||||||
|
Built from three sources, in priority order:
|
||||||
|
1. Onboarding — what the user explicitly told us
|
||||||
|
2. Learned — what we inferred from session history with high confidence
|
||||||
|
3. Researched — what we looked up via Wikipedia for unknown words
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
from mempalace.entity_registry import EntityRegistry
|
||||||
|
registry = EntityRegistry.load()
|
||||||
|
result = registry.lookup("Riley", context="I went with Riley today")
|
||||||
|
# → {"type": "person", "confidence": 1.0, "source": "onboarding"}
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
import urllib.request
|
||||||
|
import urllib.parse
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Common English words that could be confused with names
|
||||||
|
# These get flagged as AMBIGUOUS and require context disambiguation
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
COMMON_ENGLISH_WORDS = {
|
||||||
|
# Words that are also common personal names
|
||||||
|
"ever",
|
||||||
|
"grace",
|
||||||
|
"will",
|
||||||
|
"bill",
|
||||||
|
"mark",
|
||||||
|
"april",
|
||||||
|
"may",
|
||||||
|
"june",
|
||||||
|
"joy",
|
||||||
|
"hope",
|
||||||
|
"faith",
|
||||||
|
"chance",
|
||||||
|
"chase",
|
||||||
|
"hunter",
|
||||||
|
"hunter",
|
||||||
|
"dash",
|
||||||
|
"flash",
|
||||||
|
"star",
|
||||||
|
"sky",
|
||||||
|
"river",
|
||||||
|
"brook",
|
||||||
|
"lane",
|
||||||
|
"art",
|
||||||
|
"clay",
|
||||||
|
"gil",
|
||||||
|
"nat",
|
||||||
|
"max",
|
||||||
|
"rex",
|
||||||
|
"ray",
|
||||||
|
"jay",
|
||||||
|
"rose",
|
||||||
|
"violet",
|
||||||
|
"lily",
|
||||||
|
"ivy",
|
||||||
|
"ash",
|
||||||
|
"reed",
|
||||||
|
"sage",
|
||||||
|
# Words that look like names at start of sentence
|
||||||
|
"monday",
|
||||||
|
"tuesday",
|
||||||
|
"wednesday",
|
||||||
|
"thursday",
|
||||||
|
"friday",
|
||||||
|
"saturday",
|
||||||
|
"sunday",
|
||||||
|
"january",
|
||||||
|
"february",
|
||||||
|
"march",
|
||||||
|
"april",
|
||||||
|
"june",
|
||||||
|
"july",
|
||||||
|
"august",
|
||||||
|
"september",
|
||||||
|
"october",
|
||||||
|
"november",
|
||||||
|
"december",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Context patterns that indicate a word is being used as a PERSON name
|
||||||
|
PERSON_CONTEXT_PATTERNS = [
|
||||||
|
r"\b{name}\s+said\b",
|
||||||
|
r"\b{name}\s+told\b",
|
||||||
|
r"\b{name}\s+asked\b",
|
||||||
|
r"\b{name}\s+laughed\b",
|
||||||
|
r"\b{name}\s+smiled\b",
|
||||||
|
r"\b{name}\s+was\b",
|
||||||
|
r"\b{name}\s+is\b",
|
||||||
|
r"\b{name}\s+called\b",
|
||||||
|
r"\b{name}\s+texted\b",
|
||||||
|
r"\bwith\s+{name}\b",
|
||||||
|
r"\bsaw\s+{name}\b",
|
||||||
|
r"\bcalled\s+{name}\b",
|
||||||
|
r"\btook\s+{name}\b",
|
||||||
|
r"\bpicked\s+up\s+{name}\b",
|
||||||
|
r"\bdrop(?:ped)?\s+(?:off\s+)?{name}\b",
|
||||||
|
r"\b{name}(?:'s|s')\b", # Riley's, Max's
|
||||||
|
r"\bhey\s+{name}\b",
|
||||||
|
r"\bthanks?\s+{name}\b",
|
||||||
|
r"^{name}[:\s]", # dialogue: "Riley: ..."
|
||||||
|
r"\bmy\s+(?:son|daughter|kid|child|brother|sister|friend|partner|colleague|coworker)\s+{name}\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
# Context patterns that indicate a word is NOT being used as a name
|
||||||
|
CONCEPT_CONTEXT_PATTERNS = [
|
||||||
|
r"\bhave\s+you\s+{name}\b", # "have you ever"
|
||||||
|
r"\bif\s+you\s+{name}\b", # "if you ever"
|
||||||
|
r"\b{name}\s+since\b", # "ever since"
|
||||||
|
r"\b{name}\s+again\b", # "ever again"
|
||||||
|
r"\bnot\s+{name}\b", # "not ever"
|
||||||
|
r"\b{name}\s+more\b", # "ever more"
|
||||||
|
r"\bwould\s+{name}\b", # "would ever"
|
||||||
|
r"\bcould\s+{name}\b", # "could ever"
|
||||||
|
r"\bwill\s+{name}\b", # "will ever"
|
||||||
|
r"(?:the\s+)?{name}\s+(?:of|in|at|for|to)\b", # "the grace of", "the mark of"
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Wikipedia lookup for unknown words
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Phrases in Wikipedia summaries that indicate a personal name
|
||||||
|
NAME_INDICATOR_PHRASES = [
|
||||||
|
"given name",
|
||||||
|
"personal name",
|
||||||
|
"first name",
|
||||||
|
"forename",
|
||||||
|
"masculine name",
|
||||||
|
"feminine name",
|
||||||
|
"boy's name",
|
||||||
|
"girl's name",
|
||||||
|
"male name",
|
||||||
|
"female name",
|
||||||
|
"irish name",
|
||||||
|
"welsh name",
|
||||||
|
"scottish name",
|
||||||
|
"gaelic name",
|
||||||
|
"hebrew name",
|
||||||
|
"arabic name",
|
||||||
|
"norse name",
|
||||||
|
"old english name",
|
||||||
|
"is a name",
|
||||||
|
"as a name",
|
||||||
|
"name meaning",
|
||||||
|
"name derived from",
|
||||||
|
"legendary irish",
|
||||||
|
"legendary welsh",
|
||||||
|
"legendary scottish",
|
||||||
|
]
|
||||||
|
|
||||||
|
PLACE_INDICATOR_PHRASES = [
|
||||||
|
"city in",
|
||||||
|
"town in",
|
||||||
|
"village in",
|
||||||
|
"municipality",
|
||||||
|
"capital of",
|
||||||
|
"district of",
|
||||||
|
"county",
|
||||||
|
"province",
|
||||||
|
"region of",
|
||||||
|
"island of",
|
||||||
|
"mountain in",
|
||||||
|
"river in",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _wikipedia_lookup(word: str) -> dict:
|
||||||
|
"""
|
||||||
|
Look up a word via Wikipedia REST API.
|
||||||
|
Returns inferred type (person/place/concept/unknown) + confidence + summary.
|
||||||
|
Free, no API key, handles disambiguation pages.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{urllib.parse.quote(word)}"
|
||||||
|
req = urllib.request.Request(url, headers={"User-Agent": "MemPalace/1.0"})
|
||||||
|
with urllib.request.urlopen(req, timeout=5) as resp:
|
||||||
|
data = json.loads(resp.read())
|
||||||
|
|
||||||
|
page_type = data.get("type", "")
|
||||||
|
extract = data.get("extract", "").lower()
|
||||||
|
title = data.get("title", word)
|
||||||
|
|
||||||
|
# Disambiguation — look at description
|
||||||
|
if page_type == "disambiguation":
|
||||||
|
desc = data.get("description", "").lower()
|
||||||
|
if any(p in desc for p in ["name", "given name"]):
|
||||||
|
return {
|
||||||
|
"inferred_type": "person",
|
||||||
|
"confidence": 0.65,
|
||||||
|
"wiki_summary": extract[:200],
|
||||||
|
"wiki_title": title,
|
||||||
|
"note": "disambiguation page with name entries",
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
"inferred_type": "ambiguous",
|
||||||
|
"confidence": 0.4,
|
||||||
|
"wiki_summary": extract[:200],
|
||||||
|
"wiki_title": title,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check for name indicators
|
||||||
|
if any(phrase in extract for phrase in NAME_INDICATOR_PHRASES):
|
||||||
|
# Higher confidence if the word itself is described as a name
|
||||||
|
confidence = (
|
||||||
|
0.90
|
||||||
|
if any(
|
||||||
|
f"{word.lower()} is a" in extract or f"{word.lower()} (name" in extract
|
||||||
|
for _ in [1]
|
||||||
|
)
|
||||||
|
else 0.80
|
||||||
|
)
|
||||||
|
return {
|
||||||
|
"inferred_type": "person",
|
||||||
|
"confidence": confidence,
|
||||||
|
"wiki_summary": extract[:200],
|
||||||
|
"wiki_title": title,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check for place indicators
|
||||||
|
if any(phrase in extract for phrase in PLACE_INDICATOR_PHRASES):
|
||||||
|
return {
|
||||||
|
"inferred_type": "place",
|
||||||
|
"confidence": 0.80,
|
||||||
|
"wiki_summary": extract[:200],
|
||||||
|
"wiki_title": title,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Found but doesn't match name/place patterns
|
||||||
|
return {
|
||||||
|
"inferred_type": "concept",
|
||||||
|
"confidence": 0.60,
|
||||||
|
"wiki_summary": extract[:200],
|
||||||
|
"wiki_title": title,
|
||||||
|
}
|
||||||
|
|
||||||
|
except urllib.error.HTTPError as e:
|
||||||
|
if e.code == 404:
|
||||||
|
# Not in Wikipedia — strong signal it's a proper noun (unusual name, nickname)
|
||||||
|
return {
|
||||||
|
"inferred_type": "person",
|
||||||
|
"confidence": 0.70,
|
||||||
|
"wiki_summary": None,
|
||||||
|
"wiki_title": None,
|
||||||
|
"note": "not found in Wikipedia — likely a proper noun or unusual name",
|
||||||
|
}
|
||||||
|
return {"inferred_type": "unknown", "confidence": 0.0, "wiki_summary": None}
|
||||||
|
except Exception:
|
||||||
|
return {"inferred_type": "unknown", "confidence": 0.0, "wiki_summary": None}
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Entity Registry
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
class EntityRegistry:
|
||||||
|
"""
|
||||||
|
Persistent personal entity registry.
|
||||||
|
|
||||||
|
Stored at ~/.mempalace/entity_registry.json
|
||||||
|
Schema:
|
||||||
|
{
|
||||||
|
"mode": "personal", # work | personal | combo
|
||||||
|
"version": 1,
|
||||||
|
"people": {
|
||||||
|
"Riley": {
|
||||||
|
"source": "onboarding",
|
||||||
|
"contexts": ["personal"],
|
||||||
|
"aliases": [],
|
||||||
|
"relationship": "daughter",
|
||||||
|
"confidence": 1.0
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"projects": ["MemPalace", "Acme"],
|
||||||
|
"ambiguous_flags": ["riley", "max"],
|
||||||
|
"wiki_cache": {
|
||||||
|
"Sam": {"inferred_type": "person", "confidence": 0.9, "confirmed": true, ...}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
|
||||||
|
DEFAULT_PATH = Path.home() / ".mempalace" / "entity_registry.json"
|
||||||
|
|
||||||
|
def __init__(self, data: dict, path: Path):
|
||||||
|
self._data = data
|
||||||
|
self._path = path
|
||||||
|
|
||||||
|
# ── Load / Save ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def load(cls, config_dir: Optional[Path] = None) -> "EntityRegistry":
|
||||||
|
path = (Path(config_dir) / "entity_registry.json") if config_dir else cls.DEFAULT_PATH
|
||||||
|
if path.exists():
|
||||||
|
try:
|
||||||
|
data = json.loads(path.read_text())
|
||||||
|
return cls(data, path)
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
pass
|
||||||
|
return cls(cls._empty(), path)
|
||||||
|
|
||||||
|
def save(self):
|
||||||
|
self._path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
self._path.write_text(json.dumps(self._data, indent=2))
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _empty() -> dict:
|
||||||
|
return {
|
||||||
|
"version": 1,
|
||||||
|
"mode": "personal",
|
||||||
|
"people": {},
|
||||||
|
"projects": [],
|
||||||
|
"ambiguous_flags": [],
|
||||||
|
"wiki_cache": {},
|
||||||
|
}
|
||||||
|
|
||||||
|
# ── Properties ───────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
@property
|
||||||
|
def mode(self) -> str:
|
||||||
|
return self._data.get("mode", "personal")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def people(self) -> dict:
|
||||||
|
return self._data.get("people", {})
|
||||||
|
|
||||||
|
@property
|
||||||
|
def projects(self) -> list:
|
||||||
|
return self._data.get("projects", [])
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ambiguous_flags(self) -> list:
|
||||||
|
return self._data.get("ambiguous_flags", [])
|
||||||
|
|
||||||
|
# ── Seed from onboarding ─────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def seed(self, mode: str, people: list, projects: list, aliases: dict = None):
|
||||||
|
"""
|
||||||
|
Seed the registry from onboarding data.
|
||||||
|
|
||||||
|
people: list of dicts {"name": str, "relationship": str, "context": str}
|
||||||
|
projects: list of str
|
||||||
|
aliases: dict {"Max": "Maxwell", ...}
|
||||||
|
"""
|
||||||
|
self._data["mode"] = mode
|
||||||
|
self._data["projects"] = list(projects)
|
||||||
|
|
||||||
|
aliases = aliases or {}
|
||||||
|
reverse_aliases = {v: k for k, v in aliases.items()} # Maxwell → Max
|
||||||
|
|
||||||
|
for entry in people:
|
||||||
|
name = entry["name"].strip()
|
||||||
|
if not name:
|
||||||
|
continue
|
||||||
|
context = entry.get("context", "personal")
|
||||||
|
relationship = entry.get("relationship", "")
|
||||||
|
|
||||||
|
self._data["people"][name] = {
|
||||||
|
"source": "onboarding",
|
||||||
|
"contexts": [context],
|
||||||
|
"aliases": [reverse_aliases[name]] if name in reverse_aliases else [],
|
||||||
|
"relationship": relationship,
|
||||||
|
"confidence": 1.0,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Also register aliases
|
||||||
|
if name in reverse_aliases:
|
||||||
|
alias = reverse_aliases[name]
|
||||||
|
self._data["people"][alias] = {
|
||||||
|
"source": "onboarding",
|
||||||
|
"contexts": [context],
|
||||||
|
"aliases": [name],
|
||||||
|
"relationship": relationship,
|
||||||
|
"confidence": 1.0,
|
||||||
|
"canonical": name,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Flag ambiguous names (also common English words)
|
||||||
|
ambiguous = []
|
||||||
|
for name in self._data["people"]:
|
||||||
|
if name.lower() in COMMON_ENGLISH_WORDS:
|
||||||
|
ambiguous.append(name.lower())
|
||||||
|
self._data["ambiguous_flags"] = ambiguous
|
||||||
|
|
||||||
|
self.save()
|
||||||
|
|
||||||
|
# ── Lookup ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def lookup(self, word: str, context: str = "") -> dict:
|
||||||
|
"""
|
||||||
|
Look up a word. Returns entity classification.
|
||||||
|
|
||||||
|
context: surrounding sentence (used for disambiguation of ambiguous words)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
{"type": "person"|"project"|"concept"|"unknown",
|
||||||
|
"confidence": float,
|
||||||
|
"source": "onboarding"|"learned"|"wiki"|"inferred",
|
||||||
|
"name": canonical name if found,
|
||||||
|
"needs_disambiguation": bool}
|
||||||
|
"""
|
||||||
|
# 1. Exact match in people registry
|
||||||
|
for canonical, info in self.people.items():
|
||||||
|
if word.lower() == canonical.lower() or word.lower() in [
|
||||||
|
a.lower() for a in info.get("aliases", [])
|
||||||
|
]:
|
||||||
|
# Check if this is an ambiguous word
|
||||||
|
if word.lower() in self.ambiguous_flags and context:
|
||||||
|
resolved = self._disambiguate(word, context, info)
|
||||||
|
if resolved is not None:
|
||||||
|
return resolved
|
||||||
|
return {
|
||||||
|
"type": "person",
|
||||||
|
"confidence": info["confidence"],
|
||||||
|
"source": info["source"],
|
||||||
|
"name": canonical,
|
||||||
|
"context": info.get("contexts", ["personal"]),
|
||||||
|
"needs_disambiguation": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
# 2. Project match
|
||||||
|
for proj in self.projects:
|
||||||
|
if word.lower() == proj.lower():
|
||||||
|
return {
|
||||||
|
"type": "project",
|
||||||
|
"confidence": 1.0,
|
||||||
|
"source": "onboarding",
|
||||||
|
"name": proj,
|
||||||
|
"needs_disambiguation": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
# 3. Wiki cache
|
||||||
|
cache = self._data.get("wiki_cache", {})
|
||||||
|
for cached_word, cached_result in cache.items():
|
||||||
|
if word.lower() == cached_word.lower() and cached_result.get("confirmed"):
|
||||||
|
return {
|
||||||
|
"type": cached_result["inferred_type"],
|
||||||
|
"confidence": cached_result["confidence"],
|
||||||
|
"source": "wiki",
|
||||||
|
"name": word,
|
||||||
|
"needs_disambiguation": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
"type": "unknown",
|
||||||
|
"confidence": 0.0,
|
||||||
|
"source": "none",
|
||||||
|
"name": word,
|
||||||
|
"needs_disambiguation": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
def _disambiguate(self, word: str, context: str, person_info: dict) -> Optional[dict]:
|
||||||
|
"""
|
||||||
|
When a word is both a name and a common word, check context.
|
||||||
|
Returns person result if context suggests a name, None if ambiguous.
|
||||||
|
"""
|
||||||
|
name_lower = word.lower()
|
||||||
|
ctx_lower = context.lower()
|
||||||
|
|
||||||
|
# Check person context patterns
|
||||||
|
person_score = 0
|
||||||
|
for pat in PERSON_CONTEXT_PATTERNS:
|
||||||
|
if re.search(pat.format(name=re.escape(name_lower)), ctx_lower):
|
||||||
|
person_score += 1
|
||||||
|
|
||||||
|
# Check concept context patterns
|
||||||
|
concept_score = 0
|
||||||
|
for pat in CONCEPT_CONTEXT_PATTERNS:
|
||||||
|
if re.search(pat.format(name=re.escape(name_lower)), ctx_lower):
|
||||||
|
concept_score += 1
|
||||||
|
|
||||||
|
if person_score > concept_score:
|
||||||
|
return {
|
||||||
|
"type": "person",
|
||||||
|
"confidence": min(0.95, 0.7 + person_score * 0.1),
|
||||||
|
"source": person_info["source"],
|
||||||
|
"name": word,
|
||||||
|
"context": person_info.get("contexts", ["personal"]),
|
||||||
|
"needs_disambiguation": False,
|
||||||
|
"disambiguated_by": "context_patterns",
|
||||||
|
}
|
||||||
|
elif concept_score > person_score:
|
||||||
|
return {
|
||||||
|
"type": "concept",
|
||||||
|
"confidence": min(0.90, 0.7 + concept_score * 0.1),
|
||||||
|
"source": "context_disambiguated",
|
||||||
|
"name": word,
|
||||||
|
"needs_disambiguation": False,
|
||||||
|
"disambiguated_by": "context_patterns",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Truly ambiguous — return None to fall through to person (registered name)
|
||||||
|
return None
|
||||||
|
|
||||||
|
# ── Research unknown words ───────────────────────────────────────────────
|
||||||
|
|
||||||
|
def research(self, word: str, auto_confirm: bool = False) -> dict:
|
||||||
|
"""
|
||||||
|
Research an unknown word via Wikipedia.
|
||||||
|
Caches result. If auto_confirm=False, marks as unconfirmed (needs user review).
|
||||||
|
Returns the lookup result.
|
||||||
|
"""
|
||||||
|
# Already cached?
|
||||||
|
cache = self._data.setdefault("wiki_cache", {})
|
||||||
|
if word in cache:
|
||||||
|
return cache[word]
|
||||||
|
|
||||||
|
result = _wikipedia_lookup(word)
|
||||||
|
result["word"] = word
|
||||||
|
result["confirmed"] = auto_confirm
|
||||||
|
|
||||||
|
cache[word] = result
|
||||||
|
self.save()
|
||||||
|
return result
|
||||||
|
|
||||||
|
def confirm_research(
|
||||||
|
self, word: str, entity_type: str, relationship: str = "", context: str = "personal"
|
||||||
|
):
|
||||||
|
"""Mark a researched word as confirmed and add to people registry."""
|
||||||
|
cache = self._data.get("wiki_cache", {})
|
||||||
|
if word in cache:
|
||||||
|
cache[word]["confirmed"] = True
|
||||||
|
cache[word]["confirmed_type"] = entity_type
|
||||||
|
|
||||||
|
if entity_type == "person":
|
||||||
|
self._data["people"][word] = {
|
||||||
|
"source": "wiki",
|
||||||
|
"contexts": [context],
|
||||||
|
"aliases": [],
|
||||||
|
"relationship": relationship,
|
||||||
|
"confidence": 0.90,
|
||||||
|
}
|
||||||
|
if word.lower() in COMMON_ENGLISH_WORDS:
|
||||||
|
flags = self._data.setdefault("ambiguous_flags", [])
|
||||||
|
if word.lower() not in flags:
|
||||||
|
flags.append(word.lower())
|
||||||
|
|
||||||
|
self.save()
|
||||||
|
|
||||||
|
# ── Learn from sessions ──────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def learn_from_text(self, text: str, min_confidence: float = 0.75) -> list:
|
||||||
|
"""
|
||||||
|
Scan session text for new entity candidates.
|
||||||
|
Returns list of newly discovered candidates for review.
|
||||||
|
"""
|
||||||
|
from mempalace.entity_detector import extract_candidates, score_entity, classify_entity
|
||||||
|
|
||||||
|
lines = text.splitlines()
|
||||||
|
candidates = extract_candidates(text)
|
||||||
|
new_candidates = []
|
||||||
|
|
||||||
|
for name, frequency in candidates.items():
|
||||||
|
# Skip if already known
|
||||||
|
if name in self.people or name in self.projects:
|
||||||
|
continue
|
||||||
|
|
||||||
|
scores = score_entity(name, text, lines)
|
||||||
|
entity = classify_entity(name, frequency, scores)
|
||||||
|
|
||||||
|
if entity["type"] == "person" and entity["confidence"] >= min_confidence:
|
||||||
|
self._data["people"][name] = {
|
||||||
|
"source": "learned",
|
||||||
|
"contexts": [self.mode if self.mode != "combo" else "personal"],
|
||||||
|
"aliases": [],
|
||||||
|
"relationship": "",
|
||||||
|
"confidence": entity["confidence"],
|
||||||
|
"seen_count": frequency,
|
||||||
|
}
|
||||||
|
if name.lower() in COMMON_ENGLISH_WORDS:
|
||||||
|
flags = self._data.setdefault("ambiguous_flags", [])
|
||||||
|
if name.lower() not in flags:
|
||||||
|
flags.append(name.lower())
|
||||||
|
new_candidates.append(entity)
|
||||||
|
|
||||||
|
if new_candidates:
|
||||||
|
self.save()
|
||||||
|
|
||||||
|
return new_candidates
|
||||||
|
|
||||||
|
# ── Query helpers for retrieval ──────────────────────────────────────────
|
||||||
|
|
||||||
|
def extract_people_from_query(self, query: str) -> list:
|
||||||
|
"""
|
||||||
|
Extract known person names from a query string.
|
||||||
|
Returns list of canonical names found.
|
||||||
|
"""
|
||||||
|
found = []
|
||||||
|
query.lower()
|
||||||
|
|
||||||
|
for canonical, info in self.people.items():
|
||||||
|
names_to_check = [canonical] + info.get("aliases", [])
|
||||||
|
for name in names_to_check:
|
||||||
|
# Word boundary match
|
||||||
|
if re.search(rf"\b{re.escape(name)}\b", query, re.IGNORECASE):
|
||||||
|
# For ambiguous words, check context
|
||||||
|
if name.lower() in self.ambiguous_flags:
|
||||||
|
result = self._disambiguate(name, query, info)
|
||||||
|
if result and result["type"] == "person":
|
||||||
|
if canonical not in found:
|
||||||
|
found.append(canonical)
|
||||||
|
else:
|
||||||
|
if canonical not in found:
|
||||||
|
found.append(canonical)
|
||||||
|
return found
|
||||||
|
|
||||||
|
def extract_unknown_candidates(self, query: str) -> list:
|
||||||
|
"""
|
||||||
|
Find capitalized words in query that aren't in registry or common words.
|
||||||
|
These are candidates for Wikipedia research.
|
||||||
|
"""
|
||||||
|
candidates = re.findall(r"\b[A-Z][a-z]{2,15}\b", query)
|
||||||
|
unknown = []
|
||||||
|
for word in set(candidates):
|
||||||
|
if word.lower() in COMMON_ENGLISH_WORDS:
|
||||||
|
continue
|
||||||
|
result = self.lookup(word)
|
||||||
|
if result["type"] == "unknown":
|
||||||
|
unknown.append(word)
|
||||||
|
return unknown
|
||||||
|
|
||||||
|
# ── Summary ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def summary(self) -> str:
|
||||||
|
lines = [
|
||||||
|
f"Mode: {self.mode}",
|
||||||
|
f"People: {len(self.people)} ({', '.join(list(self.people.keys())[:8])}{'...' if len(self.people) > 8 else ''})",
|
||||||
|
f"Projects: {', '.join(self.projects) or '(none)'}",
|
||||||
|
f"Ambiguous flags: {', '.join(self.ambiguous_flags) or '(none)'}",
|
||||||
|
f"Wiki cache: {len(self._data.get('wiki_cache', {}))} entries",
|
||||||
|
]
|
||||||
|
return "\n".join(lines)
|
||||||
@@ -0,0 +1,521 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
general_extractor.py — Extract 5 types of memories from text.
|
||||||
|
|
||||||
|
Types:
|
||||||
|
1. DECISIONS — "we went with X because Y", choices made
|
||||||
|
2. PREFERENCES — "always use X", "never do Y", "I prefer Z"
|
||||||
|
3. MILESTONES — breakthroughs, things that finally worked
|
||||||
|
4. PROBLEMS — what broke, what fixed it, root causes
|
||||||
|
5. EMOTIONAL — feelings, vulnerability, relationships
|
||||||
|
|
||||||
|
No LLM required. Pure keyword/pattern heuristics.
|
||||||
|
No external dependencies on palace.py, dialect.py, or layers.py.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
from general_extractor import extract_memories
|
||||||
|
|
||||||
|
chunks = extract_memories(text)
|
||||||
|
# [{"content": "...", "memory_type": "decision", "chunk_index": 0}, ...]
|
||||||
|
"""
|
||||||
|
|
||||||
|
import re
|
||||||
|
from typing import List, Dict, Tuple
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# MARKER SETS — One per memory type
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
DECISION_MARKERS = [
|
||||||
|
r"\blet'?s (use|go with|try|pick|choose|switch to)\b",
|
||||||
|
r"\bwe (should|decided|chose|went with|picked|settled on)\b",
|
||||||
|
r"\bi'?m going (to|with)\b",
|
||||||
|
r"\bbetter (to|than|approach|option|choice)\b",
|
||||||
|
r"\binstead of\b",
|
||||||
|
r"\brather than\b",
|
||||||
|
r"\bthe reason (is|was|being)\b",
|
||||||
|
r"\bbecause\b",
|
||||||
|
r"\btrade-?off\b",
|
||||||
|
r"\bpros and cons\b",
|
||||||
|
r"\bover\b.*\bbecause\b",
|
||||||
|
r"\barchitecture\b",
|
||||||
|
r"\bapproach\b",
|
||||||
|
r"\bstrategy\b",
|
||||||
|
r"\bpattern\b",
|
||||||
|
r"\bstack\b",
|
||||||
|
r"\bframework\b",
|
||||||
|
r"\binfrastructure\b",
|
||||||
|
r"\bset (it |this )?to\b",
|
||||||
|
r"\bconfigure\b",
|
||||||
|
r"\bdefault\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
PREFERENCE_MARKERS = [
|
||||||
|
r"\bi prefer\b",
|
||||||
|
r"\balways use\b",
|
||||||
|
r"\bnever use\b",
|
||||||
|
r"\bdon'?t (ever |like to )?(use|do|mock|stub|import)\b",
|
||||||
|
r"\bi like (to|when|how)\b",
|
||||||
|
r"\bi hate (when|how|it when)\b",
|
||||||
|
r"\bplease (always|never|don'?t)\b",
|
||||||
|
r"\bmy (rule|preference|style|convention) is\b",
|
||||||
|
r"\bwe (always|never)\b",
|
||||||
|
r"\bfunctional\b.*\bstyle\b",
|
||||||
|
r"\bimperative\b",
|
||||||
|
r"\bsnake_?case\b",
|
||||||
|
r"\bcamel_?case\b",
|
||||||
|
r"\btabs\b.*\bspaces\b",
|
||||||
|
r"\bspaces\b.*\btabs\b",
|
||||||
|
r"\buse\b.*\binstead of\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
MILESTONE_MARKERS = [
|
||||||
|
r"\bit works\b",
|
||||||
|
r"\bit worked\b",
|
||||||
|
r"\bgot it working\b",
|
||||||
|
r"\bfixed\b",
|
||||||
|
r"\bsolved\b",
|
||||||
|
r"\bbreakthrough\b",
|
||||||
|
r"\bfigured (it )?out\b",
|
||||||
|
r"\bnailed it\b",
|
||||||
|
r"\bcracked (it|the)\b",
|
||||||
|
r"\bfinally\b",
|
||||||
|
r"\bfirst time\b",
|
||||||
|
r"\bfirst ever\b",
|
||||||
|
r"\bnever (done|been|had) before\b",
|
||||||
|
r"\bdiscovered\b",
|
||||||
|
r"\brealized\b",
|
||||||
|
r"\bfound (out|that)\b",
|
||||||
|
r"\bturns out\b",
|
||||||
|
r"\bthe key (is|was|insight)\b",
|
||||||
|
r"\bthe trick (is|was)\b",
|
||||||
|
r"\bnow i (understand|see|get it)\b",
|
||||||
|
r"\bbuilt\b",
|
||||||
|
r"\bcreated\b",
|
||||||
|
r"\bimplemented\b",
|
||||||
|
r"\bshipped\b",
|
||||||
|
r"\blaunched\b",
|
||||||
|
r"\bdeployed\b",
|
||||||
|
r"\breleased\b",
|
||||||
|
r"\bprototype\b",
|
||||||
|
r"\bproof of concept\b",
|
||||||
|
r"\bdemo\b",
|
||||||
|
r"\bversion \d",
|
||||||
|
r"\bv\d+\.\d+",
|
||||||
|
r"\d+x (compression|faster|slower|better|improvement|reduction)",
|
||||||
|
r"\d+% (reduction|improvement|faster|better|smaller)",
|
||||||
|
]
|
||||||
|
|
||||||
|
PROBLEM_MARKERS = [
|
||||||
|
r"\b(bug|error|crash|fail|broke|broken|issue|problem)\b",
|
||||||
|
r"\bdoesn'?t work\b",
|
||||||
|
r"\bnot working\b",
|
||||||
|
r"\bwon'?t\b.*\bwork\b",
|
||||||
|
r"\bkeeps? (failing|crashing|breaking|erroring)\b",
|
||||||
|
r"\broot cause\b",
|
||||||
|
r"\bthe (problem|issue|bug) (is|was)\b",
|
||||||
|
r"\bturns out\b.*\b(was|because|due to)\b",
|
||||||
|
r"\bthe fix (is|was)\b",
|
||||||
|
r"\bworkaround\b",
|
||||||
|
r"\bthat'?s why\b",
|
||||||
|
r"\bthe reason it\b",
|
||||||
|
r"\bfixed (it |the |by )\b",
|
||||||
|
r"\bsolution (is|was)\b",
|
||||||
|
r"\bresolved\b",
|
||||||
|
r"\bpatched\b",
|
||||||
|
r"\bthe answer (is|was)\b",
|
||||||
|
r"\b(had|need) to\b.*\binstead\b",
|
||||||
|
]
|
||||||
|
|
||||||
|
EMOTION_MARKERS = [
|
||||||
|
r"\blove\b",
|
||||||
|
r"\bscared\b",
|
||||||
|
r"\bafraid\b",
|
||||||
|
r"\bproud\b",
|
||||||
|
r"\bhurt\b",
|
||||||
|
r"\bhappy\b",
|
||||||
|
r"\bsad\b",
|
||||||
|
r"\bcry\b",
|
||||||
|
r"\bcrying\b",
|
||||||
|
r"\bmiss\b",
|
||||||
|
r"\bsorry\b",
|
||||||
|
r"\bgrateful\b",
|
||||||
|
r"\bangry\b",
|
||||||
|
r"\bworried\b",
|
||||||
|
r"\blonely\b",
|
||||||
|
r"\bbeautiful\b",
|
||||||
|
r"\bamazing\b",
|
||||||
|
r"\bwonderful\b",
|
||||||
|
r"i feel",
|
||||||
|
r"i'm scared",
|
||||||
|
r"i love you",
|
||||||
|
r"i'm sorry",
|
||||||
|
r"i can't",
|
||||||
|
r"i wish",
|
||||||
|
r"i miss",
|
||||||
|
r"i need",
|
||||||
|
r"never told anyone",
|
||||||
|
r"nobody knows",
|
||||||
|
r"\*[^*]+\*",
|
||||||
|
]
|
||||||
|
|
||||||
|
ALL_MARKERS = {
|
||||||
|
"decision": DECISION_MARKERS,
|
||||||
|
"preference": PREFERENCE_MARKERS,
|
||||||
|
"milestone": MILESTONE_MARKERS,
|
||||||
|
"problem": PROBLEM_MARKERS,
|
||||||
|
"emotional": EMOTION_MARKERS,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# SENTIMENT — for disambiguation
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
POSITIVE_WORDS = {
|
||||||
|
"pride",
|
||||||
|
"proud",
|
||||||
|
"joy",
|
||||||
|
"happy",
|
||||||
|
"love",
|
||||||
|
"loving",
|
||||||
|
"beautiful",
|
||||||
|
"amazing",
|
||||||
|
"wonderful",
|
||||||
|
"incredible",
|
||||||
|
"fantastic",
|
||||||
|
"brilliant",
|
||||||
|
"perfect",
|
||||||
|
"excited",
|
||||||
|
"thrilled",
|
||||||
|
"grateful",
|
||||||
|
"warm",
|
||||||
|
"breakthrough",
|
||||||
|
"success",
|
||||||
|
"works",
|
||||||
|
"working",
|
||||||
|
"solved",
|
||||||
|
"fixed",
|
||||||
|
"nailed",
|
||||||
|
"heart",
|
||||||
|
"hug",
|
||||||
|
"precious",
|
||||||
|
"adore",
|
||||||
|
}
|
||||||
|
|
||||||
|
NEGATIVE_WORDS = {
|
||||||
|
"bug",
|
||||||
|
"error",
|
||||||
|
"crash",
|
||||||
|
"crashing",
|
||||||
|
"crashed",
|
||||||
|
"fail",
|
||||||
|
"failed",
|
||||||
|
"failing",
|
||||||
|
"failure",
|
||||||
|
"broken",
|
||||||
|
"broke",
|
||||||
|
"breaking",
|
||||||
|
"breaks",
|
||||||
|
"issue",
|
||||||
|
"problem",
|
||||||
|
"wrong",
|
||||||
|
"stuck",
|
||||||
|
"blocked",
|
||||||
|
"unable",
|
||||||
|
"impossible",
|
||||||
|
"missing",
|
||||||
|
"terrible",
|
||||||
|
"horrible",
|
||||||
|
"awful",
|
||||||
|
"worse",
|
||||||
|
"worst",
|
||||||
|
"panic",
|
||||||
|
"disaster",
|
||||||
|
"mess",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _get_sentiment(text: str) -> str:
|
||||||
|
"""Quick sentiment: 'positive', 'negative', or 'neutral'."""
|
||||||
|
words = set(w.lower() for w in re.findall(r"\b\w+\b", text))
|
||||||
|
pos = len(words & POSITIVE_WORDS)
|
||||||
|
neg = len(words & NEGATIVE_WORDS)
|
||||||
|
if pos > neg:
|
||||||
|
return "positive"
|
||||||
|
elif neg > pos:
|
||||||
|
return "negative"
|
||||||
|
return "neutral"
|
||||||
|
|
||||||
|
|
||||||
|
def _has_resolution(text: str) -> bool:
|
||||||
|
"""Check if text describes a RESOLVED problem."""
|
||||||
|
text_lower = text.lower()
|
||||||
|
patterns = [
|
||||||
|
r"\bfixed\b",
|
||||||
|
r"\bsolved\b",
|
||||||
|
r"\bresolved\b",
|
||||||
|
r"\bpatched\b",
|
||||||
|
r"\bgot it working\b",
|
||||||
|
r"\bit works\b",
|
||||||
|
r"\bnailed it\b",
|
||||||
|
r"\bfigured (it )?out\b",
|
||||||
|
r"\bthe (fix|answer|solution)\b",
|
||||||
|
]
|
||||||
|
return any(re.search(p, text_lower) for p in patterns)
|
||||||
|
|
||||||
|
|
||||||
|
def _disambiguate(memory_type: str, text: str, scores: Dict[str, float]) -> str:
|
||||||
|
"""Fix misclassifications using sentiment + resolution."""
|
||||||
|
sentiment = _get_sentiment(text)
|
||||||
|
|
||||||
|
# Resolved problems are milestones
|
||||||
|
if memory_type == "problem" and _has_resolution(text):
|
||||||
|
if scores.get("emotional", 0) > 0 and sentiment == "positive":
|
||||||
|
return "emotional"
|
||||||
|
return "milestone"
|
||||||
|
|
||||||
|
# Problem + positive sentiment => milestone or emotional
|
||||||
|
if memory_type == "problem" and sentiment == "positive":
|
||||||
|
if scores.get("milestone", 0) > 0:
|
||||||
|
return "milestone"
|
||||||
|
if scores.get("emotional", 0) > 0:
|
||||||
|
return "emotional"
|
||||||
|
|
||||||
|
return memory_type
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# CODE LINE FILTERING
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
_CODE_LINE_PATTERNS = [
|
||||||
|
re.compile(r"^\s*[\$#]\s"),
|
||||||
|
re.compile(
|
||||||
|
r"^\s*(cd|source|echo|export|pip|npm|git|python|bash|curl|wget|mkdir|rm|cp|mv|ls|cat|grep|find|chmod|sudo|brew|docker)\s"
|
||||||
|
),
|
||||||
|
re.compile(r"^\s*```"),
|
||||||
|
re.compile(r"^\s*(import|from|def|class|function|const|let|var|return)\s"),
|
||||||
|
re.compile(r"^\s*[A-Z_]{2,}="),
|
||||||
|
re.compile(r"^\s*\|"),
|
||||||
|
re.compile(r"^\s*[-]{2,}"),
|
||||||
|
re.compile(r"^\s*[{}\[\]]\s*$"),
|
||||||
|
re.compile(r"^\s*(if|for|while|try|except|elif|else:)\b"),
|
||||||
|
re.compile(r"^\s*\w+\.\w+\("),
|
||||||
|
re.compile(r"^\s*\w+ = \w+\.\w+"),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _is_code_line(line: str) -> bool:
|
||||||
|
stripped = line.strip()
|
||||||
|
if not stripped:
|
||||||
|
return False
|
||||||
|
for pattern in _CODE_LINE_PATTERNS:
|
||||||
|
if pattern.match(stripped):
|
||||||
|
return True
|
||||||
|
alpha_ratio = sum(1 for c in stripped if c.isalpha()) / max(len(stripped), 1)
|
||||||
|
if alpha_ratio < 0.4 and len(stripped) > 10:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_prose(text: str) -> str:
|
||||||
|
"""Extract only prose lines (skip code) for classification scoring."""
|
||||||
|
lines = text.split("\n")
|
||||||
|
prose = []
|
||||||
|
in_code = False
|
||||||
|
for line in lines:
|
||||||
|
if line.strip().startswith("```"):
|
||||||
|
in_code = not in_code
|
||||||
|
continue
|
||||||
|
if in_code:
|
||||||
|
continue
|
||||||
|
if not _is_code_line(line):
|
||||||
|
prose.append(line)
|
||||||
|
result = "\n".join(prose).strip()
|
||||||
|
return result if result else text
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# SCORING
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def _score_markers(text: str, markers: List[str]) -> Tuple[float, List[str]]:
|
||||||
|
"""Score text against regex markers. Returns (score, matched_keywords)."""
|
||||||
|
text_lower = text.lower()
|
||||||
|
score = 0.0
|
||||||
|
keywords = []
|
||||||
|
for marker in markers:
|
||||||
|
matches = re.findall(marker, text_lower)
|
||||||
|
if matches:
|
||||||
|
score += len(matches)
|
||||||
|
keywords.extend(m if isinstance(m, str) else m[0] if m else marker for m in matches)
|
||||||
|
return score, list(set(keywords))
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# MAIN EXTRACTION
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def extract_memories(text: str, min_confidence: float = 0.3) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Extract memories from a text string.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: The text to extract from (any format).
|
||||||
|
min_confidence: Minimum confidence threshold (0.0-1.0).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of dicts: {"content": str, "memory_type": str, "chunk_index": int}
|
||||||
|
"""
|
||||||
|
# Split into paragraphs (double newline or speaker-turn boundaries)
|
||||||
|
paragraphs = _split_into_segments(text)
|
||||||
|
memories = []
|
||||||
|
|
||||||
|
for para in paragraphs:
|
||||||
|
if len(para.strip()) < 20:
|
||||||
|
continue
|
||||||
|
|
||||||
|
prose = _extract_prose(para)
|
||||||
|
|
||||||
|
# Score against all types
|
||||||
|
scores = {}
|
||||||
|
for mem_type, markers in ALL_MARKERS.items():
|
||||||
|
score, _ = _score_markers(prose, markers)
|
||||||
|
if score > 0:
|
||||||
|
scores[mem_type] = score
|
||||||
|
|
||||||
|
if not scores:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Length bonus
|
||||||
|
if len(para) > 500:
|
||||||
|
length_bonus = 2
|
||||||
|
elif len(para) > 200:
|
||||||
|
length_bonus = 1
|
||||||
|
else:
|
||||||
|
length_bonus = 0
|
||||||
|
|
||||||
|
max_type = max(scores, key=scores.get)
|
||||||
|
max_score = scores[max_type] + length_bonus
|
||||||
|
|
||||||
|
# Disambiguate
|
||||||
|
max_type = _disambiguate(max_type, prose, scores)
|
||||||
|
|
||||||
|
# Confidence
|
||||||
|
confidence = min(1.0, max_score / 5.0)
|
||||||
|
if confidence < min_confidence:
|
||||||
|
continue
|
||||||
|
|
||||||
|
memories.append(
|
||||||
|
{
|
||||||
|
"content": para.strip(),
|
||||||
|
"memory_type": max_type,
|
||||||
|
"chunk_index": len(memories),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return memories
|
||||||
|
|
||||||
|
|
||||||
|
def _split_into_segments(text: str) -> List[str]:
|
||||||
|
"""
|
||||||
|
Split text into segments suitable for memory extraction.
|
||||||
|
|
||||||
|
Tries speaker-turn splitting first (> markers, "Human:", "Assistant:", etc.),
|
||||||
|
then falls back to paragraph splitting.
|
||||||
|
"""
|
||||||
|
lines = text.split("\n")
|
||||||
|
|
||||||
|
# Check for speaker-turn markers
|
||||||
|
turn_patterns = [
|
||||||
|
re.compile(r"^>\s"), # > quoted user turn
|
||||||
|
re.compile(r"^(Human|User|Q)\s*:", re.I), # Human: / User:
|
||||||
|
re.compile(r"^(Assistant|AI|A|Claude|ChatGPT)\s*:", re.I),
|
||||||
|
]
|
||||||
|
|
||||||
|
turn_count = 0
|
||||||
|
for line in lines:
|
||||||
|
stripped = line.strip()
|
||||||
|
for pat in turn_patterns:
|
||||||
|
if pat.match(stripped):
|
||||||
|
turn_count += 1
|
||||||
|
break
|
||||||
|
|
||||||
|
# If enough turn markers, split by turns
|
||||||
|
if turn_count >= 3:
|
||||||
|
return _split_by_turns(lines, turn_patterns)
|
||||||
|
|
||||||
|
# Fallback: paragraph splitting
|
||||||
|
paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
|
||||||
|
|
||||||
|
# If single giant block, chunk by line groups
|
||||||
|
if len(paragraphs) <= 1 and len(lines) > 20:
|
||||||
|
segments = []
|
||||||
|
for i in range(0, len(lines), 25):
|
||||||
|
group = "\n".join(lines[i : i + 25]).strip()
|
||||||
|
if group:
|
||||||
|
segments.append(group)
|
||||||
|
return segments
|
||||||
|
|
||||||
|
return paragraphs
|
||||||
|
|
||||||
|
|
||||||
|
def _split_by_turns(lines: List[str], turn_patterns: List[re.Pattern]) -> List[str]:
|
||||||
|
"""Split lines into segments at each speaker turn boundary."""
|
||||||
|
segments = []
|
||||||
|
current = []
|
||||||
|
|
||||||
|
for line in lines:
|
||||||
|
stripped = line.strip()
|
||||||
|
is_turn = any(pat.match(stripped) for pat in turn_patterns)
|
||||||
|
|
||||||
|
if is_turn and current:
|
||||||
|
segments.append("\n".join(current))
|
||||||
|
current = [line]
|
||||||
|
else:
|
||||||
|
current.append(line)
|
||||||
|
|
||||||
|
if current:
|
||||||
|
segments.append("\n".join(current))
|
||||||
|
|
||||||
|
return segments
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# CLI
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: python general_extractor.py <file>")
|
||||||
|
print()
|
||||||
|
print("Extracts decisions, preferences, milestones, problems, and")
|
||||||
|
print("emotional moments from any text file.")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
filepath = sys.argv[1]
|
||||||
|
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
|
||||||
|
text = f.read()
|
||||||
|
|
||||||
|
memories = extract_memories(text)
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
from collections import Counter
|
||||||
|
|
||||||
|
type_counts = Counter(m["memory_type"] for m in memories)
|
||||||
|
print(f"Extracted {len(memories)} memories:")
|
||||||
|
for mtype in ["decision", "preference", "milestone", "problem", "emotional"]:
|
||||||
|
count = type_counts.get(mtype, 0)
|
||||||
|
if count:
|
||||||
|
print(f" {mtype:12} {count}")
|
||||||
|
|
||||||
|
print()
|
||||||
|
for m in memories[:10]:
|
||||||
|
preview = m["content"][:80].replace("\n", " ")
|
||||||
|
print(f" [{m['memory_type']:10}] {preview}...")
|
||||||
@@ -0,0 +1,350 @@
|
|||||||
|
"""
|
||||||
|
knowledge_graph.py — Temporal Entity-Relationship Graph for MemPalace
|
||||||
|
=====================================================================
|
||||||
|
|
||||||
|
Real knowledge graph with:
|
||||||
|
- Entity nodes (people, projects, tools, concepts)
|
||||||
|
- Typed relationship edges (daughter_of, does, loves, works_on, etc.)
|
||||||
|
- Temporal validity (valid_from → valid_to — knows WHEN facts are true)
|
||||||
|
- Closet references (links back to the verbatim memory)
|
||||||
|
|
||||||
|
Storage: SQLite (local, no dependencies, no subscriptions)
|
||||||
|
Query: entity-first traversal with time filtering
|
||||||
|
|
||||||
|
This is what competes with Zep's temporal knowledge graph.
|
||||||
|
Zep uses Neo4j in the cloud ($25/mo+). We use SQLite locally (free).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
from mempalace.knowledge_graph import KnowledgeGraph
|
||||||
|
|
||||||
|
kg = KnowledgeGraph()
|
||||||
|
kg.add_triple("Max", "child_of", "Alice", valid_from="2015-04-01")
|
||||||
|
kg.add_triple("Max", "does", "swimming", valid_from="2025-01-01")
|
||||||
|
kg.add_triple("Max", "loves", "chess", valid_from="2025-10-01")
|
||||||
|
|
||||||
|
# Query: everything about Max
|
||||||
|
kg.query_entity("Max")
|
||||||
|
|
||||||
|
# Query: what was true about Max in January 2026?
|
||||||
|
kg.query_entity("Max", as_of="2026-01-15")
|
||||||
|
|
||||||
|
# Query: who is connected to Alice?
|
||||||
|
kg.query_entity("Alice", direction="both")
|
||||||
|
|
||||||
|
# Invalidate: Max's sports injury resolved
|
||||||
|
kg.invalidate("Max", "has_issue", "sports_injury", ended="2026-02-15")
|
||||||
|
"""
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sqlite3
|
||||||
|
from datetime import date, datetime
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
DEFAULT_KG_PATH = os.path.expanduser("~/.mempalace/knowledge_graph.sqlite3")
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeGraph:
|
||||||
|
def __init__(self, db_path: str = None):
|
||||||
|
self.db_path = db_path or DEFAULT_KG_PATH
|
||||||
|
Path(self.db_path).parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
self._init_db()
|
||||||
|
|
||||||
|
def _init_db(self):
|
||||||
|
conn = self._conn()
|
||||||
|
conn.executescript("""
|
||||||
|
CREATE TABLE IF NOT EXISTS entities (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
name TEXT NOT NULL,
|
||||||
|
type TEXT DEFAULT 'unknown',
|
||||||
|
properties TEXT DEFAULT '{}',
|
||||||
|
created_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS triples (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
subject TEXT NOT NULL,
|
||||||
|
predicate TEXT NOT NULL,
|
||||||
|
object TEXT NOT NULL,
|
||||||
|
valid_from TEXT,
|
||||||
|
valid_to TEXT,
|
||||||
|
confidence REAL DEFAULT 1.0,
|
||||||
|
source_closet TEXT,
|
||||||
|
source_file TEXT,
|
||||||
|
extracted_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
||||||
|
FOREIGN KEY (subject) REFERENCES entities(id),
|
||||||
|
FOREIGN KEY (object) REFERENCES entities(id)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_triples_subject ON triples(subject);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_triples_object ON triples(object);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_triples_predicate ON triples(predicate);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_triples_valid ON triples(valid_from, valid_to);
|
||||||
|
""")
|
||||||
|
conn.commit()
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
def _conn(self):
|
||||||
|
return sqlite3.connect(self.db_path, timeout=10)
|
||||||
|
|
||||||
|
def _entity_id(self, name: str) -> str:
|
||||||
|
return name.lower().replace(" ", "_").replace("'", "")
|
||||||
|
|
||||||
|
# ── Write operations ──────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def add_entity(self, name: str, entity_type: str = "unknown", properties: dict = None):
|
||||||
|
"""Add or update an entity node."""
|
||||||
|
eid = self._entity_id(name)
|
||||||
|
props = json.dumps(properties or {})
|
||||||
|
conn = self._conn()
|
||||||
|
conn.execute(
|
||||||
|
"INSERT OR REPLACE INTO entities (id, name, type, properties) VALUES (?, ?, ?, ?)",
|
||||||
|
(eid, name, entity_type, props)
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
conn.close()
|
||||||
|
return eid
|
||||||
|
|
||||||
|
def add_triple(self, subject: str, predicate: str, obj: str,
|
||||||
|
valid_from: str = None, valid_to: str = None,
|
||||||
|
confidence: float = 1.0, source_closet: str = None,
|
||||||
|
source_file: str = None):
|
||||||
|
"""
|
||||||
|
Add a relationship triple: subject → predicate → object.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
add_triple("Max", "child_of", "Alice", valid_from="2015-04-01")
|
||||||
|
add_triple("Max", "does", "swimming", valid_from="2025-01-01")
|
||||||
|
add_triple("Alice", "worried_about", "Max injury", valid_from="2026-01", valid_to="2026-02")
|
||||||
|
"""
|
||||||
|
sub_id = self._entity_id(subject)
|
||||||
|
obj_id = self._entity_id(obj)
|
||||||
|
pred = predicate.lower().replace(" ", "_")
|
||||||
|
|
||||||
|
# Auto-create entities if they don't exist
|
||||||
|
conn = self._conn()
|
||||||
|
conn.execute(
|
||||||
|
"INSERT OR IGNORE INTO entities (id, name) VALUES (?, ?)",
|
||||||
|
(sub_id, subject)
|
||||||
|
)
|
||||||
|
conn.execute(
|
||||||
|
"INSERT OR IGNORE INTO entities (id, name) VALUES (?, ?)",
|
||||||
|
(obj_id, obj)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check for existing identical triple
|
||||||
|
existing = conn.execute(
|
||||||
|
"SELECT id FROM triples WHERE subject=? AND predicate=? AND object=? AND valid_to IS NULL",
|
||||||
|
(sub_id, pred, obj_id)
|
||||||
|
).fetchone()
|
||||||
|
|
||||||
|
if existing:
|
||||||
|
conn.close()
|
||||||
|
return existing[0] # Already exists and still valid
|
||||||
|
|
||||||
|
triple_id = f"t_{sub_id}_{pred}_{obj_id}_{hashlib.md5(f'{valid_from}{datetime.now().isoformat()}'.encode()).hexdigest()[:8]}"
|
||||||
|
|
||||||
|
conn.execute(
|
||||||
|
"""INSERT INTO triples (id, subject, predicate, object, valid_from, valid_to, confidence, source_closet, source_file)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
|
||||||
|
(triple_id, sub_id, pred, obj_id, valid_from, valid_to, confidence, source_closet, source_file)
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
conn.close()
|
||||||
|
return triple_id
|
||||||
|
|
||||||
|
def invalidate(self, subject: str, predicate: str, obj: str, ended: str = None):
|
||||||
|
"""Mark a relationship as no longer valid (set valid_to date)."""
|
||||||
|
sub_id = self._entity_id(subject)
|
||||||
|
obj_id = self._entity_id(obj)
|
||||||
|
pred = predicate.lower().replace(" ", "_")
|
||||||
|
ended = ended or date.today().isoformat()
|
||||||
|
|
||||||
|
conn = self._conn()
|
||||||
|
conn.execute(
|
||||||
|
"UPDATE triples SET valid_to=? WHERE subject=? AND predicate=? AND object=? AND valid_to IS NULL",
|
||||||
|
(ended, sub_id, pred, obj_id)
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
# ── Query operations ──────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def query_entity(self, name: str, as_of: str = None, direction: str = "outgoing"):
|
||||||
|
"""
|
||||||
|
Get all relationships for an entity.
|
||||||
|
|
||||||
|
direction: "outgoing" (entity → ?), "incoming" (? → entity), "both"
|
||||||
|
as_of: date string — only return facts valid at that time
|
||||||
|
"""
|
||||||
|
eid = self._entity_id(name)
|
||||||
|
conn = self._conn()
|
||||||
|
|
||||||
|
results = []
|
||||||
|
|
||||||
|
if direction in ("outgoing", "both"):
|
||||||
|
query = "SELECT t.*, e.name as obj_name FROM triples t JOIN entities e ON t.object = e.id WHERE t.subject = ?"
|
||||||
|
params = [eid]
|
||||||
|
if as_of:
|
||||||
|
query += " AND (t.valid_from IS NULL OR t.valid_from <= ?) AND (t.valid_to IS NULL OR t.valid_to >= ?)"
|
||||||
|
params.extend([as_of, as_of])
|
||||||
|
for row in conn.execute(query, params).fetchall():
|
||||||
|
results.append({
|
||||||
|
"direction": "outgoing",
|
||||||
|
"subject": name,
|
||||||
|
"predicate": row[2],
|
||||||
|
"object": row[10], # obj_name
|
||||||
|
"valid_from": row[4],
|
||||||
|
"valid_to": row[5],
|
||||||
|
"confidence": row[6],
|
||||||
|
"source_closet": row[7],
|
||||||
|
"current": row[5] is None,
|
||||||
|
})
|
||||||
|
|
||||||
|
if direction in ("incoming", "both"):
|
||||||
|
query = "SELECT t.*, e.name as sub_name FROM triples t JOIN entities e ON t.subject = e.id WHERE t.object = ?"
|
||||||
|
params = [eid]
|
||||||
|
if as_of:
|
||||||
|
query += " AND (t.valid_from IS NULL OR t.valid_from <= ?) AND (t.valid_to IS NULL OR t.valid_to >= ?)"
|
||||||
|
params.extend([as_of, as_of])
|
||||||
|
for row in conn.execute(query, params).fetchall():
|
||||||
|
results.append({
|
||||||
|
"direction": "incoming",
|
||||||
|
"subject": row[10], # sub_name
|
||||||
|
"predicate": row[2],
|
||||||
|
"object": name,
|
||||||
|
"valid_from": row[4],
|
||||||
|
"valid_to": row[5],
|
||||||
|
"confidence": row[6],
|
||||||
|
"source_closet": row[7],
|
||||||
|
"current": row[5] is None,
|
||||||
|
})
|
||||||
|
|
||||||
|
conn.close()
|
||||||
|
return results
|
||||||
|
|
||||||
|
def query_relationship(self, predicate: str, as_of: str = None):
|
||||||
|
"""Get all triples with a given relationship type."""
|
||||||
|
pred = predicate.lower().replace(" ", "_")
|
||||||
|
conn = self._conn()
|
||||||
|
query = """
|
||||||
|
SELECT t.*, s.name as sub_name, o.name as obj_name
|
||||||
|
FROM triples t
|
||||||
|
JOIN entities s ON t.subject = s.id
|
||||||
|
JOIN entities o ON t.object = o.id
|
||||||
|
WHERE t.predicate = ?
|
||||||
|
"""
|
||||||
|
params = [pred]
|
||||||
|
if as_of:
|
||||||
|
query += " AND (t.valid_from IS NULL OR t.valid_from <= ?) AND (t.valid_to IS NULL OR t.valid_to >= ?)"
|
||||||
|
params.extend([as_of, as_of])
|
||||||
|
|
||||||
|
results = []
|
||||||
|
for row in conn.execute(query, params).fetchall():
|
||||||
|
results.append({
|
||||||
|
"subject": row[10],
|
||||||
|
"predicate": pred,
|
||||||
|
"object": row[11],
|
||||||
|
"valid_from": row[4],
|
||||||
|
"valid_to": row[5],
|
||||||
|
"current": row[5] is None,
|
||||||
|
})
|
||||||
|
conn.close()
|
||||||
|
return results
|
||||||
|
|
||||||
|
def timeline(self, entity_name: str = None):
|
||||||
|
"""Get all facts in chronological order, optionally filtered by entity."""
|
||||||
|
conn = self._conn()
|
||||||
|
if entity_name:
|
||||||
|
eid = self._entity_id(entity_name)
|
||||||
|
rows = conn.execute("""
|
||||||
|
SELECT t.*, s.name as sub_name, o.name as obj_name
|
||||||
|
FROM triples t
|
||||||
|
JOIN entities s ON t.subject = s.id
|
||||||
|
JOIN entities o ON t.object = o.id
|
||||||
|
WHERE (t.subject = ? OR t.object = ?)
|
||||||
|
ORDER BY t.valid_from ASC NULLS LAST
|
||||||
|
""", (eid, eid)).fetchall()
|
||||||
|
else:
|
||||||
|
rows = conn.execute("""
|
||||||
|
SELECT t.*, s.name as sub_name, o.name as obj_name
|
||||||
|
FROM triples t
|
||||||
|
JOIN entities s ON t.subject = s.id
|
||||||
|
JOIN entities o ON t.object = o.id
|
||||||
|
ORDER BY t.valid_from ASC NULLS LAST
|
||||||
|
LIMIT 100
|
||||||
|
""").fetchall()
|
||||||
|
|
||||||
|
conn.close()
|
||||||
|
return [{
|
||||||
|
"subject": r[10],
|
||||||
|
"predicate": r[2],
|
||||||
|
"object": r[11],
|
||||||
|
"valid_from": r[4],
|
||||||
|
"valid_to": r[5],
|
||||||
|
"current": r[5] is None,
|
||||||
|
} for r in rows]
|
||||||
|
|
||||||
|
# ── Stats ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def stats(self):
|
||||||
|
conn = self._conn()
|
||||||
|
entities = conn.execute("SELECT COUNT(*) FROM entities").fetchone()[0]
|
||||||
|
triples = conn.execute("SELECT COUNT(*) FROM triples").fetchone()[0]
|
||||||
|
current = conn.execute("SELECT COUNT(*) FROM triples WHERE valid_to IS NULL").fetchone()[0]
|
||||||
|
expired = triples - current
|
||||||
|
predicates = [r[0] for r in conn.execute(
|
||||||
|
"SELECT DISTINCT predicate FROM triples ORDER BY predicate"
|
||||||
|
).fetchall()]
|
||||||
|
conn.close()
|
||||||
|
return {
|
||||||
|
"entities": entities,
|
||||||
|
"triples": triples,
|
||||||
|
"current_facts": current,
|
||||||
|
"expired_facts": expired,
|
||||||
|
"relationship_types": predicates,
|
||||||
|
}
|
||||||
|
|
||||||
|
# ── Seed from known facts ─────────────────────────────────────────────
|
||||||
|
|
||||||
|
def seed_from_entity_facts(self, entity_facts: dict):
|
||||||
|
"""
|
||||||
|
Seed the knowledge graph from fact_checker.py ENTITY_FACTS.
|
||||||
|
This bootstraps the graph with known ground truth.
|
||||||
|
"""
|
||||||
|
for key, facts in entity_facts.items():
|
||||||
|
name = facts.get("full_name", key.capitalize())
|
||||||
|
etype = facts.get("type", "person")
|
||||||
|
self.add_entity(name, etype, {
|
||||||
|
"gender": facts.get("gender", ""),
|
||||||
|
"birthday": facts.get("birthday", ""),
|
||||||
|
})
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
parent = facts.get("parent")
|
||||||
|
if parent:
|
||||||
|
self.add_triple(name, "child_of", parent.capitalize(),
|
||||||
|
valid_from=facts.get("birthday"))
|
||||||
|
|
||||||
|
partner = facts.get("partner")
|
||||||
|
if partner:
|
||||||
|
self.add_triple(name, "married_to", partner.capitalize())
|
||||||
|
|
||||||
|
relationship = facts.get("relationship", "")
|
||||||
|
if relationship == "daughter":
|
||||||
|
self.add_triple(name, "is_child_of", facts.get("parent", "").capitalize() or name,
|
||||||
|
valid_from=facts.get("birthday"))
|
||||||
|
elif relationship == "husband":
|
||||||
|
self.add_triple(name, "is_partner_of", facts.get("partner", name).capitalize())
|
||||||
|
elif relationship == "brother":
|
||||||
|
self.add_triple(name, "is_sibling_of", facts.get("sibling", name).capitalize())
|
||||||
|
elif relationship == "dog":
|
||||||
|
self.add_triple(name, "is_pet_of", facts.get("owner", name).capitalize())
|
||||||
|
self.add_entity(name, "animal")
|
||||||
|
|
||||||
|
# Interests
|
||||||
|
for interest in facts.get("interests", []):
|
||||||
|
self.add_triple(name, "loves", interest.capitalize(),
|
||||||
|
valid_from="2025-01-01")
|
||||||
@@ -0,0 +1,506 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
layers.py — 4-Layer Memory Stack for mempalace
|
||||||
|
===================================================
|
||||||
|
|
||||||
|
Load only what you need, when you need it.
|
||||||
|
|
||||||
|
Layer 0: Identity (~100 tokens) — Always loaded. "Who am I?"
|
||||||
|
Layer 1: Essential Story (~500-800) — Always loaded. Top moments from the palace.
|
||||||
|
Layer 2: On-Demand (~200-500 each) — Loaded when a topic/wing comes up.
|
||||||
|
Layer 3: Deep Search (unlimited) — Full ChromaDB semantic search.
|
||||||
|
|
||||||
|
Wake-up cost: ~600-900 tokens (L0+L1). Leaves 95%+ of context free.
|
||||||
|
|
||||||
|
Reads directly from ChromaDB (mempalace_drawers)
|
||||||
|
and ~/.mempalace/identity.txt.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
import chromadb
|
||||||
|
|
||||||
|
from .config import MempalaceConfig
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Layer 0 — Identity
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class Layer0:
|
||||||
|
"""
|
||||||
|
~100 tokens. Always loaded.
|
||||||
|
Reads from ~/.mempalace/identity.txt — a plain-text file the user writes.
|
||||||
|
|
||||||
|
Example identity.txt:
|
||||||
|
I am Atlas, a personal AI assistant for Alice.
|
||||||
|
Traits: warm, direct, remembers everything.
|
||||||
|
People: Alice (creator), Bob (Alice's partner).
|
||||||
|
Project: A journaling app that helps people process emotions.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, identity_path: str = None):
|
||||||
|
if identity_path is None:
|
||||||
|
identity_path = os.path.expanduser("~/.mempalace/identity.txt")
|
||||||
|
self.path = identity_path
|
||||||
|
self._text = None
|
||||||
|
|
||||||
|
def render(self) -> str:
|
||||||
|
"""Return the identity text, or a sensible default."""
|
||||||
|
if self._text is not None:
|
||||||
|
return self._text
|
||||||
|
|
||||||
|
if os.path.exists(self.path):
|
||||||
|
with open(self.path, "r") as f:
|
||||||
|
self._text = f.read().strip()
|
||||||
|
else:
|
||||||
|
self._text = (
|
||||||
|
"## L0 — IDENTITY\nNo identity configured. Create ~/.mempalace/identity.txt"
|
||||||
|
)
|
||||||
|
|
||||||
|
return self._text
|
||||||
|
|
||||||
|
def token_estimate(self) -> int:
|
||||||
|
return len(self.render()) // 4
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Layer 1 — Essential Story (auto-generated from palace)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class Layer1:
|
||||||
|
"""
|
||||||
|
~500-800 tokens. Always loaded.
|
||||||
|
Auto-generated from the highest-weight / most-recent drawers in the palace.
|
||||||
|
Groups by room, picks the top N moments, compresses to a compact summary.
|
||||||
|
"""
|
||||||
|
|
||||||
|
MAX_DRAWERS = 15 # at most 15 moments in wake-up
|
||||||
|
MAX_CHARS = 3200 # hard cap on total L1 text (~800 tokens)
|
||||||
|
|
||||||
|
def __init__(self, palace_path: str = None, wing: str = None):
|
||||||
|
cfg = MempalaceConfig()
|
||||||
|
self.palace_path = palace_path or cfg.palace_path
|
||||||
|
self.wing = wing
|
||||||
|
|
||||||
|
def generate(self) -> str:
|
||||||
|
"""Pull top drawers from ChromaDB and format as compact L1 text."""
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=self.palace_path)
|
||||||
|
col = client.get_collection("mempalace_drawers")
|
||||||
|
except Exception:
|
||||||
|
return "## L1 — No palace found. Run: mempalace mine <dir>"
|
||||||
|
|
||||||
|
# Fetch all drawers (with optional wing filter)
|
||||||
|
kwargs = {"include": ["documents", "metadatas"]}
|
||||||
|
if self.wing:
|
||||||
|
kwargs["where"] = {"wing": self.wing}
|
||||||
|
|
||||||
|
try:
|
||||||
|
results = col.get(**kwargs)
|
||||||
|
except Exception:
|
||||||
|
return "## L1 — No drawers found."
|
||||||
|
|
||||||
|
docs = results.get("documents", [])
|
||||||
|
metas = results.get("metadatas", [])
|
||||||
|
|
||||||
|
if not docs:
|
||||||
|
return "## L1 — No memories yet."
|
||||||
|
|
||||||
|
# Score each drawer: prefer high importance, recent filing
|
||||||
|
scored = []
|
||||||
|
for doc, meta in zip(docs, metas):
|
||||||
|
importance = 3
|
||||||
|
# Try multiple metadata keys that might carry weight info
|
||||||
|
for key in ("importance", "emotional_weight", "weight"):
|
||||||
|
val = meta.get(key)
|
||||||
|
if val is not None:
|
||||||
|
try:
|
||||||
|
importance = float(val)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
break
|
||||||
|
scored.append((importance, meta, doc))
|
||||||
|
|
||||||
|
# Sort by importance descending, take top N
|
||||||
|
scored.sort(key=lambda x: x[0], reverse=True)
|
||||||
|
top = scored[: self.MAX_DRAWERS]
|
||||||
|
|
||||||
|
# Group by room for readability
|
||||||
|
by_room = defaultdict(list)
|
||||||
|
for imp, meta, doc in top:
|
||||||
|
room = meta.get("room", "general")
|
||||||
|
by_room[room].append((imp, meta, doc))
|
||||||
|
|
||||||
|
# Build compact text
|
||||||
|
lines = ["## L1 — ESSENTIAL STORY"]
|
||||||
|
|
||||||
|
total_len = 0
|
||||||
|
for room, entries in sorted(by_room.items()):
|
||||||
|
room_line = f"\n[{room}]"
|
||||||
|
lines.append(room_line)
|
||||||
|
total_len += len(room_line)
|
||||||
|
|
||||||
|
for imp, meta, doc in entries:
|
||||||
|
source = Path(meta.get("source_file", "")).name if meta.get("source_file") else ""
|
||||||
|
|
||||||
|
# Truncate doc to keep L1 compact
|
||||||
|
snippet = doc.strip().replace("\n", " ")
|
||||||
|
if len(snippet) > 200:
|
||||||
|
snippet = snippet[:197] + "..."
|
||||||
|
|
||||||
|
entry_line = f" - {snippet}"
|
||||||
|
if source:
|
||||||
|
entry_line += f" ({source})"
|
||||||
|
|
||||||
|
if total_len + len(entry_line) > self.MAX_CHARS:
|
||||||
|
lines.append(" ... (more in L3 search)")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
lines.append(entry_line)
|
||||||
|
total_len += len(entry_line)
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Layer 2 — On-Demand (wing/room filtered retrieval)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class Layer2:
|
||||||
|
"""
|
||||||
|
~200-500 tokens per retrieval.
|
||||||
|
Loaded when a specific topic or wing comes up in conversation.
|
||||||
|
Queries ChromaDB with a wing/room filter.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, palace_path: str = None):
|
||||||
|
cfg = MempalaceConfig()
|
||||||
|
self.palace_path = palace_path or cfg.palace_path
|
||||||
|
|
||||||
|
def retrieve(self, wing: str = None, room: str = None, n_results: int = 10) -> str:
|
||||||
|
"""Retrieve drawers filtered by wing and/or room."""
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=self.palace_path)
|
||||||
|
col = client.get_collection("mempalace_drawers")
|
||||||
|
except Exception:
|
||||||
|
return "No palace found."
|
||||||
|
|
||||||
|
where = {}
|
||||||
|
if wing and room:
|
||||||
|
where = {"$and": [{"wing": wing}, {"room": room}]}
|
||||||
|
elif wing:
|
||||||
|
where = {"wing": wing}
|
||||||
|
elif room:
|
||||||
|
where = {"room": room}
|
||||||
|
|
||||||
|
kwargs = {"include": ["documents", "metadatas"], "limit": n_results}
|
||||||
|
if where:
|
||||||
|
kwargs["where"] = where
|
||||||
|
|
||||||
|
try:
|
||||||
|
results = col.get(**kwargs)
|
||||||
|
except Exception as e:
|
||||||
|
return f"Retrieval error: {e}"
|
||||||
|
|
||||||
|
docs = results.get("documents", [])
|
||||||
|
metas = results.get("metadatas", [])
|
||||||
|
|
||||||
|
if not docs:
|
||||||
|
label = f"wing={wing}" if wing else ""
|
||||||
|
if room:
|
||||||
|
label += f" room={room}" if label else f"room={room}"
|
||||||
|
return f"No drawers found for {label}."
|
||||||
|
|
||||||
|
lines = [f"## L2 — ON-DEMAND ({len(docs)} drawers)"]
|
||||||
|
for doc, meta in zip(docs[:n_results], metas[:n_results]):
|
||||||
|
room_name = meta.get("room", "?")
|
||||||
|
source = Path(meta.get("source_file", "")).name if meta.get("source_file") else ""
|
||||||
|
snippet = doc.strip().replace("\n", " ")
|
||||||
|
if len(snippet) > 300:
|
||||||
|
snippet = snippet[:297] + "..."
|
||||||
|
entry = f" [{room_name}] {snippet}"
|
||||||
|
if source:
|
||||||
|
entry += f" ({source})"
|
||||||
|
lines.append(entry)
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Layer 3 — Deep Search (full semantic search via ChromaDB)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class Layer3:
|
||||||
|
"""
|
||||||
|
Unlimited depth. Semantic search against the full palace.
|
||||||
|
Reuses searcher.py logic against mempalace_drawers.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, palace_path: str = None):
|
||||||
|
cfg = MempalaceConfig()
|
||||||
|
self.palace_path = palace_path or cfg.palace_path
|
||||||
|
|
||||||
|
def search(self, query: str, wing: str = None, room: str = None, n_results: int = 5) -> str:
|
||||||
|
"""Semantic search, returns compact result text."""
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=self.palace_path)
|
||||||
|
col = client.get_collection("mempalace_drawers")
|
||||||
|
except Exception:
|
||||||
|
return "No palace found."
|
||||||
|
|
||||||
|
where = {}
|
||||||
|
if wing and room:
|
||||||
|
where = {"$and": [{"wing": wing}, {"room": room}]}
|
||||||
|
elif wing:
|
||||||
|
where = {"wing": wing}
|
||||||
|
elif room:
|
||||||
|
where = {"room": room}
|
||||||
|
|
||||||
|
kwargs = {
|
||||||
|
"query_texts": [query],
|
||||||
|
"n_results": n_results,
|
||||||
|
"include": ["documents", "metadatas", "distances"],
|
||||||
|
}
|
||||||
|
if where:
|
||||||
|
kwargs["where"] = where
|
||||||
|
|
||||||
|
try:
|
||||||
|
results = col.query(**kwargs)
|
||||||
|
except Exception as e:
|
||||||
|
return f"Search error: {e}"
|
||||||
|
|
||||||
|
docs = results["documents"][0]
|
||||||
|
metas = results["metadatas"][0]
|
||||||
|
dists = results["distances"][0]
|
||||||
|
|
||||||
|
if not docs:
|
||||||
|
return "No results found."
|
||||||
|
|
||||||
|
lines = [f'## L3 — SEARCH RESULTS for "{query}"']
|
||||||
|
for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
|
||||||
|
similarity = round(1 - dist, 3)
|
||||||
|
wing_name = meta.get("wing", "?")
|
||||||
|
room_name = meta.get("room", "?")
|
||||||
|
source = Path(meta.get("source_file", "")).name if meta.get("source_file") else ""
|
||||||
|
|
||||||
|
snippet = doc.strip().replace("\n", " ")
|
||||||
|
if len(snippet) > 300:
|
||||||
|
snippet = snippet[:297] + "..."
|
||||||
|
|
||||||
|
lines.append(f" [{i}] {wing_name}/{room_name} (sim={similarity})")
|
||||||
|
lines.append(f" {snippet}")
|
||||||
|
if source:
|
||||||
|
lines.append(f" src: {source}")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
def search_raw(
|
||||||
|
self, query: str, wing: str = None, room: str = None, n_results: int = 5
|
||||||
|
) -> list:
|
||||||
|
"""Return raw dicts instead of formatted text."""
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=self.palace_path)
|
||||||
|
col = client.get_collection("mempalace_drawers")
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
where = {}
|
||||||
|
if wing and room:
|
||||||
|
where = {"$and": [{"wing": wing}, {"room": room}]}
|
||||||
|
elif wing:
|
||||||
|
where = {"wing": wing}
|
||||||
|
elif room:
|
||||||
|
where = {"room": room}
|
||||||
|
|
||||||
|
kwargs = {
|
||||||
|
"query_texts": [query],
|
||||||
|
"n_results": n_results,
|
||||||
|
"include": ["documents", "metadatas", "distances"],
|
||||||
|
}
|
||||||
|
if where:
|
||||||
|
kwargs["where"] = where
|
||||||
|
|
||||||
|
try:
|
||||||
|
results = col.query(**kwargs)
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
hits = []
|
||||||
|
for doc, meta, dist in zip(
|
||||||
|
results["documents"][0],
|
||||||
|
results["metadatas"][0],
|
||||||
|
results["distances"][0],
|
||||||
|
):
|
||||||
|
hits.append(
|
||||||
|
{
|
||||||
|
"text": doc,
|
||||||
|
"wing": meta.get("wing", "unknown"),
|
||||||
|
"room": meta.get("room", "unknown"),
|
||||||
|
"source_file": Path(meta.get("source_file", "?")).name,
|
||||||
|
"similarity": round(1 - dist, 3),
|
||||||
|
"metadata": meta,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return hits
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# MemoryStack — unified interface
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class MemoryStack:
|
||||||
|
"""
|
||||||
|
The full 4-layer stack. One class, one palace, everything works.
|
||||||
|
|
||||||
|
stack = MemoryStack()
|
||||||
|
print(stack.wake_up()) # L0 + L1 (~600-900 tokens)
|
||||||
|
print(stack.recall(wing="my_app")) # L2 on-demand
|
||||||
|
print(stack.search("pricing change")) # L3 deep search
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, palace_path: str = None, identity_path: str = None):
|
||||||
|
cfg = MempalaceConfig()
|
||||||
|
self.palace_path = palace_path or cfg.palace_path
|
||||||
|
self.identity_path = identity_path or os.path.expanduser("~/.mempalace/identity.txt")
|
||||||
|
|
||||||
|
self.l0 = Layer0(self.identity_path)
|
||||||
|
self.l1 = Layer1(self.palace_path)
|
||||||
|
self.l2 = Layer2(self.palace_path)
|
||||||
|
self.l3 = Layer3(self.palace_path)
|
||||||
|
|
||||||
|
def wake_up(self, wing: str = None) -> str:
|
||||||
|
"""
|
||||||
|
Generate wake-up text: L0 (identity) + L1 (essential story).
|
||||||
|
Typically ~600-900 tokens. Inject into system prompt or first message.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
wing: Optional wing filter for L1 (project-specific wake-up).
|
||||||
|
"""
|
||||||
|
parts = []
|
||||||
|
|
||||||
|
# L0: Identity
|
||||||
|
parts.append(self.l0.render())
|
||||||
|
parts.append("")
|
||||||
|
|
||||||
|
# L1: Essential Story
|
||||||
|
if wing:
|
||||||
|
self.l1.wing = wing
|
||||||
|
parts.append(self.l1.generate())
|
||||||
|
|
||||||
|
return "\n".join(parts)
|
||||||
|
|
||||||
|
def recall(self, wing: str = None, room: str = None, n_results: int = 10) -> str:
|
||||||
|
"""On-demand L2 retrieval filtered by wing/room."""
|
||||||
|
return self.l2.retrieve(wing=wing, room=room, n_results=n_results)
|
||||||
|
|
||||||
|
def search(self, query: str, wing: str = None, room: str = None, n_results: int = 5) -> str:
|
||||||
|
"""Deep L3 semantic search."""
|
||||||
|
return self.l3.search(query, wing=wing, room=room, n_results=n_results)
|
||||||
|
|
||||||
|
def status(self) -> dict:
|
||||||
|
"""Status of all layers."""
|
||||||
|
result = {
|
||||||
|
"palace_path": self.palace_path,
|
||||||
|
"L0_identity": {
|
||||||
|
"path": self.identity_path,
|
||||||
|
"exists": os.path.exists(self.identity_path),
|
||||||
|
"tokens": self.l0.token_estimate(),
|
||||||
|
},
|
||||||
|
"L1_essential": {
|
||||||
|
"description": "Auto-generated from top palace drawers",
|
||||||
|
},
|
||||||
|
"L2_on_demand": {
|
||||||
|
"description": "Wing/room filtered retrieval",
|
||||||
|
},
|
||||||
|
"L3_deep_search": {
|
||||||
|
"description": "Full semantic search via ChromaDB",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Count drawers
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=self.palace_path)
|
||||||
|
col = client.get_collection("mempalace_drawers")
|
||||||
|
count = col.count()
|
||||||
|
result["total_drawers"] = count
|
||||||
|
except Exception:
|
||||||
|
result["total_drawers"] = 0
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI (standalone)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import json
|
||||||
|
|
||||||
|
def usage():
|
||||||
|
print("layers.py — 4-Layer Memory Stack")
|
||||||
|
print()
|
||||||
|
print("Usage:")
|
||||||
|
print(" python layers.py wake-up Show L0 + L1")
|
||||||
|
print(" python layers.py wake-up --wing=NAME Wake-up for a specific project")
|
||||||
|
print(" python layers.py recall --wing=NAME On-demand L2 retrieval")
|
||||||
|
print(" python layers.py search <query> Deep L3 search")
|
||||||
|
print(" python layers.py status Show layer status")
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
usage()
|
||||||
|
|
||||||
|
cmd = sys.argv[1]
|
||||||
|
|
||||||
|
# Parse flags
|
||||||
|
flags = {}
|
||||||
|
positional = []
|
||||||
|
for arg in sys.argv[2:]:
|
||||||
|
if arg.startswith("--") and "=" in arg:
|
||||||
|
key, val = arg.split("=", 1)
|
||||||
|
flags[key.lstrip("-")] = val
|
||||||
|
elif not arg.startswith("--"):
|
||||||
|
positional.append(arg)
|
||||||
|
|
||||||
|
palace_path = flags.get("palace")
|
||||||
|
stack = MemoryStack(palace_path=palace_path)
|
||||||
|
|
||||||
|
if cmd in ("wake-up", "wakeup"):
|
||||||
|
wing = flags.get("wing")
|
||||||
|
text = stack.wake_up(wing=wing)
|
||||||
|
tokens = len(text) // 4
|
||||||
|
print(f"Wake-up text (~{tokens} tokens):")
|
||||||
|
print("=" * 50)
|
||||||
|
print(text)
|
||||||
|
|
||||||
|
elif cmd == "recall":
|
||||||
|
wing = flags.get("wing")
|
||||||
|
room = flags.get("room")
|
||||||
|
text = stack.recall(wing=wing, room=room)
|
||||||
|
print(text)
|
||||||
|
|
||||||
|
elif cmd == "search":
|
||||||
|
query = " ".join(positional) if positional else ""
|
||||||
|
if not query:
|
||||||
|
print("Usage: python layers.py search <query>")
|
||||||
|
sys.exit(1)
|
||||||
|
wing = flags.get("wing")
|
||||||
|
room = flags.get("room")
|
||||||
|
text = stack.search(query, wing=wing, room=room)
|
||||||
|
print(text)
|
||||||
|
|
||||||
|
elif cmd == "status":
|
||||||
|
s = stack.status()
|
||||||
|
print(json.dumps(s, indent=2))
|
||||||
|
|
||||||
|
else:
|
||||||
|
usage()
|
||||||
@@ -0,0 +1,714 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
MemPalace MCP Server — read/write palace access for Claude Code
|
||||||
|
================================================================
|
||||||
|
Install: claude mcp add mempalace -- python /path/to/mcp_server.py
|
||||||
|
|
||||||
|
Tools (read):
|
||||||
|
mempalace_status — total drawers, wing/room breakdown
|
||||||
|
mempalace_list_wings — all wings with drawer counts
|
||||||
|
mempalace_list_rooms — rooms within a wing
|
||||||
|
mempalace_get_taxonomy — full wing → room → count tree
|
||||||
|
mempalace_search — semantic search, optional wing/room filter
|
||||||
|
mempalace_check_duplicate — check if content already exists before filing
|
||||||
|
|
||||||
|
Tools (write):
|
||||||
|
mempalace_add_drawer — file verbatim content into a wing/room
|
||||||
|
mempalace_delete_drawer — remove a drawer by ID
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import hashlib
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
from .config import MempalaceConfig
|
||||||
|
from .searcher import search_memories
|
||||||
|
from .palace_graph import traverse, find_tunnels, graph_stats
|
||||||
|
from .knowledge_graph import KnowledgeGraph
|
||||||
|
|
||||||
|
_kg = KnowledgeGraph()
|
||||||
|
|
||||||
|
import chromadb
|
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO, format="%(message)s", stream=sys.stderr)
|
||||||
|
logger = logging.getLogger("mempalace_mcp")
|
||||||
|
|
||||||
|
_config = MempalaceConfig()
|
||||||
|
|
||||||
|
|
||||||
|
def _get_collection(create=False):
|
||||||
|
"""Return the ChromaDB collection, or None on failure."""
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=_config.palace_path)
|
||||||
|
if create:
|
||||||
|
return client.get_or_create_collection(_config.collection_name)
|
||||||
|
return client.get_collection(_config.collection_name)
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _no_palace():
|
||||||
|
return {
|
||||||
|
"error": "No palace found",
|
||||||
|
"palace_path": _config.palace_path,
|
||||||
|
"hint": "Run: mempalace init <dir> && mempalace mine <dir>",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== READ TOOLS ====================
|
||||||
|
|
||||||
|
|
||||||
|
def tool_status():
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
count = col.count()
|
||||||
|
wings = {}
|
||||||
|
rooms = {}
|
||||||
|
try:
|
||||||
|
all_meta = col.get(include=["metadatas"])["metadatas"]
|
||||||
|
for m in all_meta:
|
||||||
|
w = m.get("wing", "unknown")
|
||||||
|
r = m.get("room", "unknown")
|
||||||
|
wings[w] = wings.get(w, 0) + 1
|
||||||
|
rooms[r] = rooms.get(r, 0) + 1
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return {
|
||||||
|
"total_drawers": count,
|
||||||
|
"wings": wings,
|
||||||
|
"rooms": rooms,
|
||||||
|
"palace_path": _config.palace_path,
|
||||||
|
"protocol": PALACE_PROTOCOL,
|
||||||
|
"aaak_dialect": AAAK_SPEC,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ── AAAK Dialect Spec ─────────────────────────────────────────────────────────
|
||||||
|
# Included in status response so the AI learns it on first wake-up call.
|
||||||
|
# Also available via mempalace_get_aaak_spec tool.
|
||||||
|
|
||||||
|
PALACE_PROTOCOL = """IMPORTANT — MemPalace Memory Protocol:
|
||||||
|
1. ON WAKE-UP: Call mempalace_status to load palace overview + AAAK spec.
|
||||||
|
2. BEFORE RESPONDING about any person, project, or past event: call mempalace_kg_query or mempalace_search FIRST. Never guess — verify.
|
||||||
|
3. IF UNSURE about a fact (name, gender, age, relationship): say "let me check" and query the palace. Wrong is worse than slow.
|
||||||
|
4. AFTER EACH SESSION: call mempalace_diary_write to record what happened, what you learned, what matters.
|
||||||
|
5. WHEN FACTS CHANGE: call mempalace_kg_invalidate on the old fact, mempalace_kg_add for the new one.
|
||||||
|
|
||||||
|
This protocol ensures the AI KNOWS before it speaks. Storage is not memory — but storage + this protocol = memory."""
|
||||||
|
|
||||||
|
AAAK_SPEC = """AAAK is a compressed memory dialect that MemPalace uses for efficient storage.
|
||||||
|
It is designed to be readable by both humans and LLMs without decoding.
|
||||||
|
|
||||||
|
FORMAT:
|
||||||
|
ENTITIES: 3-letter uppercase codes. ALC=Alice, JOR=Jordan, RIL=Riley, MAX=Max, BEN=Ben.
|
||||||
|
EMOTIONS: *action markers* before/during text. *warm*=joy, *fierce*=determined, *raw*=vulnerable, *bloom*=tenderness.
|
||||||
|
STRUCTURE: Pipe-separated fields. FAM: family | PROJ: projects | ⚠: warnings/reminders.
|
||||||
|
DATES: ISO format (2026-03-31). COUNTS: Nx = N mentions (e.g., 570x).
|
||||||
|
IMPORTANCE: ★ to ★★★★★ (1-5 scale).
|
||||||
|
HALLS: hall_facts, hall_events, hall_discoveries, hall_preferences, hall_advice.
|
||||||
|
WINGS: wing_user, wing_agent, wing_team, wing_code, wing_myproject, wing_hardware, wing_ue5, wing_ai_research.
|
||||||
|
ROOMS: Hyphenated slugs representing named ideas (e.g., chromadb-setup, gpu-pricing).
|
||||||
|
|
||||||
|
EXAMPLE:
|
||||||
|
FAM: ALC→♡JOR | 2D(kids): RIL(18,sports) MAX(11,chess+swimming) | BEN(contributor)
|
||||||
|
|
||||||
|
Read AAAK naturally — expand codes mentally, treat *markers* as emotional context.
|
||||||
|
When WRITING AAAK: use entity codes, mark emotions, keep structure tight."""
|
||||||
|
|
||||||
|
|
||||||
|
def tool_list_wings():
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
wings = {}
|
||||||
|
try:
|
||||||
|
all_meta = col.get(include=["metadatas"])["metadatas"]
|
||||||
|
for m in all_meta:
|
||||||
|
w = m.get("wing", "unknown")
|
||||||
|
wings[w] = wings.get(w, 0) + 1
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return {"wings": wings}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_list_rooms(wing: str = None):
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
rooms = {}
|
||||||
|
try:
|
||||||
|
kwargs = {"include": ["metadatas"]}
|
||||||
|
if wing:
|
||||||
|
kwargs["where"] = {"wing": wing}
|
||||||
|
all_meta = col.get(**kwargs)["metadatas"]
|
||||||
|
for m in all_meta:
|
||||||
|
r = m.get("room", "unknown")
|
||||||
|
rooms[r] = rooms.get(r, 0) + 1
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return {"wing": wing or "all", "rooms": rooms}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_get_taxonomy():
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
taxonomy = {}
|
||||||
|
try:
|
||||||
|
all_meta = col.get(include=["metadatas"])["metadatas"]
|
||||||
|
for m in all_meta:
|
||||||
|
w = m.get("wing", "unknown")
|
||||||
|
r = m.get("room", "unknown")
|
||||||
|
if w not in taxonomy:
|
||||||
|
taxonomy[w] = {}
|
||||||
|
taxonomy[w][r] = taxonomy[w].get(r, 0) + 1
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return {"taxonomy": taxonomy}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_search(query: str, limit: int = 5, wing: str = None, room: str = None):
|
||||||
|
return search_memories(
|
||||||
|
query,
|
||||||
|
palace_path=_config.palace_path,
|
||||||
|
wing=wing,
|
||||||
|
room=room,
|
||||||
|
n_results=limit,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def tool_check_duplicate(content: str, threshold: float = 0.9):
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
try:
|
||||||
|
results = col.query(
|
||||||
|
query_texts=[content],
|
||||||
|
n_results=5,
|
||||||
|
include=["metadatas", "documents", "distances"],
|
||||||
|
)
|
||||||
|
duplicates = []
|
||||||
|
if results["ids"] and results["ids"][0]:
|
||||||
|
for i, drawer_id in enumerate(results["ids"][0]):
|
||||||
|
dist = results["distances"][0][i]
|
||||||
|
similarity = round(1 - dist, 3)
|
||||||
|
if similarity >= threshold:
|
||||||
|
meta = results["metadatas"][0][i]
|
||||||
|
doc = results["documents"][0][i]
|
||||||
|
duplicates.append(
|
||||||
|
{
|
||||||
|
"id": drawer_id,
|
||||||
|
"wing": meta.get("wing", "?"),
|
||||||
|
"room": meta.get("room", "?"),
|
||||||
|
"similarity": similarity,
|
||||||
|
"content": doc[:200] + "..." if len(doc) > 200 else doc,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return {
|
||||||
|
"is_duplicate": len(duplicates) > 0,
|
||||||
|
"matches": duplicates,
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_get_aaak_spec():
|
||||||
|
"""Return the AAAK dialect specification."""
|
||||||
|
return {"aaak_spec": AAAK_SPEC}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_traverse_graph(start_room: str, max_hops: int = 2):
|
||||||
|
"""Walk the palace graph from a room. Find connected ideas across wings."""
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
return traverse(start_room, col=col, max_hops=max_hops)
|
||||||
|
|
||||||
|
|
||||||
|
def tool_find_tunnels(wing_a: str = None, wing_b: str = None):
|
||||||
|
"""Find rooms that bridge two wings — the hallways connecting domains."""
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
return find_tunnels(wing_a, wing_b, col=col)
|
||||||
|
|
||||||
|
|
||||||
|
def tool_graph_stats():
|
||||||
|
"""Palace graph overview: nodes, tunnels, edges, connectivity."""
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
return graph_stats(col=col)
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== WRITE TOOLS ====================
|
||||||
|
|
||||||
|
|
||||||
|
def tool_add_drawer(
|
||||||
|
wing: str, room: str, content: str, source_file: str = None, added_by: str = "mcp"
|
||||||
|
):
|
||||||
|
"""File verbatim content into a wing/room. Checks for duplicates first."""
|
||||||
|
col = _get_collection(create=True)
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
|
||||||
|
# Duplicate check
|
||||||
|
dup = tool_check_duplicate(content, threshold=0.9)
|
||||||
|
if dup.get("is_duplicate"):
|
||||||
|
return {
|
||||||
|
"success": False,
|
||||||
|
"reason": "duplicate",
|
||||||
|
"matches": dup["matches"],
|
||||||
|
}
|
||||||
|
|
||||||
|
drawer_id = f"drawer_{wing}_{room}_{hashlib.md5((content[:100] + datetime.now().isoformat()).encode()).hexdigest()[:16]}"
|
||||||
|
|
||||||
|
try:
|
||||||
|
col.add(
|
||||||
|
ids=[drawer_id],
|
||||||
|
documents=[content],
|
||||||
|
metadatas=[
|
||||||
|
{
|
||||||
|
"wing": wing,
|
||||||
|
"room": room,
|
||||||
|
"source_file": source_file or "",
|
||||||
|
"chunk_index": 0,
|
||||||
|
"added_by": added_by,
|
||||||
|
"filed_at": datetime.now().isoformat(),
|
||||||
|
}
|
||||||
|
],
|
||||||
|
)
|
||||||
|
logger.info(f"Filed drawer: {drawer_id} → {wing}/{room}")
|
||||||
|
return {"success": True, "drawer_id": drawer_id, "wing": wing, "room": room}
|
||||||
|
except Exception as e:
|
||||||
|
return {"success": False, "error": str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_delete_drawer(drawer_id: str):
|
||||||
|
"""Delete a single drawer by ID."""
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
existing = col.get(ids=[drawer_id])
|
||||||
|
if not existing["ids"]:
|
||||||
|
return {"success": False, "error": f"Drawer not found: {drawer_id}"}
|
||||||
|
try:
|
||||||
|
col.delete(ids=[drawer_id])
|
||||||
|
logger.info(f"Deleted drawer: {drawer_id}")
|
||||||
|
return {"success": True, "drawer_id": drawer_id}
|
||||||
|
except Exception as e:
|
||||||
|
return {"success": False, "error": str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== KNOWLEDGE GRAPH ====================
|
||||||
|
|
||||||
|
|
||||||
|
def tool_kg_query(entity: str, as_of: str = None, direction: str = "both"):
|
||||||
|
"""Query the knowledge graph for an entity's relationships."""
|
||||||
|
results = _kg.query_entity(entity, as_of=as_of, direction=direction)
|
||||||
|
return {"entity": entity, "as_of": as_of, "facts": results, "count": len(results)}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_kg_add(subject: str, predicate: str, object: str,
|
||||||
|
valid_from: str = None, source_closet: str = None):
|
||||||
|
"""Add a relationship to the knowledge graph."""
|
||||||
|
triple_id = _kg.add_triple(subject, predicate, object,
|
||||||
|
valid_from=valid_from, source_closet=source_closet)
|
||||||
|
return {"success": True, "triple_id": triple_id,
|
||||||
|
"fact": f"{subject} → {predicate} → {object}"}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_kg_invalidate(subject: str, predicate: str, object: str, ended: str = None):
|
||||||
|
"""Mark a fact as no longer true (set end date)."""
|
||||||
|
_kg.invalidate(subject, predicate, object, ended=ended)
|
||||||
|
return {"success": True, "fact": f"{subject} → {predicate} → {object}", "ended": ended or "today"}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_kg_timeline(entity: str = None):
|
||||||
|
"""Get chronological timeline of facts, optionally for one entity."""
|
||||||
|
results = _kg.timeline(entity)
|
||||||
|
return {"entity": entity or "all", "timeline": results, "count": len(results)}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_kg_stats():
|
||||||
|
"""Knowledge graph overview: entities, triples, relationship types."""
|
||||||
|
return _kg.stats()
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== AGENT DIARY ====================
|
||||||
|
|
||||||
|
|
||||||
|
def tool_diary_write(agent_name: str, entry: str, topic: str = "general"):
|
||||||
|
"""
|
||||||
|
Write a diary entry for this agent. Each agent gets its own wing
|
||||||
|
with a diary room. Entries are timestamped and accumulate over time.
|
||||||
|
|
||||||
|
This is the agent's personal journal — observations, thoughts,
|
||||||
|
what it worked on, what it noticed, what it thinks matters.
|
||||||
|
"""
|
||||||
|
wing = f"wing_{agent_name.lower().replace(' ', '_')}"
|
||||||
|
room = "diary"
|
||||||
|
col = _get_collection(create=True)
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
|
||||||
|
now = datetime.now()
|
||||||
|
entry_id = f"diary_{wing}_{now.strftime('%Y%m%d_%H%M%S')}_{hashlib.md5(entry[:50].encode()).hexdigest()[:8]}"
|
||||||
|
|
||||||
|
try:
|
||||||
|
col.add(
|
||||||
|
ids=[entry_id],
|
||||||
|
documents=[entry],
|
||||||
|
metadatas=[{
|
||||||
|
"wing": wing,
|
||||||
|
"room": room,
|
||||||
|
"hall": "hall_diary",
|
||||||
|
"topic": topic,
|
||||||
|
"type": "diary_entry",
|
||||||
|
"agent": agent_name,
|
||||||
|
"filed_at": now.isoformat(),
|
||||||
|
"date": now.strftime("%Y-%m-%d"),
|
||||||
|
}],
|
||||||
|
)
|
||||||
|
logger.info(f"Diary entry: {entry_id} → {wing}/diary/{topic}")
|
||||||
|
return {
|
||||||
|
"success": True,
|
||||||
|
"entry_id": entry_id,
|
||||||
|
"agent": agent_name,
|
||||||
|
"topic": topic,
|
||||||
|
"timestamp": now.isoformat(),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
return {"success": False, "error": str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
def tool_diary_read(agent_name: str, last_n: int = 10):
|
||||||
|
"""
|
||||||
|
Read an agent's recent diary entries. Returns the last N entries
|
||||||
|
in chronological order — the agent's personal journal.
|
||||||
|
"""
|
||||||
|
wing = f"wing_{agent_name.lower().replace(' ', '_')}"
|
||||||
|
col = _get_collection()
|
||||||
|
if not col:
|
||||||
|
return _no_palace()
|
||||||
|
|
||||||
|
try:
|
||||||
|
results = col.get(
|
||||||
|
where={"$and": [{"wing": wing}, {"room": "diary"}]},
|
||||||
|
include=["documents", "metadatas"],
|
||||||
|
)
|
||||||
|
|
||||||
|
if not results["ids"]:
|
||||||
|
return {"agent": agent_name, "entries": [], "message": "No diary entries yet."}
|
||||||
|
|
||||||
|
# Combine and sort by timestamp
|
||||||
|
entries = []
|
||||||
|
for doc, meta in zip(results["documents"], results["metadatas"]):
|
||||||
|
entries.append({
|
||||||
|
"date": meta.get("date", ""),
|
||||||
|
"timestamp": meta.get("filed_at", ""),
|
||||||
|
"topic": meta.get("topic", ""),
|
||||||
|
"content": doc,
|
||||||
|
})
|
||||||
|
|
||||||
|
entries.sort(key=lambda x: x["timestamp"], reverse=True)
|
||||||
|
entries = entries[:last_n]
|
||||||
|
|
||||||
|
return {
|
||||||
|
"agent": agent_name,
|
||||||
|
"entries": entries,
|
||||||
|
"total": len(results["ids"]),
|
||||||
|
"showing": len(entries),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
# ==================== MCP PROTOCOL ====================
|
||||||
|
|
||||||
|
TOOLS = {
|
||||||
|
"mempalace_status": {
|
||||||
|
"description": "Palace overview — total drawers, wing and room counts",
|
||||||
|
"input_schema": {"type": "object", "properties": {}},
|
||||||
|
"handler": tool_status,
|
||||||
|
},
|
||||||
|
"mempalace_list_wings": {
|
||||||
|
"description": "List all wings with drawer counts",
|
||||||
|
"input_schema": {"type": "object", "properties": {}},
|
||||||
|
"handler": tool_list_wings,
|
||||||
|
},
|
||||||
|
"mempalace_list_rooms": {
|
||||||
|
"description": "List rooms within a wing (or all rooms if no wing given)",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"wing": {"type": "string", "description": "Wing to list rooms for (optional)"},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"handler": tool_list_rooms,
|
||||||
|
},
|
||||||
|
"mempalace_get_taxonomy": {
|
||||||
|
"description": "Full taxonomy: wing → room → drawer count",
|
||||||
|
"input_schema": {"type": "object", "properties": {}},
|
||||||
|
"handler": tool_get_taxonomy,
|
||||||
|
},
|
||||||
|
"mempalace_get_aaak_spec": {
|
||||||
|
"description": "Get the AAAK dialect specification — the compressed memory format MemPalace uses. Call this if you need to read or write AAAK-compressed memories.",
|
||||||
|
"input_schema": {"type": "object", "properties": {}},
|
||||||
|
"handler": tool_get_aaak_spec,
|
||||||
|
},
|
||||||
|
"mempalace_kg_query": {
|
||||||
|
"description": "Query the knowledge graph for an entity's relationships. Returns typed facts with temporal validity. E.g. 'Max' → child_of Alice, loves chess, does swimming. Filter by date with as_of to see what was true at a point in time.",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"entity": {"type": "string", "description": "Entity to query (e.g. 'Max', 'MyProject', 'Alice')"},
|
||||||
|
"as_of": {"type": "string", "description": "Date filter — only facts valid at this date (YYYY-MM-DD, optional)"},
|
||||||
|
"direction": {"type": "string", "description": "outgoing (entity→?), incoming (?→entity), or both (default: both)"},
|
||||||
|
},
|
||||||
|
"required": ["entity"],
|
||||||
|
},
|
||||||
|
"handler": tool_kg_query,
|
||||||
|
},
|
||||||
|
"mempalace_kg_add": {
|
||||||
|
"description": "Add a fact to the knowledge graph. Subject → predicate → object with optional time window. E.g. ('Max', 'started_school', 'Year 7', valid_from='2026-09-01').",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"subject": {"type": "string", "description": "The entity doing/being something"},
|
||||||
|
"predicate": {"type": "string", "description": "The relationship type (e.g. 'loves', 'works_on', 'daughter_of')"},
|
||||||
|
"object": {"type": "string", "description": "The entity being connected to"},
|
||||||
|
"valid_from": {"type": "string", "description": "When this became true (YYYY-MM-DD, optional)"},
|
||||||
|
"source_closet": {"type": "string", "description": "Closet ID where this fact appears (optional)"},
|
||||||
|
},
|
||||||
|
"required": ["subject", "predicate", "object"],
|
||||||
|
},
|
||||||
|
"handler": tool_kg_add,
|
||||||
|
},
|
||||||
|
"mempalace_kg_invalidate": {
|
||||||
|
"description": "Mark a fact as no longer true. E.g. ankle injury resolved, job ended, moved house.",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"subject": {"type": "string", "description": "Entity"},
|
||||||
|
"predicate": {"type": "string", "description": "Relationship"},
|
||||||
|
"object": {"type": "string", "description": "Connected entity"},
|
||||||
|
"ended": {"type": "string", "description": "When it stopped being true (YYYY-MM-DD, default: today)"},
|
||||||
|
},
|
||||||
|
"required": ["subject", "predicate", "object"],
|
||||||
|
},
|
||||||
|
"handler": tool_kg_invalidate,
|
||||||
|
},
|
||||||
|
"mempalace_kg_timeline": {
|
||||||
|
"description": "Chronological timeline of facts. Shows the story of an entity (or everything) in order.",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"entity": {"type": "string", "description": "Entity to get timeline for (optional — omit for full timeline)"},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"handler": tool_kg_timeline,
|
||||||
|
},
|
||||||
|
"mempalace_kg_stats": {
|
||||||
|
"description": "Knowledge graph overview: entities, triples, current vs expired facts, relationship types.",
|
||||||
|
"input_schema": {"type": "object", "properties": {}},
|
||||||
|
"handler": tool_kg_stats,
|
||||||
|
},
|
||||||
|
"mempalace_traverse": {
|
||||||
|
"description": "Walk the palace graph from a room. Shows connected ideas across wings — the tunnels. Like following a thread through the palace: start at 'chromadb-setup' in wing_code, discover it connects to wing_myproject (planning) and wing_user (feelings about it).",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"start_room": {"type": "string", "description": "Room to start from (e.g. 'chromadb-setup', 'riley-school')"},
|
||||||
|
"max_hops": {"type": "integer", "description": "How many connections to follow (default: 2)"},
|
||||||
|
},
|
||||||
|
"required": ["start_room"],
|
||||||
|
},
|
||||||
|
"handler": tool_traverse_graph,
|
||||||
|
},
|
||||||
|
"mempalace_find_tunnels": {
|
||||||
|
"description": "Find rooms that bridge two wings — the hallways connecting different domains. E.g. what topics connect wing_code to wing_team?",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"wing_a": {"type": "string", "description": "First wing (optional)"},
|
||||||
|
"wing_b": {"type": "string", "description": "Second wing (optional)"},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"handler": tool_find_tunnels,
|
||||||
|
},
|
||||||
|
"mempalace_graph_stats": {
|
||||||
|
"description": "Palace graph overview: total rooms, tunnel connections, edges between wings.",
|
||||||
|
"input_schema": {"type": "object", "properties": {}},
|
||||||
|
"handler": tool_graph_stats,
|
||||||
|
},
|
||||||
|
"mempalace_search": {
|
||||||
|
"description": "Semantic search. Returns verbatim drawer content with similarity scores.",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"query": {"type": "string", "description": "What to search for"},
|
||||||
|
"limit": {"type": "integer", "description": "Max results (default 5)"},
|
||||||
|
"wing": {"type": "string", "description": "Filter by wing (optional)"},
|
||||||
|
"room": {"type": "string", "description": "Filter by room (optional)"},
|
||||||
|
},
|
||||||
|
"required": ["query"],
|
||||||
|
},
|
||||||
|
"handler": tool_search,
|
||||||
|
},
|
||||||
|
"mempalace_check_duplicate": {
|
||||||
|
"description": "Check if content already exists in the palace before filing",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"content": {"type": "string", "description": "Content to check"},
|
||||||
|
"threshold": {
|
||||||
|
"type": "number",
|
||||||
|
"description": "Similarity threshold 0-1 (default 0.9)",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"required": ["content"],
|
||||||
|
},
|
||||||
|
"handler": tool_check_duplicate,
|
||||||
|
},
|
||||||
|
"mempalace_add_drawer": {
|
||||||
|
"description": "File verbatim content into the palace. Checks for duplicates first.",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"wing": {"type": "string", "description": "Wing (project name)"},
|
||||||
|
"room": {
|
||||||
|
"type": "string",
|
||||||
|
"description": "Room (aspect: backend, decisions, meetings...)",
|
||||||
|
},
|
||||||
|
"content": {
|
||||||
|
"type": "string",
|
||||||
|
"description": "Verbatim content to store — exact words, never summarized",
|
||||||
|
},
|
||||||
|
"source_file": {"type": "string", "description": "Where this came from (optional)"},
|
||||||
|
"added_by": {"type": "string", "description": "Who is filing this (default: mcp)"},
|
||||||
|
},
|
||||||
|
"required": ["wing", "room", "content"],
|
||||||
|
},
|
||||||
|
"handler": tool_add_drawer,
|
||||||
|
},
|
||||||
|
"mempalace_delete_drawer": {
|
||||||
|
"description": "Delete a drawer by ID. Irreversible.",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"drawer_id": {"type": "string", "description": "ID of the drawer to delete"},
|
||||||
|
},
|
||||||
|
"required": ["drawer_id"],
|
||||||
|
},
|
||||||
|
"handler": tool_delete_drawer,
|
||||||
|
},
|
||||||
|
"mempalace_diary_write": {
|
||||||
|
"description": "Write to your personal agent diary in AAAK format. Your observations, thoughts, what you worked on, what matters. Each agent has their own diary with full history. Write in AAAK for compression — e.g. 'SESSION:2026-04-04|built.palace.graph+diary.tools|ALC.req:agent.diaries.in.aaak|★★★'. Use entity codes from the AAAK spec.",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"agent_name": {"type": "string", "description": "Your name — each agent gets their own diary wing"},
|
||||||
|
"entry": {"type": "string", "description": "Your diary entry in AAAK format — compressed, entity-coded, emotion-marked"},
|
||||||
|
"topic": {"type": "string", "description": "Topic tag (optional, default: general)"},
|
||||||
|
},
|
||||||
|
"required": ["agent_name", "entry"],
|
||||||
|
},
|
||||||
|
"handler": tool_diary_write,
|
||||||
|
},
|
||||||
|
"mempalace_diary_read": {
|
||||||
|
"description": "Read your recent diary entries (in AAAK). See what past versions of yourself recorded — your journal across sessions.",
|
||||||
|
"input_schema": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"agent_name": {"type": "string", "description": "Your name — each agent gets their own diary wing"},
|
||||||
|
"last_n": {"type": "integer", "description": "Number of recent entries to read (default: 10)"},
|
||||||
|
},
|
||||||
|
"required": ["agent_name"],
|
||||||
|
},
|
||||||
|
"handler": tool_diary_read,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def handle_request(request):
|
||||||
|
method = request.get("method", "")
|
||||||
|
params = request.get("params", {})
|
||||||
|
req_id = request.get("id")
|
||||||
|
|
||||||
|
if method == "initialize":
|
||||||
|
return {
|
||||||
|
"jsonrpc": "2.0",
|
||||||
|
"id": req_id,
|
||||||
|
"result": {
|
||||||
|
"protocolVersion": "2024-11-05",
|
||||||
|
"capabilities": {"tools": {}},
|
||||||
|
"serverInfo": {"name": "mempalace", "version": "2.0.0"},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
elif method == "notifications/initialized":
|
||||||
|
return None
|
||||||
|
elif method == "tools/list":
|
||||||
|
return {
|
||||||
|
"jsonrpc": "2.0",
|
||||||
|
"id": req_id,
|
||||||
|
"result": {
|
||||||
|
"tools": [
|
||||||
|
{"name": n, "description": t["description"], "inputSchema": t["input_schema"]}
|
||||||
|
for n, t in TOOLS.items()
|
||||||
|
]
|
||||||
|
},
|
||||||
|
}
|
||||||
|
elif method == "tools/call":
|
||||||
|
tool_name = params.get("name")
|
||||||
|
tool_args = params.get("arguments", {})
|
||||||
|
if tool_name not in TOOLS:
|
||||||
|
return {
|
||||||
|
"jsonrpc": "2.0",
|
||||||
|
"id": req_id,
|
||||||
|
"error": {"code": -32601, "message": f"Unknown tool: {tool_name}"},
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
result = TOOLS[tool_name]["handler"](**tool_args)
|
||||||
|
return {
|
||||||
|
"jsonrpc": "2.0",
|
||||||
|
"id": req_id,
|
||||||
|
"result": {"content": [{"type": "text", "text": json.dumps(result, indent=2)}]},
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Tool error in {tool_name}: {e}")
|
||||||
|
return {"jsonrpc": "2.0", "id": req_id, "error": {"code": -32000, "message": str(e)}}
|
||||||
|
|
||||||
|
return {
|
||||||
|
"jsonrpc": "2.0",
|
||||||
|
"id": req_id,
|
||||||
|
"error": {"code": -32601, "message": f"Unknown method: {method}"},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
logger.info("MemPalace MCP Server starting...")
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
line = sys.stdin.readline()
|
||||||
|
if not line:
|
||||||
|
break
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
request = json.loads(line)
|
||||||
|
response = handle_request(request)
|
||||||
|
if response is not None:
|
||||||
|
sys.stdout.write(json.dumps(response) + "\n")
|
||||||
|
sys.stdout.flush()
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Server error: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -0,0 +1,417 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
miner.py — Files everything into the palace.
|
||||||
|
|
||||||
|
Reads mempalace.yaml from the project directory to know the wing + rooms.
|
||||||
|
Routes each file to the right room based on content.
|
||||||
|
Stores verbatim chunks as drawers. No summaries. Ever.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import hashlib
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
import chromadb
|
||||||
|
|
||||||
|
READABLE_EXTENSIONS = {
|
||||||
|
".txt",
|
||||||
|
".md",
|
||||||
|
".py",
|
||||||
|
".js",
|
||||||
|
".ts",
|
||||||
|
".jsx",
|
||||||
|
".tsx",
|
||||||
|
".json",
|
||||||
|
".yaml",
|
||||||
|
".yml",
|
||||||
|
".html",
|
||||||
|
".css",
|
||||||
|
".java",
|
||||||
|
".go",
|
||||||
|
".rs",
|
||||||
|
".rb",
|
||||||
|
".sh",
|
||||||
|
".csv",
|
||||||
|
".sql",
|
||||||
|
".toml",
|
||||||
|
}
|
||||||
|
|
||||||
|
SKIP_DIRS = {
|
||||||
|
".git",
|
||||||
|
"node_modules",
|
||||||
|
"__pycache__",
|
||||||
|
".venv",
|
||||||
|
"venv",
|
||||||
|
"env",
|
||||||
|
"dist",
|
||||||
|
"build",
|
||||||
|
".next",
|
||||||
|
"coverage",
|
||||||
|
".mempalace",
|
||||||
|
}
|
||||||
|
|
||||||
|
CHUNK_SIZE = 800 # chars per drawer
|
||||||
|
CHUNK_OVERLAP = 100 # overlap between chunks
|
||||||
|
MIN_CHUNK_SIZE = 50 # skip tiny chunks
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# CONFIG
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def load_config(project_dir: str) -> dict:
|
||||||
|
"""Load mempalace.yaml from project directory (falls back to mempal.yaml)."""
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
config_path = Path(project_dir).expanduser().resolve() / "mempalace.yaml"
|
||||||
|
if not config_path.exists():
|
||||||
|
# Fallback to legacy name
|
||||||
|
legacy_path = Path(project_dir).expanduser().resolve() / "mempal.yaml"
|
||||||
|
if legacy_path.exists():
|
||||||
|
config_path = legacy_path
|
||||||
|
else:
|
||||||
|
print(f"ERROR: No mempalace.yaml found in {project_dir}")
|
||||||
|
print(f"Run: mempalace init {project_dir}")
|
||||||
|
sys.exit(1)
|
||||||
|
with open(config_path) as f:
|
||||||
|
return yaml.safe_load(f)
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# FILE ROUTING — which room does this file belong to?
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def detect_room(filepath: Path, content: str, rooms: list, project_path: Path) -> str:
|
||||||
|
"""
|
||||||
|
Route a file to the right room.
|
||||||
|
Priority:
|
||||||
|
1. Folder path matches a room name
|
||||||
|
2. Filename matches a room name or keyword
|
||||||
|
3. Content keyword scoring
|
||||||
|
4. Fallback: "general"
|
||||||
|
"""
|
||||||
|
relative = str(filepath.relative_to(project_path)).lower()
|
||||||
|
filename = filepath.stem.lower()
|
||||||
|
content_lower = content[:2000].lower()
|
||||||
|
|
||||||
|
# Priority 1: folder path contains room name
|
||||||
|
path_parts = relative.replace("\\", "/").split("/")
|
||||||
|
for part in path_parts[:-1]: # skip filename itself
|
||||||
|
for room in rooms:
|
||||||
|
if room["name"].lower() in part or part in room["name"].lower():
|
||||||
|
return room["name"]
|
||||||
|
|
||||||
|
# Priority 2: filename matches room name
|
||||||
|
for room in rooms:
|
||||||
|
if room["name"].lower() in filename or filename in room["name"].lower():
|
||||||
|
return room["name"]
|
||||||
|
|
||||||
|
# Priority 3: keyword scoring from room keywords + name
|
||||||
|
scores = defaultdict(int)
|
||||||
|
for room in rooms:
|
||||||
|
keywords = room.get("keywords", []) + [room["name"]]
|
||||||
|
for kw in keywords:
|
||||||
|
count = content_lower.count(kw.lower())
|
||||||
|
scores[room["name"]] += count
|
||||||
|
|
||||||
|
if scores:
|
||||||
|
best = max(scores, key=scores.get)
|
||||||
|
if scores[best] > 0:
|
||||||
|
return best
|
||||||
|
|
||||||
|
return "general"
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# CHUNKING
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def chunk_text(content: str, source_file: str) -> list:
|
||||||
|
"""
|
||||||
|
Split content into drawer-sized chunks.
|
||||||
|
Tries to split on paragraph/line boundaries.
|
||||||
|
Returns list of {"content": str, "chunk_index": int}
|
||||||
|
"""
|
||||||
|
# Clean up
|
||||||
|
content = content.strip()
|
||||||
|
if not content:
|
||||||
|
return []
|
||||||
|
|
||||||
|
chunks = []
|
||||||
|
start = 0
|
||||||
|
chunk_index = 0
|
||||||
|
|
||||||
|
while start < len(content):
|
||||||
|
end = min(start + CHUNK_SIZE, len(content))
|
||||||
|
|
||||||
|
# Try to break at paragraph boundary
|
||||||
|
if end < len(content):
|
||||||
|
newline_pos = content.rfind("\n\n", start, end)
|
||||||
|
if newline_pos > start + CHUNK_SIZE // 2:
|
||||||
|
end = newline_pos
|
||||||
|
else:
|
||||||
|
newline_pos = content.rfind("\n", start, end)
|
||||||
|
if newline_pos > start + CHUNK_SIZE // 2:
|
||||||
|
end = newline_pos
|
||||||
|
|
||||||
|
chunk = content[start:end].strip()
|
||||||
|
if len(chunk) >= MIN_CHUNK_SIZE:
|
||||||
|
chunks.append(
|
||||||
|
{
|
||||||
|
"content": chunk,
|
||||||
|
"chunk_index": chunk_index,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
chunk_index += 1
|
||||||
|
|
||||||
|
start = end - CHUNK_OVERLAP if end < len(content) else end
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# PALACE — ChromaDB operations
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def get_collection(palace_path: str):
|
||||||
|
os.makedirs(palace_path, exist_ok=True)
|
||||||
|
client = chromadb.PersistentClient(path=palace_path)
|
||||||
|
try:
|
||||||
|
return client.get_collection("mempalace_drawers")
|
||||||
|
except Exception:
|
||||||
|
return client.create_collection("mempalace_drawers")
|
||||||
|
|
||||||
|
|
||||||
|
def file_already_mined(collection, source_file: str) -> bool:
|
||||||
|
"""Fast check: has this file been filed before?"""
|
||||||
|
try:
|
||||||
|
results = collection.get(where={"source_file": source_file}, limit=1)
|
||||||
|
return len(results.get("ids", [])) > 0
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def add_drawer(
|
||||||
|
collection, wing: str, room: str, content: str, source_file: str, chunk_index: int, agent: str
|
||||||
|
):
|
||||||
|
"""Add one drawer to the palace."""
|
||||||
|
drawer_id = f"drawer_{wing}_{room}_{hashlib.md5((source_file + str(chunk_index)).encode()).hexdigest()[:16]}"
|
||||||
|
try:
|
||||||
|
collection.add(
|
||||||
|
documents=[content],
|
||||||
|
ids=[drawer_id],
|
||||||
|
metadatas=[
|
||||||
|
{
|
||||||
|
"wing": wing,
|
||||||
|
"room": room,
|
||||||
|
"source_file": source_file,
|
||||||
|
"chunk_index": chunk_index,
|
||||||
|
"added_by": agent,
|
||||||
|
"filed_at": datetime.now().isoformat(),
|
||||||
|
}
|
||||||
|
],
|
||||||
|
)
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
if "already exists" in str(e).lower() or "duplicate" in str(e).lower():
|
||||||
|
return False
|
||||||
|
raise
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# PROCESS ONE FILE
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def process_file(
|
||||||
|
filepath: Path,
|
||||||
|
project_path: Path,
|
||||||
|
collection,
|
||||||
|
wing: str,
|
||||||
|
rooms: list,
|
||||||
|
agent: str,
|
||||||
|
dry_run: bool,
|
||||||
|
) -> int:
|
||||||
|
"""Read, chunk, route, and file one file. Returns drawer count."""
|
||||||
|
|
||||||
|
# Skip if already filed
|
||||||
|
source_file = str(filepath)
|
||||||
|
if not dry_run and file_already_mined(collection, source_file):
|
||||||
|
return 0
|
||||||
|
|
||||||
|
try:
|
||||||
|
content = filepath.read_text(encoding="utf-8", errors="replace")
|
||||||
|
except Exception:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
content = content.strip()
|
||||||
|
if len(content) < MIN_CHUNK_SIZE:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
room = detect_room(filepath, content, rooms, project_path)
|
||||||
|
chunks = chunk_text(content, source_file)
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
print(f" [DRY RUN] {filepath.name} → room:{room} ({len(chunks)} drawers)")
|
||||||
|
return len(chunks)
|
||||||
|
|
||||||
|
drawers_added = 0
|
||||||
|
for chunk in chunks:
|
||||||
|
added = add_drawer(
|
||||||
|
collection=collection,
|
||||||
|
wing=wing,
|
||||||
|
room=room,
|
||||||
|
content=chunk["content"],
|
||||||
|
source_file=source_file,
|
||||||
|
chunk_index=chunk["chunk_index"],
|
||||||
|
agent=agent,
|
||||||
|
)
|
||||||
|
if added:
|
||||||
|
drawers_added += 1
|
||||||
|
|
||||||
|
return drawers_added
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# SCAN PROJECT
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def scan_project(project_dir: str) -> list:
|
||||||
|
"""Return list of all readable file paths."""
|
||||||
|
project_path = Path(project_dir).expanduser().resolve()
|
||||||
|
files = []
|
||||||
|
for root, dirs, filenames in os.walk(project_path):
|
||||||
|
dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
|
||||||
|
for filename in filenames:
|
||||||
|
filepath = Path(root) / filename
|
||||||
|
if filepath.suffix.lower() in READABLE_EXTENSIONS:
|
||||||
|
# Skip config files
|
||||||
|
if filename in (
|
||||||
|
"mempalace.yaml",
|
||||||
|
"mempalace.yml",
|
||||||
|
"mempal.yaml",
|
||||||
|
"mempal.yml",
|
||||||
|
".gitignore",
|
||||||
|
"package-lock.json",
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
files.append(filepath)
|
||||||
|
return files
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# MAIN: MINE
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def mine(
|
||||||
|
project_dir: str,
|
||||||
|
palace_path: str,
|
||||||
|
wing_override: str = None,
|
||||||
|
agent: str = "mempalace",
|
||||||
|
limit: int = 0,
|
||||||
|
dry_run: bool = False,
|
||||||
|
):
|
||||||
|
"""Mine a project directory into the palace."""
|
||||||
|
|
||||||
|
project_path = Path(project_dir).expanduser().resolve()
|
||||||
|
config = load_config(project_dir)
|
||||||
|
|
||||||
|
wing = wing_override or config["wing"]
|
||||||
|
rooms = config.get("rooms", [{"name": "general", "description": "All project files"}])
|
||||||
|
|
||||||
|
files = scan_project(project_dir)
|
||||||
|
if limit > 0:
|
||||||
|
files = files[:limit]
|
||||||
|
|
||||||
|
print(f"\n{'=' * 55}")
|
||||||
|
print(" MemPalace Mine")
|
||||||
|
print(f"{'=' * 55}")
|
||||||
|
print(f" Wing: {wing}")
|
||||||
|
print(f" Rooms: {', '.join(r['name'] for r in rooms)}")
|
||||||
|
print(f" Files: {len(files)}")
|
||||||
|
print(f" Palace: {palace_path}")
|
||||||
|
if dry_run:
|
||||||
|
print(" DRY RUN — nothing will be filed")
|
||||||
|
print(f"{'─' * 55}\n")
|
||||||
|
|
||||||
|
if not dry_run:
|
||||||
|
collection = get_collection(palace_path)
|
||||||
|
else:
|
||||||
|
collection = None
|
||||||
|
|
||||||
|
total_drawers = 0
|
||||||
|
files_skipped = 0
|
||||||
|
room_counts = defaultdict(int)
|
||||||
|
|
||||||
|
for i, filepath in enumerate(files, 1):
|
||||||
|
drawers = process_file(
|
||||||
|
filepath=filepath,
|
||||||
|
project_path=project_path,
|
||||||
|
collection=collection,
|
||||||
|
wing=wing,
|
||||||
|
rooms=rooms,
|
||||||
|
agent=agent,
|
||||||
|
dry_run=dry_run,
|
||||||
|
)
|
||||||
|
if drawers == 0 and not dry_run:
|
||||||
|
files_skipped += 1
|
||||||
|
else:
|
||||||
|
total_drawers += drawers
|
||||||
|
room = detect_room(filepath, "", rooms, project_path)
|
||||||
|
room_counts[room] += 1
|
||||||
|
if not dry_run:
|
||||||
|
print(f" ✓ [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers}")
|
||||||
|
|
||||||
|
print(f"\n{'=' * 55}")
|
||||||
|
print(" Done.")
|
||||||
|
print(f" Files processed: {len(files) - files_skipped}")
|
||||||
|
print(f" Files skipped (already filed): {files_skipped}")
|
||||||
|
print(f" Drawers filed: {total_drawers}")
|
||||||
|
print("\n By room:")
|
||||||
|
for room, count in sorted(room_counts.items(), key=lambda x: x[1], reverse=True):
|
||||||
|
print(f" {room:20} {count} files")
|
||||||
|
print('\n Next: mempalace search "what you\'re looking for"')
|
||||||
|
print(f"{'=' * 55}\n")
|
||||||
|
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# STATUS
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
def status(palace_path: str):
|
||||||
|
"""Show what's been filed in the palace."""
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=palace_path)
|
||||||
|
col = client.get_collection("mempalace_drawers")
|
||||||
|
except Exception:
|
||||||
|
print(f"\n No palace found at {palace_path}")
|
||||||
|
print(" Run: mempalace init <dir> then mempalace mine <dir>")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Count by wing and room
|
||||||
|
r = col.get(limit=10000, include=["metadatas"])
|
||||||
|
metas = r["metadatas"]
|
||||||
|
|
||||||
|
wing_rooms = defaultdict(lambda: defaultdict(int))
|
||||||
|
for m in metas:
|
||||||
|
wing_rooms[m.get("wing", "?")][m.get("room", "?")] += 1
|
||||||
|
|
||||||
|
print(f"\n{'=' * 55}")
|
||||||
|
print(f" MemPalace Status — {len(metas)} drawers")
|
||||||
|
print(f"{'=' * 55}\n")
|
||||||
|
for wing, rooms in sorted(wing_rooms.items()):
|
||||||
|
print(f" WING: {wing}")
|
||||||
|
for room, count in sorted(rooms.items(), key=lambda x: x[1], reverse=True):
|
||||||
|
print(f" ROOM: {room:20} {count:5} drawers")
|
||||||
|
print()
|
||||||
|
print(f"{'=' * 55}\n")
|
||||||
@@ -0,0 +1,253 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
normalize.py — Convert any chat export format to MemPalace transcript format.
|
||||||
|
|
||||||
|
Supported:
|
||||||
|
- Plain text with > markers (pass through)
|
||||||
|
- Claude.ai JSON export
|
||||||
|
- ChatGPT conversations.json
|
||||||
|
- Claude Code JSONL
|
||||||
|
- Slack JSON export
|
||||||
|
- Plain text (pass through for paragraph chunking)
|
||||||
|
|
||||||
|
No API key. No internet. Everything local.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
|
||||||
|
def normalize(filepath: str) -> str:
|
||||||
|
"""
|
||||||
|
Load a file and normalize to transcript format if it's a chat export.
|
||||||
|
Plain text files pass through unchanged.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
|
||||||
|
content = f.read()
|
||||||
|
except Exception as e:
|
||||||
|
raise IOError(f"Could not read {filepath}: {e}")
|
||||||
|
|
||||||
|
if not content.strip():
|
||||||
|
return content
|
||||||
|
|
||||||
|
# Already has > markers — pass through
|
||||||
|
lines = content.split("\n")
|
||||||
|
if sum(1 for line in lines if line.strip().startswith(">")) >= 3:
|
||||||
|
return content
|
||||||
|
|
||||||
|
# Try JSON normalization
|
||||||
|
ext = Path(filepath).suffix.lower()
|
||||||
|
if ext in (".json", ".jsonl") or content.strip()[:1] in ("{", "["):
|
||||||
|
normalized = _try_normalize_json(content)
|
||||||
|
if normalized:
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
return content
|
||||||
|
|
||||||
|
|
||||||
|
def _try_normalize_json(content: str) -> Optional[str]:
|
||||||
|
"""Try all known JSON chat schemas."""
|
||||||
|
|
||||||
|
normalized = _try_claude_code_jsonl(content)
|
||||||
|
if normalized:
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = json.loads(content)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
for parser in (_try_claude_ai_json, _try_chatgpt_json, _try_slack_json):
|
||||||
|
normalized = parser(data)
|
||||||
|
if normalized:
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _try_claude_code_jsonl(content: str) -> Optional[str]:
|
||||||
|
"""Claude Code JSONL sessions."""
|
||||||
|
lines = [line.strip() for line in content.strip().split("\n") if line.strip()]
|
||||||
|
messages = []
|
||||||
|
for line in lines:
|
||||||
|
try:
|
||||||
|
entry = json.loads(line)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
if not isinstance(entry, dict):
|
||||||
|
continue
|
||||||
|
msg_type = entry.get("type", "")
|
||||||
|
message = entry.get("message", {})
|
||||||
|
if msg_type == "human":
|
||||||
|
text = _extract_content(message.get("content", ""))
|
||||||
|
if text:
|
||||||
|
messages.append(("user", text))
|
||||||
|
elif msg_type == "assistant":
|
||||||
|
text = _extract_content(message.get("content", ""))
|
||||||
|
if text:
|
||||||
|
messages.append(("assistant", text))
|
||||||
|
if len(messages) >= 2:
|
||||||
|
return _messages_to_transcript(messages)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _try_claude_ai_json(data) -> Optional[str]:
|
||||||
|
"""Claude.ai JSON export: [{"role": "user", "content": "..."}]"""
|
||||||
|
if isinstance(data, dict):
|
||||||
|
data = data.get("messages", data.get("chat_messages", []))
|
||||||
|
if not isinstance(data, list):
|
||||||
|
return None
|
||||||
|
messages = []
|
||||||
|
for item in data:
|
||||||
|
if not isinstance(item, dict):
|
||||||
|
continue
|
||||||
|
role = item.get("role", "")
|
||||||
|
text = _extract_content(item.get("content", ""))
|
||||||
|
if role in ("user", "human") and text:
|
||||||
|
messages.append(("user", text))
|
||||||
|
elif role in ("assistant", "ai") and text:
|
||||||
|
messages.append(("assistant", text))
|
||||||
|
if len(messages) >= 2:
|
||||||
|
return _messages_to_transcript(messages)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _try_chatgpt_json(data) -> Optional[str]:
|
||||||
|
"""ChatGPT conversations.json with mapping tree."""
|
||||||
|
if not isinstance(data, dict) or "mapping" not in data:
|
||||||
|
return None
|
||||||
|
mapping = data["mapping"]
|
||||||
|
messages = []
|
||||||
|
# Find root: prefer node with parent=None AND no message (synthetic root)
|
||||||
|
root_id = None
|
||||||
|
fallback_root = None
|
||||||
|
for node_id, node in mapping.items():
|
||||||
|
if node.get("parent") is None:
|
||||||
|
if node.get("message") is None:
|
||||||
|
root_id = node_id
|
||||||
|
break
|
||||||
|
elif fallback_root is None:
|
||||||
|
fallback_root = node_id
|
||||||
|
if not root_id:
|
||||||
|
root_id = fallback_root
|
||||||
|
if root_id:
|
||||||
|
current_id = root_id
|
||||||
|
visited = set()
|
||||||
|
while current_id and current_id not in visited:
|
||||||
|
visited.add(current_id)
|
||||||
|
node = mapping.get(current_id, {})
|
||||||
|
msg = node.get("message")
|
||||||
|
if msg:
|
||||||
|
role = msg.get("author", {}).get("role", "")
|
||||||
|
content = msg.get("content", {})
|
||||||
|
parts = content.get("parts", []) if isinstance(content, dict) else []
|
||||||
|
text = " ".join(str(p) for p in parts if isinstance(p, str) and p).strip()
|
||||||
|
if role == "user" and text:
|
||||||
|
messages.append(("user", text))
|
||||||
|
elif role == "assistant" and text:
|
||||||
|
messages.append(("assistant", text))
|
||||||
|
children = node.get("children", [])
|
||||||
|
current_id = children[0] if children else None
|
||||||
|
if len(messages) >= 2:
|
||||||
|
return _messages_to_transcript(messages)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _try_slack_json(data) -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Slack channel export: [{"type": "message", "user": "...", "text": "..."}]
|
||||||
|
Optimized for 2-person DMs. In channels with 3+ people, alternating
|
||||||
|
speakers are labeled user/assistant to preserve the exchange structure.
|
||||||
|
"""
|
||||||
|
if not isinstance(data, list):
|
||||||
|
return None
|
||||||
|
messages = []
|
||||||
|
seen_users = {}
|
||||||
|
last_role = None
|
||||||
|
for item in data:
|
||||||
|
if not isinstance(item, dict) or item.get("type") != "message":
|
||||||
|
continue
|
||||||
|
user_id = item.get("user", item.get("username", ""))
|
||||||
|
text = item.get("text", "").strip()
|
||||||
|
if not text or not user_id:
|
||||||
|
continue
|
||||||
|
if user_id not in seen_users:
|
||||||
|
# Alternate roles so exchange chunking works with any number of speakers
|
||||||
|
if not seen_users:
|
||||||
|
seen_users[user_id] = "user"
|
||||||
|
elif last_role == "user":
|
||||||
|
seen_users[user_id] = "assistant"
|
||||||
|
else:
|
||||||
|
seen_users[user_id] = "user"
|
||||||
|
last_role = seen_users[user_id]
|
||||||
|
messages.append((seen_users[user_id], text))
|
||||||
|
if len(messages) >= 2:
|
||||||
|
return _messages_to_transcript(messages)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_content(content) -> str:
|
||||||
|
"""Pull text from content — handles str, list of blocks, or dict."""
|
||||||
|
if isinstance(content, str):
|
||||||
|
return content.strip()
|
||||||
|
if isinstance(content, list):
|
||||||
|
parts = []
|
||||||
|
for item in content:
|
||||||
|
if isinstance(item, str):
|
||||||
|
parts.append(item)
|
||||||
|
elif isinstance(item, dict) and item.get("type") == "text":
|
||||||
|
parts.append(item.get("text", ""))
|
||||||
|
return " ".join(parts).strip()
|
||||||
|
if isinstance(content, dict):
|
||||||
|
return content.get("text", "").strip()
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _messages_to_transcript(messages: list, spellcheck: bool = True) -> str:
|
||||||
|
"""Convert [(role, text), ...] to transcript format with > markers."""
|
||||||
|
if spellcheck:
|
||||||
|
try:
|
||||||
|
from mempalace.spellcheck import spellcheck_user_text
|
||||||
|
|
||||||
|
_fix = spellcheck_user_text
|
||||||
|
except Exception:
|
||||||
|
_fix = None
|
||||||
|
else:
|
||||||
|
_fix = None
|
||||||
|
|
||||||
|
lines = []
|
||||||
|
i = 0
|
||||||
|
while i < len(messages):
|
||||||
|
role, text = messages[i]
|
||||||
|
if role == "user":
|
||||||
|
if _fix is not None:
|
||||||
|
text = _fix(text)
|
||||||
|
lines.append(f"> {text}")
|
||||||
|
if i + 1 < len(messages) and messages[i + 1][0] == "assistant":
|
||||||
|
lines.append(messages[i + 1][1])
|
||||||
|
i += 2
|
||||||
|
else:
|
||||||
|
i += 1
|
||||||
|
else:
|
||||||
|
lines.append(text)
|
||||||
|
i += 1
|
||||||
|
lines.append("")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: python normalize.py <filepath>")
|
||||||
|
sys.exit(1)
|
||||||
|
filepath = sys.argv[1]
|
||||||
|
result = normalize(filepath)
|
||||||
|
quote_count = sum(1 for line in result.split("\n") if line.strip().startswith(">"))
|
||||||
|
print(f"\nFile: {os.path.basename(filepath)}")
|
||||||
|
print(f"Normalized: {len(result)} chars | {quote_count} user turns detected")
|
||||||
|
print("\n--- Preview (first 20 lines) ---")
|
||||||
|
print("\n".join(result.split("\n")[:20]))
|
||||||
@@ -0,0 +1,480 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
onboarding.py — MemPalace first-run setup.
|
||||||
|
|
||||||
|
Asks the user:
|
||||||
|
1. How they're using MemPalace (work / personal / combo)
|
||||||
|
2. Who the people in their life are (names, nicknames, relationships)
|
||||||
|
3. What their projects are
|
||||||
|
4. What they want their wings called
|
||||||
|
|
||||||
|
Seeds the entity_registry with confirmed data so MemPalace knows your world
|
||||||
|
from minute one — before a single session is indexed.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 -m mempalace.onboarding
|
||||||
|
or: mempalace init
|
||||||
|
"""
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
from mempalace.entity_registry import EntityRegistry
|
||||||
|
from mempalace.entity_detector import detect_entities, scan_for_detection
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Default wing taxonomies by mode
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
DEFAULT_WINGS = {
|
||||||
|
"work": [
|
||||||
|
"projects",
|
||||||
|
"clients",
|
||||||
|
"team",
|
||||||
|
"decisions",
|
||||||
|
"research",
|
||||||
|
],
|
||||||
|
"personal": [
|
||||||
|
"family",
|
||||||
|
"health",
|
||||||
|
"creative",
|
||||||
|
"reflections",
|
||||||
|
"relationships",
|
||||||
|
],
|
||||||
|
"combo": [
|
||||||
|
"family",
|
||||||
|
"work",
|
||||||
|
"health",
|
||||||
|
"creative",
|
||||||
|
"projects",
|
||||||
|
"reflections",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Helpers
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _hr():
|
||||||
|
print(f"\n{'─' * 58}")
|
||||||
|
|
||||||
|
|
||||||
|
def _header(text):
|
||||||
|
print(f"\n{'=' * 58}")
|
||||||
|
print(f" {text}")
|
||||||
|
print(f"{'=' * 58}")
|
||||||
|
|
||||||
|
|
||||||
|
def _ask(prompt, default=None):
|
||||||
|
if default:
|
||||||
|
val = input(f" {prompt} [{default}]: ").strip()
|
||||||
|
return val if val else default
|
||||||
|
return input(f" {prompt}: ").strip()
|
||||||
|
|
||||||
|
|
||||||
|
def _yn(prompt, default="y"):
|
||||||
|
val = input(f" {prompt} [{'Y/n' if default == 'y' else 'y/N'}]: ").strip().lower()
|
||||||
|
if not val:
|
||||||
|
return default == "y"
|
||||||
|
return val.startswith("y")
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Step 1: Mode selection
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _ask_mode() -> str:
|
||||||
|
_header("Welcome to MemPalace")
|
||||||
|
print("""
|
||||||
|
MemPalace is a personal memory system. To work well, it needs to know
|
||||||
|
a little about your world — who the people are, what the projects
|
||||||
|
are, and how you want your memory organized.
|
||||||
|
|
||||||
|
This takes about 2 minutes. You can always update it later.
|
||||||
|
""")
|
||||||
|
print(" How are you using MemPalace?")
|
||||||
|
print()
|
||||||
|
print(" [1] Work — notes, projects, clients, colleagues, decisions")
|
||||||
|
print(" [2] Personal — diary, family, health, relationships, reflections")
|
||||||
|
print(" [3] Both — personal and professional mixed")
|
||||||
|
print()
|
||||||
|
|
||||||
|
while True:
|
||||||
|
choice = input(" Your choice [1/2/3]: ").strip()
|
||||||
|
if choice == "1":
|
||||||
|
return "work"
|
||||||
|
elif choice == "2":
|
||||||
|
return "personal"
|
||||||
|
elif choice == "3":
|
||||||
|
return "combo"
|
||||||
|
print(" Please enter 1, 2, or 3.")
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Step 2: People
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _ask_people(mode: str) -> tuple[list, dict]:
|
||||||
|
"""Returns (people_list, aliases_dict)."""
|
||||||
|
people = []
|
||||||
|
aliases = {} # nickname → full name
|
||||||
|
|
||||||
|
if mode in ("personal", "combo"):
|
||||||
|
_hr()
|
||||||
|
print("""
|
||||||
|
Personal world — who are the important people in your life?
|
||||||
|
|
||||||
|
Format: name, relationship (e.g. "Riley, daughter" or just "Devon")
|
||||||
|
For nicknames, you'll be asked separately.
|
||||||
|
Type 'done' when finished.
|
||||||
|
""")
|
||||||
|
while True:
|
||||||
|
entry = input(" Person: ").strip()
|
||||||
|
if entry.lower() in ("done", ""):
|
||||||
|
break
|
||||||
|
parts = [p.strip() for p in entry.split(",", 1)]
|
||||||
|
name = parts[0]
|
||||||
|
relationship = parts[1] if len(parts) > 1 else ""
|
||||||
|
if name:
|
||||||
|
# Ask about nicknames
|
||||||
|
nick = input(f" Nickname for {name}? (or enter to skip): ").strip()
|
||||||
|
if nick:
|
||||||
|
aliases[nick] = name
|
||||||
|
people.append({"name": name, "relationship": relationship, "context": "personal"})
|
||||||
|
|
||||||
|
if mode in ("work", "combo"):
|
||||||
|
_hr()
|
||||||
|
print("""
|
||||||
|
Work world — who are the colleagues, clients, or collaborators
|
||||||
|
you'd want to find in your notes?
|
||||||
|
|
||||||
|
Format: name, role (e.g. "Ben, co-founder" or just "Sarah")
|
||||||
|
Type 'done' when finished.
|
||||||
|
""")
|
||||||
|
while True:
|
||||||
|
entry = input(" Person: ").strip()
|
||||||
|
if entry.lower() in ("done", ""):
|
||||||
|
break
|
||||||
|
parts = [p.strip() for p in entry.split(",", 1)]
|
||||||
|
name = parts[0]
|
||||||
|
role = parts[1] if len(parts) > 1 else ""
|
||||||
|
if name:
|
||||||
|
people.append({"name": name, "relationship": role, "context": "work"})
|
||||||
|
|
||||||
|
return people, aliases
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Step 3: Projects
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _ask_projects(mode: str) -> list:
|
||||||
|
if mode == "personal":
|
||||||
|
return []
|
||||||
|
|
||||||
|
_hr()
|
||||||
|
print("""
|
||||||
|
What are your main projects? (These help MemPalace distinguish project
|
||||||
|
names from person names — e.g. "Lantern" the project vs. "Lantern" the word.)
|
||||||
|
|
||||||
|
Type 'done' when finished.
|
||||||
|
""")
|
||||||
|
projects = []
|
||||||
|
while True:
|
||||||
|
proj = input(" Project: ").strip()
|
||||||
|
if proj.lower() in ("done", ""):
|
||||||
|
break
|
||||||
|
if proj:
|
||||||
|
projects.append(proj)
|
||||||
|
return projects
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Step 4: Wings
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _ask_wings(mode: str) -> list:
|
||||||
|
defaults = DEFAULT_WINGS[mode]
|
||||||
|
_hr()
|
||||||
|
print(f"""
|
||||||
|
Wings are the top-level categories in your memory palace.
|
||||||
|
|
||||||
|
Suggested wings for {mode} mode:
|
||||||
|
{", ".join(defaults)}
|
||||||
|
|
||||||
|
Press enter to keep these, or type your own comma-separated list.
|
||||||
|
""")
|
||||||
|
custom = input(" Wings: ").strip()
|
||||||
|
if custom:
|
||||||
|
return [w.strip() for w in custom.split(",") if w.strip()]
|
||||||
|
return defaults
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Step 5: Auto-detect from files
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _auto_detect(directory: str, known_people: list) -> list:
|
||||||
|
"""Scan directory for additional entity candidates."""
|
||||||
|
known_names = {p["name"].lower() for p in known_people}
|
||||||
|
|
||||||
|
try:
|
||||||
|
files = scan_for_detection(directory)
|
||||||
|
if not files:
|
||||||
|
return []
|
||||||
|
detected = detect_entities(files)
|
||||||
|
new_people = [
|
||||||
|
e
|
||||||
|
for e in detected["people"]
|
||||||
|
if e["name"].lower() not in known_names and e["confidence"] >= 0.7
|
||||||
|
]
|
||||||
|
return new_people
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Step 6: Ambiguity warnings
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _warn_ambiguous(people: list) -> list:
|
||||||
|
"""
|
||||||
|
Flag names that are also common English words.
|
||||||
|
Returns list of ambiguous names for user awareness.
|
||||||
|
"""
|
||||||
|
from mempalace.entity_registry import COMMON_ENGLISH_WORDS
|
||||||
|
|
||||||
|
ambiguous = []
|
||||||
|
for p in people:
|
||||||
|
if p["name"].lower() in COMMON_ENGLISH_WORDS:
|
||||||
|
ambiguous.append(p["name"])
|
||||||
|
return ambiguous
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Main onboarding flow
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _generate_aaak_bootstrap(people: list, projects: list, wings: list, mode: str, config_dir: Path = None):
|
||||||
|
"""
|
||||||
|
Generate AAAK entity registry + critical facts bootstrap from onboarding data.
|
||||||
|
These files teach the AI about the user's world from session one.
|
||||||
|
"""
|
||||||
|
mempalace_dir = Path(config_dir) if config_dir else Path.home() / ".mempalace"
|
||||||
|
mempalace_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Build AAAK entity codes (first 3 letters of name, uppercase)
|
||||||
|
entity_codes = {}
|
||||||
|
for p in people:
|
||||||
|
name = p["name"]
|
||||||
|
code = name[:3].upper()
|
||||||
|
# Handle collisions
|
||||||
|
while code in entity_codes.values():
|
||||||
|
code = name[:4].upper()
|
||||||
|
entity_codes[name] = code
|
||||||
|
|
||||||
|
# AAAK entity registry
|
||||||
|
registry_lines = [
|
||||||
|
"# AAAK Entity Registry",
|
||||||
|
"# Auto-generated by mempalace init. Update as needed.",
|
||||||
|
"",
|
||||||
|
"## People",
|
||||||
|
]
|
||||||
|
for p in people:
|
||||||
|
name = p["name"]
|
||||||
|
code = entity_codes[name]
|
||||||
|
rel = p.get("relationship", "")
|
||||||
|
ctx = p.get("context", "")
|
||||||
|
registry_lines.append(f" {code}={name} ({rel})" if rel else f" {code}={name}")
|
||||||
|
|
||||||
|
if projects:
|
||||||
|
registry_lines.extend(["", "## Projects"])
|
||||||
|
for proj in projects:
|
||||||
|
code = proj[:4].upper()
|
||||||
|
registry_lines.append(f" {code}={proj}")
|
||||||
|
|
||||||
|
registry_lines.extend([
|
||||||
|
"",
|
||||||
|
"## AAAK Quick Reference",
|
||||||
|
" Symbols: ♡=love ★=importance ⚠=warning →=relationship |=separator",
|
||||||
|
" Structure: KEY:value | GROUP(details) | entity.attribute",
|
||||||
|
" Read naturally — expand codes, treat *markers* as emotional context.",
|
||||||
|
])
|
||||||
|
|
||||||
|
(mempalace_dir / "aaak_entities.md").write_text("\n".join(registry_lines))
|
||||||
|
|
||||||
|
# Critical facts bootstrap (pre-palace — before any mining)
|
||||||
|
facts_lines = [
|
||||||
|
"# Critical Facts (bootstrap — will be enriched after mining)",
|
||||||
|
"",
|
||||||
|
]
|
||||||
|
|
||||||
|
personal_people = [p for p in people if p.get("context") == "personal"]
|
||||||
|
work_people = [p for p in people if p.get("context") == "work"]
|
||||||
|
|
||||||
|
if personal_people:
|
||||||
|
facts_lines.append("## People (personal)")
|
||||||
|
for p in personal_people:
|
||||||
|
code = entity_codes[p["name"]]
|
||||||
|
rel = p.get("relationship", "")
|
||||||
|
facts_lines.append(f"- **{p['name']}** ({code}) — {rel}" if rel else f"- **{p['name']}** ({code})")
|
||||||
|
facts_lines.append("")
|
||||||
|
|
||||||
|
if work_people:
|
||||||
|
facts_lines.append("## People (work)")
|
||||||
|
for p in work_people:
|
||||||
|
code = entity_codes[p["name"]]
|
||||||
|
rel = p.get("relationship", "")
|
||||||
|
facts_lines.append(f"- **{p['name']}** ({code}) — {rel}" if rel else f"- **{p['name']}** ({code})")
|
||||||
|
facts_lines.append("")
|
||||||
|
|
||||||
|
if projects:
|
||||||
|
facts_lines.append("## Projects")
|
||||||
|
for proj in projects:
|
||||||
|
facts_lines.append(f"- **{proj}**")
|
||||||
|
facts_lines.append("")
|
||||||
|
|
||||||
|
facts_lines.extend([
|
||||||
|
"## Palace",
|
||||||
|
f"Wings: {', '.join(wings)}",
|
||||||
|
f"Mode: {mode}",
|
||||||
|
"",
|
||||||
|
"*This file will be enriched by palace_facts.py after mining.*",
|
||||||
|
])
|
||||||
|
|
||||||
|
(mempalace_dir / "critical_facts.md").write_text("\n".join(facts_lines))
|
||||||
|
|
||||||
|
|
||||||
|
def run_onboarding(
|
||||||
|
directory: str = ".",
|
||||||
|
config_dir: Path = None,
|
||||||
|
auto_detect: bool = True,
|
||||||
|
) -> EntityRegistry:
|
||||||
|
"""
|
||||||
|
Run the full onboarding flow.
|
||||||
|
Returns the seeded EntityRegistry.
|
||||||
|
"""
|
||||||
|
# Step 1: Mode
|
||||||
|
mode = _ask_mode()
|
||||||
|
|
||||||
|
# Step 2: People
|
||||||
|
people, aliases = _ask_people(mode)
|
||||||
|
|
||||||
|
# Step 3: Projects
|
||||||
|
projects = _ask_projects(mode)
|
||||||
|
|
||||||
|
# Step 4: Wings (stored in config, not registry — just show user)
|
||||||
|
wings = _ask_wings(mode)
|
||||||
|
|
||||||
|
# Step 5: Auto-detect additional people from files
|
||||||
|
if auto_detect and _yn("\nScan your files for additional names we might have missed?"):
|
||||||
|
directory = _ask("Directory to scan", default=directory)
|
||||||
|
detected = _auto_detect(directory, people)
|
||||||
|
if detected:
|
||||||
|
_hr()
|
||||||
|
print(f"\n Found {len(detected)} additional name candidates:\n")
|
||||||
|
for e in detected:
|
||||||
|
print(
|
||||||
|
f" {e['name']:20} confidence={e['confidence']:.0%} "
|
||||||
|
f"({', '.join(e['signals'][:1])})"
|
||||||
|
)
|
||||||
|
print()
|
||||||
|
if _yn(" Add any of these to your registry?"):
|
||||||
|
for e in detected:
|
||||||
|
ans = input(f" {e['name']} — (p)erson, (s)kip? ").strip().lower()
|
||||||
|
if ans == "p":
|
||||||
|
rel = input(f" Relationship/role for {e['name']}? ").strip()
|
||||||
|
ctx = (
|
||||||
|
"personal"
|
||||||
|
if mode == "personal"
|
||||||
|
else (
|
||||||
|
"work"
|
||||||
|
if mode == "work"
|
||||||
|
else input(" Context — (p)ersonal or (w)ork? ")
|
||||||
|
.strip()
|
||||||
|
.lower()
|
||||||
|
.replace("w", "work")
|
||||||
|
.replace("p", "personal")
|
||||||
|
)
|
||||||
|
)
|
||||||
|
people.append({"name": e["name"], "relationship": rel, "context": ctx})
|
||||||
|
|
||||||
|
# Step 6: Warn about ambiguous names
|
||||||
|
ambiguous = _warn_ambiguous(people)
|
||||||
|
if ambiguous:
|
||||||
|
_hr()
|
||||||
|
print(f"""
|
||||||
|
Heads up — these names are also common English words:
|
||||||
|
{", ".join(ambiguous)}
|
||||||
|
|
||||||
|
MemPalace will check the context before treating them as person names.
|
||||||
|
For example: "I picked up Riley" → person.
|
||||||
|
"Have you ever tried" → adverb.
|
||||||
|
""")
|
||||||
|
|
||||||
|
# Build and save registry
|
||||||
|
registry = EntityRegistry.load(config_dir)
|
||||||
|
registry.seed(mode=mode, people=people, projects=projects, aliases=aliases)
|
||||||
|
|
||||||
|
# Generate AAAK entity registry + critical facts bootstrap
|
||||||
|
_generate_aaak_bootstrap(people, projects, wings, mode, config_dir)
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
_header("Setup Complete")
|
||||||
|
print()
|
||||||
|
print(f" {registry.summary()}")
|
||||||
|
print(f"\n Wings: {', '.join(wings)}")
|
||||||
|
print(f"\n Registry saved to: {registry._path}")
|
||||||
|
print(f"\n AAAK entity registry: ~/.mempalace/aaak_entities.md")
|
||||||
|
print(f" Critical facts bootstrap: ~/.mempalace/critical_facts.md")
|
||||||
|
print(f"\n Your AI will know your world from the first session.")
|
||||||
|
print()
|
||||||
|
|
||||||
|
return registry
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Quick setup (non-interactive, for testing)
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def quick_setup(
|
||||||
|
mode: str,
|
||||||
|
people: list,
|
||||||
|
projects: list = None,
|
||||||
|
aliases: dict = None,
|
||||||
|
config_dir: Path = None,
|
||||||
|
) -> EntityRegistry:
|
||||||
|
"""
|
||||||
|
Programmatic setup without interactive prompts.
|
||||||
|
Used in tests and benchmark scripts.
|
||||||
|
|
||||||
|
people: list of dicts {"name": str, "relationship": str, "context": str}
|
||||||
|
"""
|
||||||
|
registry = EntityRegistry.load(config_dir)
|
||||||
|
registry.seed(
|
||||||
|
mode=mode,
|
||||||
|
people=people,
|
||||||
|
projects=projects or [],
|
||||||
|
aliases=aliases or {},
|
||||||
|
)
|
||||||
|
return registry
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# CLI
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
|
||||||
|
directory = sys.argv[1] if len(sys.argv) > 1 else "."
|
||||||
|
run_onboarding(directory=directory)
|
||||||
@@ -0,0 +1,216 @@
|
|||||||
|
"""
|
||||||
|
palace_graph.py — Graph traversal layer for MemPalace
|
||||||
|
======================================================
|
||||||
|
|
||||||
|
Builds a navigable graph from the palace structure:
|
||||||
|
- Nodes = rooms (named ideas)
|
||||||
|
- Edges = shared rooms across wings (tunnels)
|
||||||
|
- Edge types = halls (the corridors)
|
||||||
|
|
||||||
|
Enables queries like:
|
||||||
|
"Start at chromadb-setup in wing_code, walk to wing_myproject"
|
||||||
|
"Find all rooms connected to riley-college-apps"
|
||||||
|
"What topics bridge wing_hardware and wing_myproject?"
|
||||||
|
|
||||||
|
No external graph DB needed — built from ChromaDB metadata.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from collections import defaultdict, Counter
|
||||||
|
from .config import MempalaceConfig
|
||||||
|
|
||||||
|
import chromadb
|
||||||
|
|
||||||
|
|
||||||
|
def _get_collection(config=None):
|
||||||
|
config = config or MempalaceConfig()
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=config.palace_path)
|
||||||
|
return client.get_collection(config.collection_name)
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def build_graph(col=None, config=None):
|
||||||
|
"""
|
||||||
|
Build the palace graph from ChromaDB metadata.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
nodes: dict of {room: {wings: set, halls: set, count: int}}
|
||||||
|
edges: list of {room, wing_a, wing_b, hall} — one per tunnel crossing
|
||||||
|
"""
|
||||||
|
if col is None:
|
||||||
|
col = _get_collection(config)
|
||||||
|
if not col:
|
||||||
|
return {}, []
|
||||||
|
|
||||||
|
total = col.count()
|
||||||
|
room_data = defaultdict(lambda: {"wings": set(), "halls": set(), "count": 0, "dates": set()})
|
||||||
|
|
||||||
|
offset = 0
|
||||||
|
while offset < total:
|
||||||
|
batch = col.get(limit=1000, offset=offset, include=["metadatas"])
|
||||||
|
for meta in batch["metadatas"]:
|
||||||
|
room = meta.get("room", "")
|
||||||
|
wing = meta.get("wing", "")
|
||||||
|
hall = meta.get("hall", "")
|
||||||
|
date = meta.get("date", "")
|
||||||
|
if room and room != "general" and wing:
|
||||||
|
room_data[room]["wings"].add(wing)
|
||||||
|
if hall:
|
||||||
|
room_data[room]["halls"].add(hall)
|
||||||
|
if date:
|
||||||
|
room_data[room]["dates"].add(date)
|
||||||
|
room_data[room]["count"] += 1
|
||||||
|
if not batch["ids"]:
|
||||||
|
break
|
||||||
|
offset += len(batch["ids"])
|
||||||
|
|
||||||
|
# Build edges from rooms that span multiple wings
|
||||||
|
edges = []
|
||||||
|
for room, data in room_data.items():
|
||||||
|
wings = sorted(data["wings"])
|
||||||
|
if len(wings) >= 2:
|
||||||
|
for i, wa in enumerate(wings):
|
||||||
|
for wb in wings[i + 1:]:
|
||||||
|
for hall in data["halls"]:
|
||||||
|
edges.append({
|
||||||
|
"room": room,
|
||||||
|
"wing_a": wa,
|
||||||
|
"wing_b": wb,
|
||||||
|
"hall": hall,
|
||||||
|
"count": data["count"],
|
||||||
|
})
|
||||||
|
|
||||||
|
# Convert sets to lists for JSON serialization
|
||||||
|
nodes = {}
|
||||||
|
for room, data in room_data.items():
|
||||||
|
nodes[room] = {
|
||||||
|
"wings": sorted(data["wings"]),
|
||||||
|
"halls": sorted(data["halls"]),
|
||||||
|
"count": data["count"],
|
||||||
|
"dates": sorted(data["dates"])[-5:] if data["dates"] else [],
|
||||||
|
}
|
||||||
|
|
||||||
|
return nodes, edges
|
||||||
|
|
||||||
|
|
||||||
|
def traverse(start_room: str, col=None, config=None, max_hops: int = 2):
|
||||||
|
"""
|
||||||
|
Walk the graph from a starting room. Find connected rooms
|
||||||
|
through shared wings.
|
||||||
|
|
||||||
|
Returns list of paths: [{room, wing, hall, hop_distance}]
|
||||||
|
"""
|
||||||
|
nodes, edges = build_graph(col, config)
|
||||||
|
|
||||||
|
if start_room not in nodes:
|
||||||
|
return {"error": f"Room '{start_room}' not found", "suggestions": _fuzzy_match(start_room, nodes)}
|
||||||
|
|
||||||
|
start = nodes[start_room]
|
||||||
|
visited = {start_room}
|
||||||
|
results = [{
|
||||||
|
"room": start_room,
|
||||||
|
"wings": start["wings"],
|
||||||
|
"halls": start["halls"],
|
||||||
|
"count": start["count"],
|
||||||
|
"hop": 0,
|
||||||
|
}]
|
||||||
|
|
||||||
|
# BFS traversal
|
||||||
|
frontier = [(start_room, 0)]
|
||||||
|
while frontier:
|
||||||
|
current_room, depth = frontier.pop(0)
|
||||||
|
if depth >= max_hops:
|
||||||
|
continue
|
||||||
|
|
||||||
|
current = nodes.get(current_room, {})
|
||||||
|
current_wings = set(current.get("wings", []))
|
||||||
|
|
||||||
|
# Find all rooms that share a wing with current room
|
||||||
|
for room, data in nodes.items():
|
||||||
|
if room in visited:
|
||||||
|
continue
|
||||||
|
shared_wings = current_wings & set(data["wings"])
|
||||||
|
if shared_wings:
|
||||||
|
visited.add(room)
|
||||||
|
results.append({
|
||||||
|
"room": room,
|
||||||
|
"wings": data["wings"],
|
||||||
|
"halls": data["halls"],
|
||||||
|
"count": data["count"],
|
||||||
|
"hop": depth + 1,
|
||||||
|
"connected_via": sorted(shared_wings),
|
||||||
|
})
|
||||||
|
if depth + 1 < max_hops:
|
||||||
|
frontier.append((room, depth + 1))
|
||||||
|
|
||||||
|
# Sort by relevance (hop distance, then count)
|
||||||
|
results.sort(key=lambda x: (x["hop"], -x["count"]))
|
||||||
|
return results[:50] # cap results
|
||||||
|
|
||||||
|
|
||||||
|
def find_tunnels(wing_a: str = None, wing_b: str = None, col=None, config=None):
|
||||||
|
"""
|
||||||
|
Find rooms that connect two wings (or all tunnel rooms if no wings specified).
|
||||||
|
These are the "hallways" — same named idea appearing in multiple domains.
|
||||||
|
"""
|
||||||
|
nodes, edges = build_graph(col, config)
|
||||||
|
|
||||||
|
tunnels = []
|
||||||
|
for room, data in nodes.items():
|
||||||
|
wings = data["wings"]
|
||||||
|
if len(wings) < 2:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if wing_a and wing_a not in wings:
|
||||||
|
continue
|
||||||
|
if wing_b and wing_b not in wings:
|
||||||
|
continue
|
||||||
|
|
||||||
|
tunnels.append({
|
||||||
|
"room": room,
|
||||||
|
"wings": wings,
|
||||||
|
"halls": data["halls"],
|
||||||
|
"count": data["count"],
|
||||||
|
"recent": data["dates"][-1] if data["dates"] else "",
|
||||||
|
})
|
||||||
|
|
||||||
|
tunnels.sort(key=lambda x: -x["count"])
|
||||||
|
return tunnels[:50]
|
||||||
|
|
||||||
|
|
||||||
|
def graph_stats(col=None, config=None):
|
||||||
|
"""Summary statistics about the palace graph."""
|
||||||
|
nodes, edges = build_graph(col, config)
|
||||||
|
|
||||||
|
tunnel_rooms = sum(1 for n in nodes.values() if len(n["wings"]) >= 2)
|
||||||
|
wing_counts = Counter()
|
||||||
|
for data in nodes.values():
|
||||||
|
for w in data["wings"]:
|
||||||
|
wing_counts[w] += 1
|
||||||
|
|
||||||
|
return {
|
||||||
|
"total_rooms": len(nodes),
|
||||||
|
"tunnel_rooms": tunnel_rooms,
|
||||||
|
"total_edges": len(edges),
|
||||||
|
"rooms_per_wing": dict(wing_counts.most_common()),
|
||||||
|
"top_tunnels": [
|
||||||
|
{"room": r, "wings": d["wings"], "count": d["count"]}
|
||||||
|
for r, d in sorted(nodes.items(), key=lambda x: -len(x[1]["wings"]))[:10]
|
||||||
|
if len(d["wings"]) >= 2
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _fuzzy_match(query: str, nodes: dict, n: int = 5):
|
||||||
|
"""Find rooms that approximately match a query string."""
|
||||||
|
query_lower = query.lower()
|
||||||
|
scored = []
|
||||||
|
for room in nodes:
|
||||||
|
# Simple substring matching
|
||||||
|
if query_lower in room:
|
||||||
|
scored.append((room, 1.0))
|
||||||
|
elif any(word in room for word in query_lower.split("-")):
|
||||||
|
scored.append((room, 0.5))
|
||||||
|
scored.sort(key=lambda x: -x[1])
|
||||||
|
return [r for r, _ in scored[:n]]
|
||||||
@@ -0,0 +1,300 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
room_detector_local.py — Local setup, no API required.
|
||||||
|
|
||||||
|
Two ways to define rooms without calling any AI:
|
||||||
|
1. Auto-detect from folder structure (zero config)
|
||||||
|
2. Define manually in mempalace.yaml
|
||||||
|
|
||||||
|
No internet. No API key. Your files stay on your machine.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import yaml
|
||||||
|
from pathlib import Path
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
# Common room patterns — detected from folder names and filenames
|
||||||
|
# Format: {folder_keyword: room_name}
|
||||||
|
FOLDER_ROOM_MAP = {
|
||||||
|
"frontend": "frontend",
|
||||||
|
"front-end": "frontend",
|
||||||
|
"front_end": "frontend",
|
||||||
|
"client": "frontend",
|
||||||
|
"ui": "frontend",
|
||||||
|
"views": "frontend",
|
||||||
|
"components": "frontend",
|
||||||
|
"pages": "frontend",
|
||||||
|
"backend": "backend",
|
||||||
|
"back-end": "backend",
|
||||||
|
"back_end": "backend",
|
||||||
|
"server": "backend",
|
||||||
|
"api": "backend",
|
||||||
|
"routes": "backend",
|
||||||
|
"services": "backend",
|
||||||
|
"controllers": "backend",
|
||||||
|
"models": "backend",
|
||||||
|
"database": "backend",
|
||||||
|
"db": "backend",
|
||||||
|
"docs": "documentation",
|
||||||
|
"doc": "documentation",
|
||||||
|
"documentation": "documentation",
|
||||||
|
"wiki": "documentation",
|
||||||
|
"readme": "documentation",
|
||||||
|
"notes": "documentation",
|
||||||
|
"design": "design",
|
||||||
|
"designs": "design",
|
||||||
|
"mockups": "design",
|
||||||
|
"wireframes": "design",
|
||||||
|
"assets": "design",
|
||||||
|
"storyboard": "design",
|
||||||
|
"costs": "costs",
|
||||||
|
"cost": "costs",
|
||||||
|
"budget": "costs",
|
||||||
|
"finance": "costs",
|
||||||
|
"financial": "costs",
|
||||||
|
"pricing": "costs",
|
||||||
|
"invoices": "costs",
|
||||||
|
"accounting": "costs",
|
||||||
|
"meetings": "meetings",
|
||||||
|
"meeting": "meetings",
|
||||||
|
"calls": "meetings",
|
||||||
|
"meeting_notes": "meetings",
|
||||||
|
"standup": "meetings",
|
||||||
|
"minutes": "meetings",
|
||||||
|
"team": "team",
|
||||||
|
"staff": "team",
|
||||||
|
"hr": "team",
|
||||||
|
"hiring": "team",
|
||||||
|
"employees": "team",
|
||||||
|
"people": "team",
|
||||||
|
"research": "research",
|
||||||
|
"references": "research",
|
||||||
|
"reading": "research",
|
||||||
|
"papers": "research",
|
||||||
|
"planning": "planning",
|
||||||
|
"roadmap": "planning",
|
||||||
|
"strategy": "planning",
|
||||||
|
"specs": "planning",
|
||||||
|
"requirements": "planning",
|
||||||
|
"tests": "testing",
|
||||||
|
"test": "testing",
|
||||||
|
"testing": "testing",
|
||||||
|
"qa": "testing",
|
||||||
|
"scripts": "scripts",
|
||||||
|
"tools": "scripts",
|
||||||
|
"utils": "scripts",
|
||||||
|
"config": "configuration",
|
||||||
|
"configs": "configuration",
|
||||||
|
"settings": "configuration",
|
||||||
|
"infrastructure": "configuration",
|
||||||
|
"infra": "configuration",
|
||||||
|
"deploy": "configuration",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def detect_rooms_from_folders(project_dir: str) -> list:
|
||||||
|
"""
|
||||||
|
Walk the project folder structure.
|
||||||
|
Find top-level subdirectories that match known room patterns.
|
||||||
|
Returns list of room dicts.
|
||||||
|
"""
|
||||||
|
project_path = Path(project_dir).expanduser().resolve()
|
||||||
|
found_rooms = {}
|
||||||
|
|
||||||
|
SKIP_DIRS = {
|
||||||
|
".git",
|
||||||
|
"node_modules",
|
||||||
|
"__pycache__",
|
||||||
|
".venv",
|
||||||
|
"venv",
|
||||||
|
"env",
|
||||||
|
"dist",
|
||||||
|
"build",
|
||||||
|
".next",
|
||||||
|
"coverage",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check top-level directories first (most reliable signal)
|
||||||
|
for item in project_path.iterdir():
|
||||||
|
if item.is_dir() and item.name not in SKIP_DIRS:
|
||||||
|
name_lower = item.name.lower().replace("-", "_")
|
||||||
|
if name_lower in FOLDER_ROOM_MAP:
|
||||||
|
room_name = FOLDER_ROOM_MAP[name_lower]
|
||||||
|
if room_name not in found_rooms:
|
||||||
|
found_rooms[room_name] = item.name
|
||||||
|
# Also check if folder name IS a good room name directly
|
||||||
|
elif len(item.name) > 2 and item.name[0].isalpha():
|
||||||
|
clean = item.name.lower().replace("-", "_").replace(" ", "_")
|
||||||
|
if clean not in found_rooms:
|
||||||
|
found_rooms[clean] = item.name
|
||||||
|
|
||||||
|
# Walk one level deeper for nested patterns
|
||||||
|
for item in project_path.iterdir():
|
||||||
|
if item.is_dir() and item.name not in SKIP_DIRS:
|
||||||
|
for subitem in item.iterdir():
|
||||||
|
if subitem.is_dir() and subitem.name not in SKIP_DIRS:
|
||||||
|
name_lower = subitem.name.lower().replace("-", "_")
|
||||||
|
if name_lower in FOLDER_ROOM_MAP:
|
||||||
|
room_name = FOLDER_ROOM_MAP[name_lower]
|
||||||
|
if room_name not in found_rooms:
|
||||||
|
found_rooms[room_name] = subitem.name
|
||||||
|
|
||||||
|
# Build room list
|
||||||
|
rooms = []
|
||||||
|
for room_name, original in found_rooms.items():
|
||||||
|
rooms.append(
|
||||||
|
{
|
||||||
|
"name": room_name,
|
||||||
|
"description": f"Files from {original}/",
|
||||||
|
"keywords": [room_name, original.lower()],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Always add "general" as fallback
|
||||||
|
if not any(r["name"] == "general" for r in rooms):
|
||||||
|
rooms.append(
|
||||||
|
{
|
||||||
|
"name": "general",
|
||||||
|
"description": "Files that don't fit other rooms",
|
||||||
|
"keywords": [],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return rooms
|
||||||
|
|
||||||
|
|
||||||
|
def detect_rooms_from_files(project_dir: str) -> list:
|
||||||
|
"""
|
||||||
|
Fallback: if folder structure gives no signal,
|
||||||
|
detect rooms from recurring filename patterns.
|
||||||
|
"""
|
||||||
|
project_path = Path(project_dir).expanduser().resolve()
|
||||||
|
keyword_counts = defaultdict(int)
|
||||||
|
|
||||||
|
SKIP_DIRS = {".git", "node_modules", "__pycache__", ".venv", "venv", "dist", "build"}
|
||||||
|
|
||||||
|
for root, dirs, filenames in os.walk(project_path):
|
||||||
|
dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
|
||||||
|
for filename in filenames:
|
||||||
|
name_lower = filename.lower().replace("-", "_").replace(" ", "_")
|
||||||
|
for keyword, room in FOLDER_ROOM_MAP.items():
|
||||||
|
if keyword in name_lower:
|
||||||
|
keyword_counts[room] += 1
|
||||||
|
|
||||||
|
# Return rooms that appear more than twice
|
||||||
|
rooms = []
|
||||||
|
for room, count in sorted(keyword_counts.items(), key=lambda x: x[1], reverse=True):
|
||||||
|
if count >= 2:
|
||||||
|
rooms.append(
|
||||||
|
{
|
||||||
|
"name": room,
|
||||||
|
"description": f"Files related to {room}",
|
||||||
|
"keywords": [room],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
if len(rooms) >= 6:
|
||||||
|
break
|
||||||
|
|
||||||
|
if not rooms:
|
||||||
|
rooms = [{"name": "general", "description": "All project files", "keywords": []}]
|
||||||
|
|
||||||
|
return rooms
|
||||||
|
|
||||||
|
|
||||||
|
def print_proposed_structure(project_name: str, rooms: list, total_files: int, source: str):
|
||||||
|
print(f"\n{'=' * 55}")
|
||||||
|
print(" MemPalace Init — Local setup")
|
||||||
|
print(f"{'=' * 55}")
|
||||||
|
print(f"\n WING: {project_name}")
|
||||||
|
print(f" ({total_files} files found, rooms detected from {source})\n")
|
||||||
|
for room in rooms:
|
||||||
|
print(f" ROOM: {room['name']}")
|
||||||
|
print(f" {room['description']}")
|
||||||
|
print(f"\n{'─' * 55}")
|
||||||
|
|
||||||
|
|
||||||
|
def get_user_approval(rooms: list) -> list:
|
||||||
|
"""Same approval flow as AI version."""
|
||||||
|
print(" Review the proposed rooms above.")
|
||||||
|
print(" Options:")
|
||||||
|
print(" [enter] Accept all rooms")
|
||||||
|
print(" [edit] Remove or rename rooms")
|
||||||
|
print(" [add] Add a room manually")
|
||||||
|
print()
|
||||||
|
|
||||||
|
choice = input(" Your choice [enter/edit/add]: ").strip().lower()
|
||||||
|
|
||||||
|
if choice in ("", "y", "yes"):
|
||||||
|
return rooms
|
||||||
|
|
||||||
|
if choice == "edit":
|
||||||
|
print("\n Current rooms:")
|
||||||
|
for i, room in enumerate(rooms):
|
||||||
|
print(f" {i + 1}. {room['name']} — {room['description']}")
|
||||||
|
remove = input("\n Room numbers to REMOVE (comma-separated, or enter to skip): ").strip()
|
||||||
|
if remove:
|
||||||
|
to_remove = {int(x.strip()) - 1 for x in remove.split(",") if x.strip().isdigit()}
|
||||||
|
rooms = [r for i, r in enumerate(rooms) if i not in to_remove]
|
||||||
|
|
||||||
|
if choice == "add" or input("\n Add any missing rooms? [y/N]: ").strip().lower() == "y":
|
||||||
|
while True:
|
||||||
|
new_name = (
|
||||||
|
input(" New room name (or enter to stop): ").strip().lower().replace(" ", "_")
|
||||||
|
)
|
||||||
|
if not new_name:
|
||||||
|
break
|
||||||
|
new_desc = input(f" Description for '{new_name}': ").strip()
|
||||||
|
rooms.append({"name": new_name, "description": new_desc, "keywords": [new_name]})
|
||||||
|
print(f" Added: {new_name}")
|
||||||
|
|
||||||
|
return rooms
|
||||||
|
|
||||||
|
|
||||||
|
def save_config(project_dir: str, project_name: str, rooms: list):
|
||||||
|
config = {
|
||||||
|
"wing": project_name,
|
||||||
|
"rooms": [{"name": r["name"], "description": r["description"]} for r in rooms],
|
||||||
|
}
|
||||||
|
config_path = Path(project_dir).expanduser().resolve() / "mempalace.yaml"
|
||||||
|
with open(config_path, "w") as f:
|
||||||
|
yaml.dump(config, f, default_flow_style=False, sort_keys=False)
|
||||||
|
|
||||||
|
print(f"\n Config saved: {config_path}")
|
||||||
|
print("\n Next step:")
|
||||||
|
print(f" mempalace mine {project_dir}")
|
||||||
|
print(f"\n{'=' * 55}\n")
|
||||||
|
|
||||||
|
|
||||||
|
def detect_rooms_local(project_dir: str):
|
||||||
|
"""Main entry point for local setup."""
|
||||||
|
project_path = Path(project_dir).expanduser().resolve()
|
||||||
|
project_name = project_path.name.lower().replace(" ", "_").replace("-", "_")
|
||||||
|
|
||||||
|
if not project_path.exists():
|
||||||
|
print(f"ERROR: Directory not found: {project_dir}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Count files
|
||||||
|
from .miner import scan_project
|
||||||
|
|
||||||
|
files = scan_project(project_dir)
|
||||||
|
|
||||||
|
# Try folder structure first
|
||||||
|
rooms = detect_rooms_from_folders(project_dir)
|
||||||
|
source = "folder structure"
|
||||||
|
|
||||||
|
# If only "general" found, try filename patterns
|
||||||
|
if len(rooms) <= 1:
|
||||||
|
rooms = detect_rooms_from_files(project_dir)
|
||||||
|
source = "filename patterns"
|
||||||
|
|
||||||
|
# If still nothing, just use general
|
||||||
|
if not rooms:
|
||||||
|
rooms = [{"name": "general", "description": "All project files", "keywords": []}]
|
||||||
|
source = "fallback (flat project)"
|
||||||
|
|
||||||
|
print_proposed_structure(project_name, rooms, len(files), source)
|
||||||
|
approved_rooms = get_user_approval(rooms)
|
||||||
|
save_config(project_dir, project_name, approved_rooms)
|
||||||
@@ -0,0 +1,142 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
searcher.py — Find anything. Exact words.
|
||||||
|
|
||||||
|
Semantic search against the palace.
|
||||||
|
Returns verbatim text — the actual words, never summaries.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import chromadb
|
||||||
|
|
||||||
|
|
||||||
|
def search(query: str, palace_path: str, wing: str = None, room: str = None, n_results: int = 5):
|
||||||
|
"""
|
||||||
|
Search the palace. Returns verbatim drawer content.
|
||||||
|
Optionally filter by wing (project) or room (aspect).
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=palace_path)
|
||||||
|
col = client.get_collection("mempalace_drawers")
|
||||||
|
except Exception:
|
||||||
|
print(f"\n No palace found at {palace_path}")
|
||||||
|
print(" Run: mempalace init <dir> then mempalace mine <dir>")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Build where filter
|
||||||
|
where = {}
|
||||||
|
if wing and room:
|
||||||
|
where = {"$and": [{"wing": wing}, {"room": room}]}
|
||||||
|
elif wing:
|
||||||
|
where = {"wing": wing}
|
||||||
|
elif room:
|
||||||
|
where = {"room": room}
|
||||||
|
|
||||||
|
try:
|
||||||
|
kwargs = {
|
||||||
|
"query_texts": [query],
|
||||||
|
"n_results": n_results,
|
||||||
|
"include": ["documents", "metadatas", "distances"],
|
||||||
|
}
|
||||||
|
if where:
|
||||||
|
kwargs["where"] = where
|
||||||
|
|
||||||
|
results = col.query(**kwargs)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"\n Search error: {e}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
docs = results["documents"][0]
|
||||||
|
metas = results["metadatas"][0]
|
||||||
|
dists = results["distances"][0]
|
||||||
|
|
||||||
|
if not docs:
|
||||||
|
print(f'\n No results found for: "{query}"')
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"\n{'=' * 60}")
|
||||||
|
print(f' Results for: "{query}"')
|
||||||
|
if wing:
|
||||||
|
print(f" Wing: {wing}")
|
||||||
|
if room:
|
||||||
|
print(f" Room: {room}")
|
||||||
|
print(f"{'=' * 60}\n")
|
||||||
|
|
||||||
|
for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
|
||||||
|
similarity = round(1 - dist, 3)
|
||||||
|
source = Path(meta.get("source_file", "?")).name
|
||||||
|
wing_name = meta.get("wing", "?")
|
||||||
|
room_name = meta.get("room", "?")
|
||||||
|
|
||||||
|
print(f" [{i}] {wing_name} / {room_name}")
|
||||||
|
print(f" Source: {source}")
|
||||||
|
print(f" Match: {similarity}")
|
||||||
|
print()
|
||||||
|
# Print the verbatim text, indented
|
||||||
|
for line in doc.strip().split("\n"):
|
||||||
|
print(f" {line}")
|
||||||
|
print()
|
||||||
|
print(f" {'─' * 56}")
|
||||||
|
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
|
def search_memories(
|
||||||
|
query: str, palace_path: str, wing: str = None, room: str = None, n_results: int = 5
|
||||||
|
) -> dict:
|
||||||
|
"""
|
||||||
|
Programmatic search — returns a dict instead of printing.
|
||||||
|
Used by the MCP server and other callers that need data.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
client = chromadb.PersistentClient(path=palace_path)
|
||||||
|
col = client.get_collection("mempalace_drawers")
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": f"No palace found at {palace_path}: {e}"}
|
||||||
|
|
||||||
|
# Build where filter
|
||||||
|
where = {}
|
||||||
|
if wing and room:
|
||||||
|
where = {"$and": [{"wing": wing}, {"room": room}]}
|
||||||
|
elif wing:
|
||||||
|
where = {"wing": wing}
|
||||||
|
elif room:
|
||||||
|
where = {"room": room}
|
||||||
|
|
||||||
|
try:
|
||||||
|
kwargs = {
|
||||||
|
"query_texts": [query],
|
||||||
|
"n_results": n_results,
|
||||||
|
"include": ["documents", "metadatas", "distances"],
|
||||||
|
}
|
||||||
|
if where:
|
||||||
|
kwargs["where"] = where
|
||||||
|
|
||||||
|
results = col.query(**kwargs)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": f"Search error: {e}"}
|
||||||
|
|
||||||
|
docs = results["documents"][0]
|
||||||
|
metas = results["metadatas"][0]
|
||||||
|
dists = results["distances"][0]
|
||||||
|
|
||||||
|
hits = []
|
||||||
|
for doc, meta, dist in zip(docs, metas, dists):
|
||||||
|
hits.append(
|
||||||
|
{
|
||||||
|
"text": doc,
|
||||||
|
"wing": meta.get("wing", "unknown"),
|
||||||
|
"room": meta.get("room", "unknown"),
|
||||||
|
"source_file": Path(meta.get("source_file", "?")).name,
|
||||||
|
"similarity": round(1 - dist, 3),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"query": query,
|
||||||
|
"filters": {"wing": wing, "room": room},
|
||||||
|
"results": hits,
|
||||||
|
}
|
||||||
@@ -0,0 +1,269 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
spellcheck.py — Spell-correct user messages before palace filing.
|
||||||
|
|
||||||
|
Preserves:
|
||||||
|
- Technical terms (words with digits, hyphens, underscores)
|
||||||
|
- CamelCase and ALL_CAPS identifiers
|
||||||
|
- Known entity names (from EntityRegistry if available)
|
||||||
|
- URLs and file paths
|
||||||
|
- Words shorter than 3 chars (common abbreviations, pronouns, etc.)
|
||||||
|
- Proper nouns already capitalized in context
|
||||||
|
|
||||||
|
Corrects:
|
||||||
|
- Genuine typos in lowercase, flowing text
|
||||||
|
- Common fat-finger words (3am → 3am, knoe → know)
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
from mempalace.spellcheck import spellcheck_user_text
|
||||||
|
corrected = spellcheck_user_text("lsresdy knoe the question befor")
|
||||||
|
# → "already know the question before" (best effort)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import re
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
# Lazy-load autocorrect — not everyone has it installed
|
||||||
|
_speller = None
|
||||||
|
_autocorrect_available = None
|
||||||
|
|
||||||
|
# System word list — loaded once, used to skip already-valid words
|
||||||
|
_system_words: Optional[set] = None
|
||||||
|
_SYSTEM_DICT = Path("/usr/share/dict/words")
|
||||||
|
|
||||||
|
|
||||||
|
def _get_speller():
|
||||||
|
global _speller, _autocorrect_available
|
||||||
|
if _autocorrect_available is None:
|
||||||
|
try:
|
||||||
|
from autocorrect import Speller
|
||||||
|
|
||||||
|
_speller = Speller(lang="en")
|
||||||
|
_autocorrect_available = True
|
||||||
|
except ImportError:
|
||||||
|
_autocorrect_available = False
|
||||||
|
return _speller if _autocorrect_available else None
|
||||||
|
|
||||||
|
|
||||||
|
def _get_system_words() -> set:
|
||||||
|
"""Load /usr/share/dict/words once and cache it."""
|
||||||
|
global _system_words
|
||||||
|
if _system_words is None:
|
||||||
|
if _SYSTEM_DICT.exists():
|
||||||
|
with open(_SYSTEM_DICT) as f:
|
||||||
|
_system_words = {w.strip().lower() for w in f if w.strip()}
|
||||||
|
else:
|
||||||
|
_system_words = set()
|
||||||
|
return _system_words
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Patterns that mark a token as "don't touch this"
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Matches any token with a digit anywhere in it: 3am, bge-large-v1.5, top-10
|
||||||
|
_HAS_DIGIT = re.compile(r"\d")
|
||||||
|
|
||||||
|
# CamelCase: ChromaDB, MemPalace, LongMemEval
|
||||||
|
_IS_CAMEL = re.compile(r"[A-Z][a-z]+[A-Z]")
|
||||||
|
|
||||||
|
# ALL_CAPS or all-caps with underscores: NDCG, R@5, MAX_RESULTS
|
||||||
|
_IS_ALLCAPS = re.compile(r"^[A-Z_@#$%^&*()+=\[\]{}|<>?.:/\\]+$")
|
||||||
|
|
||||||
|
# Technical token: contains hyphens or underscores (bge-large, train_test)
|
||||||
|
_IS_TECHNICAL = re.compile(r"[-_]")
|
||||||
|
|
||||||
|
# URL-like or file-path-like
|
||||||
|
_IS_URL = re.compile(r"https?://|www\.|/Users/|~/|\.[a-z]{2,4}$", re.IGNORECASE)
|
||||||
|
|
||||||
|
# Code fences, markdown, or emoji-heavy
|
||||||
|
_IS_CODE_OR_EMOJI = re.compile(r"[`*_#{}[\]\\]")
|
||||||
|
|
||||||
|
# Very short tokens — skip (I, a, ok, my, etc. — also avoids ambiguous 3-char typos
|
||||||
|
# like "kno" which autocorrect resolves as "no" rather than "know")
|
||||||
|
_MIN_LENGTH = 4
|
||||||
|
|
||||||
|
|
||||||
|
def _should_skip(token: str, known_names: set) -> bool:
|
||||||
|
"""Return True if this token should be left as-is."""
|
||||||
|
if len(token) < _MIN_LENGTH:
|
||||||
|
return True
|
||||||
|
if _HAS_DIGIT.search(token):
|
||||||
|
return True
|
||||||
|
if _IS_CAMEL.search(token):
|
||||||
|
return True
|
||||||
|
if _IS_ALLCAPS.match(token):
|
||||||
|
return True
|
||||||
|
if _IS_TECHNICAL.search(token):
|
||||||
|
return True
|
||||||
|
if _IS_URL.search(token):
|
||||||
|
return True
|
||||||
|
if _IS_CODE_OR_EMOJI.search(token):
|
||||||
|
return True
|
||||||
|
# Known proper names (entity registry)
|
||||||
|
if token.lower() in known_names:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Load known entity names from registry (optional, best-effort)
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _load_known_names() -> set:
|
||||||
|
"""Pull all registered names from EntityRegistry. Returns empty set on failure."""
|
||||||
|
try:
|
||||||
|
from mempalace.entity_registry import EntityRegistry
|
||||||
|
|
||||||
|
reg = EntityRegistry.load()
|
||||||
|
names = set()
|
||||||
|
for entity in reg._data.get("entities", {}).values():
|
||||||
|
names.add(entity.get("canonical", "").lower())
|
||||||
|
for alias in entity.get("aliases", []):
|
||||||
|
names.add(alias.lower())
|
||||||
|
return names
|
||||||
|
except Exception:
|
||||||
|
return set()
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Edit distance — used to guard against over-aggressive autocorrect
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def _edit_distance(a: str, b: str) -> int:
|
||||||
|
"""Levenshtein distance between two strings."""
|
||||||
|
if a == b:
|
||||||
|
return 0
|
||||||
|
if not a:
|
||||||
|
return len(b)
|
||||||
|
if not b:
|
||||||
|
return len(a)
|
||||||
|
prev = list(range(len(b) + 1))
|
||||||
|
for i, ca in enumerate(a, 1):
|
||||||
|
curr = [i]
|
||||||
|
for j, cb in enumerate(b, 1):
|
||||||
|
curr.append(min(prev[j] + 1, curr[j - 1] + 1, prev[j - 1] + (ca != cb)))
|
||||||
|
prev = curr
|
||||||
|
return prev[-1]
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Core correction
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Split on word boundaries but keep punctuation attached to tokens
|
||||||
|
_TOKEN_RE = re.compile(r"(\S+)")
|
||||||
|
|
||||||
|
|
||||||
|
def spellcheck_user_text(text: str, known_names: Optional[set] = None) -> str:
|
||||||
|
"""
|
||||||
|
Spell-correct a user message.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Raw user message text.
|
||||||
|
known_names: Set of lowercase names/terms to preserve. If None,
|
||||||
|
attempts to load from EntityRegistry automatically.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Corrected text. Falls back to original if autocorrect not installed.
|
||||||
|
"""
|
||||||
|
speller = _get_speller()
|
||||||
|
if speller is None:
|
||||||
|
return text # autocorrect not installed — pass through unchanged
|
||||||
|
|
||||||
|
if known_names is None:
|
||||||
|
known_names = _load_known_names()
|
||||||
|
|
||||||
|
# Process token by token, preserving all whitespace
|
||||||
|
sys_words = _get_system_words()
|
||||||
|
|
||||||
|
def _fix(match):
|
||||||
|
token = match.group(0)
|
||||||
|
# Strip trailing punctuation for checking, reattach after
|
||||||
|
stripped = token.rstrip(".,!?;:'\")")
|
||||||
|
punct = token[len(stripped) :]
|
||||||
|
|
||||||
|
if not stripped or _should_skip(stripped, known_names):
|
||||||
|
return token
|
||||||
|
|
||||||
|
# Only correct lowercase words (capitalized words are likely proper nouns)
|
||||||
|
if stripped[0].isupper():
|
||||||
|
return token
|
||||||
|
|
||||||
|
# Skip words that are already valid English — prevents "coherently" → "inherently"
|
||||||
|
if stripped.lower() in sys_words:
|
||||||
|
return token
|
||||||
|
|
||||||
|
corrected = speller(stripped)
|
||||||
|
|
||||||
|
# Guard: don't apply if corrected word is too different from original.
|
||||||
|
# Extra safety net for words not in the system dict but also not typos.
|
||||||
|
if corrected != stripped:
|
||||||
|
dist = _edit_distance(stripped, corrected)
|
||||||
|
max_edits = 2 if len(stripped) <= 7 else 3
|
||||||
|
if dist > max_edits:
|
||||||
|
return token
|
||||||
|
|
||||||
|
return corrected + punct
|
||||||
|
|
||||||
|
return _TOKEN_RE.sub(_fix, text)
|
||||||
|
|
||||||
|
|
||||||
|
def spellcheck_transcript_line(line: str) -> str:
|
||||||
|
"""
|
||||||
|
Spell-correct a single transcript line.
|
||||||
|
Only touches lines that start with '>' (user turns).
|
||||||
|
Assistant turns are never modified.
|
||||||
|
"""
|
||||||
|
stripped = line.lstrip()
|
||||||
|
if not stripped.startswith(">"):
|
||||||
|
return line
|
||||||
|
|
||||||
|
# '> actual message here'
|
||||||
|
prefix_len = len(line) - len(stripped) + 2 # '> '
|
||||||
|
message = line[prefix_len:]
|
||||||
|
if not message.strip():
|
||||||
|
return line
|
||||||
|
|
||||||
|
corrected = spellcheck_user_text(message)
|
||||||
|
return line[:prefix_len] + corrected
|
||||||
|
|
||||||
|
|
||||||
|
def spellcheck_transcript(content: str) -> str:
|
||||||
|
"""
|
||||||
|
Spell-correct all user turns in a full transcript.
|
||||||
|
Only lines starting with '>' are touched.
|
||||||
|
"""
|
||||||
|
lines = content.split("\n")
|
||||||
|
return "\n".join(spellcheck_transcript_line(line) for line in lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
# Quick test
|
||||||
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
test_cases = [
|
||||||
|
"lsresdy knoe the question befor",
|
||||||
|
"isn't there meny diferent benchmarks tesing questions?",
|
||||||
|
"also can you pleese spell chekc my questions befroe storing",
|
||||||
|
"it's realy hard for me to writte coherently at 3am",
|
||||||
|
"Mempalace cant be fine-tunned if you alredy kno the question",
|
||||||
|
# Should NOT change these:
|
||||||
|
"ChromaDB bge-large-en-v1.5 NDCG@10 R@5",
|
||||||
|
"Riley picked up Sam from school",
|
||||||
|
"hybrid_v4 top-k=50 longmemeval_bench.py",
|
||||||
|
]
|
||||||
|
|
||||||
|
print("Spell-check test\n" + "=" * 50)
|
||||||
|
for msg in test_cases:
|
||||||
|
result = spellcheck_user_text(msg, known_names={"riley", "sam", "mempalace"})
|
||||||
|
changed = " ← CHANGED" if result != msg else ""
|
||||||
|
print(f"\nIN: {msg}")
|
||||||
|
if result != msg:
|
||||||
|
print(f"OUT: {result}{changed}")
|
||||||
|
else:
|
||||||
|
print("OUT: (unchanged)")
|
||||||
@@ -0,0 +1,272 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
split_mega_files.py — Split concatenated transcript files into per-session files
|
||||||
|
=================================================================================
|
||||||
|
|
||||||
|
Scans a directory for .txt files that contain multiple Claude Code sessions
|
||||||
|
(identified by "Claude Code v" headers). Splits each into individual files
|
||||||
|
named with: date, time, people detected, and subject from first prompt.
|
||||||
|
|
||||||
|
Distinguishes true session starts from mid-session context restores
|
||||||
|
(which show "Ctrl+E to show X previous messages").
|
||||||
|
|
||||||
|
Output files are written to --output-dir (default: same dir as source).
|
||||||
|
Original files are renamed with .mega_backup extension (not deleted).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 split_mega_files.py # scan ~/Desktop/transcripts
|
||||||
|
python3 split_mega_files.py --source ~/Desktop/transcripts # explicit source
|
||||||
|
python3 split_mega_files.py --dry-run # show what would happen
|
||||||
|
python3 split_mega_files.py --min-sessions 2 # only files with 2+ sessions
|
||||||
|
|
||||||
|
By: Ben, 2026-03-30
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
HOME = Path.home()
|
||||||
|
LUMI_DIR = Path(os.environ.get("MEMPALACE_SOURCE_DIR", str(HOME / "Desktop/transcripts")))
|
||||||
|
|
||||||
|
# People we know about (for name detection in content)
|
||||||
|
# Loaded from ~/.mempalace/known_names.json if it exists, otherwise generic fallback.
|
||||||
|
_KNOWN_NAMES_PATH = HOME / ".mempalace" / "known_names.json"
|
||||||
|
|
||||||
|
def _load_known_people() -> list:
|
||||||
|
"""Load known names from config file, falling back to a generic list."""
|
||||||
|
if _KNOWN_NAMES_PATH.exists():
|
||||||
|
try:
|
||||||
|
data = json.loads(_KNOWN_NAMES_PATH.read_text())
|
||||||
|
if isinstance(data, list):
|
||||||
|
return data
|
||||||
|
return data.get("names", [])
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
pass
|
||||||
|
# Generic fallback — override by creating ~/.mempalace/known_names.json
|
||||||
|
return ["Alice", "Ben", "Riley", "Max", "Sam", "Devon", "Jordan"]
|
||||||
|
|
||||||
|
KNOWN_PEOPLE = _load_known_people()
|
||||||
|
|
||||||
|
|
||||||
|
def _load_username_map() -> dict:
|
||||||
|
"""Load username-to-name mapping from config file."""
|
||||||
|
if _KNOWN_NAMES_PATH.exists():
|
||||||
|
try:
|
||||||
|
data = json.loads(_KNOWN_NAMES_PATH.read_text())
|
||||||
|
if isinstance(data, dict):
|
||||||
|
return data.get("username_map", {})
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
pass
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def is_true_session_start(lines, idx):
|
||||||
|
"""
|
||||||
|
True session start: 'Claude Code v' header NOT followed by 'Ctrl+E'/'previous messages'
|
||||||
|
within the next 6 lines (those are context restores, not new sessions).
|
||||||
|
"""
|
||||||
|
nearby = "".join(lines[idx:idx + 6])
|
||||||
|
return "Ctrl+E" not in nearby and "previous messages" not in nearby
|
||||||
|
|
||||||
|
|
||||||
|
def find_session_boundaries(lines):
|
||||||
|
"""Return list of line indices where true new sessions begin."""
|
||||||
|
boundaries = []
|
||||||
|
for i, line in enumerate(lines):
|
||||||
|
if "Claude Code v" in line and is_true_session_start(lines, i):
|
||||||
|
boundaries.append(i)
|
||||||
|
return boundaries
|
||||||
|
|
||||||
|
|
||||||
|
def extract_timestamp(lines):
|
||||||
|
"""
|
||||||
|
Find the first timestamp line: ⏺ H:MM AM/PM Weekday, Month DD, YYYY
|
||||||
|
Returns (datetime_str, iso_str) or (None, None).
|
||||||
|
"""
|
||||||
|
ts_pattern = re.compile(
|
||||||
|
r"⏺\s+(\d{1,2}:\d{2}\s+[AP]M)\s+\w+,\s+(\w+)\s+(\d{1,2}),\s+(\d{4})"
|
||||||
|
)
|
||||||
|
months = {
|
||||||
|
"January": "01", "February": "02", "March": "03", "April": "04",
|
||||||
|
"May": "05", "June": "06", "July": "07", "August": "08",
|
||||||
|
"September": "09", "October": "10", "November": "11", "December": "12",
|
||||||
|
}
|
||||||
|
for line in lines[:50]:
|
||||||
|
m = ts_pattern.search(line)
|
||||||
|
if m:
|
||||||
|
time_str, month, day, year = m.groups()
|
||||||
|
mon = months.get(month, "00")
|
||||||
|
day_z = day.zfill(2)
|
||||||
|
time_safe = time_str.replace(":", "").replace(" ", "")
|
||||||
|
iso = f"{year}-{mon}-{day_z}"
|
||||||
|
human = f"{year}-{mon}-{day_z}_{time_safe}"
|
||||||
|
return human, iso
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
|
||||||
|
def extract_people(lines):
|
||||||
|
"""
|
||||||
|
Detect people mentioned as speakers or by name in first 100 lines.
|
||||||
|
Returns sorted list of detected names.
|
||||||
|
"""
|
||||||
|
found = set()
|
||||||
|
text = "".join(lines[:100])
|
||||||
|
|
||||||
|
# Speaker tags: "Alice:", "Ben:", etc.
|
||||||
|
for person in KNOWN_PEOPLE:
|
||||||
|
if re.search(rf"\b{person}\b", text, re.IGNORECASE):
|
||||||
|
found.add(person)
|
||||||
|
|
||||||
|
# Working directory username hint — map to known people if configured
|
||||||
|
dir_match = re.search(r"/Users/(\w+)/", text)
|
||||||
|
if dir_match:
|
||||||
|
username = dir_match.group(1)
|
||||||
|
# User can map usernames to names in ~/.mempalace/known_names.json
|
||||||
|
# under a "username_map" key, e.g. {"username_map": {"jdoe": "John"}}
|
||||||
|
username_map = _load_username_map()
|
||||||
|
if username in username_map:
|
||||||
|
found.add(username_map[username])
|
||||||
|
|
||||||
|
return sorted(found)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_subject(lines):
|
||||||
|
"""
|
||||||
|
Find the first meaningful user prompt (> line that isn't a shell command).
|
||||||
|
Returns cleaned, filename-safe subject string.
|
||||||
|
"""
|
||||||
|
skip_patterns = re.compile(
|
||||||
|
r"^(\.\/|cd |ls |python|bash|git |cat |source |export |claude|./activate)"
|
||||||
|
)
|
||||||
|
for line in lines:
|
||||||
|
if line.startswith("> "):
|
||||||
|
prompt = line[2:].strip()
|
||||||
|
if prompt and not skip_patterns.match(prompt) and len(prompt) > 5:
|
||||||
|
# Clean for filename
|
||||||
|
subject = re.sub(r"[^\w\s-]", "", prompt)
|
||||||
|
subject = re.sub(r"\s+", "-", subject.strip())
|
||||||
|
return subject[:60]
|
||||||
|
return "session"
|
||||||
|
|
||||||
|
|
||||||
|
def split_file(filepath, output_dir, dry_run=False):
|
||||||
|
"""
|
||||||
|
Split a single mega-file into per-session files.
|
||||||
|
Returns list of output paths written (or would be written if dry_run).
|
||||||
|
"""
|
||||||
|
path = Path(filepath)
|
||||||
|
lines = path.read_text(errors="replace").splitlines(keepends=True)
|
||||||
|
|
||||||
|
boundaries = find_session_boundaries(lines)
|
||||||
|
if len(boundaries) < 2:
|
||||||
|
return [] # Not a mega-file
|
||||||
|
|
||||||
|
# Add sentinel at end
|
||||||
|
boundaries.append(len(lines))
|
||||||
|
|
||||||
|
out_dir = Path(output_dir) if output_dir else path.parent
|
||||||
|
written = []
|
||||||
|
|
||||||
|
for i, (start, end) in enumerate(zip(boundaries, boundaries[1:])):
|
||||||
|
chunk = lines[start:end]
|
||||||
|
if len(chunk) < 10:
|
||||||
|
continue # Skip tiny fragments
|
||||||
|
|
||||||
|
ts_human, ts_iso = extract_timestamp(chunk)
|
||||||
|
people = extract_people(chunk)
|
||||||
|
subject = extract_subject(chunk)
|
||||||
|
|
||||||
|
# Build filename: SOURCESTEM__DATE_TIME_People_subject.txt
|
||||||
|
# Source stem prefix prevents collisions when multiple mega-files
|
||||||
|
# produce sessions with the same timestamp/people/subject.
|
||||||
|
ts_part = ts_human or f"part{i+1:02d}"
|
||||||
|
people_part = "-".join(people[:3]) if people else "unknown"
|
||||||
|
src_stem = re.sub(r"[^\w-]", "_", path.stem)[:40]
|
||||||
|
name = f"{src_stem}__{ts_part}_{people_part}_{subject}.txt"
|
||||||
|
# Sanitize
|
||||||
|
name = re.sub(r"[^\w\.\-]", "_", name)
|
||||||
|
name = re.sub(r"_+", "_", name)
|
||||||
|
|
||||||
|
out_path = out_dir / name
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
print(f" [{i+1}/{len(boundaries)-1}] {name} ({len(chunk)} lines)")
|
||||||
|
else:
|
||||||
|
out_path.write_text("".join(chunk))
|
||||||
|
print(f" ✓ {name} ({len(chunk)} lines)")
|
||||||
|
|
||||||
|
written.append(out_path)
|
||||||
|
|
||||||
|
return written
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Split concatenated transcript mega-files into per-session files"
|
||||||
|
)
|
||||||
|
parser.add_argument("--source", type=str, default=None,
|
||||||
|
help="Source directory (default: MEMPALACE_SOURCE_DIR or ~/Desktop/transcripts)")
|
||||||
|
parser.add_argument("--output-dir", type=str, default=None,
|
||||||
|
help="Output directory (default: same as source)")
|
||||||
|
parser.add_argument("--min-sessions", type=int, default=2,
|
||||||
|
help="Only split files with at least N sessions (default: 2)")
|
||||||
|
parser.add_argument("--dry-run", action="store_true",
|
||||||
|
help="Show what would happen without writing files")
|
||||||
|
parser.add_argument("--file", type=str, default=None,
|
||||||
|
help="Split a single specific file instead of scanning dir")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
src_dir = Path(args.source) if args.source else LUMI_DIR
|
||||||
|
output_dir = args.output_dir or None # None = same dir as file
|
||||||
|
|
||||||
|
if args.file:
|
||||||
|
files = [Path(args.file)]
|
||||||
|
else:
|
||||||
|
files = sorted(src_dir.glob("*.txt"))
|
||||||
|
|
||||||
|
mega_files = []
|
||||||
|
for f in files:
|
||||||
|
lines = f.read_text(errors="replace").splitlines(keepends=True)
|
||||||
|
boundaries = find_session_boundaries(lines)
|
||||||
|
if len(boundaries) >= args.min_sessions:
|
||||||
|
mega_files.append((f, len(boundaries)))
|
||||||
|
|
||||||
|
if not mega_files:
|
||||||
|
print(f"No mega-files found in {src_dir} (min {args.min_sessions} sessions).")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print(f" Mega-file splitter — {'DRY RUN' if args.dry_run else 'SPLITTING'}")
|
||||||
|
print(f"{'='*60}")
|
||||||
|
print(f" Source: {src_dir}")
|
||||||
|
print(f" Output: {output_dir or 'same dir as source'}")
|
||||||
|
print(f" Mega-files: {len(mega_files)}")
|
||||||
|
print(f"{'─'*60}\n")
|
||||||
|
|
||||||
|
total_written = 0
|
||||||
|
for f, n_sessions in mega_files:
|
||||||
|
print(f" {f.name} ({n_sessions} sessions, {f.stat().st_size // 1024}KB)")
|
||||||
|
written = split_file(f, output_dir, dry_run=args.dry_run)
|
||||||
|
total_written += len(written)
|
||||||
|
|
||||||
|
if not args.dry_run and written:
|
||||||
|
backup = f.with_suffix(".mega_backup")
|
||||||
|
f.rename(backup)
|
||||||
|
print(f" → Original renamed to {backup.name}\n")
|
||||||
|
else:
|
||||||
|
print()
|
||||||
|
|
||||||
|
print(f"{'─'*60}")
|
||||||
|
if args.dry_run:
|
||||||
|
print(f" DRY RUN — would create {total_written} files from {len(mega_files)} mega-files")
|
||||||
|
else:
|
||||||
|
print(f" Done — created {total_written} files from {len(mega_files)} mega-files")
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -0,0 +1,62 @@
|
|||||||
|
[build-system]
|
||||||
|
requires = ["setuptools>=64"]
|
||||||
|
build-backend = "setuptools.build_meta"
|
||||||
|
|
||||||
|
[project]
|
||||||
|
name = "mempalace"
|
||||||
|
version = "3.0.0"
|
||||||
|
description = "Give your AI a memory — mine projects and conversations into a searchable palace. No API key required."
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.9"
|
||||||
|
license = "MIT"
|
||||||
|
authors = [
|
||||||
|
{name = "milla-jovovich"},
|
||||||
|
]
|
||||||
|
keywords = [
|
||||||
|
"ai", "memory", "llm", "rag", "chromadb", "mcp",
|
||||||
|
"vector-database", "claude", "chatgpt", "embeddings",
|
||||||
|
]
|
||||||
|
classifiers = [
|
||||||
|
"Development Status :: 4 - Beta",
|
||||||
|
"Environment :: Console",
|
||||||
|
"Intended Audience :: Developers",
|
||||||
|
"Programming Language :: Python :: 3",
|
||||||
|
"Programming Language :: Python :: 3.9",
|
||||||
|
"Programming Language :: Python :: 3.10",
|
||||||
|
"Programming Language :: Python :: 3.11",
|
||||||
|
"Programming Language :: Python :: 3.12",
|
||||||
|
"Topic :: Scientific/Engineering :: Artificial Intelligence",
|
||||||
|
"Topic :: Utilities",
|
||||||
|
]
|
||||||
|
dependencies = [
|
||||||
|
"chromadb>=0.4.0",
|
||||||
|
"pyyaml>=6.0",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.urls]
|
||||||
|
Homepage = "https://github.com/milla-jovovich/mempalace"
|
||||||
|
Repository = "https://github.com/milla-jovovich/mempalace"
|
||||||
|
"Bug Tracker" = "https://github.com/milla-jovovich/mempalace/issues"
|
||||||
|
|
||||||
|
[tool.setuptools.packages.find]
|
||||||
|
include = ["mempalace*"]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
mempalace = "mempalace:main"
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
dev = ["pytest>=7.0", "build>=1.0", "twine>=4.0"]
|
||||||
|
|
||||||
|
[tool.ruff]
|
||||||
|
line-length = 100
|
||||||
|
target-version = "py39"
|
||||||
|
|
||||||
|
[tool.ruff.lint]
|
||||||
|
select = ["E", "F", "W"]
|
||||||
|
ignore = ["E501"]
|
||||||
|
|
||||||
|
[tool.ruff.format]
|
||||||
|
quote-style = "double"
|
||||||
|
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
testpaths = ["tests"]
|
||||||
@@ -0,0 +1,2 @@
|
|||||||
|
chromadb>=0.4.0
|
||||||
|
pyyaml>=6.0
|
||||||
Reference in New Issue
Block a user