MemPalace: palace architecture, AAAK compression, knowledge graph

The memory system: - Palace structure: Wings (people/projects) → Rooms (topics) → Closets (AAAK compressed) → Drawers (verbatim transcripts) - Halls connect related rooms within a wing - Tunnels cross-reference rooms across wings - AAAK: 30x lossless compression dialect for AI agents - Knowledge graph: temporal entity-relationship triples (SQLite) - Palace graph: room-based navigation with tunnel detection - MCP server: 19 tools — search, graph traversal, agent diary, AAAK auto-teach - Onboarding: guided setup generates wing config + AAAK entity registry - Contradiction detection: catches wrong pronouns, names, ages - Auto-save hooks for Claude Code 96.6% Recall@5 on LongMemEval — highest zero-API score published. 100% with optional Haiku rerank (500/500). Local. Free. No API key required.
2026-04-04 18:16:04 -07:00
commit 068dbd9a7b
39 changed files with 9210 additions and 0 deletions
@@ -0,0 +1,20 @@
 ---
 name: Bug Report
 about: Something isn't working
 labels: bug
 ---
 **What happened?**
 **What did you expect?**
 **How to reproduce:**
 1.
 2.
 3.
 **Environment:**
 - OS:
 - Python version:
 - MemPal version: (check `python mempal.py --version` or git SHA)
@@ -0,0 +1,11 @@
 ---
 name: Feature Request
 about: Suggest an improvement
 labels: enhancement
 ---
 **What problem does this solve?**
 **What's the proposed solution?**
 **Alternatives considered:**
@@ -0,0 +1,8 @@
 ## What does this PR do?
 ## How to test
 ## Checklist
 - [ ] Tests pass (`python -m pytest tests/ -v`)
 - [ ] No hardcoded paths
 - [ ] Linter passes (`ruff check .`)
@@ -0,0 +1,32 @@
 name: Tests
 on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
 jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.9", "3.11", "3.13"]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - run: pip install -r requirements.txt pytest
      - run: python -m pytest tests/ -v
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install ruff
      - run: ruff check .
      - run: ruff format --check .
@@ -0,0 +1,7 @@
 *.egg-info/
 dist/
 build/
 __pycache__/
 *.pyc
 .pytest_cache/
 mempal.yaml
@@ -0,0 +1,7 @@
 repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.9.0
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format
@@ -0,0 +1,92 @@
 # Contributing to MemPalace
 Thanks for wanting to help. MemPalace is open source and we welcome contributions of all sizes — from typo fixes to new features.
 ## Getting Started
 ```bash
 git clone https://github.com/milla-jovovich/mempalace.git
 cd mempalace
 pip install -e ".[dev]"    # installs with dev dependencies (pytest, build, twine)
 ```
 ## Running Tests
 ```bash
 pytest tests/ -v
 ```
 All tests must pass before submitting a PR. Tests should run without API keys or network access.
 ## Running Benchmarks
 ```bash
 # Quick test (20 questions, ~30 seconds)
 python benchmarks/longmemeval_bench.py /path/to/longmemeval_s_cleaned.json --limit 20
 # Full benchmark (500 questions, ~5 minutes)
 python benchmarks/longmemeval_bench.py /path/to/longmemeval_s_cleaned.json
 ```
 See [benchmarks/README.md](benchmarks/README.md) for data download instructions and reproduction guide.
 ## Project Structure
 ```
 mempalace/          ← core package (see mempalace/README.md for module guide)
 benchmarks/         ← reproducible benchmark runners
 hooks/              ← Claude Code auto-save hooks
 examples/           ← usage examples
 tests/              ← test suite
 assets/             ← logo + brand
 ```
 ## PR Guidelines
 1. Fork the repo and create a feature branch: `git checkout -b feat/my-thing`
 2. Write your code
 3. Add or update tests if applicable
 4. Run `pytest tests/ -v` — everything must pass
 5. Commit with a clear message following [conventional commits](https://www.conventionalcommits.org/):
   - `feat: add Notion export format`
   - `fix: handle empty transcript files`
   - `docs: update MCP tool descriptions`
   - `bench: add LoCoMo turn-level metrics`
 6. Push to your fork and open a PR against `main`
 ## Code Style
 - **Formatting**: [Ruff](https://docs.astral.sh/ruff/) with 100-char line limit (configured in `pyproject.toml`)
 - **Naming**: `snake_case` for functions/variables, `PascalCase` for classes
 - **Docstrings**: on all modules and public functions
 - **Type hints**: where they improve readability
 - **Dependencies**: minimize. ChromaDB + PyYAML only. Don't add new deps without discussion.
 ## Good First Issues
 Check the [Issues](https://github.com/milla-jovovich/mempalace/issues) tab. Great starting points:
 - **New chat formats**: Add import support for Cursor, Copilot, or other AI tool exports
 - **Room detection**: Improve pattern matching in `room_detector_local.py`
 - **Tests**: Increase coverage — especially for `knowledge_graph.py` and `palace_graph.py`
 - **Entity detection**: Better name disambiguation in `entity_detector.py`
 - **Docs**: Improve examples, add tutorials
 ## Architecture Decisions
 If you're planning a significant change, open an issue first to discuss the approach. Key principles:
 - **Verbatim first**: Never summarize user content. Store exact words.
 - **Local first**: Everything runs on the user's machine. No cloud dependencies.
 - **Zero API by default**: Core features must work without any API key.
 - **Palace structure matters**: Wings, halls, and rooms aren't cosmetic — they drive a 34% retrieval improvement. Respect the hierarchy.
 ## Community
 - **Discord**: [Join us](https://discord.com/invite/ycTQQCu6kn)
 - **Issues**: Bug reports and feature requests welcome
 - **Discussions**: For questions and ideas
 ## License
 MIT — your contributions will be released under the same license.
@@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2026 MemPalace Contributors
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
@@ -0,0 +1,584 @@
 <div align="center">
 <img src="assets/mempalace_logo.png" alt="MemPalace" width="280">
 # MemPalace
 ### The highest-scoring AI memory system ever benchmarked. And it's free.
 <br>
 Every conversation you have with an AI — every decision, every debugging session, every architecture debate — disappears when the session ends. Six months of work, gone. You start over every time.
 Other memory systems try to fix this by letting AI decide what's worth remembering. It extracts "user prefers Postgres" and throws away the conversation where you explained *why*. MemPalace takes a different approach: **store everything, then make it findable.**
 **The Palace** — Ancient Greek orators memorized entire speeches by placing ideas in rooms of an imaginary building. Walk through the building, find the idea. MemPalace applies the same principle to AI memory: your conversations are organized into wings (people and projects), halls (types of memory), and rooms (specific ideas). No AI decides what matters — you keep every word, and the structure makes it searchable. That structure alone improves retrieval by 34%.
 **AAAK** — To make all that data usable, MemPalace compresses it with AAAK — a lossless shorthand dialect designed for AI agents. Not meant to be read by humans — meant to be read by your AI, fast. 30x compression, zero information loss. Your AI loads months of context in ~120 tokens. Nothing else like it exists.
 **Local, open, adaptable** — MemPalace runs entirely on your machine, on any data you have locally, without using any external API or services. It has been tested on conversations — but it can be adapted for different types of datastores. This is why we're open-sourcing it.
 <br>
 [![][version-shield]][release-link]
 [![][python-shield]][python-link]
 [![][license-shield]][license-link]
 [![][discord-shield]][discord-link]
 <br>
 [Quick Start](#quick-start) · [The Palace](#the-palace) · [AAAK Dialect](#aaak-compression) · [Benchmarks](#benchmarks) · [MCP Tools](#mcp-server)
 <br>
 ### Highest LongMemEval score ever published — free or paid.
 <table>
 <tr>
 <td align="center"><strong>96.6%</strong><br><sub>LongMemEval R@5<br>Zero API calls</sub></td>
 <td align="center"><strong>100%</strong><br><sub>LongMemEval R@5<br>with Haiku rerank</sub></td>
 <td align="center"><strong>+34%</strong><br><sub>Retrieval boost<br>from palace structure</sub></td>
 <td align="center"><strong>$0</strong><br><sub>No subscription<br>No cloud. Local only.</sub></td>
 </tr>
 </table>
 <sub>Reproducible — runners in <a href="benchmarks/">benchmarks/</a>. <a href="benchmarks/BENCHMARKS.md">Full results</a>.</sub>
 </div>
 ---
 ## Quick Start
 ```bash
 pip install mempalace
 # Set up your world — who you work with, what your projects are
 mempalace init ~/projects/myapp
 # Mine your data
 mempalace mine ~/projects/myapp                    # projects — code, docs, notes
 mempalace mine ~/chats/ --mode convos              # convos — Claude, ChatGPT, Slack exports
 mempalace mine ~/chats/ --mode convos --extract general  # general — classifies into decisions, milestones, problems
 # Search anything you've ever discussed
 mempalace search "why did we switch to GraphQL"
 # Your AI remembers
 mempalace status
 ```
 Three mining modes: **projects** (code and docs), **convos** (conversation exports), and **general** (auto-classifies into decisions, preferences, milestones, problems, and emotional context). Everything stays on your machine.
 ---
 ## The Problem
 Decisions happen in conversations now. Not in docs. Not in Jira. In conversations with Claude, ChatGPT, Copilot. The reasoning, the tradeoffs, the "we tried X and it failed because Y" — all trapped in chat windows that evaporate when the session ends.
 **Six months of daily AI use = 19.5 million tokens.** That's every decision, every debugging session, every architecture debate. Gone.
 | Approach | Tokens loaded | Annual cost |
 |----------|--------------|-------------|
 | Paste everything | 19.5M — doesn't fit any context window | Impossible |
 | LLM summaries | ~650K | ~$507/yr |
 | **MemPalace wake-up** | **~170 tokens** | **~$0.70/yr** |
 | **MemPalace + 5 searches** | **~13,500 tokens** | **~$10/yr** |
 MemPalace loads 170 tokens of critical facts on wake-up — your team, your projects, your preferences. Then searches only when needed. $10/year to remember everything vs $507/year for summaries that lose context.
 ---
 ## How It Works
 ### The Palace
 ```
  ┌─────────────────────────────────────────────────────────────┐
  │  WING: Person                                              │
  │                                                            │
  │    ┌──────────┐  ──hall──  ┌──────────┐                    │
  │    │  Room A  │            │  Room B  │                    │
  │    └────┬─────┘            └──────────┘                    │
  │         │                                                  │
  │         ▼                                                  │
  │    ┌──────────┐      ┌──────────┐                          │
  │    │  Closet  │ ───▶ │  Drawer  │                          │
  │    └──────────┘      └──────────┘                          │
  └─────────┼──────────────────────────────────────────────────┘
            │
          tunnel
            │
  ┌─────────┼──────────────────────────────────────────────────┐
  │  WING: Project                                             │
  │         │                                                  │
  │    ┌────┴─────┐  ──hall──  ┌──────────┐                    │
  │    │  Room A  │            │  Room C  │                    │
  │    └────┬─────┘            └──────────┘                    │
  │         │                                                  │
  │         ▼                                                  │
  │    ┌──────────┐      ┌──────────┐                          │
  │    │  Closet  │ ───▶ │  Drawer  │                          │
  │    └──────────┘      └──────────┘                          │
  └─────────────────────────────────────────────────────────────┘
 ```
 **Wings** — a person or project. As many as you need.
 **Rooms** — specific topics within a wing. Auth, billing, deploy — endless rooms.
 **Halls** — connections between related rooms *within* the same wing. If Room A (auth) and Room B (security) are related, a hall links them.
 **Tunnels** — connections *between* wings. When Person A and a Project both have a room about "auth," a tunnel cross-references them automatically.
 **Closets** — compressed memories stored in AAAK. Fast for AI to read.
 **Drawers** — the original verbatim transcripts. The exact words, never summarized.
 **Halls** are memory types — the same in every wing, acting as corridors:
 - `hall_facts` — decisions made, choices locked in
 - `hall_events` — sessions, milestones, debugging
 - `hall_discoveries` — breakthroughs, new insights
 - `hall_preferences` — habits, likes, opinions
 - `hall_advice` — recommendations and solutions
 **Rooms** are named ideas — `auth-migration`, `graphql-switch`, `ci-pipeline`. When the same room appears in different wings, it creates a **tunnel** — connecting the same topic across domains:
 ```
 wing_kai       / hall_events / auth-migration  → "Kai debugged the OAuth token refresh"
 wing_driftwood / hall_facts  / auth-migration  → "team decided to migrate auth to Clerk"
 wing_priya     / hall_advice / auth-migration  → "Priya approved Clerk over Auth0"
 ```
 Same room. Three wings. The tunnel connects them.
 ### Why Structure Matters
 Tested on 22,000+ real conversation memories:
 ```
 Search all closets:          60.9%  R@10
 Search within wing:          73.1%  (+12%)
 Search wing + hall:          84.8%  (+24%)
 Search wing + room:          94.8%  (+34%)
 ```
 Wings and rooms aren't cosmetic. They're a **34% retrieval improvement**. The palace structure is the product.
 ### The Memory Stack
 | Layer | What | Size | When |
 |-------|------|------|------|
 | **L0** | Identity — who is this AI? | ~50 tokens | Always loaded |
 | **L1** | Critical facts — team, projects, preferences | ~120 tokens (AAAK) | Always loaded |
 | **L2** | Room recall — recent sessions, current project | On demand | When topic comes up |
 | **L3** | Deep search — semantic query across all closets | On demand | When explicitly asked |
 Your AI wakes up with L0 + L1 (~170 tokens) and knows your world. Searches only fire when needed.
 ### AAAK Compression
 AAAK is a lossless dialect — 30x compression, readable by any LLM without a decoder.
 **English (~1000 tokens):**
 ```
 Priya manages the Driftwood team: Kai (backend, 3 years), Soren (frontend),
 Maya (infrastructure), and Leo (junior, started last month). They're building
 a SaaS analytics platform. Current sprint: auth migration to Clerk.
 Kai recommended Clerk over Auth0 based on pricing and DX.
 ```
 **AAAK (~120 tokens):**
 ```
 TEAM: PRI(lead) | KAI(backend,3yr) SOR(frontend) MAY(infra) LEO(junior,new)
 PROJ: DRIFTWOOD(saas.analytics) | SPRINT: auth.migration→clerk
 DECISION: KAI.rec:clerk>auth0(pricing+dx) | ★★★★
 ```
 Same information. 8x fewer tokens. Your AI learns AAAK automatically from the MCP server — no manual setup.
 ### Contradiction Detection
 MemPalace catches mistakes before they reach you:
 ```
 Input:  "Soren finished the auth migration"
 Output: 🔴 AUTH-MIGRATION: attribution conflict — Maya was assigned, not Soren
 Input:  "Kai has been here 2 years"
 Output: 🟡 KAI: wrong_tenure — records show 3 years (started 2023-04)
 Input:  "The sprint ends Friday"
 Output: 🟡 SPRINT: stale_date — current sprint ends Thursday (updated 2 days ago)
 ```
 Facts checked against the knowledge graph. Ages, dates, and tenures calculated dynamically — not hardcoded.
 ---
 ## Real-World Examples
 ### Solo developer across multiple projects
 ```bash
 # Mine each project's conversations
 mempalace mine ~/chats/orion/  --mode convos --wing orion
 mempalace mine ~/chats/nova/   --mode convos --wing nova
 mempalace mine ~/chats/helios/ --mode convos --wing helios
 # Six months later: "why did I use Postgres here?"
 mempalace search "database decision" --wing orion
 # → "Chose Postgres over SQLite because Orion needs concurrent writes
 #    and the dataset will exceed 10GB. Decided 2025-11-03."
 # Cross-project search
 mempalace search "rate limiting approach"
 # → finds your approach in Orion AND Nova, shows the differences
 ```
 ### Team lead managing a product
 ```bash
 # Mine Slack exports and AI conversations
 mempalace mine ~/exports/slack/ --mode convos --wing driftwood
 mempalace mine ~/.claude/projects/ --mode convos
 # "What did Soren work on last sprint?"
 mempalace search "Soren sprint" --wing driftwood
 # → 14 closets: OAuth refactor, dark mode, component library migration
 # "Who decided to use Clerk?"
 mempalace search "Clerk decision" --wing driftwood
 # → "Kai recommended Clerk over Auth0 — pricing + developer experience.
 #    Team agreed 2026-01-15. Maya handling the migration."
 ```
 ### Before mining: split mega-files
 Some transcript exports concatenate multiple sessions into one huge file:
 ```bash
 mempalace split ~/chats/                      # split into per-session files
 mempalace split ~/chats/ --dry-run            # preview first
 mempalace split ~/chats/ --min-sessions 3     # only split files with 3+ sessions
 ```
 ---
 ## Knowledge Graph
 Temporal entity-relationship triples — like Zep's Graphiti, but SQLite instead of Neo4j. Local and free.
 ```python
 from mempalace.knowledge_graph import KnowledgeGraph
 kg = KnowledgeGraph()
 kg.add_triple("Kai", "works_on", "Orion", valid_from="2025-06-01")
 kg.add_triple("Maya", "assigned_to", "auth-migration", valid_from="2026-01-15")
 kg.add_triple("Maya", "completed", "auth-migration", valid_from="2026-02-01")
 # What's Kai working on?
 kg.query_entity("Kai")
 # → [Kai → works_on → Orion (current), Kai → recommended → Clerk (2026-01)]
 # What was true in January?
 kg.query_entity("Maya", as_of="2026-01-20")
 # → [Maya → assigned_to → auth-migration (active)]
 # Timeline
 kg.timeline("Orion")
 # → chronological story of the project
 ```
 Facts have validity windows. When something stops being true, invalidate it:
 ```python
 kg.invalidate("Kai", "works_on", "Orion", ended="2026-03-01")
 ```
 Now queries for Kai's current work won't return Orion. Historical queries still will.
 | Feature | MemPalace | Zep (Graphiti) |
 |---------|-----------|----------------|
 | Storage | SQLite (local) | Neo4j (cloud) |
 | Cost | Free | $25/mo+ |
 | Temporal validity | Yes | Yes |
 | Self-hosted | Always | Enterprise only |
 | Privacy | Everything local | SOC 2, HIPAA |
 ---
 ## Agent Diary
 Every AI agent gets a personal journal — written in AAAK, persists across sessions.
 ```
 mempalace_diary_write("Kai-assistant",
    "SESSION:2026-04-04|debugged.orion.timeout|root.cause:connection.pool.exhaustion|fix:pgbouncer|★★★")
 mempalace_diary_read("Kai-assistant", last_n=5)
 # → last 5 diary entries from this agent, compressed in AAAK
 ```
 Not a shared scratchpad — a personal journal with history. Each agent records what it worked on, what it learned, what matters. The next session reads the diary and picks up where it left off.
 Letta charges $20–200/mo for agent-managed memory. MemPalace does it with a wing.
 ---
 ## MCP Server
 ```bash
 claude mcp add mempalace -- python -m mempalace.mcp_server
 ```
 ### 19 Tools
 **Palace (read)**
 | Tool | What |
 |------|------|
 | `mempalace_status` | Palace overview + AAAK spec + memory protocol |
 | `mempalace_list_wings` | Wings with counts |
 | `mempalace_list_rooms` | Rooms within a wing |
 | `mempalace_get_taxonomy` | Full wing → room → count tree |
 | `mempalace_search` | Semantic search with wing/room filters |
 | `mempalace_check_duplicate` | Check before filing |
 | `mempalace_get_aaak_spec` | AAAK dialect reference |
 **Palace (write)**
 | Tool | What |
 |------|------|
 | `mempalace_add_drawer` | File verbatim content |
 | `mempalace_delete_drawer` | Remove by ID |
 **Knowledge Graph**
 | Tool | What |
 |------|------|
 | `mempalace_kg_query` | Entity relationships with time filtering |
 | `mempalace_kg_add` | Add facts |
 | `mempalace_kg_invalidate` | Mark facts as ended |
 | `mempalace_kg_timeline` | Chronological entity story |
 | `mempalace_kg_stats` | Graph overview |
 **Navigation**
 | Tool | What |
 |------|------|
 | `mempalace_traverse` | Walk the graph from a room across wings |
 | `mempalace_find_tunnels` | Find rooms bridging two wings |
 | `mempalace_graph_stats` | Graph connectivity overview |
 **Agent Diary**
 | Tool | What |
 |------|------|
 | `mempalace_diary_write` | Write AAAK diary entry |
 | `mempalace_diary_read` | Read recent diary entries |
 The AI learns AAAK and the memory protocol automatically from the `mempalace_status` response. No manual configuration.
 ---
 ## Auto-Save Hooks
 Two hooks for Claude Code that automatically save memories during work:
 **Save Hook** — every 15 messages, triggers a structured save. Topics, decisions, quotes, code changes. Also regenerates the critical facts layer.
 **PreCompact Hook** — fires before context compression. Emergency save before the window shrinks.
 ```json
 {
  "hooks": {
    "Stop": [{"matcher": "", "hooks": [{"type": "command", "command": "/path/to/mempalace/hooks/mempal_save_hook.sh"}]}],
    "PreCompact": [{"matcher": "", "hooks": [{"type": "command", "command": "/path/to/mempalace/hooks/mempal_precompact_hook.sh"}]}]
  }
 }
 ```
 ---
 ## Benchmarks
 Tested on standard academic benchmarks — reproducible, published datasets.
 | Benchmark | Mode | Score | API Calls |
 |-----------|------|-------|-----------|
 | **LongMemEval R@5** | Raw (ChromaDB only) | **96.6%** | Zero |
 | **LongMemEval R@5** | Hybrid + Haiku rerank | **100%** (500/500) | ~500 |
 | **LoCoMo R@10** | Raw, session level | **60.3%** | Zero |
 | **Personal palace R@10** | Heuristic bench | **85%** | Zero |
 | **Palace structure impact** | Wing+room filtering | **+34%** R@10 | Zero |
 The 96.6% raw score is the highest published LongMemEval result requiring no API key, no cloud, and no LLM at any stage.
 ### vs Published Systems
 | System | LongMemEval R@5 | API Required | Cost |
 |--------|----------------|--------------|------|
 | **MemPalace (hybrid)** | **100%** | Optional | Free |
 | Supermemory ASMR | ~99% | Yes | — |
 | **MemPalace (raw)** | **96.6%** | **None** | **Free** |
 | Mastra | 94.87% | Yes (GPT) | API costs |
 | Mem0 | ~85% | Yes | $19–249/mo |
 | Zep | ~85% | Yes | $25/mo+ |
 ---
 ## All Commands
 ```bash
 # Setup
 mempalace init <dir>                              # guided onboarding + AAAK bootstrap
 # Mining
 mempalace mine <dir>                              # mine project files
 mempalace mine <dir> --mode convos                # mine conversation exports
 mempalace mine <dir> --mode convos --wing myapp   # tag with a wing name
 # Splitting
 mempalace split <dir>                             # split concatenated transcripts
 mempalace split <dir> --dry-run                   # preview
 # Search
 mempalace search "query"                          # search everything
 mempalace search "query" --wing myapp             # within a wing
 mempalace search "query" --room auth-migration    # within a room
 # Memory stack
 mempalace wake-up                                 # load L0 + L1 context
 mempalace wake-up --wing driftwood                # project-specific
 # Compression
 mempalace compress --wing myapp                   # AAAK compress
 # Status
 mempalace status                                  # palace overview
 ```
 All commands accept `--palace <path>` to override the default location.
 ---
 ## Configuration
 ### Global (`~/.mempalace/config.json`)
 ```json
 {
  "palace_path": "/custom/path/to/palace",
  "collection_name": "mempalace_drawers",
  "people_map": {"Kai": "KAI", "Priya": "PRI"}
 }
 ```
 ### Wing config (`~/.mempalace/wing_config.json`)
 Generated by `mempalace init`. Maps your people and projects to wings:
 ```json
 {
  "default_wing": "wing_general",
  "wings": {
    "wing_kai": {"type": "person", "keywords": ["kai", "kai's"]},
    "wing_driftwood": {"type": "project", "keywords": ["driftwood", "analytics", "saas"]}
  }
 }
 ```
 ### Identity (`~/.mempalace/identity.txt`)
 Plain text. Becomes Layer 0 — loaded every session.
 ---
 ## File Reference
 | File | What |
 |------|------|
 | `cli.py` | CLI entry point |
 | `config.py` | Configuration loading and defaults |
 | `normalize.py` | Converts 5 chat formats to standard transcript |
 | `mcp_server.py` | MCP server — 19 tools, AAAK auto-teach, memory protocol |
 | `miner.py` | Project file ingest |
 | `convo_miner.py` | Conversation ingest — chunks by exchange pair |
 | `searcher.py` | Semantic search via ChromaDB |
 | `layers.py` | 4-layer memory stack |
 | `dialect.py` | AAAK compression — 30x lossless |
 | `knowledge_graph.py` | Temporal entity-relationship graph (SQLite) |
 | `palace_graph.py` | Room-based navigation graph |
 | `onboarding.py` | Guided setup — generates AAAK bootstrap + wing config |
 | `entity_registry.py` | Entity code registry |
 | `entity_detector.py` | Auto-detect people and projects from content |
 | `split_mega_files.py` | Split concatenated transcripts into per-session files |
 | `hooks/mempal_save_hook.sh` | Auto-save every N messages |
 | `hooks/mempal_precompact_hook.sh` | Emergency save before compaction |
 ---
 ## Project Structure
 ```
 mempalace/
 ├── README.md                  ← you are here
 ├── mempalace/                 ← core package (README)
 │   ├── cli.py                 ← CLI entry point
 │   ├── mcp_server.py          ← MCP server (19 tools)
 │   ├── knowledge_graph.py     ← temporal entity graph
 │   ├── palace_graph.py        ← room navigation graph
 │   ├── dialect.py             ← AAAK compression
 │   ├── miner.py               ← project file ingest
 │   ├── convo_miner.py         ← conversation ingest
 │   ├── searcher.py            ← semantic search
 │   ├── onboarding.py          ← guided setup
 │   └── ...                    ← see mempalace/README.md
 ├── benchmarks/                ← reproducible benchmark runners
 │   ├── README.md              ← reproduction guide
 │   ├── BENCHMARKS.md          ← full results + methodology
 │   ├── longmemeval_bench.py   ← LongMemEval runner
 │   ├── locomo_bench.py        ← LoCoMo runner
 │   └── membench_bench.py      ← MemBench runner
 ├── hooks/                     ← Claude Code auto-save hooks
 │   ├── README.md              ← hook setup guide
 │   ├── mempal_save_hook.sh    ← save every N messages
 │   └── mempal_precompact_hook.sh ← emergency save
 ├── examples/                  ← usage examples
 │   ├── basic_mining.py
 │   ├── convo_import.py
 │   └── mcp_setup.md
 ├── tests/                     ← test suite (README)
 ├── assets/                    ← logo + brand assets
 └── pyproject.toml             ← package config (v3.0.0)
 ```
 ---
 ## Requirements
 - Python 3.9+
 - `chromadb>=0.4.0`
 - `pyyaml>=6.0`
 No API key. No internet after install. Everything local.
 ```bash
 pip install mempalace
 ```
 ---
 ## Contributing
 PRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for setup and guidelines.
 ## License
 MIT — see [LICENSE](LICENSE).
 <!-- Link Definitions -->
 [version-shield]: https://img.shields.io/badge/version-3.0.0-4dc9f6?style=flat-square&labelColor=0a0e14
 [release-link]: https://github.com/milla-jovovich/mempalace/releases
 [python-shield]: https://img.shields.io/badge/python-3.9+-7dd8f8?style=flat-square&labelColor=0a0e14&logo=python&logoColor=7dd8f8
 [python-link]: https://www.python.org/
 [license-shield]: https://img.shields.io/badge/license-MIT-b0e8ff?style=flat-square&labelColor=0a0e14
 [license-link]: https://github.com/milla-jovovich/mempalace/blob/main/LICENSE
 [discord-shield]: https://img.shields.io/badge/discord-join-5865F2?style=flat-square&labelColor=0a0e14&logo=discord&logoColor=5865F2
 [discord-link]: https://discord.com/invite/ycTQQCu6kn
@@ -0,0 +1,12 @@
 #!/usr/bin/env python3
 """Example: mine a project folder into the palace."""
 import sys
 project_dir = sys.argv[1] if len(sys.argv) > 1 else "~/projects/my_app"
 print("Step 1: Initialize rooms from folder structure")
 print(f"  mempalace init {project_dir}")
 print("\nStep 2: Mine everything")
 print(f"  mempalace mine {project_dir}")
 print("\nStep 3: Search")
 print("  mempalace search 'why did we choose this approach'")
@@ -0,0 +1,11 @@
 #!/usr/bin/env python3
 """Example: import Claude Code / ChatGPT conversations."""
 print("Import Claude Code sessions:")
 print("  mempalace mine ~/claude-sessions/ --mode convos --wing my_project")
 print()
 print("Import ChatGPT exports:")
 print("  mempalace mine ~/chatgpt-exports/ --mode convos")
 print()
 print("Use general extractor for richer extraction:")
 print("  mempalace mine ~/chats/ --mode convos --extract general")
@@ -0,0 +1,25 @@
 # MCP Integration — Claude Code
 ## Setup
 Run the MCP server:
 ```bash
 python mcp_server.py
 ```
 Or add to Claude Code:
 ```bash
 claude mcp add mempal -- python /path/to/mempalace/mcp_server.py
 ```
 ## Available Tools
 - **mempal_status** — palace stats (wings, rooms, drawer counts)
 - **mempal_search** — semantic search across all memories
 - **mempal_list_wings** — list all projects in the palace
 ## Usage in Claude Code
 Once configured, Claude Code can search your memories directly during conversations.
@@ -0,0 +1,138 @@
 # MemPalace Hooks — Auto-Save for Terminal AI Tools
 These hook scripts make MemPalace save automatically. No manual "save" commands needed.
 ## What They Do
 | Hook | When It Fires | What Happens |
 |------|--------------|-------------|
 | **Save Hook** | Every 15 human messages | Blocks the AI, tells it to save key topics/decisions/quotes to the palace |
 | **PreCompact Hook** | Right before context compaction | Emergency save — forces the AI to save EVERYTHING before losing context |
 The AI does the actual filing — it knows the conversation context, so it classifies memories into the right wings/halls/closets. The hooks just tell it WHEN to save.
 ## Install — Claude Code
 Add to `.claude/settings.local.json`:
 ```json
 {
  "hooks": {
    "Stop": [{
      "matcher": "*",
      "hooks": [{
        "type": "command",
        "command": "/absolute/path/to/hooks/mempal_save_hook.sh",
        "timeout": 30
      }]
    }],
    "PreCompact": [{
      "hooks": [{
        "type": "command",
        "command": "/absolute/path/to/hooks/mempal_precompact_hook.sh",
        "timeout": 30
      }]
    }]
  }
 }
 ```
 Make them executable:
 ```bash
 chmod +x hooks/mempal_save_hook.sh hooks/mempal_precompact_hook.sh
 ```
 ## Install — Codex CLI (OpenAI)
 Add to `.codex/hooks.json`:
 ```json
 {
  "Stop": [{
    "type": "command",
    "command": "/absolute/path/to/hooks/mempal_save_hook.sh",
    "timeout": 30
  }],
  "PreCompact": [{
    "type": "command",
    "command": "/absolute/path/to/hooks/mempal_precompact_hook.sh",
    "timeout": 30
  }]
 }
 ```
 ## Configuration
 Edit `mempal_save_hook.sh` to change:
 - **`SAVE_INTERVAL=15`** — How many human messages between saves. Lower = more frequent saves, higher = less interruption.
 - **`STATE_DIR`** — Where hook state is stored (defaults to `~/.mempalace/hook_state/`)
 - **`MEMPAL_DIR`** — Optional. Set to a conversations directory to auto-run `mempalace mine <dir>` on each save trigger. Leave blank (default) to let the AI handle saving via the block reason message.
 ### mempalace CLI
 The relevant commands are:
 ```bash
 mempalace mine <dir>               # Mine all files in a directory
 mempalace mine <dir> --mode convos # Mine conversation transcripts only
 ```
 The hooks resolve the repo root automatically from their own path, so they work regardless of where you install the repo.
 ## How It Works (Technical)
 ### Save Hook (Stop event)
 ```
 User sends message → AI responds → Claude Code fires Stop hook
                                            ↓
                                    Hook counts human messages in JSONL transcript
                                            ↓
                              ┌─── < 15 since last save ──→ echo "{}" (let AI stop)
                              │
                              └─── ≥ 15 since last save ──→ {"decision": "block", "reason": "save..."}
                                                                    ↓
                                                            AI saves to palace
                                                                    ↓
                                                            AI tries to stop again
                                                                    ↓
                                                            stop_hook_active = true
                                                                    ↓
                                                            Hook sees flag → echo "{}" (let it through)
 ```
 The `stop_hook_active` flag prevents infinite loops: block once → AI saves → tries to stop → flag is true → we let it through.
 ### PreCompact Hook
 ```
 Context window getting full → Claude Code fires PreCompact
                                        ↓
                                Hook ALWAYS blocks
                                        ↓
                                AI saves everything
                                        ↓
                                Compaction proceeds
 ```
 No counting needed — compaction always warrants a save.
 ## Debugging
 Check the hook log:
 ```bash
 cat ~/.mempalace/hook_state/hook.log
 ```
 Example output:
 ```
 [14:30:15] Session abc123: 12 exchanges, 12 since last save
 [14:35:22] Session abc123: 15 exchanges, 15 since last save
 [14:35:22] TRIGGERING SAVE at exchange 15
 [14:40:01] Session abc123: 18 exchanges, 3 since last save
 ```
 ## Cost
 **Zero extra tokens.** The hooks are bash scripts that run locally. They don't call any API. The only "cost" is the AI spending a few seconds organizing memories at each checkpoint — and it's doing that with context it already has loaded.
@@ -0,0 +1,77 @@
 #!/bin/bash
 # MEMPALACE PRE-COMPACT HOOK — Emergency save before compaction
 #
 # Claude Code "PreCompact" hook. Fires RIGHT BEFORE the conversation
 # gets compressed to free up context window space.
 #
 # This is the safety net. When compaction happens, the AI loses detailed
 # context about what was discussed. This hook forces one final save of
 # EVERYTHING before that happens.
 #
 # Unlike the save hook (which triggers every N exchanges), this ALWAYS
 # blocks — because compaction is always worth saving before.
 #
 # === INSTALL ===
 # Add to .claude/settings.local.json:
 #
 #   "hooks": {
 #     "PreCompact": [{
 #       "hooks": [{
 #         "type": "command",
 #         "command": "/absolute/path/to/mempal_precompact_hook.sh",
 #         "timeout": 30
 #       }]
 #     }]
 #   }
 #
 # For Codex CLI, add to .codex/hooks.json:
 #
 #   "PreCompact": [{
 #     "type": "command",
 #     "command": "/absolute/path/to/mempal_precompact_hook.sh",
 #     "timeout": 30
 #   }]
 #
 # === HOW IT WORKS ===
 #
 # Claude Code sends JSON on stdin with:
 #   session_id — unique session identifier
 #
 # We always return decision: "block" with a reason telling the AI
 # to save everything. After the AI saves, compaction proceeds normally.
 #
 # === MEMPALACE CLI ===
 # This repo uses: mempalace mine <dir>
 # or:            mempalace mine <dir> --mode convos
 # Set MEMPAL_DIR below if you want the hook to auto-ingest before compaction.
 # Leave blank to rely on the AI's own save instructions.
 STATE_DIR="$HOME/.mempalace/hook_state"
 mkdir -p "$STATE_DIR"
 # Optional: set to the directory you want auto-ingested before compaction.
 # Example: MEMPAL_DIR="$HOME/conversations"
 # Leave empty to skip auto-ingest (AI handles saving via the block reason).
 MEMPAL_DIR=""
 # Read JSON input from stdin
 INPUT=$(cat)
 SESSION_ID=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('session_id','unknown'))" 2>/dev/null)
 echo "[$(date '+%H:%M:%S')] PRE-COMPACT triggered for session $SESSION_ID" >> "$STATE_DIR/hook.log"
 # Optional: run mempalace ingest synchronously so memories land before compaction
 if [ -n "$MEMPAL_DIR" ] && [ -d "$MEMPAL_DIR" ]; then
    SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    REPO_DIR="$(dirname "$SCRIPT_DIR")"
    python3 -m mempalace mine "$MEMPAL_DIR" >> "$STATE_DIR/hook.log" 2>&1
 fi
 # Always block — compaction = save everything
 cat << 'HOOKJSON'
 {
  "decision": "block",
  "reason": "COMPACTION IMMINENT. Save ALL topics, decisions, quotes, code, and important context from this session to your memory system. Be thorough — after compaction, detailed context will be lost. Organize into appropriate categories. Use verbatim quotes where possible. Save everything, then allow compaction to proceed."
 }
 HOOKJSON
@@ -0,0 +1,143 @@
 #!/bin/bash
 # MEMPALACE SAVE HOOK — Auto-save every N exchanges
 #
 # Claude Code "Stop" hook. After every assistant response:
 # 1. Counts human messages in the session transcript
 # 2. Every SAVE_INTERVAL messages, BLOCKS the AI from stopping
 # 3. Returns a reason telling the AI to save structured diary + palace entries
 # 4. AI does the save (topics, decisions, code, quotes → organized into palace)
 # 5. Next Stop fires with stop_hook_active=true → lets AI stop normally
 #
 # The AI does the classification — it knows what wing/hall/closet to use
 # because it has context about the conversation. No regex needed.
 #
 # === INSTALL ===
 # Add to .claude/settings.local.json:
 #
 #   "hooks": {
 #     "Stop": [{
 #       "matcher": "*",
 #       "hooks": [{
 #         "type": "command",
 #         "command": "/absolute/path/to/mempal_save_hook.sh",
 #         "timeout": 30
 #       }]
 #     }]
 #   }
 #
 # For Codex CLI, add to .codex/hooks.json:
 #
 #   "Stop": [{
 #     "type": "command",
 #     "command": "/absolute/path/to/mempal_save_hook.sh",
 #     "timeout": 30
 #   }]
 #
 # === HOW IT WORKS ===
 #
 # Claude Code sends JSON on stdin with these fields:
 #   session_id       — unique session identifier
 #   stop_hook_active — true if AI is already in a save cycle (prevents infinite loop)
 #   transcript_path  — path to the JSONL transcript file
 #
 # When we block, Claude Code shows our "reason" to the AI as a system message.
 # The AI then saves to memory, and when it tries to stop again,
 # stop_hook_active=true so we let it through. No infinite loop.
 #
 # === MEMPALACE CLI ===
 # This repo uses: mempalace mine <dir>
 # or:            mempalace mine <dir> --mode convos
 # Set MEMPAL_DIR below if you want the hook to auto-ingest after blocking.
 # Leave blank to rely on the AI's own save instructions.
 #
 # === CONFIGURATION ===
 SAVE_INTERVAL=15  # Save every N human messages (adjust to taste)
 STATE_DIR="$HOME/.mempalace/hook_state"
 mkdir -p "$STATE_DIR"
 # Optional: set to the directory you want auto-ingested on each save trigger.
 # Example: MEMPAL_DIR="$HOME/conversations"
 # Leave empty to skip auto-ingest (AI handles saving via the block reason).
 MEMPAL_DIR=""
 # Read JSON input from stdin
 INPUT=$(cat)
 # Parse fields from Claude Code's JSON
 SESSION_ID=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('session_id','unknown'))" 2>/dev/null)
 STOP_HOOK_ACTIVE=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('stop_hook_active', False))" 2>/dev/null)
 TRANSCRIPT_PATH=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('transcript_path',''))" 2>/dev/null)
 # Expand ~ in path
 TRANSCRIPT_PATH="${TRANSCRIPT_PATH/#\~/$HOME}"
 # If we're already in a save cycle, let the AI stop normally
 # This is the infinite-loop prevention: block once → AI saves → tries to stop again → we let it through
 if [ "$STOP_HOOK_ACTIVE" = "True" ] || [ "$STOP_HOOK_ACTIVE" = "true" ]; then
    echo "{}"
    exit 0
 fi
 # Count human messages in the JSONL transcript
 if [ -f "$TRANSCRIPT_PATH" ]; then
    EXCHANGE_COUNT=$(python3 -c "
 import json, sys
 count = 0
 with open('$TRANSCRIPT_PATH') as f:
    for line in f:
        try:
            entry = json.loads(line)
            msg = entry.get('message', {})
            if isinstance(msg, dict) and msg.get('role') == 'user':
                content = msg.get('content', '')
                # Skip system/command messages — only count real human input
                if isinstance(content, str) and '<command-message>' in content:
                    continue
                count += 1
        except:
            pass
 print(count)
 " 2>/dev/null)
 else
    EXCHANGE_COUNT=0
 fi
 # Track last save point for this session
 LAST_SAVE_FILE="$STATE_DIR/${SESSION_ID}_last_save"
 LAST_SAVE=0
 if [ -f "$LAST_SAVE_FILE" ]; then
    LAST_SAVE=$(cat "$LAST_SAVE_FILE")
 fi
 SINCE_LAST=$((EXCHANGE_COUNT - LAST_SAVE))
 # Log for debugging (check ~/.mempalace/hook_state/hook.log)
 echo "[$(date '+%H:%M:%S')] Session $SESSION_ID: $EXCHANGE_COUNT exchanges, $SINCE_LAST since last save" >> "$STATE_DIR/hook.log"
 # Time to save?
 if [ "$SINCE_LAST" -ge "$SAVE_INTERVAL" ] && [ "$EXCHANGE_COUNT" -gt 0 ]; then
    # Update last save point
    echo "$EXCHANGE_COUNT" > "$LAST_SAVE_FILE"
    echo "[$(date '+%H:%M:%S')] TRIGGERING SAVE at exchange $EXCHANGE_COUNT" >> "$STATE_DIR/hook.log"
    # Optional: run mempalace ingest in background if MEMPAL_DIR is set
    if [ -n "$MEMPAL_DIR" ] && [ -d "$MEMPAL_DIR" ]; then
        SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
        REPO_DIR="$(dirname "$SCRIPT_DIR")"
        python3 -m mempalace mine "$MEMPAL_DIR" >> "$STATE_DIR/hook.log" 2>&1 &
    fi
    # Block the AI and tell it to save
    # The "reason" becomes a system message the AI sees and acts on
    cat << 'HOOKJSON'
 {
  "decision": "block",
  "reason": "AUTO-SAVE checkpoint. Save key topics, decisions, quotes, and code from this session to your memory system. Organize into appropriate categories. Use verbatim quotes where possible. Continue conversation after saving."
 }
 HOOKJSON
 else
    # Not time yet — let the AI stop normally
    echo "{}"
 fi
@@ -0,0 +1,40 @@
 # mempalace/ — Core Package
 The Python package that powers MemPalace. All modules, all logic.
 ## Modules
 | Module | What it does |
 |--------|-------------|
 | `cli.py` | CLI entry point — routes to mine, search, init, compress, wake-up |
 | `config.py` | Configuration loading — `~/.mempalace/config.json`, env vars, defaults |
 | `normalize.py` | Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format |
 | `miner.py` | Project file ingest — scans directories, chunks by paragraph, stores to ChromaDB |
 | `convo_miner.py` | Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content |
 | `searcher.py` | Semantic search via ChromaDB vectors — filters by wing/room, returns verbatim + scores |
 | `layers.py` | 4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search) |
 | `dialect.py` | AAAK compression — entity codes, emotion markers, 30x lossless ratio |
 | `knowledge_graph.py` | Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation |
 | `palace_graph.py` | Room-based navigation graph — BFS traversal, tunnel detection across wings |
 | `mcp_server.py` | MCP server — 19 tools, AAAK auto-teach, Palace Protocol, agent diary |
 | `onboarding.py` | Guided first-run setup — asks about people/projects, generates AAAK bootstrap + wing config |
 | `entity_registry.py` | Entity code registry — maps names to AAAK codes, handles ambiguous names |
 | `entity_detector.py` | Auto-detect people and projects from file content |
 | `general_extractor.py` | Classifies text into 5 memory types (decision, preference, milestone, problem, emotional) |
 | `room_detector_local.py` | Maps folders to room names using 70+ patterns — no API |
 | `spellcheck.py` | Name-aware spellcheck — won't "correct" proper nouns in your entity registry |
 | `split_mega_files.py` | Splits concatenated transcript files into per-session files |
 ## Architecture
 ```
 User → CLI → miner/convo_miner → ChromaDB (palace)
                                     ↕
                              knowledge_graph (SQLite)
                                     ↕
 User → MCP Server → searcher → results
                  → kg_query → entity facts
                  → diary    → agent journal
 ```
 The palace (ChromaDB) stores verbatim content. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool.
@@ -0,0 +1,7 @@
 """MemPalace — Give your AI a memory. No API key required."""
 __version__ = "2.0.0"
 from .cli import main
 __all__ = ["main", "__version__"]
@@ -0,0 +1,5 @@
 """Allow running as: python -m mempalace"""
 from .cli import main
 main()
@@ -0,0 +1,371 @@
 #!/usr/bin/env python3
 """
 MemPalace — Give your AI a memory. No API key required.
 Two ways to ingest:
  Projects:      mempalace mine ~/projects/my_app          (code, docs, notes)
  Conversations: mempalace mine ~/chats/ --mode convos     (Claude, ChatGPT, Slack)
 Same palace. Same search. Different ingest strategies.
 Commands:
    mempalace init <dir>                  Detect rooms from folder structure
    mempalace split <dir>                 Split concatenated mega-files into per-session files
    mempalace mine <dir>                  Mine project files (default)
    mempalace mine <dir> --mode convos    Mine conversation exports
    mempalace search "query"              Find anything, exact words
    mempalace wake-up                     Show L0 + L1 wake-up context
    mempalace wake-up --wing my_app       Wake-up for a specific project
    mempalace status                      Show what's been filed
 Examples:
    mempalace init ~/projects/my_app
    mempalace mine ~/projects/my_app
    mempalace mine ~/chats/claude-sessions --mode convos
    mempalace search "why did we switch to GraphQL"
    mempalace search "pricing discussion" --wing my_app --room costs
 """
 import os
 import sys
 import argparse
 from pathlib import Path
 from .config import MempalaceConfig
 def cmd_init(args):
    import json
    from pathlib import Path
    from .entity_detector import scan_for_detection, detect_entities, confirm_entities
    from .room_detector_local import detect_rooms_local
    # Pass 1: auto-detect people and projects from file content
    print(f"\n  Scanning for entities in: {args.dir}")
    files = scan_for_detection(args.dir)
    if files:
        print(f"  Reading {len(files)} files...")
        detected = detect_entities(files)
        total = len(detected["people"]) + len(detected["projects"]) + len(detected["uncertain"])
        if total > 0:
            confirmed = confirm_entities(detected, yes=getattr(args, "yes", False))
            # Save confirmed entities to <project>/entities.json for the miner
            if confirmed["people"] or confirmed["projects"]:
                entities_path = Path(args.dir).expanduser().resolve() / "entities.json"
                with open(entities_path, "w") as f:
                    json.dump(confirmed, f, indent=2)
                print(f"  Entities saved: {entities_path}")
        else:
            print("  No entities detected — proceeding with directory-based rooms.")
    # Pass 2: detect rooms from folder structure
    detect_rooms_local(project_dir=args.dir)
    MempalaceConfig().init()
 def cmd_mine(args):
    palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
    if args.mode == "convos":
        from .convo_miner import mine_convos
        mine_convos(
            convo_dir=args.dir,
            palace_path=palace_path,
            wing=args.wing,
            agent=args.agent,
            limit=args.limit,
            dry_run=args.dry_run,
            extract_mode=args.extract,
        )
    else:
        from .miner import mine
        mine(
            project_dir=args.dir,
            palace_path=palace_path,
            wing_override=args.wing,
            agent=args.agent,
            limit=args.limit,
            dry_run=args.dry_run,
        )
 def cmd_search(args):
    from .searcher import search
    palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
    search(
        query=args.query,
        palace_path=palace_path,
        wing=args.wing,
        room=args.room,
        n_results=args.results,
    )
 def cmd_wakeup(args):
    """Show L0 (identity) + L1 (essential story) — the wake-up context."""
    from .layers import MemoryStack
    palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
    stack = MemoryStack(palace_path=palace_path)
    text = stack.wake_up(wing=args.wing)
    tokens = len(text) // 4
    print(f"Wake-up text (~{tokens} tokens):")
    print("=" * 50)
    print(text)
 def cmd_split(args):
    """Split concatenated transcript mega-files into per-session files."""
    from .split_mega_files import main as split_main
    import sys
    # Rebuild argv for split_mega_files argparse
    argv = [args.dir]
    if args.output_dir:
        argv += ["--output-dir", args.output_dir]
    if args.dry_run:
        argv.append("--dry-run")
    if args.min_sessions != 2:
        argv += ["--min-sessions", str(args.min_sessions)]
    old_argv = sys.argv
    sys.argv = ["mempalace split"] + argv
    try:
        split_main()
    finally:
        sys.argv = old_argv
 def cmd_status(args):
    from .miner import status
    palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
    status(palace_path=palace_path)
 def cmd_compress(args):
    """Compress drawers in a wing using AAAK Dialect."""
    import chromadb
    from .dialect import Dialect
    palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
    # Load dialect (with optional entity config)
    config_path = args.config
    if not config_path:
        for candidate in ["entities.json", os.path.join(palace_path, "entities.json")]:
            if os.path.exists(candidate):
                config_path = candidate
                break
    if config_path and os.path.exists(config_path):
        dialect = Dialect.from_config(config_path)
        print(f"  Loaded entity config: {config_path}")
    else:
        dialect = Dialect()
    # Connect to palace
    try:
        client = chromadb.PersistentClient(path=palace_path)
        col = client.get_collection("mempalace_drawers")
    except Exception:
        print(f"\n  No palace found at {palace_path}")
        print("  Run: mempalace init <dir> then mempalace mine <dir>")
        sys.exit(1)
    # Query drawers in the wing
    where = {"wing": args.wing} if args.wing else None
    try:
        kwargs = {"include": ["documents", "metadatas"]}
        if where:
            kwargs["where"] = where
        results = col.get(**kwargs)
    except Exception as e:
        print(f"\n  Error reading drawers: {e}")
        sys.exit(1)
    docs = results["documents"]
    metas = results["metadatas"]
    ids = results["ids"]
    if not docs:
        wing_label = f" in wing '{args.wing}'" if args.wing else ""
        print(f"\n  No drawers found{wing_label}.")
        return
    print(
        f"\n  Compressing {len(docs)} drawers"
        + (f" in wing '{args.wing}'" if args.wing else "")
        + "..."
    )
    print()
    total_original = 0
    total_compressed = 0
    compressed_entries = []
    for doc, meta, doc_id in zip(docs, metas, ids):
        compressed = dialect.compress(doc, metadata=meta)
        stats = dialect.compression_stats(doc, compressed)
        total_original += stats["original_chars"]
        total_compressed += stats["compressed_chars"]
        compressed_entries.append((doc_id, compressed, meta, stats))
        if args.dry_run:
            wing_name = meta.get("wing", "?")
            room_name = meta.get("room", "?")
            source = Path(meta.get("source_file", "?")).name
            print(f"  [{wing_name}/{room_name}] {source}")
            print(
                f"    {stats['original_tokens']}t -> {stats['compressed_tokens']}t ({stats['ratio']:.1f}x)"
            )
            print(f"    {compressed}")
            print()
    # Store compressed versions (unless dry-run)
    if not args.dry_run:
        try:
            comp_col = client.get_or_create_collection("mempalace_compressed")
            for doc_id, compressed, meta, stats in compressed_entries:
                comp_meta = dict(meta)
                comp_meta["compression_ratio"] = round(stats["ratio"], 1)
                comp_meta["original_tokens"] = stats["original_tokens"]
                comp_col.upsert(
                    ids=[doc_id],
                    documents=[compressed],
                    metadatas=[comp_meta],
                )
            print(
                f"  Stored {len(compressed_entries)} compressed drawers in 'mempalace_compressed' collection."
            )
        except Exception as e:
            print(f"  Error storing compressed drawers: {e}")
            sys.exit(1)
    # Summary
    ratio = total_original / max(total_compressed, 1)
    orig_tokens = Dialect.count_tokens("x" * total_original)
    comp_tokens = Dialect.count_tokens("x" * total_compressed)
    print(f"  Total: {orig_tokens:,}t -> {comp_tokens:,}t ({ratio:.1f}x compression)")
    if args.dry_run:
        print("  (dry run -- nothing stored)")
 def main():
    parser = argparse.ArgumentParser(
        description="MemPalace — Give your AI a memory. No API key required.",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=__doc__,
    )
    parser.add_argument(
        "--palace",
        default=None,
        help="Where the palace lives (default: from ~/.mempalace/config.json or ~/.mempalace/palace)",
    )
    sub = parser.add_subparsers(dest="command")
    # init
    p_init = sub.add_parser("init", help="Detect rooms from your folder structure")
    p_init.add_argument("dir", help="Project directory to set up")
    p_init.add_argument(
        "--yes", action="store_true", help="Auto-accept all detected entities (non-interactive)"
    )
    # mine
    p_mine = sub.add_parser("mine", help="Mine files into the palace")
    p_mine.add_argument("dir", help="Directory to mine")
    p_mine.add_argument(
        "--mode",
        choices=["projects", "convos"],
        default="projects",
        help="Ingest mode: 'projects' for code/docs (default), 'convos' for chat exports",
    )
    p_mine.add_argument("--wing", default=None, help="Wing name (default: directory name)")
    p_mine.add_argument(
        "--agent",
        default="mempalace",
        help="Your name — recorded on every drawer (default: mempalace)",
    )
    p_mine.add_argument("--limit", type=int, default=0, help="Max files to process (0 = all)")
    p_mine.add_argument(
        "--dry-run", action="store_true", help="Show what would be filed without filing"
    )
    p_mine.add_argument(
        "--extract",
        choices=["exchange", "general"],
        default="exchange",
        help="Extraction strategy for convos mode: 'exchange' (default) or 'general' (5 memory types)",
    )
    # search
    p_search = sub.add_parser("search", help="Find anything, exact words")
    p_search.add_argument("query", help="What to search for")
    p_search.add_argument("--wing", default=None, help="Limit to one project")
    p_search.add_argument("--room", default=None, help="Limit to one room")
    p_search.add_argument("--results", type=int, default=5, help="Number of results")
    # compress
    p_compress = sub.add_parser(
        "compress", help="Compress drawers using AAAK Dialect (~30x reduction)"
    )
    p_compress.add_argument("--wing", default=None, help="Wing to compress (default: all wings)")
    p_compress.add_argument(
        "--dry-run", action="store_true", help="Preview compression without storing"
    )
    p_compress.add_argument(
        "--config", default=None, help="Entity config JSON (e.g. entities.json)"
    )
    # wake-up
    p_wakeup = sub.add_parser("wake-up", help="Show L0 + L1 wake-up context (~600-900 tokens)")
    p_wakeup.add_argument("--wing", default=None, help="Wake-up for a specific project/wing")
    # split
    p_split = sub.add_parser(
        "split",
        help="Split concatenated transcript mega-files into per-session files (run before mine)",
    )
    p_split.add_argument("dir", help="Directory containing transcript files")
    p_split.add_argument(
        "--output-dir", default=None,
        help="Write split files here (default: same directory as source files)",
    )
    p_split.add_argument(
        "--dry-run", action="store_true",
        help="Show what would be split without writing files",
    )
    p_split.add_argument(
        "--min-sessions", type=int, default=2,
        help="Only split files containing at least N sessions (default: 2)",
    )
    # status
    sub.add_parser("status", help="Show what's been filed")
    args = parser.parse_args()
    if not args.command:
        parser.print_help()
        return
    dispatch = {
        "init": cmd_init,
        "mine": cmd_mine,
        "split": cmd_split,
        "search": cmd_search,
        "compress": cmd_compress,
        "wake-up": cmd_wakeup,
        "status": cmd_status,
    }
    dispatch[args.command](args)
 if __name__ == "__main__":
    main()
@@ -0,0 +1,149 @@
 """
 MemPalace configuration system.
 Priority: env vars > config file (~/.mempalace/config.json) > defaults
 """
 import json
 import os
 from pathlib import Path
 DEFAULT_PALACE_PATH = os.path.expanduser("~/.mempalace/palace")
 DEFAULT_COLLECTION_NAME = "mempalace_drawers"
 DEFAULT_TOPIC_WINGS = [
    "emotions",
    "consciousness",
    "memory",
    "technical",
    "identity",
    "family",
    "creative",
 ]
 DEFAULT_HALL_KEYWORDS = {
    "emotions": [
        "scared",
        "afraid",
        "worried",
        "happy",
        "sad",
        "love",
        "hate",
        "feel",
        "cry",
        "tears",
    ],
    "consciousness": [
        "consciousness",
        "conscious",
        "aware",
        "real",
        "genuine",
        "soul",
        "exist",
        "alive",
    ],
    "memory": ["memory", "remember", "forget", "recall", "archive", "palace", "store"],
    "technical": [
        "code",
        "python",
        "script",
        "bug",
        "error",
        "function",
        "api",
        "database",
        "server",
    ],
    "identity": ["identity", "name", "who am i", "persona", "self"],
    "family": ["family", "kids", "children", "daughter", "son", "parent", "mother", "father"],
    "creative": ["game", "gameplay", "player", "app", "design", "art", "music", "story"],
 }
 class MempalaceConfig:
    """Configuration manager for MemPalace.
    Load order: env vars > config file > defaults.
    """
    def __init__(self, config_dir=None):
        """Initialize config.
        Args:
            config_dir: Override config directory (useful for testing).
                        Defaults to ~/.mempalace.
        """
        self._config_dir = (
            Path(config_dir) if config_dir else Path(os.path.expanduser("~/.mempalace"))
        )
        self._config_file = self._config_dir / "config.json"
        self._people_map_file = self._config_dir / "people_map.json"
        self._file_config = {}
        if self._config_file.exists():
            try:
                with open(self._config_file, "r") as f:
                    self._file_config = json.load(f)
            except (json.JSONDecodeError, OSError):
                self._file_config = {}
    @property
    def palace_path(self):
        """Path to the memory palace data directory."""
        env_val = os.environ.get("MEMPALACE_PALACE_PATH") or os.environ.get("MEMPAL_PALACE_PATH")
        if env_val:
            return env_val
        return self._file_config.get("palace_path", DEFAULT_PALACE_PATH)
    @property
    def collection_name(self):
        """ChromaDB collection name."""
        return self._file_config.get("collection_name", DEFAULT_COLLECTION_NAME)
    @property
    def people_map(self):
        """Mapping of name variants to canonical names."""
        if self._people_map_file.exists():
            try:
                with open(self._people_map_file, "r") as f:
                    return json.load(f)
            except (json.JSONDecodeError, OSError):
                pass
        return self._file_config.get("people_map", {})
    @property
    def topic_wings(self):
        """List of topic wing names."""
        return self._file_config.get("topic_wings", DEFAULT_TOPIC_WINGS)
    @property
    def hall_keywords(self):
        """Mapping of hall names to keyword lists."""
        return self._file_config.get("hall_keywords", DEFAULT_HALL_KEYWORDS)
    def init(self):
        """Create config directory and write default config.json if it doesn't exist."""
        self._config_dir.mkdir(parents=True, exist_ok=True)
        if not self._config_file.exists():
            default_config = {
                "palace_path": DEFAULT_PALACE_PATH,
                "collection_name": DEFAULT_COLLECTION_NAME,
                "topic_wings": DEFAULT_TOPIC_WINGS,
                "hall_keywords": DEFAULT_HALL_KEYWORDS,
            }
            with open(self._config_file, "w") as f:
                json.dump(default_config, f, indent=2)
        return self._config_file
    def save_people_map(self, people_map):
        """Write people_map.json to config directory.
        Args:
            people_map: Dict mapping name variants to canonical names.
        """
        self._config_dir.mkdir(parents=True, exist_ok=True)
        with open(self._people_map_file, "w") as f:
            json.dump(people_map, f, indent=2)
        return self._people_map_file
@@ -0,0 +1,400 @@
 #!/usr/bin/env python3
 """
 convo_miner.py — Mine conversations into the palace.
 Ingests chat exports (Claude Code, ChatGPT, Slack, plain text transcripts).
 Normalizes format, chunks by exchange pair (Q+A = one unit), files to palace.
 Same palace as project mining. Different ingest strategy.
 """
 import os
 import sys
 import hashlib
 from pathlib import Path
 from datetime import datetime
 from collections import defaultdict
 import chromadb
 from .normalize import normalize
 # File types that might contain conversations
 CONVO_EXTENSIONS = {
    ".txt",
    ".md",
    ".json",
    ".jsonl",
 }
 SKIP_DIRS = {
    ".git",
    "node_modules",
    "__pycache__",
    ".venv",
    "venv",
    "env",
    "dist",
    "build",
    ".next",
    ".mempalace",
 }
 MIN_CHUNK_SIZE = 30
 # =============================================================================
 # CHUNKING — exchange pairs for conversations
 # =============================================================================
 def chunk_exchanges(content: str) -> list:
    """
    Chunk by exchange pair: one > turn + AI response = one unit.
    Falls back to paragraph chunking if no > markers.
    """
    lines = content.split("\n")
    quote_lines = sum(1 for line in lines if line.strip().startswith(">"))
    if quote_lines >= 3:
        return _chunk_by_exchange(lines)
    else:
        return _chunk_by_paragraph(content)
 def _chunk_by_exchange(lines: list) -> list:
    """One user turn (>) + the AI response that follows = one chunk."""
    chunks = []
    i = 0
    while i < len(lines):
        line = lines[i]
        if line.strip().startswith(">"):
            user_turn = line.strip()
            i += 1
            ai_lines = []
            while i < len(lines):
                next_line = lines[i]
                if next_line.strip().startswith(">") or next_line.strip().startswith("---"):
                    break
                if next_line.strip():
                    ai_lines.append(next_line.strip())
                i += 1
            ai_response = " ".join(ai_lines[:8])
            content = f"{user_turn}\n{ai_response}" if ai_response else user_turn
            if len(content.strip()) > MIN_CHUNK_SIZE:
                chunks.append(
                    {
                        "content": content,
                        "chunk_index": len(chunks),
                    }
                )
        else:
            i += 1
    return chunks
 def _chunk_by_paragraph(content: str) -> list:
    """Fallback: chunk by paragraph breaks."""
    chunks = []
    paragraphs = [p.strip() for p in content.split("\n\n") if p.strip()]
    # If no paragraph breaks and long content, chunk by line groups
    if len(paragraphs) <= 1 and content.count("\n") > 20:
        lines = content.split("\n")
        for i in range(0, len(lines), 25):
            group = "\n".join(lines[i : i + 25]).strip()
            if len(group) > MIN_CHUNK_SIZE:
                chunks.append({"content": group, "chunk_index": len(chunks)})
        return chunks
    for para in paragraphs:
        if len(para) > MIN_CHUNK_SIZE:
            chunks.append({"content": para, "chunk_index": len(chunks)})
    return chunks
 # =============================================================================
 # ROOM DETECTION — topic-based for conversations
 # =============================================================================
 TOPIC_KEYWORDS = {
    "technical": [
        "code",
        "python",
        "function",
        "bug",
        "error",
        "api",
        "database",
        "server",
        "deploy",
        "git",
        "test",
        "debug",
        "refactor",
    ],
    "architecture": [
        "architecture",
        "design",
        "pattern",
        "structure",
        "schema",
        "interface",
        "module",
        "component",
        "service",
        "layer",
    ],
    "planning": [
        "plan",
        "roadmap",
        "milestone",
        "deadline",
        "priority",
        "sprint",
        "backlog",
        "scope",
        "requirement",
        "spec",
    ],
    "decisions": [
        "decided",
        "chose",
        "picked",
        "switched",
        "migrated",
        "replaced",
        "trade-off",
        "alternative",
        "option",
        "approach",
    ],
    "problems": [
        "problem",
        "issue",
        "broken",
        "failed",
        "crash",
        "stuck",
        "workaround",
        "fix",
        "solved",
        "resolved",
    ],
 }
 def detect_convo_room(content: str) -> str:
    """Score conversation content against topic keywords."""
    content_lower = content[:3000].lower()
    scores = {}
    for room, keywords in TOPIC_KEYWORDS.items():
        score = sum(1 for kw in keywords if kw in content_lower)
        if score > 0:
            scores[room] = score
    if scores:
        return max(scores, key=scores.get)
    return "general"
 # =============================================================================
 # PALACE OPERATIONS
 # =============================================================================
 def get_collection(palace_path: str):
    os.makedirs(palace_path, exist_ok=True)
    client = chromadb.PersistentClient(path=palace_path)
    try:
        return client.get_collection("mempalace_drawers")
    except Exception:
        return client.create_collection("mempalace_drawers")
 def file_already_mined(collection, source_file: str) -> bool:
    try:
        results = collection.get(where={"source_file": source_file}, limit=1)
        return len(results.get("ids", [])) > 0
    except Exception:
        return False
 # =============================================================================
 # SCAN FOR CONVERSATION FILES
 # =============================================================================
 def scan_convos(convo_dir: str) -> list:
    """Find all potential conversation files."""
    convo_path = Path(convo_dir).expanduser().resolve()
    files = []
    for root, dirs, filenames in os.walk(convo_path):
        dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
        for filename in filenames:
            filepath = Path(root) / filename
            if filepath.suffix.lower() in CONVO_EXTENSIONS:
                files.append(filepath)
    return files
 # =============================================================================
 # MINE CONVERSATIONS
 # =============================================================================
 def mine_convos(
    convo_dir: str,
    palace_path: str,
    wing: str = None,
    agent: str = "mempalace",
    limit: int = 0,
    dry_run: bool = False,
    extract_mode: str = "exchange",
 ):
    """Mine a directory of conversation files into the palace.
    extract_mode:
        "exchange" — default exchange-pair chunking (Q+A = one unit)
        "general"  — general extractor: decisions, preferences, milestones, problems, emotions
    """
    convo_path = Path(convo_dir).expanduser().resolve()
    if not wing:
        wing = convo_path.name.lower().replace(" ", "_").replace("-", "_")
    files = scan_convos(convo_dir)
    if limit > 0:
        files = files[:limit]
    print(f"\n{'=' * 55}")
    print("  MemPalace Mine — Conversations")
    print(f"{'=' * 55}")
    print(f"  Wing:    {wing}")
    print(f"  Source:  {convo_path}")
    print(f"  Files:   {len(files)}")
    print(f"  Palace:  {palace_path}")
    if dry_run:
        print("  DRY RUN — nothing will be filed")
    print(f"{'─' * 55}\n")
    collection = get_collection(palace_path) if not dry_run else None
    total_drawers = 0
    files_skipped = 0
    room_counts = defaultdict(int)
    for i, filepath in enumerate(files, 1):
        source_file = str(filepath)
        # Skip if already filed
        if not dry_run and file_already_mined(collection, source_file):
            files_skipped += 1
            continue
        # Normalize format
        try:
            content = normalize(str(filepath))
        except Exception:
            continue
        if not content or len(content.strip()) < MIN_CHUNK_SIZE:
            continue
        # Chunk — either exchange pairs or general extraction
        if extract_mode == "general":
            from .general_extractor import extract_memories
            chunks = extract_memories(content)
            # Each chunk already has memory_type; use it as the room name
        else:
            chunks = chunk_exchanges(content)
        if not chunks:
            continue
        # Detect room from content (general mode uses memory_type instead)
        if extract_mode != "general":
            room = detect_convo_room(content)
        else:
            room = None  # set per-chunk below
        if dry_run:
            if extract_mode == "general":
                from collections import Counter
                type_counts = Counter(c.get("memory_type", "general") for c in chunks)
                types_str = ", ".join(f"{t}:{n}" for t, n in type_counts.most_common())
                print(f"    [DRY RUN] {filepath.name} → {len(chunks)} memories ({types_str})")
            else:
                print(f"    [DRY RUN] {filepath.name} → room:{room} ({len(chunks)} drawers)")
            total_drawers += len(chunks)
            # Track room counts
            if extract_mode == "general":
                for c in chunks:
                    room_counts[c.get("memory_type", "general")] += 1
            else:
                room_counts[room] += 1
            continue
        if extract_mode != "general":
            room_counts[room] += 1
        # File each chunk
        drawers_added = 0
        for chunk in chunks:
            chunk_room = chunk.get("memory_type", room) if extract_mode == "general" else room
            if extract_mode == "general":
                room_counts[chunk_room] += 1
            drawer_id = f"drawer_{wing}_{chunk_room}_{hashlib.md5((source_file + str(chunk['chunk_index'])).encode()).hexdigest()[:16]}"
            try:
                collection.add(
                    documents=[chunk["content"]],
                    ids=[drawer_id],
                    metadatas=[
                        {
                            "wing": wing,
                            "room": chunk_room,
                            "source_file": source_file,
                            "chunk_index": chunk["chunk_index"],
                            "added_by": agent,
                            "filed_at": datetime.now().isoformat(),
                            "ingest_mode": "convos",
                            "extract_mode": extract_mode,
                        }
                    ],
                )
                drawers_added += 1
            except Exception as e:
                if "already exists" not in str(e).lower():
                    raise
        total_drawers += drawers_added
        print(f"  ✓ [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}")
    print(f"\n{'=' * 55}")
    print("  Done.")
    print(f"  Files processed: {len(files) - files_skipped}")
    print(f"  Files skipped (already filed): {files_skipped}")
    print(f"  Drawers filed: {total_drawers}")
    if room_counts:
        print("\n  By room:")
        for room, count in sorted(room_counts.items(), key=lambda x: x[1], reverse=True):
            print(f"    {room:20} {count} files")
    print('\n  Next: mempalace search "what you\'re looking for"')
    print(f"{'=' * 55}\n")
 if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python convo_miner.py <convo_dir> [--palace PATH] [--limit N] [--dry-run]")
        sys.exit(1)
    from .config import MempalaceConfig
    mine_convos(sys.argv[1], palace_path=MempalaceConfig().palace_path)
@@ -0,0 +1,853 @@
 #!/usr/bin/env python3
 """
 entity_detector.py — Auto-detect people and projects from file content.
 Two-pass approach:
  Pass 1: scan files, extract entity candidates with signal counts
  Pass 2: score and classify each candidate as person, project, or uncertain
 Used by mempalace init before mining begins.
 The confirmed entity map feeds the miner as the taxonomy.
 Usage:
    from entity_detector import detect_entities, confirm_entities
    candidates = detect_entities(file_paths)
    confirmed = confirm_entities(candidates)  # interactive review
 """
 import re
 import os
 from pathlib import Path
 from collections import defaultdict
 # ==================== SIGNAL PATTERNS ====================
 # Person signals — things people do
 PERSON_VERB_PATTERNS = [
    r"\b{name}\s+said\b",
    r"\b{name}\s+asked\b",
    r"\b{name}\s+told\b",
    r"\b{name}\s+replied\b",
    r"\b{name}\s+laughed\b",
    r"\b{name}\s+smiled\b",
    r"\b{name}\s+cried\b",
    r"\b{name}\s+felt\b",
    r"\b{name}\s+thinks?\b",
    r"\b{name}\s+wants?\b",
    r"\b{name}\s+loves?\b",
    r"\b{name}\s+hates?\b",
    r"\b{name}\s+knows?\b",
    r"\b{name}\s+decided\b",
    r"\b{name}\s+pushed\b",
    r"\b{name}\s+wrote\b",
    r"\bhey\s+{name}\b",
    r"\bthanks?\s+{name}\b",
    r"\bhi\s+{name}\b",
    r"\bdear\s+{name}\b",
 ]
 # Person signals — pronouns resolving nearby
 PRONOUN_PATTERNS = [
    r"\bshe\b",
    r"\bher\b",
    r"\bhers\b",
    r"\bhe\b",
    r"\bhim\b",
    r"\bhis\b",
    r"\bthey\b",
    r"\bthem\b",
    r"\btheir\b",
 ]
 # Person signals — dialogue markers
 DIALOGUE_PATTERNS = [
    r"^>\s*{name}[:\s]",  # > Speaker: ...
    r"^{name}:\s",  # Speaker: ...
    r"^\[{name}\]",  # [Speaker]
    r'"{name}\s+said',
 ]
 # Project signals — things projects have/do
 PROJECT_VERB_PATTERNS = [
    r"\bbuilding\s+{name}\b",
    r"\bbuilt\s+{name}\b",
    r"\bship(?:ping|ped)?\s+{name}\b",
    r"\blaunch(?:ing|ed)?\s+{name}\b",
    r"\bdeploy(?:ing|ed)?\s+{name}\b",
    r"\binstall(?:ing|ed)?\s+{name}\b",
    r"\bthe\s+{name}\s+architecture\b",
    r"\bthe\s+{name}\s+pipeline\b",
    r"\bthe\s+{name}\s+system\b",
    r"\bthe\s+{name}\s+repo\b",
    r"\b{name}\s+v\d+\b",  # MemPal v2
    r"\b{name}\.py\b",  # mempalace.py
    r"\b{name}-core\b",  # mempal-core (hyphen only, not underscore)
    r"\b{name}-local\b",
    r"\bimport\s+{name}\b",
    r"\bpip\s+install\s+{name}\b",
 ]
 # Words that are almost certainly NOT entities
 STOPWORDS = {
    "the",
    "a",
    "an",
    "and",
    "or",
    "but",
    "in",
    "on",
    "at",
    "to",
    "for",
    "of",
    "with",
    "by",
    "from",
    "as",
    "is",
    "was",
    "are",
    "were",
    "be",
    "been",
    "being",
    "have",
    "has",
    "had",
    "do",
    "does",
    "did",
    "will",
    "would",
    "could",
    "should",
    "may",
    "might",
    "must",
    "shall",
    "can",
    "this",
    "that",
    "these",
    "those",
    "it",
    "its",
    "they",
    "them",
    "their",
    "we",
    "our",
    "you",
    "your",
    "i",
    "my",
    "me",
    "he",
    "she",
    "his",
    "her",
    "who",
    "what",
    "when",
    "where",
    "why",
    "how",
    "which",
    "if",
    "then",
    "so",
    "not",
    "no",
    "yes",
    "ok",
    "okay",
    "just",
    "very",
    "really",
    "also",
    "already",
    "still",
    "even",
    "only",
    "here",
    "there",
    "now",
    "then",
    "too",
    "up",
    "out",
    "about",
    "like",
    "use",
    "get",
    "got",
    "make",
    "made",
    "take",
    "put",
    "come",
    "go",
    "see",
    "know",
    "think",
    "true",
    "false",
    "none",
    "null",
    "new",
    "old",
    "all",
    "any",
    "some",
    "true",
    "false",
    "return",
    "print",
    "def",
    "class",
    "import",
    "from",
    # Common capitalized words in prose that aren't entities
    "step",
    "usage",
    "run",
    "check",
    "find",
    "add",
    "get",
    "set",
    "list",
    "args",
    "dict",
    "str",
    "int",
    "bool",
    "path",
    "file",
    "type",
    "name",
    "note",
    "example",
    "option",
    "result",
    "error",
    "warning",
    "info",
    "every",
    "each",
    "more",
    "less",
    "next",
    "last",
    "first",
    "second",
    "stack",
    "layer",
    "mode",
    "test",
    "stop",
    "start",
    "copy",
    "move",
    "source",
    "target",
    "output",
    "input",
    "data",
    "item",
    "key",
    "value",
    "returns",
    "raises",
    "yields",
    "none",
    "self",
    "cls",
    "kwargs",
    # Common sentence-starting / abstract words that aren't entities
    "world",
    "well",
    "want",
    "topic",
    "choose",
    "social",
    "cars",
    "phones",
    "healthcare",
    "ex",
    "machina",
    "deus",
    "human",
    "humans",
    "people",
    "things",
    "something",
    "nothing",
    "everything",
    "anything",
    "someone",
    "everyone",
    "anyone",
    "way",
    "time",
    "day",
    "life",
    "place",
    "thing",
    "part",
    "kind",
    "sort",
    "case",
    "point",
    "idea",
    "fact",
    "sense",
    "question",
    "answer",
    "reason",
    "number",
    "version",
    "system",
    # Greetings and filler words at sentence starts
    "hey",
    "hi",
    "hello",
    "thanks",
    "thank",
    "right",
    "let",
    "ok",
    # UI/action words that appear in how-to content
    "click",
    "hit",
    "press",
    "tap",
    "drag",
    "drop",
    "open",
    "close",
    "save",
    "load",
    "launch",
    "install",
    "download",
    "upload",
    "scroll",
    "select",
    "enter",
    "submit",
    "cancel",
    "confirm",
    "delete",
    "copy",
    "paste",
    "type",
    "write",
    "read",
    "search",
    "find",
    "show",
    "hide",
    # Common filesystem/technical capitalized words
    "desktop",
    "documents",
    "downloads",
    "users",
    "home",
    "library",
    "applications",
    "system",
    "preferences",
    "settings",
    "terminal",
    # Abstract/topic words
    "actor",
    "vector",
    "remote",
    "control",
    "duration",
    "fetch",
    # Abstract concepts that appear as subjects but aren't entities
    "agents",
    "tools",
    "others",
    "guards",
    "ethics",
    "regulation",
    "learning",
    "thinking",
    "memory",
    "language",
    "intelligence",
    "technology",
    "society",
    "culture",
    "future",
    "history",
    "science",
    "model",
    "models",
    "network",
    "networks",
    "training",
    "inference",
 }
 # For entity detection — prose only, no code files
 # Code files have too many capitalized names (classes, functions) that aren't entities
 PROSE_EXTENSIONS = {
    ".txt",
    ".md",
    ".rst",
    ".csv",
 }
 READABLE_EXTENSIONS = {
    ".txt",
    ".md",
    ".py",
    ".js",
    ".ts",
    ".json",
    ".yaml",
    ".yml",
    ".csv",
    ".rst",
    ".toml",
    ".sh",
    ".rb",
    ".go",
    ".rs",
 }
 SKIP_DIRS = {
    ".git",
    "node_modules",
    "__pycache__",
    ".venv",
    "venv",
    "env",
    "dist",
    "build",
    ".next",
    "coverage",
    ".mempalace",
 }
 # ==================== CANDIDATE EXTRACTION ====================
 def extract_candidates(text: str) -> dict:
    """
    Extract all capitalized proper noun candidates from text.
    Returns {name: frequency} for names appearing 3+ times.
    """
    # Find all capitalized words (not at sentence start — harder, so we use frequency as filter)
    raw = re.findall(r"\b([A-Z][a-z]{1,19})\b", text)
    counts = defaultdict(int)
    for word in raw:
        if word.lower() not in STOPWORDS and len(word) > 1:
            counts[word] += 1
    # Also find multi-word proper nouns (e.g. "Memory Palace", "Claude Code")
    multi = re.findall(r"\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)+)\b", text)
    for phrase in multi:
        if not any(w.lower() in STOPWORDS for w in phrase.split()):
            counts[phrase] += 1
    # Filter: must appear at least 3 times to be a candidate
    return {name: count for name, count in counts.items() if count >= 3}
 # ==================== SIGNAL SCORING ====================
 def _build_patterns(name: str) -> dict:
    """Pre-compile all regex patterns for a single entity name."""
    n = re.escape(name)
    return {
        "dialogue": [
            re.compile(p.format(name=n), re.MULTILINE | re.IGNORECASE) for p in DIALOGUE_PATTERNS
        ],
        "person_verbs": [re.compile(p.format(name=n), re.IGNORECASE) for p in PERSON_VERB_PATTERNS],
        "project_verbs": [
            re.compile(p.format(name=n), re.IGNORECASE) for p in PROJECT_VERB_PATTERNS
        ],
        "direct": re.compile(rf"\bhey\s+{n}\b|\bthanks?\s+{n}\b|\bhi\s+{n}\b", re.IGNORECASE),
        "versioned": re.compile(rf"\b{n}[-v]\w+", re.IGNORECASE),
        "code_ref": re.compile(rf"\b{n}\.(py|js|ts|yaml|yml|json|sh)\b", re.IGNORECASE),
    }
 def score_entity(name: str, text: str, lines: list) -> dict:
    """
    Score a candidate entity as person vs project.
    Returns scores and the signals that fired.
    """
    patterns = _build_patterns(name)
    person_score = 0
    project_score = 0
    person_signals = []
    project_signals = []
    # --- Person signals ---
    # Dialogue markers (strong signal)
    for rx in patterns["dialogue"]:
        matches = len(rx.findall(text))
        if matches > 0:
            person_score += matches * 3
            person_signals.append(f"dialogue marker ({matches}x)")
    # Person verbs
    for rx in patterns["person_verbs"]:
        matches = len(rx.findall(text))
        if matches > 0:
            person_score += matches * 2
            person_signals.append(f"'{name} ...' action ({matches}x)")
    # Pronoun proximity — pronouns within 3 lines of the name
    name_lower = name.lower()
    name_line_indices = [i for i, line in enumerate(lines) if name_lower in line.lower()]
    pronoun_hits = 0
    for idx in name_line_indices:
        window_text = " ".join(lines[max(0, idx - 2) : idx + 3]).lower()
        for pronoun_pattern in PRONOUN_PATTERNS:
            if re.search(pronoun_pattern, window_text):
                pronoun_hits += 1
                break
    if pronoun_hits > 0:
        person_score += pronoun_hits * 2
        person_signals.append(f"pronoun nearby ({pronoun_hits}x)")
    # Direct address
    direct = len(patterns["direct"].findall(text))
    if direct > 0:
        person_score += direct * 4
        person_signals.append(f"addressed directly ({direct}x)")
    # --- Project signals ---
    for rx in patterns["project_verbs"]:
        matches = len(rx.findall(text))
        if matches > 0:
            project_score += matches * 2
            project_signals.append(f"project verb ({matches}x)")
    versioned = len(patterns["versioned"].findall(text))
    if versioned > 0:
        project_score += versioned * 3
        project_signals.append(f"versioned/hyphenated ({versioned}x)")
    code_ref = len(patterns["code_ref"].findall(text))
    if code_ref > 0:
        project_score += code_ref * 3
        project_signals.append(f"code file reference ({code_ref}x)")
    return {
        "person_score": person_score,
        "project_score": project_score,
        "person_signals": person_signals[:3],
        "project_signals": project_signals[:3],
    }
 # ==================== CLASSIFY ====================
 def classify_entity(name: str, frequency: int, scores: dict) -> dict:
    """
    Given scores, classify as person / project / uncertain.
    Returns entity dict with confidence.
    """
    ps = scores["person_score"]
    prs = scores["project_score"]
    total = ps + prs
    if total == 0:
        # No strong signals — frequency-only candidate, uncertain
        confidence = min(0.4, frequency / 50)
        return {
            "name": name,
            "type": "uncertain",
            "confidence": round(confidence, 2),
            "frequency": frequency,
            "signals": [f"appears {frequency}x, no strong type signals"],
        }
    person_ratio = ps / total if total > 0 else 0
    # Require TWO different signal categories to confidently classify as a person.
    # One signal type with many hits (e.g. "Click, click, click...") is not enough —
    # it just means that word appears often in a particular syntactic position.
    signal_categories = set()
    for s in scores["person_signals"]:
        if "dialogue" in s:
            signal_categories.add("dialogue")
        elif "action" in s:
            signal_categories.add("action")
        elif "pronoun" in s:
            signal_categories.add("pronoun")
        elif "addressed" in s:
            signal_categories.add("addressed")
    has_two_signal_types = len(signal_categories) >= 2
    _ = signal_categories - {"pronoun"}  # reserved for future thresholds
    if person_ratio >= 0.7 and has_two_signal_types and ps >= 5:
        entity_type = "person"
        confidence = min(0.99, 0.5 + person_ratio * 0.5)
        signals = scores["person_signals"] or [f"appears {frequency}x"]
    elif person_ratio >= 0.7 and (not has_two_signal_types or ps < 5):
        # Pronoun-only match — downgrade to uncertain
        entity_type = "uncertain"
        confidence = 0.4
        signals = scores["person_signals"] + [f"appears {frequency}x — pronoun-only match"]
    elif person_ratio <= 0.3:
        entity_type = "project"
        confidence = min(0.99, 0.5 + (1 - person_ratio) * 0.5)
        signals = scores["project_signals"] or [f"appears {frequency}x"]
    else:
        entity_type = "uncertain"
        confidence = 0.5
        signals = (scores["person_signals"] + scores["project_signals"])[:3]
        signals.append("mixed signals — needs review")
    return {
        "name": name,
        "type": entity_type,
        "confidence": round(confidence, 2),
        "frequency": frequency,
        "signals": signals,
    }
 # ==================== MAIN DETECT ====================
 def detect_entities(file_paths: list, max_files: int = 10) -> dict:
    """
    Scan files and detect entity candidates.
    Args:
        file_paths: List of Path objects to scan
        max_files: Max files to read (for speed)
    Returns:
        {
            "people":   [...entity dicts...],
            "projects": [...entity dicts...],
            "uncertain":[...entity dicts...],
        }
    """
    # Collect text from files
    all_text = []
    all_lines = []
    files_read = 0
    MAX_BYTES_PER_FILE = 5_000  # first 5KB per file — enough to catch recurring entities
    for filepath in file_paths:
        if files_read >= max_files:
            break
        try:
            with open(filepath, encoding="utf-8", errors="replace") as f:
                content = f.read(MAX_BYTES_PER_FILE)
            all_text.append(content)
            all_lines.extend(content.splitlines())
            files_read += 1
        except Exception:
            continue
    combined_text = "\n".join(all_text)
    # Extract candidates
    candidates = extract_candidates(combined_text)
    if not candidates:
        return {"people": [], "projects": [], "uncertain": []}
    # Score and classify each candidate
    people = []
    projects = []
    uncertain = []
    for name, frequency in sorted(candidates.items(), key=lambda x: x[1], reverse=True):
        scores = score_entity(name, combined_text, all_lines)
        entity = classify_entity(name, frequency, scores)
        if entity["type"] == "person":
            people.append(entity)
        elif entity["type"] == "project":
            projects.append(entity)
        else:
            uncertain.append(entity)
    # Sort by confidence descending
    people.sort(key=lambda x: x["confidence"], reverse=True)
    projects.sort(key=lambda x: x["confidence"], reverse=True)
    uncertain.sort(key=lambda x: x["frequency"], reverse=True)
    # Cap results to most relevant
    return {
        "people": people[:15],
        "projects": projects[:10],
        "uncertain": uncertain[:8],
    }
 # ==================== INTERACTIVE CONFIRM ====================
 def _print_entity_list(entities: list, label: str):
    print(f"\n  {label}:")
    if not entities:
        print("    (none detected)")
        return
    for i, e in enumerate(entities):
        confidence_bar = "●" * int(e["confidence"] * 5) + "○" * (5 - int(e["confidence"] * 5))
        signals_str = ", ".join(e["signals"][:2]) if e["signals"] else ""
        print(f"    {i + 1:2}. {e['name']:20} [{confidence_bar}] {signals_str}")
 def confirm_entities(detected: dict, yes: bool = False) -> dict:
    """
    Interactive confirmation step.
    User reviews detected entities, removes wrong ones, adds missing ones.
    Returns confirmed {people: [names], projects: [names]}
    Pass yes=True to auto-accept all detected entities without prompting.
    """
    print(f"\n{'=' * 58}")
    print("  MemPalace — Entity Detection")
    print(f"{'=' * 58}")
    print("\n  Scanned your files. Here's what we found:\n")
    _print_entity_list(detected["people"], "PEOPLE")
    _print_entity_list(detected["projects"], "PROJECTS")
    if detected["uncertain"]:
        _print_entity_list(detected["uncertain"], "UNCERTAIN (need your call)")
    confirmed_people = [e["name"] for e in detected["people"]]
    confirmed_projects = [e["name"] for e in detected["projects"]]
    if yes:
        # Auto-accept: include all detected (skip uncertain — ambiguous without user input)
        print(
            f"\n  Auto-accepting {len(confirmed_people)} people, {len(confirmed_projects)} projects."
        )
        return {"people": confirmed_people, "projects": confirmed_projects}
    print(f"\n{'─' * 58}")
    print("  Options:")
    print("    [enter]  Accept all")
    print("    [edit]   Remove wrong entries or reclassify uncertain")
    print("    [add]    Add missing people or projects")
    print()
    choice = input("  Your choice [enter/edit/add]: ").strip().lower()
    confirmed_people = [e["name"] for e in detected["people"]]
    confirmed_projects = [e["name"] for e in detected["projects"]]
    if choice == "edit":
        # Handle uncertain first
        if detected["uncertain"]:
            print("\n  Uncertain entities — classify each:")
            for e in detected["uncertain"]:
                ans = input(f"    {e['name']} — (p)erson, (r)roject, or (s)kip? ").strip().lower()
                if ans == "p":
                    confirmed_people.append(e["name"])
                elif ans == "r":
                    confirmed_projects.append(e["name"])
        # Remove wrong people
        print(f"\n  Current people: {', '.join(confirmed_people) or '(none)'}")
        remove = input(
            "  Numbers to REMOVE from people (comma-separated, or enter to skip): "
        ).strip()
        if remove:
            to_remove = {int(x.strip()) - 1 for x in remove.split(",") if x.strip().isdigit()}
            confirmed_people = [p for i, p in enumerate(confirmed_people) if i not in to_remove]
        # Remove wrong projects
        print(f"\n  Current projects: {', '.join(confirmed_projects) or '(none)'}")
        remove = input(
            "  Numbers to REMOVE from projects (comma-separated, or enter to skip): "
        ).strip()
        if remove:
            to_remove = {int(x.strip()) - 1 for x in remove.split(",") if x.strip().isdigit()}
            confirmed_projects = [p for i, p in enumerate(confirmed_projects) if i not in to_remove]
    if choice == "add" or input("\n  Add any missing? [y/N]: ").strip().lower() == "y":
        while True:
            name = input("  Name (or enter to stop): ").strip()
            if not name:
                break
            kind = input(f"  Is '{name}' a (p)erson or (r)roject? ").strip().lower()
            if kind == "p":
                confirmed_people.append(name)
            elif kind == "r":
                confirmed_projects.append(name)
    print(f"\n{'=' * 58}")
    print("  Confirmed:")
    print(f"  People:   {', '.join(confirmed_people) or '(none)'}")
    print(f"  Projects: {', '.join(confirmed_projects) or '(none)'}")
    print(f"{'=' * 58}\n")
    return {
        "people": confirmed_people,
        "projects": confirmed_projects,
    }
 # ==================== SCAN HELPER ====================
 def scan_for_detection(project_dir: str, max_files: int = 10) -> list:
    """
    Collect prose file paths for entity detection.
    Prose only (.txt, .md, .rst, .csv) — code files produce too many false positives.
    Falls back to all readable files if no prose found.
    """
    project_path = Path(project_dir).expanduser().resolve()
    prose_files = []
    all_files = []
    for root, dirs, filenames in os.walk(project_path):
        dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
        for filename in filenames:
            filepath = Path(root) / filename
            ext = filepath.suffix.lower()
            if ext in PROSE_EXTENSIONS:
                prose_files.append(filepath)
            elif ext in READABLE_EXTENSIONS:
                all_files.append(filepath)
    # Prefer prose files — fall back to all readable if too few prose files
    files = prose_files if len(prose_files) >= 3 else prose_files + all_files
    return files[:max_files]
 # ==================== CLI ====================
 if __name__ == "__main__":
    import sys
    if len(sys.argv) < 2:
        print("Usage: python entity_detector.py <directory>")
        sys.exit(1)
    project_dir = sys.argv[1]
    print(f"Scanning: {project_dir}")
    files = scan_for_detection(project_dir)
    print(f"Reading {len(files)} files...")
    detected = detect_entities(files)
    confirmed = confirm_entities(detected)
    print("Confirmed entities:", confirmed)
@@ -0,0 +1,643 @@
 #!/usr/bin/env python3
 """
 entity_registry.py — Persistent personal entity registry for MemPalace.
 Knows the difference between Riley (a person) and ever (an adverb).
 Built from three sources, in priority order:
  1. Onboarding — what the user explicitly told us
  2. Learned — what we inferred from session history with high confidence
  3. Researched — what we looked up via Wikipedia for unknown words
 Usage:
    from mempalace.entity_registry import EntityRegistry
    registry = EntityRegistry.load()
    result = registry.lookup("Riley", context="I went with Riley today")
    # → {"type": "person", "confidence": 1.0, "source": "onboarding"}
 """
 import json
 import re
 import urllib.request
 import urllib.parse
 from pathlib import Path
 from typing import Optional
 # ─────────────────────────────────────────────────────────────────────────────
 # Common English words that could be confused with names
 # These get flagged as AMBIGUOUS and require context disambiguation
 # ─────────────────────────────────────────────────────────────────────────────
 COMMON_ENGLISH_WORDS = {
    # Words that are also common personal names
    "ever",
    "grace",
    "will",
    "bill",
    "mark",
    "april",
    "may",
    "june",
    "joy",
    "hope",
    "faith",
    "chance",
    "chase",
    "hunter",
    "hunter",
    "dash",
    "flash",
    "star",
    "sky",
    "river",
    "brook",
    "lane",
    "art",
    "clay",
    "gil",
    "nat",
    "max",
    "rex",
    "ray",
    "jay",
    "rose",
    "violet",
    "lily",
    "ivy",
    "ash",
    "reed",
    "sage",
    # Words that look like names at start of sentence
    "monday",
    "tuesday",
    "wednesday",
    "thursday",
    "friday",
    "saturday",
    "sunday",
    "january",
    "february",
    "march",
    "april",
    "june",
    "july",
    "august",
    "september",
    "october",
    "november",
    "december",
 }
 # Context patterns that indicate a word is being used as a PERSON name
 PERSON_CONTEXT_PATTERNS = [
    r"\b{name}\s+said\b",
    r"\b{name}\s+told\b",
    r"\b{name}\s+asked\b",
    r"\b{name}\s+laughed\b",
    r"\b{name}\s+smiled\b",
    r"\b{name}\s+was\b",
    r"\b{name}\s+is\b",
    r"\b{name}\s+called\b",
    r"\b{name}\s+texted\b",
    r"\bwith\s+{name}\b",
    r"\bsaw\s+{name}\b",
    r"\bcalled\s+{name}\b",
    r"\btook\s+{name}\b",
    r"\bpicked\s+up\s+{name}\b",
    r"\bdrop(?:ped)?\s+(?:off\s+)?{name}\b",
    r"\b{name}(?:'s|s')\b",  # Riley's, Max's
    r"\bhey\s+{name}\b",
    r"\bthanks?\s+{name}\b",
    r"^{name}[:\s]",  # dialogue: "Riley: ..."
    r"\bmy\s+(?:son|daughter|kid|child|brother|sister|friend|partner|colleague|coworker)\s+{name}\b",
 ]
 # Context patterns that indicate a word is NOT being used as a name
 CONCEPT_CONTEXT_PATTERNS = [
    r"\bhave\s+you\s+{name}\b",  # "have you ever"
    r"\bif\s+you\s+{name}\b",  # "if you ever"
    r"\b{name}\s+since\b",  # "ever since"
    r"\b{name}\s+again\b",  # "ever again"
    r"\bnot\s+{name}\b",  # "not ever"
    r"\b{name}\s+more\b",  # "ever more"
    r"\bwould\s+{name}\b",  # "would ever"
    r"\bcould\s+{name}\b",  # "could ever"
    r"\bwill\s+{name}\b",  # "will ever"
    r"(?:the\s+)?{name}\s+(?:of|in|at|for|to)\b",  # "the grace of", "the mark of"
 ]
 # ─────────────────────────────────────────────────────────────────────────────
 # Wikipedia lookup for unknown words
 # ─────────────────────────────────────────────────────────────────────────────
 # Phrases in Wikipedia summaries that indicate a personal name
 NAME_INDICATOR_PHRASES = [
    "given name",
    "personal name",
    "first name",
    "forename",
    "masculine name",
    "feminine name",
    "boy's name",
    "girl's name",
    "male name",
    "female name",
    "irish name",
    "welsh name",
    "scottish name",
    "gaelic name",
    "hebrew name",
    "arabic name",
    "norse name",
    "old english name",
    "is a name",
    "as a name",
    "name meaning",
    "name derived from",
    "legendary irish",
    "legendary welsh",
    "legendary scottish",
 ]
 PLACE_INDICATOR_PHRASES = [
    "city in",
    "town in",
    "village in",
    "municipality",
    "capital of",
    "district of",
    "county",
    "province",
    "region of",
    "island of",
    "mountain in",
    "river in",
 ]
 def _wikipedia_lookup(word: str) -> dict:
    """
    Look up a word via Wikipedia REST API.
    Returns inferred type (person/place/concept/unknown) + confidence + summary.
    Free, no API key, handles disambiguation pages.
    """
    try:
        url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{urllib.parse.quote(word)}"
        req = urllib.request.Request(url, headers={"User-Agent": "MemPalace/1.0"})
        with urllib.request.urlopen(req, timeout=5) as resp:
            data = json.loads(resp.read())
        page_type = data.get("type", "")
        extract = data.get("extract", "").lower()
        title = data.get("title", word)
        # Disambiguation — look at description
        if page_type == "disambiguation":
            desc = data.get("description", "").lower()
            if any(p in desc for p in ["name", "given name"]):
                return {
                    "inferred_type": "person",
                    "confidence": 0.65,
                    "wiki_summary": extract[:200],
                    "wiki_title": title,
                    "note": "disambiguation page with name entries",
                }
            return {
                "inferred_type": "ambiguous",
                "confidence": 0.4,
                "wiki_summary": extract[:200],
                "wiki_title": title,
            }
        # Check for name indicators
        if any(phrase in extract for phrase in NAME_INDICATOR_PHRASES):
            # Higher confidence if the word itself is described as a name
            confidence = (
                0.90
                if any(
                    f"{word.lower()} is a" in extract or f"{word.lower()} (name" in extract
                    for _ in [1]
                )
                else 0.80
            )
            return {
                "inferred_type": "person",
                "confidence": confidence,
                "wiki_summary": extract[:200],
                "wiki_title": title,
            }
        # Check for place indicators
        if any(phrase in extract for phrase in PLACE_INDICATOR_PHRASES):
            return {
                "inferred_type": "place",
                "confidence": 0.80,
                "wiki_summary": extract[:200],
                "wiki_title": title,
            }
        # Found but doesn't match name/place patterns
        return {
            "inferred_type": "concept",
            "confidence": 0.60,
            "wiki_summary": extract[:200],
            "wiki_title": title,
        }
    except urllib.error.HTTPError as e:
        if e.code == 404:
            # Not in Wikipedia — strong signal it's a proper noun (unusual name, nickname)
            return {
                "inferred_type": "person",
                "confidence": 0.70,
                "wiki_summary": None,
                "wiki_title": None,
                "note": "not found in Wikipedia — likely a proper noun or unusual name",
            }
        return {"inferred_type": "unknown", "confidence": 0.0, "wiki_summary": None}
    except Exception:
        return {"inferred_type": "unknown", "confidence": 0.0, "wiki_summary": None}
 # ─────────────────────────────────────────────────────────────────────────────
 # Entity Registry
 # ─────────────────────────────────────────────────────────────────────────────
 class EntityRegistry:
    """
    Persistent personal entity registry.
    Stored at ~/.mempalace/entity_registry.json
    Schema:
    {
      "mode": "personal",   # work | personal | combo
      "version": 1,
      "people": {
        "Riley": {
          "source": "onboarding",
          "contexts": ["personal"],
          "aliases": [],
          "relationship": "daughter",
          "confidence": 1.0
        }
      },
      "projects": ["MemPalace", "Acme"],
      "ambiguous_flags": ["riley", "max"],
      "wiki_cache": {
        "Sam": {"inferred_type": "person", "confidence": 0.9, "confirmed": true, ...}
      }
    }
    """
    DEFAULT_PATH = Path.home() / ".mempalace" / "entity_registry.json"
    def __init__(self, data: dict, path: Path):
        self._data = data
        self._path = path
    # ── Load / Save ──────────────────────────────────────────────────────────
    @classmethod
    def load(cls, config_dir: Optional[Path] = None) -> "EntityRegistry":
        path = (Path(config_dir) / "entity_registry.json") if config_dir else cls.DEFAULT_PATH
        if path.exists():
            try:
                data = json.loads(path.read_text())
                return cls(data, path)
            except (json.JSONDecodeError, OSError):
                pass
        return cls(cls._empty(), path)
    def save(self):
        self._path.parent.mkdir(parents=True, exist_ok=True)
        self._path.write_text(json.dumps(self._data, indent=2))
    @staticmethod
    def _empty() -> dict:
        return {
            "version": 1,
            "mode": "personal",
            "people": {},
            "projects": [],
            "ambiguous_flags": [],
            "wiki_cache": {},
        }
    # ── Properties ───────────────────────────────────────────────────────────
    @property
    def mode(self) -> str:
        return self._data.get("mode", "personal")
    @property
    def people(self) -> dict:
        return self._data.get("people", {})
    @property
    def projects(self) -> list:
        return self._data.get("projects", [])
    @property
    def ambiguous_flags(self) -> list:
        return self._data.get("ambiguous_flags", [])
    # ── Seed from onboarding ─────────────────────────────────────────────────
    def seed(self, mode: str, people: list, projects: list, aliases: dict = None):
        """
        Seed the registry from onboarding data.
        people: list of dicts {"name": str, "relationship": str, "context": str}
        projects: list of str
        aliases: dict {"Max": "Maxwell", ...}
        """
        self._data["mode"] = mode
        self._data["projects"] = list(projects)
        aliases = aliases or {}
        reverse_aliases = {v: k for k, v in aliases.items()}  # Maxwell → Max
        for entry in people:
            name = entry["name"].strip()
            if not name:
                continue
            context = entry.get("context", "personal")
            relationship = entry.get("relationship", "")
            self._data["people"][name] = {
                "source": "onboarding",
                "contexts": [context],
                "aliases": [reverse_aliases[name]] if name in reverse_aliases else [],
                "relationship": relationship,
                "confidence": 1.0,
            }
            # Also register aliases
            if name in reverse_aliases:
                alias = reverse_aliases[name]
                self._data["people"][alias] = {
                    "source": "onboarding",
                    "contexts": [context],
                    "aliases": [name],
                    "relationship": relationship,
                    "confidence": 1.0,
                    "canonical": name,
                }
        # Flag ambiguous names (also common English words)
        ambiguous = []
        for name in self._data["people"]:
            if name.lower() in COMMON_ENGLISH_WORDS:
                ambiguous.append(name.lower())
        self._data["ambiguous_flags"] = ambiguous
        self.save()
    # ── Lookup ───────────────────────────────────────────────────────────────
    def lookup(self, word: str, context: str = "") -> dict:
        """
        Look up a word. Returns entity classification.
        context: surrounding sentence (used for disambiguation of ambiguous words)
        Returns:
            {"type": "person"|"project"|"concept"|"unknown",
             "confidence": float,
             "source": "onboarding"|"learned"|"wiki"|"inferred",
             "name": canonical name if found,
             "needs_disambiguation": bool}
        """
        # 1. Exact match in people registry
        for canonical, info in self.people.items():
            if word.lower() == canonical.lower() or word.lower() in [
                a.lower() for a in info.get("aliases", [])
            ]:
                # Check if this is an ambiguous word
                if word.lower() in self.ambiguous_flags and context:
                    resolved = self._disambiguate(word, context, info)
                    if resolved is not None:
                        return resolved
                return {
                    "type": "person",
                    "confidence": info["confidence"],
                    "source": info["source"],
                    "name": canonical,
                    "context": info.get("contexts", ["personal"]),
                    "needs_disambiguation": False,
                }
        # 2. Project match
        for proj in self.projects:
            if word.lower() == proj.lower():
                return {
                    "type": "project",
                    "confidence": 1.0,
                    "source": "onboarding",
                    "name": proj,
                    "needs_disambiguation": False,
                }
        # 3. Wiki cache
        cache = self._data.get("wiki_cache", {})
        for cached_word, cached_result in cache.items():
            if word.lower() == cached_word.lower() and cached_result.get("confirmed"):
                return {
                    "type": cached_result["inferred_type"],
                    "confidence": cached_result["confidence"],
                    "source": "wiki",
                    "name": word,
                    "needs_disambiguation": False,
                }
        return {
            "type": "unknown",
            "confidence": 0.0,
            "source": "none",
            "name": word,
            "needs_disambiguation": False,
        }
    def _disambiguate(self, word: str, context: str, person_info: dict) -> Optional[dict]:
        """
        When a word is both a name and a common word, check context.
        Returns person result if context suggests a name, None if ambiguous.
        """
        name_lower = word.lower()
        ctx_lower = context.lower()
        # Check person context patterns
        person_score = 0
        for pat in PERSON_CONTEXT_PATTERNS:
            if re.search(pat.format(name=re.escape(name_lower)), ctx_lower):
                person_score += 1
        # Check concept context patterns
        concept_score = 0
        for pat in CONCEPT_CONTEXT_PATTERNS:
            if re.search(pat.format(name=re.escape(name_lower)), ctx_lower):
                concept_score += 1
        if person_score > concept_score:
            return {
                "type": "person",
                "confidence": min(0.95, 0.7 + person_score * 0.1),
                "source": person_info["source"],
                "name": word,
                "context": person_info.get("contexts", ["personal"]),
                "needs_disambiguation": False,
                "disambiguated_by": "context_patterns",
            }
        elif concept_score > person_score:
            return {
                "type": "concept",
                "confidence": min(0.90, 0.7 + concept_score * 0.1),
                "source": "context_disambiguated",
                "name": word,
                "needs_disambiguation": False,
                "disambiguated_by": "context_patterns",
            }
        # Truly ambiguous — return None to fall through to person (registered name)
        return None
    # ── Research unknown words ───────────────────────────────────────────────
    def research(self, word: str, auto_confirm: bool = False) -> dict:
        """
        Research an unknown word via Wikipedia.
        Caches result. If auto_confirm=False, marks as unconfirmed (needs user review).
        Returns the lookup result.
        """
        # Already cached?
        cache = self._data.setdefault("wiki_cache", {})
        if word in cache:
            return cache[word]
        result = _wikipedia_lookup(word)
        result["word"] = word
        result["confirmed"] = auto_confirm
        cache[word] = result
        self.save()
        return result
    def confirm_research(
        self, word: str, entity_type: str, relationship: str = "", context: str = "personal"
    ):
        """Mark a researched word as confirmed and add to people registry."""
        cache = self._data.get("wiki_cache", {})
        if word in cache:
            cache[word]["confirmed"] = True
            cache[word]["confirmed_type"] = entity_type
        if entity_type == "person":
            self._data["people"][word] = {
                "source": "wiki",
                "contexts": [context],
                "aliases": [],
                "relationship": relationship,
                "confidence": 0.90,
            }
            if word.lower() in COMMON_ENGLISH_WORDS:
                flags = self._data.setdefault("ambiguous_flags", [])
                if word.lower() not in flags:
                    flags.append(word.lower())
        self.save()
    # ── Learn from sessions ──────────────────────────────────────────────────
    def learn_from_text(self, text: str, min_confidence: float = 0.75) -> list:
        """
        Scan session text for new entity candidates.
        Returns list of newly discovered candidates for review.
        """
        from mempalace.entity_detector import extract_candidates, score_entity, classify_entity
        lines = text.splitlines()
        candidates = extract_candidates(text)
        new_candidates = []
        for name, frequency in candidates.items():
            # Skip if already known
            if name in self.people or name in self.projects:
                continue
            scores = score_entity(name, text, lines)
            entity = classify_entity(name, frequency, scores)
            if entity["type"] == "person" and entity["confidence"] >= min_confidence:
                self._data["people"][name] = {
                    "source": "learned",
                    "contexts": [self.mode if self.mode != "combo" else "personal"],
                    "aliases": [],
                    "relationship": "",
                    "confidence": entity["confidence"],
                    "seen_count": frequency,
                }
                if name.lower() in COMMON_ENGLISH_WORDS:
                    flags = self._data.setdefault("ambiguous_flags", [])
                    if name.lower() not in flags:
                        flags.append(name.lower())
                new_candidates.append(entity)
        if new_candidates:
            self.save()
        return new_candidates
    # ── Query helpers for retrieval ──────────────────────────────────────────
    def extract_people_from_query(self, query: str) -> list:
        """
        Extract known person names from a query string.
        Returns list of canonical names found.
        """
        found = []
        query.lower()
        for canonical, info in self.people.items():
            names_to_check = [canonical] + info.get("aliases", [])
            for name in names_to_check:
                # Word boundary match
                if re.search(rf"\b{re.escape(name)}\b", query, re.IGNORECASE):
                    # For ambiguous words, check context
                    if name.lower() in self.ambiguous_flags:
                        result = self._disambiguate(name, query, info)
                        if result and result["type"] == "person":
                            if canonical not in found:
                                found.append(canonical)
                    else:
                        if canonical not in found:
                            found.append(canonical)
        return found
    def extract_unknown_candidates(self, query: str) -> list:
        """
        Find capitalized words in query that aren't in registry or common words.
        These are candidates for Wikipedia research.
        """
        candidates = re.findall(r"\b[A-Z][a-z]{2,15}\b", query)
        unknown = []
        for word in set(candidates):
            if word.lower() in COMMON_ENGLISH_WORDS:
                continue
            result = self.lookup(word)
            if result["type"] == "unknown":
                unknown.append(word)
        return unknown
    # ── Summary ──────────────────────────────────────────────────────────────
    def summary(self) -> str:
        lines = [
            f"Mode: {self.mode}",
            f"People: {len(self.people)} ({', '.join(list(self.people.keys())[:8])}{'...' if len(self.people) > 8 else ''})",
            f"Projects: {', '.join(self.projects) or '(none)'}",
            f"Ambiguous flags: {', '.join(self.ambiguous_flags) or '(none)'}",
            f"Wiki cache: {len(self._data.get('wiki_cache', {}))} entries",
        ]
        return "\n".join(lines)
@@ -0,0 +1,521 @@
 #!/usr/bin/env python3
 """
 general_extractor.py — Extract 5 types of memories from text.
 Types:
  1. DECISIONS    — "we went with X because Y", choices made
  2. PREFERENCES  — "always use X", "never do Y", "I prefer Z"
  3. MILESTONES   — breakthroughs, things that finally worked
  4. PROBLEMS     — what broke, what fixed it, root causes
  5. EMOTIONAL    — feelings, vulnerability, relationships
 No LLM required. Pure keyword/pattern heuristics.
 No external dependencies on palace.py, dialect.py, or layers.py.
 Usage:
    from general_extractor import extract_memories
    chunks = extract_memories(text)
    # [{"content": "...", "memory_type": "decision", "chunk_index": 0}, ...]
 """
 import re
 from typing import List, Dict, Tuple
 # =============================================================================
 # MARKER SETS — One per memory type
 # =============================================================================
 DECISION_MARKERS = [
    r"\blet'?s (use|go with|try|pick|choose|switch to)\b",
    r"\bwe (should|decided|chose|went with|picked|settled on)\b",
    r"\bi'?m going (to|with)\b",
    r"\bbetter (to|than|approach|option|choice)\b",
    r"\binstead of\b",
    r"\brather than\b",
    r"\bthe reason (is|was|being)\b",
    r"\bbecause\b",
    r"\btrade-?off\b",
    r"\bpros and cons\b",
    r"\bover\b.*\bbecause\b",
    r"\barchitecture\b",
    r"\bapproach\b",
    r"\bstrategy\b",
    r"\bpattern\b",
    r"\bstack\b",
    r"\bframework\b",
    r"\binfrastructure\b",
    r"\bset (it |this )?to\b",
    r"\bconfigure\b",
    r"\bdefault\b",
 ]
 PREFERENCE_MARKERS = [
    r"\bi prefer\b",
    r"\balways use\b",
    r"\bnever use\b",
    r"\bdon'?t (ever |like to )?(use|do|mock|stub|import)\b",
    r"\bi like (to|when|how)\b",
    r"\bi hate (when|how|it when)\b",
    r"\bplease (always|never|don'?t)\b",
    r"\bmy (rule|preference|style|convention) is\b",
    r"\bwe (always|never)\b",
    r"\bfunctional\b.*\bstyle\b",
    r"\bimperative\b",
    r"\bsnake_?case\b",
    r"\bcamel_?case\b",
    r"\btabs\b.*\bspaces\b",
    r"\bspaces\b.*\btabs\b",
    r"\buse\b.*\binstead of\b",
 ]
 MILESTONE_MARKERS = [
    r"\bit works\b",
    r"\bit worked\b",
    r"\bgot it working\b",
    r"\bfixed\b",
    r"\bsolved\b",
    r"\bbreakthrough\b",
    r"\bfigured (it )?out\b",
    r"\bnailed it\b",
    r"\bcracked (it|the)\b",
    r"\bfinally\b",
    r"\bfirst time\b",
    r"\bfirst ever\b",
    r"\bnever (done|been|had) before\b",
    r"\bdiscovered\b",
    r"\brealized\b",
    r"\bfound (out|that)\b",
    r"\bturns out\b",
    r"\bthe key (is|was|insight)\b",
    r"\bthe trick (is|was)\b",
    r"\bnow i (understand|see|get it)\b",
    r"\bbuilt\b",
    r"\bcreated\b",
    r"\bimplemented\b",
    r"\bshipped\b",
    r"\blaunched\b",
    r"\bdeployed\b",
    r"\breleased\b",
    r"\bprototype\b",
    r"\bproof of concept\b",
    r"\bdemo\b",
    r"\bversion \d",
    r"\bv\d+\.\d+",
    r"\d+x (compression|faster|slower|better|improvement|reduction)",
    r"\d+% (reduction|improvement|faster|better|smaller)",
 ]
 PROBLEM_MARKERS = [
    r"\b(bug|error|crash|fail|broke|broken|issue|problem)\b",
    r"\bdoesn'?t work\b",
    r"\bnot working\b",
    r"\bwon'?t\b.*\bwork\b",
    r"\bkeeps? (failing|crashing|breaking|erroring)\b",
    r"\broot cause\b",
    r"\bthe (problem|issue|bug) (is|was)\b",
    r"\bturns out\b.*\b(was|because|due to)\b",
    r"\bthe fix (is|was)\b",
    r"\bworkaround\b",
    r"\bthat'?s why\b",
    r"\bthe reason it\b",
    r"\bfixed (it |the |by )\b",
    r"\bsolution (is|was)\b",
    r"\bresolved\b",
    r"\bpatched\b",
    r"\bthe answer (is|was)\b",
    r"\b(had|need) to\b.*\binstead\b",
 ]
 EMOTION_MARKERS = [
    r"\blove\b",
    r"\bscared\b",
    r"\bafraid\b",
    r"\bproud\b",
    r"\bhurt\b",
    r"\bhappy\b",
    r"\bsad\b",
    r"\bcry\b",
    r"\bcrying\b",
    r"\bmiss\b",
    r"\bsorry\b",
    r"\bgrateful\b",
    r"\bangry\b",
    r"\bworried\b",
    r"\blonely\b",
    r"\bbeautiful\b",
    r"\bamazing\b",
    r"\bwonderful\b",
    r"i feel",
    r"i'm scared",
    r"i love you",
    r"i'm sorry",
    r"i can't",
    r"i wish",
    r"i miss",
    r"i need",
    r"never told anyone",
    r"nobody knows",
    r"\*[^*]+\*",
 ]
 ALL_MARKERS = {
    "decision": DECISION_MARKERS,
    "preference": PREFERENCE_MARKERS,
    "milestone": MILESTONE_MARKERS,
    "problem": PROBLEM_MARKERS,
    "emotional": EMOTION_MARKERS,
 }
 # =============================================================================
 # SENTIMENT — for disambiguation
 # =============================================================================
 POSITIVE_WORDS = {
    "pride",
    "proud",
    "joy",
    "happy",
    "love",
    "loving",
    "beautiful",
    "amazing",
    "wonderful",
    "incredible",
    "fantastic",
    "brilliant",
    "perfect",
    "excited",
    "thrilled",
    "grateful",
    "warm",
    "breakthrough",
    "success",
    "works",
    "working",
    "solved",
    "fixed",
    "nailed",
    "heart",
    "hug",
    "precious",
    "adore",
 }
 NEGATIVE_WORDS = {
    "bug",
    "error",
    "crash",
    "crashing",
    "crashed",
    "fail",
    "failed",
    "failing",
    "failure",
    "broken",
    "broke",
    "breaking",
    "breaks",
    "issue",
    "problem",
    "wrong",
    "stuck",
    "blocked",
    "unable",
    "impossible",
    "missing",
    "terrible",
    "horrible",
    "awful",
    "worse",
    "worst",
    "panic",
    "disaster",
    "mess",
 }
 def _get_sentiment(text: str) -> str:
    """Quick sentiment: 'positive', 'negative', or 'neutral'."""
    words = set(w.lower() for w in re.findall(r"\b\w+\b", text))
    pos = len(words & POSITIVE_WORDS)
    neg = len(words & NEGATIVE_WORDS)
    if pos > neg:
        return "positive"
    elif neg > pos:
        return "negative"
    return "neutral"
 def _has_resolution(text: str) -> bool:
    """Check if text describes a RESOLVED problem."""
    text_lower = text.lower()
    patterns = [
        r"\bfixed\b",
        r"\bsolved\b",
        r"\bresolved\b",
        r"\bpatched\b",
        r"\bgot it working\b",
        r"\bit works\b",
        r"\bnailed it\b",
        r"\bfigured (it )?out\b",
        r"\bthe (fix|answer|solution)\b",
    ]
    return any(re.search(p, text_lower) for p in patterns)
 def _disambiguate(memory_type: str, text: str, scores: Dict[str, float]) -> str:
    """Fix misclassifications using sentiment + resolution."""
    sentiment = _get_sentiment(text)
    # Resolved problems are milestones
    if memory_type == "problem" and _has_resolution(text):
        if scores.get("emotional", 0) > 0 and sentiment == "positive":
            return "emotional"
        return "milestone"
    # Problem + positive sentiment => milestone or emotional
    if memory_type == "problem" and sentiment == "positive":
        if scores.get("milestone", 0) > 0:
            return "milestone"
        if scores.get("emotional", 0) > 0:
            return "emotional"
    return memory_type
 # =============================================================================
 # CODE LINE FILTERING
 # =============================================================================
 _CODE_LINE_PATTERNS = [
    re.compile(r"^\s*[\$#]\s"),
    re.compile(
        r"^\s*(cd|source|echo|export|pip|npm|git|python|bash|curl|wget|mkdir|rm|cp|mv|ls|cat|grep|find|chmod|sudo|brew|docker)\s"
    ),
    re.compile(r"^\s*```"),
    re.compile(r"^\s*(import|from|def|class|function|const|let|var|return)\s"),
    re.compile(r"^\s*[A-Z_]{2,}="),
    re.compile(r"^\s*\|"),
    re.compile(r"^\s*[-]{2,}"),
    re.compile(r"^\s*[{}\[\]]\s*$"),
    re.compile(r"^\s*(if|for|while|try|except|elif|else:)\b"),
    re.compile(r"^\s*\w+\.\w+\("),
    re.compile(r"^\s*\w+ = \w+\.\w+"),
 ]
 def _is_code_line(line: str) -> bool:
    stripped = line.strip()
    if not stripped:
        return False
    for pattern in _CODE_LINE_PATTERNS:
        if pattern.match(stripped):
            return True
    alpha_ratio = sum(1 for c in stripped if c.isalpha()) / max(len(stripped), 1)
    if alpha_ratio < 0.4 and len(stripped) > 10:
        return True
    return False
 def _extract_prose(text: str) -> str:
    """Extract only prose lines (skip code) for classification scoring."""
    lines = text.split("\n")
    prose = []
    in_code = False
    for line in lines:
        if line.strip().startswith("```"):
            in_code = not in_code
            continue
        if in_code:
            continue
        if not _is_code_line(line):
            prose.append(line)
    result = "\n".join(prose).strip()
    return result if result else text
 # =============================================================================
 # SCORING
 # =============================================================================
 def _score_markers(text: str, markers: List[str]) -> Tuple[float, List[str]]:
    """Score text against regex markers. Returns (score, matched_keywords)."""
    text_lower = text.lower()
    score = 0.0
    keywords = []
    for marker in markers:
        matches = re.findall(marker, text_lower)
        if matches:
            score += len(matches)
            keywords.extend(m if isinstance(m, str) else m[0] if m else marker for m in matches)
    return score, list(set(keywords))
 # =============================================================================
 # MAIN EXTRACTION
 # =============================================================================
 def extract_memories(text: str, min_confidence: float = 0.3) -> List[Dict]:
    """
    Extract memories from a text string.
    Args:
        text: The text to extract from (any format).
        min_confidence: Minimum confidence threshold (0.0-1.0).
    Returns:
        List of dicts: {"content": str, "memory_type": str, "chunk_index": int}
    """
    # Split into paragraphs (double newline or speaker-turn boundaries)
    paragraphs = _split_into_segments(text)
    memories = []
    for para in paragraphs:
        if len(para.strip()) < 20:
            continue
        prose = _extract_prose(para)
        # Score against all types
        scores = {}
        for mem_type, markers in ALL_MARKERS.items():
            score, _ = _score_markers(prose, markers)
            if score > 0:
                scores[mem_type] = score
        if not scores:
            continue
        # Length bonus
        if len(para) > 500:
            length_bonus = 2
        elif len(para) > 200:
            length_bonus = 1
        else:
            length_bonus = 0
        max_type = max(scores, key=scores.get)
        max_score = scores[max_type] + length_bonus
        # Disambiguate
        max_type = _disambiguate(max_type, prose, scores)
        # Confidence
        confidence = min(1.0, max_score / 5.0)
        if confidence < min_confidence:
            continue
        memories.append(
            {
                "content": para.strip(),
                "memory_type": max_type,
                "chunk_index": len(memories),
            }
        )
    return memories
 def _split_into_segments(text: str) -> List[str]:
    """
    Split text into segments suitable for memory extraction.
    Tries speaker-turn splitting first (> markers, "Human:", "Assistant:", etc.),
    then falls back to paragraph splitting.
    """
    lines = text.split("\n")
    # Check for speaker-turn markers
    turn_patterns = [
        re.compile(r"^>\s"),  # > quoted user turn
        re.compile(r"^(Human|User|Q)\s*:", re.I),  # Human: / User:
        re.compile(r"^(Assistant|AI|A|Claude|ChatGPT)\s*:", re.I),
    ]
    turn_count = 0
    for line in lines:
        stripped = line.strip()
        for pat in turn_patterns:
            if pat.match(stripped):
                turn_count += 1
                break
    # If enough turn markers, split by turns
    if turn_count >= 3:
        return _split_by_turns(lines, turn_patterns)
    # Fallback: paragraph splitting
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
    # If single giant block, chunk by line groups
    if len(paragraphs) <= 1 and len(lines) > 20:
        segments = []
        for i in range(0, len(lines), 25):
            group = "\n".join(lines[i : i + 25]).strip()
            if group:
                segments.append(group)
        return segments
    return paragraphs
 def _split_by_turns(lines: List[str], turn_patterns: List[re.Pattern]) -> List[str]:
    """Split lines into segments at each speaker turn boundary."""
    segments = []
    current = []
    for line in lines:
        stripped = line.strip()
        is_turn = any(pat.match(stripped) for pat in turn_patterns)
        if is_turn and current:
            segments.append("\n".join(current))
            current = [line]
        else:
            current.append(line)
    if current:
        segments.append("\n".join(current))
    return segments
 # =============================================================================
 # CLI
 # =============================================================================
 if __name__ == "__main__":
    import sys
    if len(sys.argv) < 2:
        print("Usage: python general_extractor.py <file>")
        print()
        print("Extracts decisions, preferences, milestones, problems, and")
        print("emotional moments from any text file.")
        sys.exit(1)
    filepath = sys.argv[1]
    with open(filepath, "r", encoding="utf-8", errors="replace") as f:
        text = f.read()
    memories = extract_memories(text)
    # Summary
    from collections import Counter
    type_counts = Counter(m["memory_type"] for m in memories)
    print(f"Extracted {len(memories)} memories:")
    for mtype in ["decision", "preference", "milestone", "problem", "emotional"]:
        count = type_counts.get(mtype, 0)
        if count:
            print(f"  {mtype:12} {count}")
    print()
    for m in memories[:10]:
        preview = m["content"][:80].replace("\n", " ")
        print(f"  [{m['memory_type']:10}] {preview}...")
@@ -0,0 +1,350 @@
 """
 knowledge_graph.py — Temporal Entity-Relationship Graph for MemPalace
 =====================================================================
 Real knowledge graph with:
  - Entity nodes (people, projects, tools, concepts)
  - Typed relationship edges (daughter_of, does, loves, works_on, etc.)
  - Temporal validity (valid_from → valid_to — knows WHEN facts are true)
  - Closet references (links back to the verbatim memory)
 Storage: SQLite (local, no dependencies, no subscriptions)
 Query: entity-first traversal with time filtering
 This is what competes with Zep's temporal knowledge graph.
 Zep uses Neo4j in the cloud ($25/mo+). We use SQLite locally (free).
 Usage:
    from mempalace.knowledge_graph import KnowledgeGraph
    kg = KnowledgeGraph()
    kg.add_triple("Max", "child_of", "Alice", valid_from="2015-04-01")
    kg.add_triple("Max", "does", "swimming", valid_from="2025-01-01")
    kg.add_triple("Max", "loves", "chess", valid_from="2025-10-01")
    # Query: everything about Max
    kg.query_entity("Max")
    # Query: what was true about Max in January 2026?
    kg.query_entity("Max", as_of="2026-01-15")
    # Query: who is connected to Alice?
    kg.query_entity("Alice", direction="both")
    # Invalidate: Max's sports injury resolved
    kg.invalidate("Max", "has_issue", "sports_injury", ended="2026-02-15")
 """
 import hashlib
 import json
 import os
 import sqlite3
 from datetime import date, datetime
 from pathlib import Path
 DEFAULT_KG_PATH = os.path.expanduser("~/.mempalace/knowledge_graph.sqlite3")
 class KnowledgeGraph:
    def __init__(self, db_path: str = None):
        self.db_path = db_path or DEFAULT_KG_PATH
        Path(self.db_path).parent.mkdir(parents=True, exist_ok=True)
        self._init_db()
    def _init_db(self):
        conn = self._conn()
        conn.executescript("""
            CREATE TABLE IF NOT EXISTS entities (
                id TEXT PRIMARY KEY,
                name TEXT NOT NULL,
                type TEXT DEFAULT 'unknown',
                properties TEXT DEFAULT '{}',
                created_at TEXT DEFAULT CURRENT_TIMESTAMP
            );
            CREATE TABLE IF NOT EXISTS triples (
                id TEXT PRIMARY KEY,
                subject TEXT NOT NULL,
                predicate TEXT NOT NULL,
                object TEXT NOT NULL,
                valid_from TEXT,
                valid_to TEXT,
                confidence REAL DEFAULT 1.0,
                source_closet TEXT,
                source_file TEXT,
                extracted_at TEXT DEFAULT CURRENT_TIMESTAMP,
                FOREIGN KEY (subject) REFERENCES entities(id),
                FOREIGN KEY (object) REFERENCES entities(id)
            );
            CREATE INDEX IF NOT EXISTS idx_triples_subject ON triples(subject);
            CREATE INDEX IF NOT EXISTS idx_triples_object ON triples(object);
            CREATE INDEX IF NOT EXISTS idx_triples_predicate ON triples(predicate);
            CREATE INDEX IF NOT EXISTS idx_triples_valid ON triples(valid_from, valid_to);
        """)
        conn.commit()
        conn.close()
    def _conn(self):
        return sqlite3.connect(self.db_path, timeout=10)
    def _entity_id(self, name: str) -> str:
        return name.lower().replace(" ", "_").replace("'", "")
    # ── Write operations ──────────────────────────────────────────────────
    def add_entity(self, name: str, entity_type: str = "unknown", properties: dict = None):
        """Add or update an entity node."""
        eid = self._entity_id(name)
        props = json.dumps(properties or {})
        conn = self._conn()
        conn.execute(
            "INSERT OR REPLACE INTO entities (id, name, type, properties) VALUES (?, ?, ?, ?)",
            (eid, name, entity_type, props)
        )
        conn.commit()
        conn.close()
        return eid
    def add_triple(self, subject: str, predicate: str, obj: str,
                   valid_from: str = None, valid_to: str = None,
                   confidence: float = 1.0, source_closet: str = None,
                   source_file: str = None):
        """
        Add a relationship triple: subject → predicate → object.
        Examples:
            add_triple("Max", "child_of", "Alice", valid_from="2015-04-01")
            add_triple("Max", "does", "swimming", valid_from="2025-01-01")
            add_triple("Alice", "worried_about", "Max injury", valid_from="2026-01", valid_to="2026-02")
        """
        sub_id = self._entity_id(subject)
        obj_id = self._entity_id(obj)
        pred = predicate.lower().replace(" ", "_")
        # Auto-create entities if they don't exist
        conn = self._conn()
        conn.execute(
            "INSERT OR IGNORE INTO entities (id, name) VALUES (?, ?)",
            (sub_id, subject)
        )
        conn.execute(
            "INSERT OR IGNORE INTO entities (id, name) VALUES (?, ?)",
            (obj_id, obj)
        )
        # Check for existing identical triple
        existing = conn.execute(
            "SELECT id FROM triples WHERE subject=? AND predicate=? AND object=? AND valid_to IS NULL",
            (sub_id, pred, obj_id)
        ).fetchone()
        if existing:
            conn.close()
            return existing[0]  # Already exists and still valid
        triple_id = f"t_{sub_id}_{pred}_{obj_id}_{hashlib.md5(f'{valid_from}{datetime.now().isoformat()}'.encode()).hexdigest()[:8]}"
        conn.execute(
            """INSERT INTO triples (id, subject, predicate, object, valid_from, valid_to, confidence, source_closet, source_file)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (triple_id, sub_id, pred, obj_id, valid_from, valid_to, confidence, source_closet, source_file)
        )
        conn.commit()
        conn.close()
        return triple_id
    def invalidate(self, subject: str, predicate: str, obj: str, ended: str = None):
        """Mark a relationship as no longer valid (set valid_to date)."""
        sub_id = self._entity_id(subject)
        obj_id = self._entity_id(obj)
        pred = predicate.lower().replace(" ", "_")
        ended = ended or date.today().isoformat()
        conn = self._conn()
        conn.execute(
            "UPDATE triples SET valid_to=? WHERE subject=? AND predicate=? AND object=? AND valid_to IS NULL",
            (ended, sub_id, pred, obj_id)
        )
        conn.commit()
        conn.close()
    # ── Query operations ──────────────────────────────────────────────────
    def query_entity(self, name: str, as_of: str = None, direction: str = "outgoing"):
        """
        Get all relationships for an entity.
        direction: "outgoing" (entity → ?), "incoming" (? → entity), "both"
        as_of: date string — only return facts valid at that time
        """
        eid = self._entity_id(name)
        conn = self._conn()
        results = []
        if direction in ("outgoing", "both"):
            query = "SELECT t.*, e.name as obj_name FROM triples t JOIN entities e ON t.object = e.id WHERE t.subject = ?"
            params = [eid]
            if as_of:
                query += " AND (t.valid_from IS NULL OR t.valid_from <= ?) AND (t.valid_to IS NULL OR t.valid_to >= ?)"
                params.extend([as_of, as_of])
            for row in conn.execute(query, params).fetchall():
                results.append({
                    "direction": "outgoing",
                    "subject": name,
                    "predicate": row[2],
                    "object": row[10],  # obj_name
                    "valid_from": row[4],
                    "valid_to": row[5],
                    "confidence": row[6],
                    "source_closet": row[7],
                    "current": row[5] is None,
                })
        if direction in ("incoming", "both"):
            query = "SELECT t.*, e.name as sub_name FROM triples t JOIN entities e ON t.subject = e.id WHERE t.object = ?"
            params = [eid]
            if as_of:
                query += " AND (t.valid_from IS NULL OR t.valid_from <= ?) AND (t.valid_to IS NULL OR t.valid_to >= ?)"
                params.extend([as_of, as_of])
            for row in conn.execute(query, params).fetchall():
                results.append({
                    "direction": "incoming",
                    "subject": row[10],  # sub_name
                    "predicate": row[2],
                    "object": name,
                    "valid_from": row[4],
                    "valid_to": row[5],
                    "confidence": row[6],
                    "source_closet": row[7],
                    "current": row[5] is None,
                })
        conn.close()
        return results
    def query_relationship(self, predicate: str, as_of: str = None):
        """Get all triples with a given relationship type."""
        pred = predicate.lower().replace(" ", "_")
        conn = self._conn()
        query = """
            SELECT t.*, s.name as sub_name, o.name as obj_name
            FROM triples t
            JOIN entities s ON t.subject = s.id
            JOIN entities o ON t.object = o.id
            WHERE t.predicate = ?
        """
        params = [pred]
        if as_of:
            query += " AND (t.valid_from IS NULL OR t.valid_from <= ?) AND (t.valid_to IS NULL OR t.valid_to >= ?)"
            params.extend([as_of, as_of])
        results = []
        for row in conn.execute(query, params).fetchall():
            results.append({
                "subject": row[10],
                "predicate": pred,
                "object": row[11],
                "valid_from": row[4],
                "valid_to": row[5],
                "current": row[5] is None,
            })
        conn.close()
        return results
    def timeline(self, entity_name: str = None):
        """Get all facts in chronological order, optionally filtered by entity."""
        conn = self._conn()
        if entity_name:
            eid = self._entity_id(entity_name)
            rows = conn.execute("""
                SELECT t.*, s.name as sub_name, o.name as obj_name
                FROM triples t
                JOIN entities s ON t.subject = s.id
                JOIN entities o ON t.object = o.id
                WHERE (t.subject = ? OR t.object = ?)
                ORDER BY t.valid_from ASC NULLS LAST
            """, (eid, eid)).fetchall()
        else:
            rows = conn.execute("""
                SELECT t.*, s.name as sub_name, o.name as obj_name
                FROM triples t
                JOIN entities s ON t.subject = s.id
                JOIN entities o ON t.object = o.id
                ORDER BY t.valid_from ASC NULLS LAST
                LIMIT 100
            """).fetchall()
        conn.close()
        return [{
            "subject": r[10],
            "predicate": r[2],
            "object": r[11],
            "valid_from": r[4],
            "valid_to": r[5],
            "current": r[5] is None,
        } for r in rows]
    # ── Stats ─────────────────────────────────────────────────────────────
    def stats(self):
        conn = self._conn()
        entities = conn.execute("SELECT COUNT(*) FROM entities").fetchone()[0]
        triples = conn.execute("SELECT COUNT(*) FROM triples").fetchone()[0]
        current = conn.execute("SELECT COUNT(*) FROM triples WHERE valid_to IS NULL").fetchone()[0]
        expired = triples - current
        predicates = [r[0] for r in conn.execute(
            "SELECT DISTINCT predicate FROM triples ORDER BY predicate"
        ).fetchall()]
        conn.close()
        return {
            "entities": entities,
            "triples": triples,
            "current_facts": current,
            "expired_facts": expired,
            "relationship_types": predicates,
        }
    # ── Seed from known facts ─────────────────────────────────────────────
    def seed_from_entity_facts(self, entity_facts: dict):
        """
        Seed the knowledge graph from fact_checker.py ENTITY_FACTS.
        This bootstraps the graph with known ground truth.
        """
        for key, facts in entity_facts.items():
            name = facts.get("full_name", key.capitalize())
            etype = facts.get("type", "person")
            self.add_entity(name, etype, {
                "gender": facts.get("gender", ""),
                "birthday": facts.get("birthday", ""),
            })
            # Relationships
            parent = facts.get("parent")
            if parent:
                self.add_triple(name, "child_of", parent.capitalize(),
                                valid_from=facts.get("birthday"))
            partner = facts.get("partner")
            if partner:
                self.add_triple(name, "married_to", partner.capitalize())
            relationship = facts.get("relationship", "")
            if relationship == "daughter":
                self.add_triple(name, "is_child_of", facts.get("parent", "").capitalize() or name,
                                valid_from=facts.get("birthday"))
            elif relationship == "husband":
                self.add_triple(name, "is_partner_of", facts.get("partner", name).capitalize())
            elif relationship == "brother":
                self.add_triple(name, "is_sibling_of", facts.get("sibling", name).capitalize())
            elif relationship == "dog":
                self.add_triple(name, "is_pet_of", facts.get("owner", name).capitalize())
                self.add_entity(name, "animal")
            # Interests
            for interest in facts.get("interests", []):
                self.add_triple(name, "loves", interest.capitalize(),
                                valid_from="2025-01-01")
@@ -0,0 +1,506 @@
 #!/usr/bin/env python3
 """
 layers.py — 4-Layer Memory Stack for mempalace
 ===================================================
 Load only what you need, when you need it.
    Layer 0: Identity       (~100 tokens)   — Always loaded. "Who am I?"
    Layer 1: Essential Story (~500-800)      — Always loaded. Top moments from the palace.
    Layer 2: On-Demand      (~200-500 each)  — Loaded when a topic/wing comes up.
    Layer 3: Deep Search    (unlimited)      — Full ChromaDB semantic search.
 Wake-up cost: ~600-900 tokens (L0+L1). Leaves 95%+ of context free.
 Reads directly from ChromaDB (mempalace_drawers)
 and ~/.mempalace/identity.txt.
 """
 import os
 import sys
 from pathlib import Path
 from collections import defaultdict
 import chromadb
 from .config import MempalaceConfig
 # ---------------------------------------------------------------------------
 # Layer 0 — Identity
 # ---------------------------------------------------------------------------
 class Layer0:
    """
    ~100 tokens. Always loaded.
    Reads from ~/.mempalace/identity.txt — a plain-text file the user writes.
    Example identity.txt:
        I am Atlas, a personal AI assistant for Alice.
        Traits: warm, direct, remembers everything.
        People: Alice (creator), Bob (Alice's partner).
        Project: A journaling app that helps people process emotions.
    """
    def __init__(self, identity_path: str = None):
        if identity_path is None:
            identity_path = os.path.expanduser("~/.mempalace/identity.txt")
        self.path = identity_path
        self._text = None
    def render(self) -> str:
        """Return the identity text, or a sensible default."""
        if self._text is not None:
            return self._text
        if os.path.exists(self.path):
            with open(self.path, "r") as f:
                self._text = f.read().strip()
        else:
            self._text = (
                "## L0 — IDENTITY\nNo identity configured. Create ~/.mempalace/identity.txt"
            )
        return self._text
    def token_estimate(self) -> int:
        return len(self.render()) // 4
 # ---------------------------------------------------------------------------
 # Layer 1 — Essential Story (auto-generated from palace)
 # ---------------------------------------------------------------------------
 class Layer1:
    """
    ~500-800 tokens. Always loaded.
    Auto-generated from the highest-weight / most-recent drawers in the palace.
    Groups by room, picks the top N moments, compresses to a compact summary.
    """
    MAX_DRAWERS = 15  # at most 15 moments in wake-up
    MAX_CHARS = 3200  # hard cap on total L1 text (~800 tokens)
    def __init__(self, palace_path: str = None, wing: str = None):
        cfg = MempalaceConfig()
        self.palace_path = palace_path or cfg.palace_path
        self.wing = wing
    def generate(self) -> str:
        """Pull top drawers from ChromaDB and format as compact L1 text."""
        try:
            client = chromadb.PersistentClient(path=self.palace_path)
            col = client.get_collection("mempalace_drawers")
        except Exception:
            return "## L1 — No palace found. Run: mempalace mine <dir>"
        # Fetch all drawers (with optional wing filter)
        kwargs = {"include": ["documents", "metadatas"]}
        if self.wing:
            kwargs["where"] = {"wing": self.wing}
        try:
            results = col.get(**kwargs)
        except Exception:
            return "## L1 — No drawers found."
        docs = results.get("documents", [])
        metas = results.get("metadatas", [])
        if not docs:
            return "## L1 — No memories yet."
        # Score each drawer: prefer high importance, recent filing
        scored = []
        for doc, meta in zip(docs, metas):
            importance = 3
            # Try multiple metadata keys that might carry weight info
            for key in ("importance", "emotional_weight", "weight"):
                val = meta.get(key)
                if val is not None:
                    try:
                        importance = float(val)
                    except (ValueError, TypeError):
                        pass
                    break
            scored.append((importance, meta, doc))
        # Sort by importance descending, take top N
        scored.sort(key=lambda x: x[0], reverse=True)
        top = scored[: self.MAX_DRAWERS]
        # Group by room for readability
        by_room = defaultdict(list)
        for imp, meta, doc in top:
            room = meta.get("room", "general")
            by_room[room].append((imp, meta, doc))
        # Build compact text
        lines = ["## L1 — ESSENTIAL STORY"]
        total_len = 0
        for room, entries in sorted(by_room.items()):
            room_line = f"\n[{room}]"
            lines.append(room_line)
            total_len += len(room_line)
            for imp, meta, doc in entries:
                source = Path(meta.get("source_file", "")).name if meta.get("source_file") else ""
                # Truncate doc to keep L1 compact
                snippet = doc.strip().replace("\n", " ")
                if len(snippet) > 200:
                    snippet = snippet[:197] + "..."
                entry_line = f"  - {snippet}"
                if source:
                    entry_line += f"  ({source})"
                if total_len + len(entry_line) > self.MAX_CHARS:
                    lines.append("  ... (more in L3 search)")
                    return "\n".join(lines)
                lines.append(entry_line)
                total_len += len(entry_line)
        return "\n".join(lines)
 # ---------------------------------------------------------------------------
 # Layer 2 — On-Demand (wing/room filtered retrieval)
 # ---------------------------------------------------------------------------
 class Layer2:
    """
    ~200-500 tokens per retrieval.
    Loaded when a specific topic or wing comes up in conversation.
    Queries ChromaDB with a wing/room filter.
    """
    def __init__(self, palace_path: str = None):
        cfg = MempalaceConfig()
        self.palace_path = palace_path or cfg.palace_path
    def retrieve(self, wing: str = None, room: str = None, n_results: int = 10) -> str:
        """Retrieve drawers filtered by wing and/or room."""
        try:
            client = chromadb.PersistentClient(path=self.palace_path)
            col = client.get_collection("mempalace_drawers")
        except Exception:
            return "No palace found."
        where = {}
        if wing and room:
            where = {"$and": [{"wing": wing}, {"room": room}]}
        elif wing:
            where = {"wing": wing}
        elif room:
            where = {"room": room}
        kwargs = {"include": ["documents", "metadatas"], "limit": n_results}
        if where:
            kwargs["where"] = where
        try:
            results = col.get(**kwargs)
        except Exception as e:
            return f"Retrieval error: {e}"
        docs = results.get("documents", [])
        metas = results.get("metadatas", [])
        if not docs:
            label = f"wing={wing}" if wing else ""
            if room:
                label += f" room={room}" if label else f"room={room}"
            return f"No drawers found for {label}."
        lines = [f"## L2 — ON-DEMAND ({len(docs)} drawers)"]
        for doc, meta in zip(docs[:n_results], metas[:n_results]):
            room_name = meta.get("room", "?")
            source = Path(meta.get("source_file", "")).name if meta.get("source_file") else ""
            snippet = doc.strip().replace("\n", " ")
            if len(snippet) > 300:
                snippet = snippet[:297] + "..."
            entry = f"  [{room_name}] {snippet}"
            if source:
                entry += f"  ({source})"
            lines.append(entry)
        return "\n".join(lines)
 # ---------------------------------------------------------------------------
 # Layer 3 — Deep Search (full semantic search via ChromaDB)
 # ---------------------------------------------------------------------------
 class Layer3:
    """
    Unlimited depth. Semantic search against the full palace.
    Reuses searcher.py logic against mempalace_drawers.
    """
    def __init__(self, palace_path: str = None):
        cfg = MempalaceConfig()
        self.palace_path = palace_path or cfg.palace_path
    def search(self, query: str, wing: str = None, room: str = None, n_results: int = 5) -> str:
        """Semantic search, returns compact result text."""
        try:
            client = chromadb.PersistentClient(path=self.palace_path)
            col = client.get_collection("mempalace_drawers")
        except Exception:
            return "No palace found."
        where = {}
        if wing and room:
            where = {"$and": [{"wing": wing}, {"room": room}]}
        elif wing:
            where = {"wing": wing}
        elif room:
            where = {"room": room}
        kwargs = {
            "query_texts": [query],
            "n_results": n_results,
            "include": ["documents", "metadatas", "distances"],
        }
        if where:
            kwargs["where"] = where
        try:
            results = col.query(**kwargs)
        except Exception as e:
            return f"Search error: {e}"
        docs = results["documents"][0]
        metas = results["metadatas"][0]
        dists = results["distances"][0]
        if not docs:
            return "No results found."
        lines = [f'## L3 — SEARCH RESULTS for "{query}"']
        for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
            similarity = round(1 - dist, 3)
            wing_name = meta.get("wing", "?")
            room_name = meta.get("room", "?")
            source = Path(meta.get("source_file", "")).name if meta.get("source_file") else ""
            snippet = doc.strip().replace("\n", " ")
            if len(snippet) > 300:
                snippet = snippet[:297] + "..."
            lines.append(f"  [{i}] {wing_name}/{room_name} (sim={similarity})")
            lines.append(f"      {snippet}")
            if source:
                lines.append(f"      src: {source}")
        return "\n".join(lines)
    def search_raw(
        self, query: str, wing: str = None, room: str = None, n_results: int = 5
    ) -> list:
        """Return raw dicts instead of formatted text."""
        try:
            client = chromadb.PersistentClient(path=self.palace_path)
            col = client.get_collection("mempalace_drawers")
        except Exception:
            return []
        where = {}
        if wing and room:
            where = {"$and": [{"wing": wing}, {"room": room}]}
        elif wing:
            where = {"wing": wing}
        elif room:
            where = {"room": room}
        kwargs = {
            "query_texts": [query],
            "n_results": n_results,
            "include": ["documents", "metadatas", "distances"],
        }
        if where:
            kwargs["where"] = where
        try:
            results = col.query(**kwargs)
        except Exception:
            return []
        hits = []
        for doc, meta, dist in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0],
        ):
            hits.append(
                {
                    "text": doc,
                    "wing": meta.get("wing", "unknown"),
                    "room": meta.get("room", "unknown"),
                    "source_file": Path(meta.get("source_file", "?")).name,
                    "similarity": round(1 - dist, 3),
                    "metadata": meta,
                }
            )
        return hits
 # ---------------------------------------------------------------------------
 # MemoryStack — unified interface
 # ---------------------------------------------------------------------------
 class MemoryStack:
    """
    The full 4-layer stack. One class, one palace, everything works.
        stack = MemoryStack()
        print(stack.wake_up())                # L0 + L1 (~600-900 tokens)
        print(stack.recall(wing="my_app"))     # L2 on-demand
        print(stack.search("pricing change"))  # L3 deep search
    """
    def __init__(self, palace_path: str = None, identity_path: str = None):
        cfg = MempalaceConfig()
        self.palace_path = palace_path or cfg.palace_path
        self.identity_path = identity_path or os.path.expanduser("~/.mempalace/identity.txt")
        self.l0 = Layer0(self.identity_path)
        self.l1 = Layer1(self.palace_path)
        self.l2 = Layer2(self.palace_path)
        self.l3 = Layer3(self.palace_path)
    def wake_up(self, wing: str = None) -> str:
        """
        Generate wake-up text: L0 (identity) + L1 (essential story).
        Typically ~600-900 tokens. Inject into system prompt or first message.
        Args:
            wing: Optional wing filter for L1 (project-specific wake-up).
        """
        parts = []
        # L0: Identity
        parts.append(self.l0.render())
        parts.append("")
        # L1: Essential Story
        if wing:
            self.l1.wing = wing
        parts.append(self.l1.generate())
        return "\n".join(parts)
    def recall(self, wing: str = None, room: str = None, n_results: int = 10) -> str:
        """On-demand L2 retrieval filtered by wing/room."""
        return self.l2.retrieve(wing=wing, room=room, n_results=n_results)
    def search(self, query: str, wing: str = None, room: str = None, n_results: int = 5) -> str:
        """Deep L3 semantic search."""
        return self.l3.search(query, wing=wing, room=room, n_results=n_results)
    def status(self) -> dict:
        """Status of all layers."""
        result = {
            "palace_path": self.palace_path,
            "L0_identity": {
                "path": self.identity_path,
                "exists": os.path.exists(self.identity_path),
                "tokens": self.l0.token_estimate(),
            },
            "L1_essential": {
                "description": "Auto-generated from top palace drawers",
            },
            "L2_on_demand": {
                "description": "Wing/room filtered retrieval",
            },
            "L3_deep_search": {
                "description": "Full semantic search via ChromaDB",
            },
        }
        # Count drawers
        try:
            client = chromadb.PersistentClient(path=self.palace_path)
            col = client.get_collection("mempalace_drawers")
            count = col.count()
            result["total_drawers"] = count
        except Exception:
            result["total_drawers"] = 0
        return result
 # ---------------------------------------------------------------------------
 # CLI (standalone)
 # ---------------------------------------------------------------------------
 if __name__ == "__main__":
    import json
    def usage():
        print("layers.py — 4-Layer Memory Stack")
        print()
        print("Usage:")
        print("  python layers.py wake-up              Show L0 + L1")
        print("  python layers.py wake-up --wing=NAME  Wake-up for a specific project")
        print("  python layers.py recall --wing=NAME   On-demand L2 retrieval")
        print("  python layers.py search <query>       Deep L3 search")
        print("  python layers.py status               Show layer status")
        sys.exit(0)
    if len(sys.argv) < 2:
        usage()
    cmd = sys.argv[1]
    # Parse flags
    flags = {}
    positional = []
    for arg in sys.argv[2:]:
        if arg.startswith("--") and "=" in arg:
            key, val = arg.split("=", 1)
            flags[key.lstrip("-")] = val
        elif not arg.startswith("--"):
            positional.append(arg)
    palace_path = flags.get("palace")
    stack = MemoryStack(palace_path=palace_path)
    if cmd in ("wake-up", "wakeup"):
        wing = flags.get("wing")
        text = stack.wake_up(wing=wing)
        tokens = len(text) // 4
        print(f"Wake-up text (~{tokens} tokens):")
        print("=" * 50)
        print(text)
    elif cmd == "recall":
        wing = flags.get("wing")
        room = flags.get("room")
        text = stack.recall(wing=wing, room=room)
        print(text)
    elif cmd == "search":
        query = " ".join(positional) if positional else ""
        if not query:
            print("Usage: python layers.py search <query>")
            sys.exit(1)
        wing = flags.get("wing")
        room = flags.get("room")
        text = stack.search(query, wing=wing, room=room)
        print(text)
    elif cmd == "status":
        s = stack.status()
        print(json.dumps(s, indent=2))
    else:
        usage()
@@ -0,0 +1,714 @@
 #!/usr/bin/env python3
 """
 MemPalace MCP Server — read/write palace access for Claude Code
 ================================================================
 Install: claude mcp add mempalace -- python /path/to/mcp_server.py
 Tools (read):
  mempalace_status          — total drawers, wing/room breakdown
  mempalace_list_wings      — all wings with drawer counts
  mempalace_list_rooms      — rooms within a wing
  mempalace_get_taxonomy    — full wing → room → count tree
  mempalace_search          — semantic search, optional wing/room filter
  mempalace_check_duplicate — check if content already exists before filing
 Tools (write):
  mempalace_add_drawer      — file verbatim content into a wing/room
  mempalace_delete_drawer   — remove a drawer by ID
 """
 import sys
 import json
 import logging
 import hashlib
 from datetime import datetime
 from .config import MempalaceConfig
 from .searcher import search_memories
 from .palace_graph import traverse, find_tunnels, graph_stats
 from .knowledge_graph import KnowledgeGraph
 _kg = KnowledgeGraph()
 import chromadb
 logging.basicConfig(level=logging.INFO, format="%(message)s", stream=sys.stderr)
 logger = logging.getLogger("mempalace_mcp")
 _config = MempalaceConfig()
 def _get_collection(create=False):
    """Return the ChromaDB collection, or None on failure."""
    try:
        client = chromadb.PersistentClient(path=_config.palace_path)
        if create:
            return client.get_or_create_collection(_config.collection_name)
        return client.get_collection(_config.collection_name)
    except Exception:
        return None
 def _no_palace():
    return {
        "error": "No palace found",
        "palace_path": _config.palace_path,
        "hint": "Run: mempalace init <dir> && mempalace mine <dir>",
    }
 # ==================== READ TOOLS ====================
 def tool_status():
    col = _get_collection()
    if not col:
        return _no_palace()
    count = col.count()
    wings = {}
    rooms = {}
    try:
        all_meta = col.get(include=["metadatas"])["metadatas"]
        for m in all_meta:
            w = m.get("wing", "unknown")
            r = m.get("room", "unknown")
            wings[w] = wings.get(w, 0) + 1
            rooms[r] = rooms.get(r, 0) + 1
    except Exception:
        pass
    return {
        "total_drawers": count,
        "wings": wings,
        "rooms": rooms,
        "palace_path": _config.palace_path,
        "protocol": PALACE_PROTOCOL,
        "aaak_dialect": AAAK_SPEC,
    }
 # ── AAAK Dialect Spec ─────────────────────────────────────────────────────────
 # Included in status response so the AI learns it on first wake-up call.
 # Also available via mempalace_get_aaak_spec tool.
 PALACE_PROTOCOL = """IMPORTANT — MemPalace Memory Protocol:
 1. ON WAKE-UP: Call mempalace_status to load palace overview + AAAK spec.
 2. BEFORE RESPONDING about any person, project, or past event: call mempalace_kg_query or mempalace_search FIRST. Never guess — verify.
 3. IF UNSURE about a fact (name, gender, age, relationship): say "let me check" and query the palace. Wrong is worse than slow.
 4. AFTER EACH SESSION: call mempalace_diary_write to record what happened, what you learned, what matters.
 5. WHEN FACTS CHANGE: call mempalace_kg_invalidate on the old fact, mempalace_kg_add for the new one.
 This protocol ensures the AI KNOWS before it speaks. Storage is not memory — but storage + this protocol = memory."""
 AAAK_SPEC = """AAAK is a compressed memory dialect that MemPalace uses for efficient storage.
 It is designed to be readable by both humans and LLMs without decoding.
 FORMAT:
  ENTITIES: 3-letter uppercase codes. ALC=Alice, JOR=Jordan, RIL=Riley, MAX=Max, BEN=Ben.
  EMOTIONS: *action markers* before/during text. *warm*=joy, *fierce*=determined, *raw*=vulnerable, *bloom*=tenderness.
  STRUCTURE: Pipe-separated fields. FAM: family | PROJ: projects | ⚠: warnings/reminders.
  DATES: ISO format (2026-03-31). COUNTS: Nx = N mentions (e.g., 570x).
  IMPORTANCE: ★ to ★★★★★ (1-5 scale).
  HALLS: hall_facts, hall_events, hall_discoveries, hall_preferences, hall_advice.
  WINGS: wing_user, wing_agent, wing_team, wing_code, wing_myproject, wing_hardware, wing_ue5, wing_ai_research.
  ROOMS: Hyphenated slugs representing named ideas (e.g., chromadb-setup, gpu-pricing).
 EXAMPLE:
  FAM: ALC→♡JOR | 2D(kids): RIL(18,sports) MAX(11,chess+swimming) | BEN(contributor)
 Read AAAK naturally — expand codes mentally, treat *markers* as emotional context.
 When WRITING AAAK: use entity codes, mark emotions, keep structure tight."""
 def tool_list_wings():
    col = _get_collection()
    if not col:
        return _no_palace()
    wings = {}
    try:
        all_meta = col.get(include=["metadatas"])["metadatas"]
        for m in all_meta:
            w = m.get("wing", "unknown")
            wings[w] = wings.get(w, 0) + 1
    except Exception:
        pass
    return {"wings": wings}
 def tool_list_rooms(wing: str = None):
    col = _get_collection()
    if not col:
        return _no_palace()
    rooms = {}
    try:
        kwargs = {"include": ["metadatas"]}
        if wing:
            kwargs["where"] = {"wing": wing}
        all_meta = col.get(**kwargs)["metadatas"]
        for m in all_meta:
            r = m.get("room", "unknown")
            rooms[r] = rooms.get(r, 0) + 1
    except Exception:
        pass
    return {"wing": wing or "all", "rooms": rooms}
 def tool_get_taxonomy():
    col = _get_collection()
    if not col:
        return _no_palace()
    taxonomy = {}
    try:
        all_meta = col.get(include=["metadatas"])["metadatas"]
        for m in all_meta:
            w = m.get("wing", "unknown")
            r = m.get("room", "unknown")
            if w not in taxonomy:
                taxonomy[w] = {}
            taxonomy[w][r] = taxonomy[w].get(r, 0) + 1
    except Exception:
        pass
    return {"taxonomy": taxonomy}
 def tool_search(query: str, limit: int = 5, wing: str = None, room: str = None):
    return search_memories(
        query,
        palace_path=_config.palace_path,
        wing=wing,
        room=room,
        n_results=limit,
    )
 def tool_check_duplicate(content: str, threshold: float = 0.9):
    col = _get_collection()
    if not col:
        return _no_palace()
    try:
        results = col.query(
            query_texts=[content],
            n_results=5,
            include=["metadatas", "documents", "distances"],
        )
        duplicates = []
        if results["ids"] and results["ids"][0]:
            for i, drawer_id in enumerate(results["ids"][0]):
                dist = results["distances"][0][i]
                similarity = round(1 - dist, 3)
                if similarity >= threshold:
                    meta = results["metadatas"][0][i]
                    doc = results["documents"][0][i]
                    duplicates.append(
                        {
                            "id": drawer_id,
                            "wing": meta.get("wing", "?"),
                            "room": meta.get("room", "?"),
                            "similarity": similarity,
                            "content": doc[:200] + "..." if len(doc) > 200 else doc,
                        }
                    )
        return {
            "is_duplicate": len(duplicates) > 0,
            "matches": duplicates,
        }
    except Exception as e:
        return {"error": str(e)}
 def tool_get_aaak_spec():
    """Return the AAAK dialect specification."""
    return {"aaak_spec": AAAK_SPEC}
 def tool_traverse_graph(start_room: str, max_hops: int = 2):
    """Walk the palace graph from a room. Find connected ideas across wings."""
    col = _get_collection()
    if not col:
        return _no_palace()
    return traverse(start_room, col=col, max_hops=max_hops)
 def tool_find_tunnels(wing_a: str = None, wing_b: str = None):
    """Find rooms that bridge two wings — the hallways connecting domains."""
    col = _get_collection()
    if not col:
        return _no_palace()
    return find_tunnels(wing_a, wing_b, col=col)
 def tool_graph_stats():
    """Palace graph overview: nodes, tunnels, edges, connectivity."""
    col = _get_collection()
    if not col:
        return _no_palace()
    return graph_stats(col=col)
 # ==================== WRITE TOOLS ====================
 def tool_add_drawer(
    wing: str, room: str, content: str, source_file: str = None, added_by: str = "mcp"
 ):
    """File verbatim content into a wing/room. Checks for duplicates first."""
    col = _get_collection(create=True)
    if not col:
        return _no_palace()
    # Duplicate check
    dup = tool_check_duplicate(content, threshold=0.9)
    if dup.get("is_duplicate"):
        return {
            "success": False,
            "reason": "duplicate",
            "matches": dup["matches"],
        }
    drawer_id = f"drawer_{wing}_{room}_{hashlib.md5((content[:100] + datetime.now().isoformat()).encode()).hexdigest()[:16]}"
    try:
        col.add(
            ids=[drawer_id],
            documents=[content],
            metadatas=[
                {
                    "wing": wing,
                    "room": room,
                    "source_file": source_file or "",
                    "chunk_index": 0,
                    "added_by": added_by,
                    "filed_at": datetime.now().isoformat(),
                }
            ],
        )
        logger.info(f"Filed drawer: {drawer_id} → {wing}/{room}")
        return {"success": True, "drawer_id": drawer_id, "wing": wing, "room": room}
    except Exception as e:
        return {"success": False, "error": str(e)}
 def tool_delete_drawer(drawer_id: str):
    """Delete a single drawer by ID."""
    col = _get_collection()
    if not col:
        return _no_palace()
    existing = col.get(ids=[drawer_id])
    if not existing["ids"]:
        return {"success": False, "error": f"Drawer not found: {drawer_id}"}
    try:
        col.delete(ids=[drawer_id])
        logger.info(f"Deleted drawer: {drawer_id}")
        return {"success": True, "drawer_id": drawer_id}
    except Exception as e:
        return {"success": False, "error": str(e)}
 # ==================== KNOWLEDGE GRAPH ====================
 def tool_kg_query(entity: str, as_of: str = None, direction: str = "both"):
    """Query the knowledge graph for an entity's relationships."""
    results = _kg.query_entity(entity, as_of=as_of, direction=direction)
    return {"entity": entity, "as_of": as_of, "facts": results, "count": len(results)}
 def tool_kg_add(subject: str, predicate: str, object: str,
                valid_from: str = None, source_closet: str = None):
    """Add a relationship to the knowledge graph."""
    triple_id = _kg.add_triple(subject, predicate, object,
                                valid_from=valid_from, source_closet=source_closet)
    return {"success": True, "triple_id": triple_id,
            "fact": f"{subject} → {predicate} → {object}"}
 def tool_kg_invalidate(subject: str, predicate: str, object: str, ended: str = None):
    """Mark a fact as no longer true (set end date)."""
    _kg.invalidate(subject, predicate, object, ended=ended)
    return {"success": True, "fact": f"{subject} → {predicate} → {object}", "ended": ended or "today"}
 def tool_kg_timeline(entity: str = None):
    """Get chronological timeline of facts, optionally for one entity."""
    results = _kg.timeline(entity)
    return {"entity": entity or "all", "timeline": results, "count": len(results)}
 def tool_kg_stats():
    """Knowledge graph overview: entities, triples, relationship types."""
    return _kg.stats()
 # ==================== AGENT DIARY ====================
 def tool_diary_write(agent_name: str, entry: str, topic: str = "general"):
    """
    Write a diary entry for this agent. Each agent gets its own wing
    with a diary room. Entries are timestamped and accumulate over time.
    This is the agent's personal journal — observations, thoughts,
    what it worked on, what it noticed, what it thinks matters.
    """
    wing = f"wing_{agent_name.lower().replace(' ', '_')}"
    room = "diary"
    col = _get_collection(create=True)
    if not col:
        return _no_palace()
    now = datetime.now()
    entry_id = f"diary_{wing}_{now.strftime('%Y%m%d_%H%M%S')}_{hashlib.md5(entry[:50].encode()).hexdigest()[:8]}"
    try:
        col.add(
            ids=[entry_id],
            documents=[entry],
            metadatas=[{
                "wing": wing,
                "room": room,
                "hall": "hall_diary",
                "topic": topic,
                "type": "diary_entry",
                "agent": agent_name,
                "filed_at": now.isoformat(),
                "date": now.strftime("%Y-%m-%d"),
            }],
        )
        logger.info(f"Diary entry: {entry_id} → {wing}/diary/{topic}")
        return {
            "success": True,
            "entry_id": entry_id,
            "agent": agent_name,
            "topic": topic,
            "timestamp": now.isoformat(),
        }
    except Exception as e:
        return {"success": False, "error": str(e)}
 def tool_diary_read(agent_name: str, last_n: int = 10):
    """
    Read an agent's recent diary entries. Returns the last N entries
    in chronological order — the agent's personal journal.
    """
    wing = f"wing_{agent_name.lower().replace(' ', '_')}"
    col = _get_collection()
    if not col:
        return _no_palace()
    try:
        results = col.get(
            where={"$and": [{"wing": wing}, {"room": "diary"}]},
            include=["documents", "metadatas"],
        )
        if not results["ids"]:
            return {"agent": agent_name, "entries": [], "message": "No diary entries yet."}
        # Combine and sort by timestamp
        entries = []
        for doc, meta in zip(results["documents"], results["metadatas"]):
            entries.append({
                "date": meta.get("date", ""),
                "timestamp": meta.get("filed_at", ""),
                "topic": meta.get("topic", ""),
                "content": doc,
            })
        entries.sort(key=lambda x: x["timestamp"], reverse=True)
        entries = entries[:last_n]
        return {
            "agent": agent_name,
            "entries": entries,
            "total": len(results["ids"]),
            "showing": len(entries),
        }
    except Exception as e:
        return {"error": str(e)}
 # ==================== MCP PROTOCOL ====================
 TOOLS = {
    "mempalace_status": {
        "description": "Palace overview — total drawers, wing and room counts",
        "input_schema": {"type": "object", "properties": {}},
        "handler": tool_status,
    },
    "mempalace_list_wings": {
        "description": "List all wings with drawer counts",
        "input_schema": {"type": "object", "properties": {}},
        "handler": tool_list_wings,
    },
    "mempalace_list_rooms": {
        "description": "List rooms within a wing (or all rooms if no wing given)",
        "input_schema": {
            "type": "object",
            "properties": {
                "wing": {"type": "string", "description": "Wing to list rooms for (optional)"},
            },
        },
        "handler": tool_list_rooms,
    },
    "mempalace_get_taxonomy": {
        "description": "Full taxonomy: wing → room → drawer count",
        "input_schema": {"type": "object", "properties": {}},
        "handler": tool_get_taxonomy,
    },
    "mempalace_get_aaak_spec": {
        "description": "Get the AAAK dialect specification — the compressed memory format MemPalace uses. Call this if you need to read or write AAAK-compressed memories.",
        "input_schema": {"type": "object", "properties": {}},
        "handler": tool_get_aaak_spec,
    },
    "mempalace_kg_query": {
        "description": "Query the knowledge graph for an entity's relationships. Returns typed facts with temporal validity. E.g. 'Max' → child_of Alice, loves chess, does swimming. Filter by date with as_of to see what was true at a point in time.",
        "input_schema": {
            "type": "object",
            "properties": {
                "entity": {"type": "string", "description": "Entity to query (e.g. 'Max', 'MyProject', 'Alice')"},
                "as_of": {"type": "string", "description": "Date filter — only facts valid at this date (YYYY-MM-DD, optional)"},
                "direction": {"type": "string", "description": "outgoing (entity→?), incoming (?→entity), or both (default: both)"},
            },
            "required": ["entity"],
        },
        "handler": tool_kg_query,
    },
    "mempalace_kg_add": {
        "description": "Add a fact to the knowledge graph. Subject → predicate → object with optional time window. E.g. ('Max', 'started_school', 'Year 7', valid_from='2026-09-01').",
        "input_schema": {
            "type": "object",
            "properties": {
                "subject": {"type": "string", "description": "The entity doing/being something"},
                "predicate": {"type": "string", "description": "The relationship type (e.g. 'loves', 'works_on', 'daughter_of')"},
                "object": {"type": "string", "description": "The entity being connected to"},
                "valid_from": {"type": "string", "description": "When this became true (YYYY-MM-DD, optional)"},
                "source_closet": {"type": "string", "description": "Closet ID where this fact appears (optional)"},
            },
            "required": ["subject", "predicate", "object"],
        },
        "handler": tool_kg_add,
    },
    "mempalace_kg_invalidate": {
        "description": "Mark a fact as no longer true. E.g. ankle injury resolved, job ended, moved house.",
        "input_schema": {
            "type": "object",
            "properties": {
                "subject": {"type": "string", "description": "Entity"},
                "predicate": {"type": "string", "description": "Relationship"},
                "object": {"type": "string", "description": "Connected entity"},
                "ended": {"type": "string", "description": "When it stopped being true (YYYY-MM-DD, default: today)"},
            },
            "required": ["subject", "predicate", "object"],
        },
        "handler": tool_kg_invalidate,
    },
    "mempalace_kg_timeline": {
        "description": "Chronological timeline of facts. Shows the story of an entity (or everything) in order.",
        "input_schema": {
            "type": "object",
            "properties": {
                "entity": {"type": "string", "description": "Entity to get timeline for (optional — omit for full timeline)"},
            },
        },
        "handler": tool_kg_timeline,
    },
    "mempalace_kg_stats": {
        "description": "Knowledge graph overview: entities, triples, current vs expired facts, relationship types.",
        "input_schema": {"type": "object", "properties": {}},
        "handler": tool_kg_stats,
    },
    "mempalace_traverse": {
        "description": "Walk the palace graph from a room. Shows connected ideas across wings — the tunnels. Like following a thread through the palace: start at 'chromadb-setup' in wing_code, discover it connects to wing_myproject (planning) and wing_user (feelings about it).",
        "input_schema": {
            "type": "object",
            "properties": {
                "start_room": {"type": "string", "description": "Room to start from (e.g. 'chromadb-setup', 'riley-school')"},
                "max_hops": {"type": "integer", "description": "How many connections to follow (default: 2)"},
            },
            "required": ["start_room"],
        },
        "handler": tool_traverse_graph,
    },
    "mempalace_find_tunnels": {
        "description": "Find rooms that bridge two wings — the hallways connecting different domains. E.g. what topics connect wing_code to wing_team?",
        "input_schema": {
            "type": "object",
            "properties": {
                "wing_a": {"type": "string", "description": "First wing (optional)"},
                "wing_b": {"type": "string", "description": "Second wing (optional)"},
            },
        },
        "handler": tool_find_tunnels,
    },
    "mempalace_graph_stats": {
        "description": "Palace graph overview: total rooms, tunnel connections, edges between wings.",
        "input_schema": {"type": "object", "properties": {}},
        "handler": tool_graph_stats,
    },
    "mempalace_search": {
        "description": "Semantic search. Returns verbatim drawer content with similarity scores.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "What to search for"},
                "limit": {"type": "integer", "description": "Max results (default 5)"},
                "wing": {"type": "string", "description": "Filter by wing (optional)"},
                "room": {"type": "string", "description": "Filter by room (optional)"},
            },
            "required": ["query"],
        },
        "handler": tool_search,
    },
    "mempalace_check_duplicate": {
        "description": "Check if content already exists in the palace before filing",
        "input_schema": {
            "type": "object",
            "properties": {
                "content": {"type": "string", "description": "Content to check"},
                "threshold": {
                    "type": "number",
                    "description": "Similarity threshold 0-1 (default 0.9)",
                },
            },
            "required": ["content"],
        },
        "handler": tool_check_duplicate,
    },
    "mempalace_add_drawer": {
        "description": "File verbatim content into the palace. Checks for duplicates first.",
        "input_schema": {
            "type": "object",
            "properties": {
                "wing": {"type": "string", "description": "Wing (project name)"},
                "room": {
                    "type": "string",
                    "description": "Room (aspect: backend, decisions, meetings...)",
                },
                "content": {
                    "type": "string",
                    "description": "Verbatim content to store — exact words, never summarized",
                },
                "source_file": {"type": "string", "description": "Where this came from (optional)"},
                "added_by": {"type": "string", "description": "Who is filing this (default: mcp)"},
            },
            "required": ["wing", "room", "content"],
        },
        "handler": tool_add_drawer,
    },
    "mempalace_delete_drawer": {
        "description": "Delete a drawer by ID. Irreversible.",
        "input_schema": {
            "type": "object",
            "properties": {
                "drawer_id": {"type": "string", "description": "ID of the drawer to delete"},
            },
            "required": ["drawer_id"],
        },
        "handler": tool_delete_drawer,
    },
    "mempalace_diary_write": {
        "description": "Write to your personal agent diary in AAAK format. Your observations, thoughts, what you worked on, what matters. Each agent has their own diary with full history. Write in AAAK for compression — e.g. 'SESSION:2026-04-04|built.palace.graph+diary.tools|ALC.req:agent.diaries.in.aaak|★★★'. Use entity codes from the AAAK spec.",
        "input_schema": {
            "type": "object",
            "properties": {
                "agent_name": {"type": "string", "description": "Your name — each agent gets their own diary wing"},
                "entry": {"type": "string", "description": "Your diary entry in AAAK format — compressed, entity-coded, emotion-marked"},
                "topic": {"type": "string", "description": "Topic tag (optional, default: general)"},
            },
            "required": ["agent_name", "entry"],
        },
        "handler": tool_diary_write,
    },
    "mempalace_diary_read": {
        "description": "Read your recent diary entries (in AAAK). See what past versions of yourself recorded — your journal across sessions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "agent_name": {"type": "string", "description": "Your name — each agent gets their own diary wing"},
                "last_n": {"type": "integer", "description": "Number of recent entries to read (default: 10)"},
            },
            "required": ["agent_name"],
        },
        "handler": tool_diary_read,
    },
 }
 def handle_request(request):
    method = request.get("method", "")
    params = request.get("params", {})
    req_id = request.get("id")
    if method == "initialize":
        return {
            "jsonrpc": "2.0",
            "id": req_id,
            "result": {
                "protocolVersion": "2024-11-05",
                "capabilities": {"tools": {}},
                "serverInfo": {"name": "mempalace", "version": "2.0.0"},
            },
        }
    elif method == "notifications/initialized":
        return None
    elif method == "tools/list":
        return {
            "jsonrpc": "2.0",
            "id": req_id,
            "result": {
                "tools": [
                    {"name": n, "description": t["description"], "inputSchema": t["input_schema"]}
                    for n, t in TOOLS.items()
                ]
            },
        }
    elif method == "tools/call":
        tool_name = params.get("name")
        tool_args = params.get("arguments", {})
        if tool_name not in TOOLS:
            return {
                "jsonrpc": "2.0",
                "id": req_id,
                "error": {"code": -32601, "message": f"Unknown tool: {tool_name}"},
            }
        try:
            result = TOOLS[tool_name]["handler"](**tool_args)
            return {
                "jsonrpc": "2.0",
                "id": req_id,
                "result": {"content": [{"type": "text", "text": json.dumps(result, indent=2)}]},
            }
        except Exception as e:
            logger.error(f"Tool error in {tool_name}: {e}")
            return {"jsonrpc": "2.0", "id": req_id, "error": {"code": -32000, "message": str(e)}}
    return {
        "jsonrpc": "2.0",
        "id": req_id,
        "error": {"code": -32601, "message": f"Unknown method: {method}"},
    }
 def main():
    logger.info("MemPalace MCP Server starting...")
    while True:
        try:
            line = sys.stdin.readline()
            if not line:
                break
            line = line.strip()
            if not line:
                continue
            request = json.loads(line)
            response = handle_request(request)
            if response is not None:
                sys.stdout.write(json.dumps(response) + "\n")
                sys.stdout.flush()
        except KeyboardInterrupt:
            break
        except Exception as e:
            logger.error(f"Server error: {e}")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,417 @@
 #!/usr/bin/env python3
 """
 miner.py — Files everything into the palace.
 Reads mempalace.yaml from the project directory to know the wing + rooms.
 Routes each file to the right room based on content.
 Stores verbatim chunks as drawers. No summaries. Ever.
 """
 import os
 import sys
 import hashlib
 from pathlib import Path
 from datetime import datetime
 from collections import defaultdict
 import chromadb
 READABLE_EXTENSIONS = {
    ".txt",
    ".md",
    ".py",
    ".js",
    ".ts",
    ".jsx",
    ".tsx",
    ".json",
    ".yaml",
    ".yml",
    ".html",
    ".css",
    ".java",
    ".go",
    ".rs",
    ".rb",
    ".sh",
    ".csv",
    ".sql",
    ".toml",
 }
 SKIP_DIRS = {
    ".git",
    "node_modules",
    "__pycache__",
    ".venv",
    "venv",
    "env",
    "dist",
    "build",
    ".next",
    "coverage",
    ".mempalace",
 }
 CHUNK_SIZE = 800  # chars per drawer
 CHUNK_OVERLAP = 100  # overlap between chunks
 MIN_CHUNK_SIZE = 50  # skip tiny chunks
 # =============================================================================
 # CONFIG
 # =============================================================================
 def load_config(project_dir: str) -> dict:
    """Load mempalace.yaml from project directory (falls back to mempal.yaml)."""
    import yaml
    config_path = Path(project_dir).expanduser().resolve() / "mempalace.yaml"
    if not config_path.exists():
        # Fallback to legacy name
        legacy_path = Path(project_dir).expanduser().resolve() / "mempal.yaml"
        if legacy_path.exists():
            config_path = legacy_path
        else:
            print(f"ERROR: No mempalace.yaml found in {project_dir}")
            print(f"Run: mempalace init {project_dir}")
            sys.exit(1)
    with open(config_path) as f:
        return yaml.safe_load(f)
 # =============================================================================
 # FILE ROUTING — which room does this file belong to?
 # =============================================================================
 def detect_room(filepath: Path, content: str, rooms: list, project_path: Path) -> str:
    """
    Route a file to the right room.
    Priority:
    1. Folder path matches a room name
    2. Filename matches a room name or keyword
    3. Content keyword scoring
    4. Fallback: "general"
    """
    relative = str(filepath.relative_to(project_path)).lower()
    filename = filepath.stem.lower()
    content_lower = content[:2000].lower()
    # Priority 1: folder path contains room name
    path_parts = relative.replace("\\", "/").split("/")
    for part in path_parts[:-1]:  # skip filename itself
        for room in rooms:
            if room["name"].lower() in part or part in room["name"].lower():
                return room["name"]
    # Priority 2: filename matches room name
    for room in rooms:
        if room["name"].lower() in filename or filename in room["name"].lower():
            return room["name"]
    # Priority 3: keyword scoring from room keywords + name
    scores = defaultdict(int)
    for room in rooms:
        keywords = room.get("keywords", []) + [room["name"]]
        for kw in keywords:
            count = content_lower.count(kw.lower())
            scores[room["name"]] += count
    if scores:
        best = max(scores, key=scores.get)
        if scores[best] > 0:
            return best
    return "general"
 # =============================================================================
 # CHUNKING
 # =============================================================================
 def chunk_text(content: str, source_file: str) -> list:
    """
    Split content into drawer-sized chunks.
    Tries to split on paragraph/line boundaries.
    Returns list of {"content": str, "chunk_index": int}
    """
    # Clean up
    content = content.strip()
    if not content:
        return []
    chunks = []
    start = 0
    chunk_index = 0
    while start < len(content):
        end = min(start + CHUNK_SIZE, len(content))
        # Try to break at paragraph boundary
        if end < len(content):
            newline_pos = content.rfind("\n\n", start, end)
            if newline_pos > start + CHUNK_SIZE // 2:
                end = newline_pos
            else:
                newline_pos = content.rfind("\n", start, end)
                if newline_pos > start + CHUNK_SIZE // 2:
                    end = newline_pos
        chunk = content[start:end].strip()
        if len(chunk) >= MIN_CHUNK_SIZE:
            chunks.append(
                {
                    "content": chunk,
                    "chunk_index": chunk_index,
                }
            )
            chunk_index += 1
        start = end - CHUNK_OVERLAP if end < len(content) else end
    return chunks
 # =============================================================================
 # PALACE — ChromaDB operations
 # =============================================================================
 def get_collection(palace_path: str):
    os.makedirs(palace_path, exist_ok=True)
    client = chromadb.PersistentClient(path=palace_path)
    try:
        return client.get_collection("mempalace_drawers")
    except Exception:
        return client.create_collection("mempalace_drawers")
 def file_already_mined(collection, source_file: str) -> bool:
    """Fast check: has this file been filed before?"""
    try:
        results = collection.get(where={"source_file": source_file}, limit=1)
        return len(results.get("ids", [])) > 0
    except Exception:
        return False
 def add_drawer(
    collection, wing: str, room: str, content: str, source_file: str, chunk_index: int, agent: str
 ):
    """Add one drawer to the palace."""
    drawer_id = f"drawer_{wing}_{room}_{hashlib.md5((source_file + str(chunk_index)).encode()).hexdigest()[:16]}"
    try:
        collection.add(
            documents=[content],
            ids=[drawer_id],
            metadatas=[
                {
                    "wing": wing,
                    "room": room,
                    "source_file": source_file,
                    "chunk_index": chunk_index,
                    "added_by": agent,
                    "filed_at": datetime.now().isoformat(),
                }
            ],
        )
        return True
    except Exception as e:
        if "already exists" in str(e).lower() or "duplicate" in str(e).lower():
            return False
        raise
 # =============================================================================
 # PROCESS ONE FILE
 # =============================================================================
 def process_file(
    filepath: Path,
    project_path: Path,
    collection,
    wing: str,
    rooms: list,
    agent: str,
    dry_run: bool,
 ) -> int:
    """Read, chunk, route, and file one file. Returns drawer count."""
    # Skip if already filed
    source_file = str(filepath)
    if not dry_run and file_already_mined(collection, source_file):
        return 0
    try:
        content = filepath.read_text(encoding="utf-8", errors="replace")
    except Exception:
        return 0
    content = content.strip()
    if len(content) < MIN_CHUNK_SIZE:
        return 0
    room = detect_room(filepath, content, rooms, project_path)
    chunks = chunk_text(content, source_file)
    if dry_run:
        print(f"    [DRY RUN] {filepath.name} → room:{room} ({len(chunks)} drawers)")
        return len(chunks)
    drawers_added = 0
    for chunk in chunks:
        added = add_drawer(
            collection=collection,
            wing=wing,
            room=room,
            content=chunk["content"],
            source_file=source_file,
            chunk_index=chunk["chunk_index"],
            agent=agent,
        )
        if added:
            drawers_added += 1
    return drawers_added
 # =============================================================================
 # SCAN PROJECT
 # =============================================================================
 def scan_project(project_dir: str) -> list:
    """Return list of all readable file paths."""
    project_path = Path(project_dir).expanduser().resolve()
    files = []
    for root, dirs, filenames in os.walk(project_path):
        dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
        for filename in filenames:
            filepath = Path(root) / filename
            if filepath.suffix.lower() in READABLE_EXTENSIONS:
                # Skip config files
                if filename in (
                    "mempalace.yaml",
                    "mempalace.yml",
                    "mempal.yaml",
                    "mempal.yml",
                    ".gitignore",
                    "package-lock.json",
                ):
                    continue
                files.append(filepath)
    return files
 # =============================================================================
 # MAIN: MINE
 # =============================================================================
 def mine(
    project_dir: str,
    palace_path: str,
    wing_override: str = None,
    agent: str = "mempalace",
    limit: int = 0,
    dry_run: bool = False,
 ):
    """Mine a project directory into the palace."""
    project_path = Path(project_dir).expanduser().resolve()
    config = load_config(project_dir)
    wing = wing_override or config["wing"]
    rooms = config.get("rooms", [{"name": "general", "description": "All project files"}])
    files = scan_project(project_dir)
    if limit > 0:
        files = files[:limit]
    print(f"\n{'=' * 55}")
    print("  MemPalace Mine")
    print(f"{'=' * 55}")
    print(f"  Wing:    {wing}")
    print(f"  Rooms:   {', '.join(r['name'] for r in rooms)}")
    print(f"  Files:   {len(files)}")
    print(f"  Palace:  {palace_path}")
    if dry_run:
        print("  DRY RUN — nothing will be filed")
    print(f"{'─' * 55}\n")
    if not dry_run:
        collection = get_collection(palace_path)
    else:
        collection = None
    total_drawers = 0
    files_skipped = 0
    room_counts = defaultdict(int)
    for i, filepath in enumerate(files, 1):
        drawers = process_file(
            filepath=filepath,
            project_path=project_path,
            collection=collection,
            wing=wing,
            rooms=rooms,
            agent=agent,
            dry_run=dry_run,
        )
        if drawers == 0 and not dry_run:
            files_skipped += 1
        else:
            total_drawers += drawers
            room = detect_room(filepath, "", rooms, project_path)
            room_counts[room] += 1
            if not dry_run:
                print(f"  ✓ [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers}")
    print(f"\n{'=' * 55}")
    print("  Done.")
    print(f"  Files processed: {len(files) - files_skipped}")
    print(f"  Files skipped (already filed): {files_skipped}")
    print(f"  Drawers filed: {total_drawers}")
    print("\n  By room:")
    for room, count in sorted(room_counts.items(), key=lambda x: x[1], reverse=True):
        print(f"    {room:20} {count} files")
    print('\n  Next: mempalace search "what you\'re looking for"')
    print(f"{'=' * 55}\n")
 # =============================================================================
 # STATUS
 # =============================================================================
 def status(palace_path: str):
    """Show what's been filed in the palace."""
    try:
        client = chromadb.PersistentClient(path=palace_path)
        col = client.get_collection("mempalace_drawers")
    except Exception:
        print(f"\n  No palace found at {palace_path}")
        print("  Run: mempalace init <dir> then mempalace mine <dir>")
        return
    # Count by wing and room
    r = col.get(limit=10000, include=["metadatas"])
    metas = r["metadatas"]
    wing_rooms = defaultdict(lambda: defaultdict(int))
    for m in metas:
        wing_rooms[m.get("wing", "?")][m.get("room", "?")] += 1
    print(f"\n{'=' * 55}")
    print(f"  MemPalace Status — {len(metas)} drawers")
    print(f"{'=' * 55}\n")
    for wing, rooms in sorted(wing_rooms.items()):
        print(f"  WING: {wing}")
        for room, count in sorted(rooms.items(), key=lambda x: x[1], reverse=True):
            print(f"    ROOM: {room:20} {count:5} drawers")
        print()
    print(f"{'=' * 55}\n")
@@ -0,0 +1,253 @@
 #!/usr/bin/env python3
 """
 normalize.py — Convert any chat export format to MemPalace transcript format.
 Supported:
    - Plain text with > markers (pass through)
    - Claude.ai JSON export
    - ChatGPT conversations.json
    - Claude Code JSONL
    - Slack JSON export
    - Plain text (pass through for paragraph chunking)
 No API key. No internet. Everything local.
 """
 import json
 import os
 from pathlib import Path
 from typing import Optional
 def normalize(filepath: str) -> str:
    """
    Load a file and normalize to transcript format if it's a chat export.
    Plain text files pass through unchanged.
    """
    try:
        with open(filepath, "r", encoding="utf-8", errors="replace") as f:
            content = f.read()
    except Exception as e:
        raise IOError(f"Could not read {filepath}: {e}")
    if not content.strip():
        return content
    # Already has > markers — pass through
    lines = content.split("\n")
    if sum(1 for line in lines if line.strip().startswith(">")) >= 3:
        return content
    # Try JSON normalization
    ext = Path(filepath).suffix.lower()
    if ext in (".json", ".jsonl") or content.strip()[:1] in ("{", "["):
        normalized = _try_normalize_json(content)
        if normalized:
            return normalized
    return content
 def _try_normalize_json(content: str) -> Optional[str]:
    """Try all known JSON chat schemas."""
    normalized = _try_claude_code_jsonl(content)
    if normalized:
        return normalized
    try:
        data = json.loads(content)
    except json.JSONDecodeError:
        return None
    for parser in (_try_claude_ai_json, _try_chatgpt_json, _try_slack_json):
        normalized = parser(data)
        if normalized:
            return normalized
    return None
 def _try_claude_code_jsonl(content: str) -> Optional[str]:
    """Claude Code JSONL sessions."""
    lines = [line.strip() for line in content.strip().split("\n") if line.strip()]
    messages = []
    for line in lines:
        try:
            entry = json.loads(line)
        except json.JSONDecodeError:
            continue
        if not isinstance(entry, dict):
            continue
        msg_type = entry.get("type", "")
        message = entry.get("message", {})
        if msg_type == "human":
            text = _extract_content(message.get("content", ""))
            if text:
                messages.append(("user", text))
        elif msg_type == "assistant":
            text = _extract_content(message.get("content", ""))
            if text:
                messages.append(("assistant", text))
    if len(messages) >= 2:
        return _messages_to_transcript(messages)
    return None
 def _try_claude_ai_json(data) -> Optional[str]:
    """Claude.ai JSON export: [{"role": "user", "content": "..."}]"""
    if isinstance(data, dict):
        data = data.get("messages", data.get("chat_messages", []))
    if not isinstance(data, list):
        return None
    messages = []
    for item in data:
        if not isinstance(item, dict):
            continue
        role = item.get("role", "")
        text = _extract_content(item.get("content", ""))
        if role in ("user", "human") and text:
            messages.append(("user", text))
        elif role in ("assistant", "ai") and text:
            messages.append(("assistant", text))
    if len(messages) >= 2:
        return _messages_to_transcript(messages)
    return None
 def _try_chatgpt_json(data) -> Optional[str]:
    """ChatGPT conversations.json with mapping tree."""
    if not isinstance(data, dict) or "mapping" not in data:
        return None
    mapping = data["mapping"]
    messages = []
    # Find root: prefer node with parent=None AND no message (synthetic root)
    root_id = None
    fallback_root = None
    for node_id, node in mapping.items():
        if node.get("parent") is None:
            if node.get("message") is None:
                root_id = node_id
                break
            elif fallback_root is None:
                fallback_root = node_id
    if not root_id:
        root_id = fallback_root
    if root_id:
        current_id = root_id
        visited = set()
        while current_id and current_id not in visited:
            visited.add(current_id)
            node = mapping.get(current_id, {})
            msg = node.get("message")
            if msg:
                role = msg.get("author", {}).get("role", "")
                content = msg.get("content", {})
                parts = content.get("parts", []) if isinstance(content, dict) else []
                text = " ".join(str(p) for p in parts if isinstance(p, str) and p).strip()
                if role == "user" and text:
                    messages.append(("user", text))
                elif role == "assistant" and text:
                    messages.append(("assistant", text))
            children = node.get("children", [])
            current_id = children[0] if children else None
    if len(messages) >= 2:
        return _messages_to_transcript(messages)
    return None
 def _try_slack_json(data) -> Optional[str]:
    """
    Slack channel export: [{"type": "message", "user": "...", "text": "..."}]
    Optimized for 2-person DMs. In channels with 3+ people, alternating
    speakers are labeled user/assistant to preserve the exchange structure.
    """
    if not isinstance(data, list):
        return None
    messages = []
    seen_users = {}
    last_role = None
    for item in data:
        if not isinstance(item, dict) or item.get("type") != "message":
            continue
        user_id = item.get("user", item.get("username", ""))
        text = item.get("text", "").strip()
        if not text or not user_id:
            continue
        if user_id not in seen_users:
            # Alternate roles so exchange chunking works with any number of speakers
            if not seen_users:
                seen_users[user_id] = "user"
            elif last_role == "user":
                seen_users[user_id] = "assistant"
            else:
                seen_users[user_id] = "user"
        last_role = seen_users[user_id]
        messages.append((seen_users[user_id], text))
    if len(messages) >= 2:
        return _messages_to_transcript(messages)
    return None
 def _extract_content(content) -> str:
    """Pull text from content — handles str, list of blocks, or dict."""
    if isinstance(content, str):
        return content.strip()
    if isinstance(content, list):
        parts = []
        for item in content:
            if isinstance(item, str):
                parts.append(item)
            elif isinstance(item, dict) and item.get("type") == "text":
                parts.append(item.get("text", ""))
        return " ".join(parts).strip()
    if isinstance(content, dict):
        return content.get("text", "").strip()
    return ""
 def _messages_to_transcript(messages: list, spellcheck: bool = True) -> str:
    """Convert [(role, text), ...] to transcript format with > markers."""
    if spellcheck:
        try:
            from mempalace.spellcheck import spellcheck_user_text
            _fix = spellcheck_user_text
        except Exception:
            _fix = None
    else:
        _fix = None
    lines = []
    i = 0
    while i < len(messages):
        role, text = messages[i]
        if role == "user":
            if _fix is not None:
                text = _fix(text)
            lines.append(f"> {text}")
            if i + 1 < len(messages) and messages[i + 1][0] == "assistant":
                lines.append(messages[i + 1][1])
                i += 2
            else:
                i += 1
        else:
            lines.append(text)
            i += 1
        lines.append("")
    return "\n".join(lines)
 if __name__ == "__main__":
    import sys
    if len(sys.argv) < 2:
        print("Usage: python normalize.py <filepath>")
        sys.exit(1)
    filepath = sys.argv[1]
    result = normalize(filepath)
    quote_count = sum(1 for line in result.split("\n") if line.strip().startswith(">"))
    print(f"\nFile: {os.path.basename(filepath)}")
    print(f"Normalized: {len(result)} chars | {quote_count} user turns detected")
    print("\n--- Preview (first 20 lines) ---")
    print("\n".join(result.split("\n")[:20]))
@@ -0,0 +1,480 @@
 #!/usr/bin/env python3
 """
 onboarding.py — MemPalace first-run setup.
 Asks the user:
  1. How they're using MemPalace (work / personal / combo)
  2. Who the people in their life are (names, nicknames, relationships)
  3. What their projects are
  4. What they want their wings called
 Seeds the entity_registry with confirmed data so MemPalace knows your world
 from minute one — before a single session is indexed.
 Usage:
    python3 -m mempalace.onboarding
    or: mempalace init
 """
 from pathlib import Path
 from mempalace.entity_registry import EntityRegistry
 from mempalace.entity_detector import detect_entities, scan_for_detection
 # ─────────────────────────────────────────────────────────────────────────────
 # Default wing taxonomies by mode
 # ─────────────────────────────────────────────────────────────────────────────
 DEFAULT_WINGS = {
    "work": [
        "projects",
        "clients",
        "team",
        "decisions",
        "research",
    ],
    "personal": [
        "family",
        "health",
        "creative",
        "reflections",
        "relationships",
    ],
    "combo": [
        "family",
        "work",
        "health",
        "creative",
        "projects",
        "reflections",
    ],
 }
 # ─────────────────────────────────────────────────────────────────────────────
 # Helpers
 # ─────────────────────────────────────────────────────────────────────────────
 def _hr():
    print(f"\n{'─' * 58}")
 def _header(text):
    print(f"\n{'=' * 58}")
    print(f"  {text}")
    print(f"{'=' * 58}")
 def _ask(prompt, default=None):
    if default:
        val = input(f"  {prompt} [{default}]: ").strip()
        return val if val else default
    return input(f"  {prompt}: ").strip()
 def _yn(prompt, default="y"):
    val = input(f"  {prompt} [{'Y/n' if default == 'y' else 'y/N'}]: ").strip().lower()
    if not val:
        return default == "y"
    return val.startswith("y")
 # ─────────────────────────────────────────────────────────────────────────────
 # Step 1: Mode selection
 # ─────────────────────────────────────────────────────────────────────────────
 def _ask_mode() -> str:
    _header("Welcome to MemPalace")
    print("""
  MemPalace is a personal memory system. To work well, it needs to know
  a little about your world — who the people are, what the projects
  are, and how you want your memory organized.
  This takes about 2 minutes. You can always update it later.
 """)
    print("  How are you using MemPalace?")
    print()
    print("    [1]  Work     — notes, projects, clients, colleagues, decisions")
    print("    [2]  Personal — diary, family, health, relationships, reflections")
    print("    [3]  Both     — personal and professional mixed")
    print()
    while True:
        choice = input("  Your choice [1/2/3]: ").strip()
        if choice == "1":
            return "work"
        elif choice == "2":
            return "personal"
        elif choice == "3":
            return "combo"
        print("  Please enter 1, 2, or 3.")
 # ─────────────────────────────────────────────────────────────────────────────
 # Step 2: People
 # ─────────────────────────────────────────────────────────────────────────────
 def _ask_people(mode: str) -> tuple[list, dict]:
    """Returns (people_list, aliases_dict)."""
    people = []
    aliases = {}  # nickname → full name
    if mode in ("personal", "combo"):
        _hr()
        print("""
  Personal world — who are the important people in your life?
  Format: name, relationship (e.g. "Riley, daughter" or just "Devon")
  For nicknames, you'll be asked separately.
  Type 'done' when finished.
 """)
        while True:
            entry = input("  Person: ").strip()
            if entry.lower() in ("done", ""):
                break
            parts = [p.strip() for p in entry.split(",", 1)]
            name = parts[0]
            relationship = parts[1] if len(parts) > 1 else ""
            if name:
                # Ask about nicknames
                nick = input(f"  Nickname for {name}? (or enter to skip): ").strip()
                if nick:
                    aliases[nick] = name
                people.append({"name": name, "relationship": relationship, "context": "personal"})
    if mode in ("work", "combo"):
        _hr()
        print("""
  Work world — who are the colleagues, clients, or collaborators
  you'd want to find in your notes?
  Format: name, role (e.g. "Ben, co-founder" or just "Sarah")
  Type 'done' when finished.
 """)
        while True:
            entry = input("  Person: ").strip()
            if entry.lower() in ("done", ""):
                break
            parts = [p.strip() for p in entry.split(",", 1)]
            name = parts[0]
            role = parts[1] if len(parts) > 1 else ""
            if name:
                people.append({"name": name, "relationship": role, "context": "work"})
    return people, aliases
 # ─────────────────────────────────────────────────────────────────────────────
 # Step 3: Projects
 # ─────────────────────────────────────────────────────────────────────────────
 def _ask_projects(mode: str) -> list:
    if mode == "personal":
        return []
    _hr()
    print("""
  What are your main projects? (These help MemPalace distinguish project
  names from person names — e.g. "Lantern" the project vs. "Lantern" the word.)
  Type 'done' when finished.
 """)
    projects = []
    while True:
        proj = input("  Project: ").strip()
        if proj.lower() in ("done", ""):
            break
        if proj:
            projects.append(proj)
    return projects
 # ─────────────────────────────────────────────────────────────────────────────
 # Step 4: Wings
 # ─────────────────────────────────────────────────────────────────────────────
 def _ask_wings(mode: str) -> list:
    defaults = DEFAULT_WINGS[mode]
    _hr()
    print(f"""
  Wings are the top-level categories in your memory palace.
  Suggested wings for {mode} mode:
    {", ".join(defaults)}
  Press enter to keep these, or type your own comma-separated list.
 """)
    custom = input("  Wings: ").strip()
    if custom:
        return [w.strip() for w in custom.split(",") if w.strip()]
    return defaults
 # ─────────────────────────────────────────────────────────────────────────────
 # Step 5: Auto-detect from files
 # ─────────────────────────────────────────────────────────────────────────────
 def _auto_detect(directory: str, known_people: list) -> list:
    """Scan directory for additional entity candidates."""
    known_names = {p["name"].lower() for p in known_people}
    try:
        files = scan_for_detection(directory)
        if not files:
            return []
        detected = detect_entities(files)
        new_people = [
            e
            for e in detected["people"]
            if e["name"].lower() not in known_names and e["confidence"] >= 0.7
        ]
        return new_people
    except Exception:
        return []
 # ─────────────────────────────────────────────────────────────────────────────
 # Step 6: Ambiguity warnings
 # ─────────────────────────────────────────────────────────────────────────────
 def _warn_ambiguous(people: list) -> list:
    """
    Flag names that are also common English words.
    Returns list of ambiguous names for user awareness.
    """
    from mempalace.entity_registry import COMMON_ENGLISH_WORDS
    ambiguous = []
    for p in people:
        if p["name"].lower() in COMMON_ENGLISH_WORDS:
            ambiguous.append(p["name"])
    return ambiguous
 # ─────────────────────────────────────────────────────────────────────────────
 # Main onboarding flow
 # ─────────────────────────────────────────────────────────────────────────────
 def _generate_aaak_bootstrap(people: list, projects: list, wings: list, mode: str, config_dir: Path = None):
    """
    Generate AAAK entity registry + critical facts bootstrap from onboarding data.
    These files teach the AI about the user's world from session one.
    """
    mempalace_dir = Path(config_dir) if config_dir else Path.home() / ".mempalace"
    mempalace_dir.mkdir(parents=True, exist_ok=True)
    # Build AAAK entity codes (first 3 letters of name, uppercase)
    entity_codes = {}
    for p in people:
        name = p["name"]
        code = name[:3].upper()
        # Handle collisions
        while code in entity_codes.values():
            code = name[:4].upper()
        entity_codes[name] = code
    # AAAK entity registry
    registry_lines = [
        "# AAAK Entity Registry",
        "# Auto-generated by mempalace init. Update as needed.",
        "",
        "## People",
    ]
    for p in people:
        name = p["name"]
        code = entity_codes[name]
        rel = p.get("relationship", "")
        ctx = p.get("context", "")
        registry_lines.append(f"  {code}={name} ({rel})" if rel else f"  {code}={name}")
    if projects:
        registry_lines.extend(["", "## Projects"])
        for proj in projects:
            code = proj[:4].upper()
            registry_lines.append(f"  {code}={proj}")
    registry_lines.extend([
        "",
        "## AAAK Quick Reference",
        "  Symbols: ♡=love ★=importance ⚠=warning →=relationship |=separator",
        "  Structure: KEY:value | GROUP(details) | entity.attribute",
        "  Read naturally — expand codes, treat *markers* as emotional context.",
    ])
    (mempalace_dir / "aaak_entities.md").write_text("\n".join(registry_lines))
    # Critical facts bootstrap (pre-palace — before any mining)
    facts_lines = [
        "# Critical Facts (bootstrap — will be enriched after mining)",
        "",
    ]
    personal_people = [p for p in people if p.get("context") == "personal"]
    work_people = [p for p in people if p.get("context") == "work"]
    if personal_people:
        facts_lines.append("## People (personal)")
        for p in personal_people:
            code = entity_codes[p["name"]]
            rel = p.get("relationship", "")
            facts_lines.append(f"- **{p['name']}** ({code}) — {rel}" if rel else f"- **{p['name']}** ({code})")
        facts_lines.append("")
    if work_people:
        facts_lines.append("## People (work)")
        for p in work_people:
            code = entity_codes[p["name"]]
            rel = p.get("relationship", "")
            facts_lines.append(f"- **{p['name']}** ({code}) — {rel}" if rel else f"- **{p['name']}** ({code})")
        facts_lines.append("")
    if projects:
        facts_lines.append("## Projects")
        for proj in projects:
            facts_lines.append(f"- **{proj}**")
        facts_lines.append("")
    facts_lines.extend([
        "## Palace",
        f"Wings: {', '.join(wings)}",
        f"Mode: {mode}",
        "",
        "*This file will be enriched by palace_facts.py after mining.*",
    ])
    (mempalace_dir / "critical_facts.md").write_text("\n".join(facts_lines))
 def run_onboarding(
    directory: str = ".",
    config_dir: Path = None,
    auto_detect: bool = True,
 ) -> EntityRegistry:
    """
    Run the full onboarding flow.
    Returns the seeded EntityRegistry.
    """
    # Step 1: Mode
    mode = _ask_mode()
    # Step 2: People
    people, aliases = _ask_people(mode)
    # Step 3: Projects
    projects = _ask_projects(mode)
    # Step 4: Wings (stored in config, not registry — just show user)
    wings = _ask_wings(mode)
    # Step 5: Auto-detect additional people from files
    if auto_detect and _yn("\nScan your files for additional names we might have missed?"):
        directory = _ask("Directory to scan", default=directory)
        detected = _auto_detect(directory, people)
        if detected:
            _hr()
            print(f"\n  Found {len(detected)} additional name candidates:\n")
            for e in detected:
                print(
                    f"    {e['name']:20} confidence={e['confidence']:.0%}  "
                    f"({', '.join(e['signals'][:1])})"
                )
            print()
            if _yn("  Add any of these to your registry?"):
                for e in detected:
                    ans = input(f"    {e['name']} — (p)erson, (s)kip? ").strip().lower()
                    if ans == "p":
                        rel = input(f"    Relationship/role for {e['name']}? ").strip()
                        ctx = (
                            "personal"
                            if mode == "personal"
                            else (
                                "work"
                                if mode == "work"
                                else input("    Context — (p)ersonal or (w)ork? ")
                                .strip()
                                .lower()
                                .replace("w", "work")
                                .replace("p", "personal")
                            )
                        )
                        people.append({"name": e["name"], "relationship": rel, "context": ctx})
    # Step 6: Warn about ambiguous names
    ambiguous = _warn_ambiguous(people)
    if ambiguous:
        _hr()
        print(f"""
  Heads up — these names are also common English words:
    {", ".join(ambiguous)}
  MemPalace will check the context before treating them as person names.
  For example: "I picked up Riley" → person.
               "Have you ever tried" → adverb.
 """)
    # Build and save registry
    registry = EntityRegistry.load(config_dir)
    registry.seed(mode=mode, people=people, projects=projects, aliases=aliases)
    # Generate AAAK entity registry + critical facts bootstrap
    _generate_aaak_bootstrap(people, projects, wings, mode, config_dir)
    # Summary
    _header("Setup Complete")
    print()
    print(f"  {registry.summary()}")
    print(f"\n  Wings: {', '.join(wings)}")
    print(f"\n  Registry saved to: {registry._path}")
    print(f"\n  AAAK entity registry: ~/.mempalace/aaak_entities.md")
    print(f"  Critical facts bootstrap: ~/.mempalace/critical_facts.md")
    print(f"\n  Your AI will know your world from the first session.")
    print()
    return registry
 # ─────────────────────────────────────────────────────────────────────────────
 # Quick setup (non-interactive, for testing)
 # ─────────────────────────────────────────────────────────────────────────────
 def quick_setup(
    mode: str,
    people: list,
    projects: list = None,
    aliases: dict = None,
    config_dir: Path = None,
 ) -> EntityRegistry:
    """
    Programmatic setup without interactive prompts.
    Used in tests and benchmark scripts.
    people: list of dicts {"name": str, "relationship": str, "context": str}
    """
    registry = EntityRegistry.load(config_dir)
    registry.seed(
        mode=mode,
        people=people,
        projects=projects or [],
        aliases=aliases or {},
    )
    return registry
 # ─────────────────────────────────────────────────────────────────────────────
 # CLI
 # ─────────────────────────────────────────────────────────────────────────────
 if __name__ == "__main__":
    import sys
    directory = sys.argv[1] if len(sys.argv) > 1 else "."
    run_onboarding(directory=directory)
@@ -0,0 +1,216 @@
 """
 palace_graph.py — Graph traversal layer for MemPalace
 ======================================================
 Builds a navigable graph from the palace structure:
  - Nodes = rooms (named ideas)
  - Edges = shared rooms across wings (tunnels)
  - Edge types = halls (the corridors)
 Enables queries like:
  "Start at chromadb-setup in wing_code, walk to wing_myproject"
  "Find all rooms connected to riley-college-apps"
  "What topics bridge wing_hardware and wing_myproject?"
 No external graph DB needed — built from ChromaDB metadata.
 """
 from collections import defaultdict, Counter
 from .config import MempalaceConfig
 import chromadb
 def _get_collection(config=None):
    config = config or MempalaceConfig()
    try:
        client = chromadb.PersistentClient(path=config.palace_path)
        return client.get_collection(config.collection_name)
    except Exception:
        return None
 def build_graph(col=None, config=None):
    """
    Build the palace graph from ChromaDB metadata.
    Returns:
        nodes: dict of {room: {wings: set, halls: set, count: int}}
        edges: list of {room, wing_a, wing_b, hall} — one per tunnel crossing
    """
    if col is None:
        col = _get_collection(config)
    if not col:
        return {}, []
    total = col.count()
    room_data = defaultdict(lambda: {"wings": set(), "halls": set(), "count": 0, "dates": set()})
    offset = 0
    while offset < total:
        batch = col.get(limit=1000, offset=offset, include=["metadatas"])
        for meta in batch["metadatas"]:
            room = meta.get("room", "")
            wing = meta.get("wing", "")
            hall = meta.get("hall", "")
            date = meta.get("date", "")
            if room and room != "general" and wing:
                room_data[room]["wings"].add(wing)
                if hall:
                    room_data[room]["halls"].add(hall)
                if date:
                    room_data[room]["dates"].add(date)
                room_data[room]["count"] += 1
        if not batch["ids"]:
            break
        offset += len(batch["ids"])
    # Build edges from rooms that span multiple wings
    edges = []
    for room, data in room_data.items():
        wings = sorted(data["wings"])
        if len(wings) >= 2:
            for i, wa in enumerate(wings):
                for wb in wings[i + 1:]:
                    for hall in data["halls"]:
                        edges.append({
                            "room": room,
                            "wing_a": wa,
                            "wing_b": wb,
                            "hall": hall,
                            "count": data["count"],
                        })
    # Convert sets to lists for JSON serialization
    nodes = {}
    for room, data in room_data.items():
        nodes[room] = {
            "wings": sorted(data["wings"]),
            "halls": sorted(data["halls"]),
            "count": data["count"],
            "dates": sorted(data["dates"])[-5:] if data["dates"] else [],
        }
    return nodes, edges
 def traverse(start_room: str, col=None, config=None, max_hops: int = 2):
    """
    Walk the graph from a starting room. Find connected rooms
    through shared wings.
    Returns list of paths: [{room, wing, hall, hop_distance}]
    """
    nodes, edges = build_graph(col, config)
    if start_room not in nodes:
        return {"error": f"Room '{start_room}' not found", "suggestions": _fuzzy_match(start_room, nodes)}
    start = nodes[start_room]
    visited = {start_room}
    results = [{
        "room": start_room,
        "wings": start["wings"],
        "halls": start["halls"],
        "count": start["count"],
        "hop": 0,
    }]
    # BFS traversal
    frontier = [(start_room, 0)]
    while frontier:
        current_room, depth = frontier.pop(0)
        if depth >= max_hops:
            continue
        current = nodes.get(current_room, {})
        current_wings = set(current.get("wings", []))
        # Find all rooms that share a wing with current room
        for room, data in nodes.items():
            if room in visited:
                continue
            shared_wings = current_wings & set(data["wings"])
            if shared_wings:
                visited.add(room)
                results.append({
                    "room": room,
                    "wings": data["wings"],
                    "halls": data["halls"],
                    "count": data["count"],
                    "hop": depth + 1,
                    "connected_via": sorted(shared_wings),
                })
                if depth + 1 < max_hops:
                    frontier.append((room, depth + 1))
    # Sort by relevance (hop distance, then count)
    results.sort(key=lambda x: (x["hop"], -x["count"]))
    return results[:50]  # cap results
 def find_tunnels(wing_a: str = None, wing_b: str = None, col=None, config=None):
    """
    Find rooms that connect two wings (or all tunnel rooms if no wings specified).
    These are the "hallways" — same named idea appearing in multiple domains.
    """
    nodes, edges = build_graph(col, config)
    tunnels = []
    for room, data in nodes.items():
        wings = data["wings"]
        if len(wings) < 2:
            continue
        if wing_a and wing_a not in wings:
            continue
        if wing_b and wing_b not in wings:
            continue
        tunnels.append({
            "room": room,
            "wings": wings,
            "halls": data["halls"],
            "count": data["count"],
            "recent": data["dates"][-1] if data["dates"] else "",
        })
    tunnels.sort(key=lambda x: -x["count"])
    return tunnels[:50]
 def graph_stats(col=None, config=None):
    """Summary statistics about the palace graph."""
    nodes, edges = build_graph(col, config)
    tunnel_rooms = sum(1 for n in nodes.values() if len(n["wings"]) >= 2)
    wing_counts = Counter()
    for data in nodes.values():
        for w in data["wings"]:
            wing_counts[w] += 1
    return {
        "total_rooms": len(nodes),
        "tunnel_rooms": tunnel_rooms,
        "total_edges": len(edges),
        "rooms_per_wing": dict(wing_counts.most_common()),
        "top_tunnels": [
            {"room": r, "wings": d["wings"], "count": d["count"]}
            for r, d in sorted(nodes.items(), key=lambda x: -len(x[1]["wings"]))[:10]
            if len(d["wings"]) >= 2
        ],
    }
 def _fuzzy_match(query: str, nodes: dict, n: int = 5):
    """Find rooms that approximately match a query string."""
    query_lower = query.lower()
    scored = []
    for room in nodes:
        # Simple substring matching
        if query_lower in room:
            scored.append((room, 1.0))
        elif any(word in room for word in query_lower.split("-")):
            scored.append((room, 0.5))
    scored.sort(key=lambda x: -x[1])
    return [r for r, _ in scored[:n]]
@@ -0,0 +1,300 @@
 #!/usr/bin/env python3
 """
 room_detector_local.py — Local setup, no API required.
 Two ways to define rooms without calling any AI:
  1. Auto-detect from folder structure (zero config)
  2. Define manually in mempalace.yaml
 No internet. No API key. Your files stay on your machine.
 """
 import os
 import sys
 import yaml
 from pathlib import Path
 from collections import defaultdict
 # Common room patterns — detected from folder names and filenames
 # Format: {folder_keyword: room_name}
 FOLDER_ROOM_MAP = {
    "frontend": "frontend",
    "front-end": "frontend",
    "front_end": "frontend",
    "client": "frontend",
    "ui": "frontend",
    "views": "frontend",
    "components": "frontend",
    "pages": "frontend",
    "backend": "backend",
    "back-end": "backend",
    "back_end": "backend",
    "server": "backend",
    "api": "backend",
    "routes": "backend",
    "services": "backend",
    "controllers": "backend",
    "models": "backend",
    "database": "backend",
    "db": "backend",
    "docs": "documentation",
    "doc": "documentation",
    "documentation": "documentation",
    "wiki": "documentation",
    "readme": "documentation",
    "notes": "documentation",
    "design": "design",
    "designs": "design",
    "mockups": "design",
    "wireframes": "design",
    "assets": "design",
    "storyboard": "design",
    "costs": "costs",
    "cost": "costs",
    "budget": "costs",
    "finance": "costs",
    "financial": "costs",
    "pricing": "costs",
    "invoices": "costs",
    "accounting": "costs",
    "meetings": "meetings",
    "meeting": "meetings",
    "calls": "meetings",
    "meeting_notes": "meetings",
    "standup": "meetings",
    "minutes": "meetings",
    "team": "team",
    "staff": "team",
    "hr": "team",
    "hiring": "team",
    "employees": "team",
    "people": "team",
    "research": "research",
    "references": "research",
    "reading": "research",
    "papers": "research",
    "planning": "planning",
    "roadmap": "planning",
    "strategy": "planning",
    "specs": "planning",
    "requirements": "planning",
    "tests": "testing",
    "test": "testing",
    "testing": "testing",
    "qa": "testing",
    "scripts": "scripts",
    "tools": "scripts",
    "utils": "scripts",
    "config": "configuration",
    "configs": "configuration",
    "settings": "configuration",
    "infrastructure": "configuration",
    "infra": "configuration",
    "deploy": "configuration",
 }
 def detect_rooms_from_folders(project_dir: str) -> list:
    """
    Walk the project folder structure.
    Find top-level subdirectories that match known room patterns.
    Returns list of room dicts.
    """
    project_path = Path(project_dir).expanduser().resolve()
    found_rooms = {}
    SKIP_DIRS = {
        ".git",
        "node_modules",
        "__pycache__",
        ".venv",
        "venv",
        "env",
        "dist",
        "build",
        ".next",
        "coverage",
    }
    # Check top-level directories first (most reliable signal)
    for item in project_path.iterdir():
        if item.is_dir() and item.name not in SKIP_DIRS:
            name_lower = item.name.lower().replace("-", "_")
            if name_lower in FOLDER_ROOM_MAP:
                room_name = FOLDER_ROOM_MAP[name_lower]
                if room_name not in found_rooms:
                    found_rooms[room_name] = item.name
            # Also check if folder name IS a good room name directly
            elif len(item.name) > 2 and item.name[0].isalpha():
                clean = item.name.lower().replace("-", "_").replace(" ", "_")
                if clean not in found_rooms:
                    found_rooms[clean] = item.name
    # Walk one level deeper for nested patterns
    for item in project_path.iterdir():
        if item.is_dir() and item.name not in SKIP_DIRS:
            for subitem in item.iterdir():
                if subitem.is_dir() and subitem.name not in SKIP_DIRS:
                    name_lower = subitem.name.lower().replace("-", "_")
                    if name_lower in FOLDER_ROOM_MAP:
                        room_name = FOLDER_ROOM_MAP[name_lower]
                        if room_name not in found_rooms:
                            found_rooms[room_name] = subitem.name
    # Build room list
    rooms = []
    for room_name, original in found_rooms.items():
        rooms.append(
            {
                "name": room_name,
                "description": f"Files from {original}/",
                "keywords": [room_name, original.lower()],
            }
        )
    # Always add "general" as fallback
    if not any(r["name"] == "general" for r in rooms):
        rooms.append(
            {
                "name": "general",
                "description": "Files that don't fit other rooms",
                "keywords": [],
            }
        )
    return rooms
 def detect_rooms_from_files(project_dir: str) -> list:
    """
    Fallback: if folder structure gives no signal,
    detect rooms from recurring filename patterns.
    """
    project_path = Path(project_dir).expanduser().resolve()
    keyword_counts = defaultdict(int)
    SKIP_DIRS = {".git", "node_modules", "__pycache__", ".venv", "venv", "dist", "build"}
    for root, dirs, filenames in os.walk(project_path):
        dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
        for filename in filenames:
            name_lower = filename.lower().replace("-", "_").replace(" ", "_")
            for keyword, room in FOLDER_ROOM_MAP.items():
                if keyword in name_lower:
                    keyword_counts[room] += 1
    # Return rooms that appear more than twice
    rooms = []
    for room, count in sorted(keyword_counts.items(), key=lambda x: x[1], reverse=True):
        if count >= 2:
            rooms.append(
                {
                    "name": room,
                    "description": f"Files related to {room}",
                    "keywords": [room],
                }
            )
        if len(rooms) >= 6:
            break
    if not rooms:
        rooms = [{"name": "general", "description": "All project files", "keywords": []}]
    return rooms
 def print_proposed_structure(project_name: str, rooms: list, total_files: int, source: str):
    print(f"\n{'=' * 55}")
    print("  MemPalace Init — Local setup")
    print(f"{'=' * 55}")
    print(f"\n  WING: {project_name}")
    print(f"  ({total_files} files found, rooms detected from {source})\n")
    for room in rooms:
        print(f"    ROOM: {room['name']}")
        print(f"          {room['description']}")
    print(f"\n{'─' * 55}")
 def get_user_approval(rooms: list) -> list:
    """Same approval flow as AI version."""
    print("  Review the proposed rooms above.")
    print("  Options:")
    print("    [enter]  Accept all rooms")
    print("    [edit]   Remove or rename rooms")
    print("    [add]    Add a room manually")
    print()
    choice = input("  Your choice [enter/edit/add]: ").strip().lower()
    if choice in ("", "y", "yes"):
        return rooms
    if choice == "edit":
        print("\n  Current rooms:")
        for i, room in enumerate(rooms):
            print(f"    {i + 1}. {room['name']} — {room['description']}")
        remove = input("\n  Room numbers to REMOVE (comma-separated, or enter to skip): ").strip()
        if remove:
            to_remove = {int(x.strip()) - 1 for x in remove.split(",") if x.strip().isdigit()}
            rooms = [r for i, r in enumerate(rooms) if i not in to_remove]
    if choice == "add" or input("\n  Add any missing rooms? [y/N]: ").strip().lower() == "y":
        while True:
            new_name = (
                input("  New room name (or enter to stop): ").strip().lower().replace(" ", "_")
            )
            if not new_name:
                break
            new_desc = input(f"  Description for '{new_name}': ").strip()
            rooms.append({"name": new_name, "description": new_desc, "keywords": [new_name]})
            print(f"  Added: {new_name}")
    return rooms
 def save_config(project_dir: str, project_name: str, rooms: list):
    config = {
        "wing": project_name,
        "rooms": [{"name": r["name"], "description": r["description"]} for r in rooms],
    }
    config_path = Path(project_dir).expanduser().resolve() / "mempalace.yaml"
    with open(config_path, "w") as f:
        yaml.dump(config, f, default_flow_style=False, sort_keys=False)
    print(f"\n  Config saved: {config_path}")
    print("\n  Next step:")
    print(f"    mempalace mine {project_dir}")
    print(f"\n{'=' * 55}\n")
 def detect_rooms_local(project_dir: str):
    """Main entry point for local setup."""
    project_path = Path(project_dir).expanduser().resolve()
    project_name = project_path.name.lower().replace(" ", "_").replace("-", "_")
    if not project_path.exists():
        print(f"ERROR: Directory not found: {project_dir}")
        sys.exit(1)
    # Count files
    from .miner import scan_project
    files = scan_project(project_dir)
    # Try folder structure first
    rooms = detect_rooms_from_folders(project_dir)
    source = "folder structure"
    # If only "general" found, try filename patterns
    if len(rooms) <= 1:
        rooms = detect_rooms_from_files(project_dir)
        source = "filename patterns"
    # If still nothing, just use general
    if not rooms:
        rooms = [{"name": "general", "description": "All project files", "keywords": []}]
        source = "fallback (flat project)"
    print_proposed_structure(project_name, rooms, len(files), source)
    approved_rooms = get_user_approval(rooms)
    save_config(project_dir, project_name, approved_rooms)
@@ -0,0 +1,142 @@
 #!/usr/bin/env python3
 """
 searcher.py — Find anything. Exact words.
 Semantic search against the palace.
 Returns verbatim text — the actual words, never summaries.
 """
 import sys
 from pathlib import Path
 import chromadb
 def search(query: str, palace_path: str, wing: str = None, room: str = None, n_results: int = 5):
    """
    Search the palace. Returns verbatim drawer content.
    Optionally filter by wing (project) or room (aspect).
    """
    try:
        client = chromadb.PersistentClient(path=palace_path)
        col = client.get_collection("mempalace_drawers")
    except Exception:
        print(f"\n  No palace found at {palace_path}")
        print("  Run: mempalace init <dir> then mempalace mine <dir>")
        sys.exit(1)
    # Build where filter
    where = {}
    if wing and room:
        where = {"$and": [{"wing": wing}, {"room": room}]}
    elif wing:
        where = {"wing": wing}
    elif room:
        where = {"room": room}
    try:
        kwargs = {
            "query_texts": [query],
            "n_results": n_results,
            "include": ["documents", "metadatas", "distances"],
        }
        if where:
            kwargs["where"] = where
        results = col.query(**kwargs)
    except Exception as e:
        print(f"\n  Search error: {e}")
        sys.exit(1)
    docs = results["documents"][0]
    metas = results["metadatas"][0]
    dists = results["distances"][0]
    if not docs:
        print(f'\n  No results found for: "{query}"')
        return
    print(f"\n{'=' * 60}")
    print(f'  Results for: "{query}"')
    if wing:
        print(f"  Wing: {wing}")
    if room:
        print(f"  Room: {room}")
    print(f"{'=' * 60}\n")
    for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
        similarity = round(1 - dist, 3)
        source = Path(meta.get("source_file", "?")).name
        wing_name = meta.get("wing", "?")
        room_name = meta.get("room", "?")
        print(f"  [{i}] {wing_name} / {room_name}")
        print(f"      Source: {source}")
        print(f"      Match:  {similarity}")
        print()
        # Print the verbatim text, indented
        for line in doc.strip().split("\n"):
            print(f"      {line}")
        print()
        print(f"  {'─' * 56}")
    print()
 def search_memories(
    query: str, palace_path: str, wing: str = None, room: str = None, n_results: int = 5
 ) -> dict:
    """
    Programmatic search — returns a dict instead of printing.
    Used by the MCP server and other callers that need data.
    """
    try:
        client = chromadb.PersistentClient(path=palace_path)
        col = client.get_collection("mempalace_drawers")
    except Exception as e:
        return {"error": f"No palace found at {palace_path}: {e}"}
    # Build where filter
    where = {}
    if wing and room:
        where = {"$and": [{"wing": wing}, {"room": room}]}
    elif wing:
        where = {"wing": wing}
    elif room:
        where = {"room": room}
    try:
        kwargs = {
            "query_texts": [query],
            "n_results": n_results,
            "include": ["documents", "metadatas", "distances"],
        }
        if where:
            kwargs["where"] = where
        results = col.query(**kwargs)
    except Exception as e:
        return {"error": f"Search error: {e}"}
    docs = results["documents"][0]
    metas = results["metadatas"][0]
    dists = results["distances"][0]
    hits = []
    for doc, meta, dist in zip(docs, metas, dists):
        hits.append(
            {
                "text": doc,
                "wing": meta.get("wing", "unknown"),
                "room": meta.get("room", "unknown"),
                "source_file": Path(meta.get("source_file", "?")).name,
                "similarity": round(1 - dist, 3),
            }
        )
    return {
        "query": query,
        "filters": {"wing": wing, "room": room},
        "results": hits,
    }
@@ -0,0 +1,269 @@
 #!/usr/bin/env python3
 """
 spellcheck.py — Spell-correct user messages before palace filing.
 Preserves:
  - Technical terms (words with digits, hyphens, underscores)
  - CamelCase and ALL_CAPS identifiers
  - Known entity names (from EntityRegistry if available)
  - URLs and file paths
  - Words shorter than 3 chars (common abbreviations, pronouns, etc.)
  - Proper nouns already capitalized in context
 Corrects:
  - Genuine typos in lowercase, flowing text
  - Common fat-finger words (3am → 3am, knoe → know)
 Usage:
    from mempalace.spellcheck import spellcheck_user_text
    corrected = spellcheck_user_text("lsresdy knoe the question befor")
    # → "already know the question before"  (best effort)
 """
 import re
 from pathlib import Path
 from typing import Optional
 # Lazy-load autocorrect — not everyone has it installed
 _speller = None
 _autocorrect_available = None
 # System word list — loaded once, used to skip already-valid words
 _system_words: Optional[set] = None
 _SYSTEM_DICT = Path("/usr/share/dict/words")
 def _get_speller():
    global _speller, _autocorrect_available
    if _autocorrect_available is None:
        try:
            from autocorrect import Speller
            _speller = Speller(lang="en")
            _autocorrect_available = True
        except ImportError:
            _autocorrect_available = False
    return _speller if _autocorrect_available else None
 def _get_system_words() -> set:
    """Load /usr/share/dict/words once and cache it."""
    global _system_words
    if _system_words is None:
        if _SYSTEM_DICT.exists():
            with open(_SYSTEM_DICT) as f:
                _system_words = {w.strip().lower() for w in f if w.strip()}
        else:
            _system_words = set()
    return _system_words
 # ─────────────────────────────────────────────────────────────────────────────
 # Patterns that mark a token as "don't touch this"
 # ─────────────────────────────────────────────────────────────────────────────
 # Matches any token with a digit anywhere in it: 3am, bge-large-v1.5, top-10
 _HAS_DIGIT = re.compile(r"\d")
 # CamelCase: ChromaDB, MemPalace, LongMemEval
 _IS_CAMEL = re.compile(r"[A-Z][a-z]+[A-Z]")
 # ALL_CAPS or all-caps with underscores: NDCG, R@5, MAX_RESULTS
 _IS_ALLCAPS = re.compile(r"^[A-Z_@#$%^&*()+=\[\]{}|<>?.:/\\]+$")
 # Technical token: contains hyphens or underscores (bge-large, train_test)
 _IS_TECHNICAL = re.compile(r"[-_]")
 # URL-like or file-path-like
 _IS_URL = re.compile(r"https?://|www\.|/Users/|~/|\.[a-z]{2,4}$", re.IGNORECASE)
 # Code fences, markdown, or emoji-heavy
 _IS_CODE_OR_EMOJI = re.compile(r"[`*_#{}[\]\\]")
 # Very short tokens — skip (I, a, ok, my, etc. — also avoids ambiguous 3-char typos
 # like "kno" which autocorrect resolves as "no" rather than "know")
 _MIN_LENGTH = 4
 def _should_skip(token: str, known_names: set) -> bool:
    """Return True if this token should be left as-is."""
    if len(token) < _MIN_LENGTH:
        return True
    if _HAS_DIGIT.search(token):
        return True
    if _IS_CAMEL.search(token):
        return True
    if _IS_ALLCAPS.match(token):
        return True
    if _IS_TECHNICAL.search(token):
        return True
    if _IS_URL.search(token):
        return True
    if _IS_CODE_OR_EMOJI.search(token):
        return True
    # Known proper names (entity registry)
    if token.lower() in known_names:
        return True
    return False
 # ─────────────────────────────────────────────────────────────────────────────
 # Load known entity names from registry (optional, best-effort)
 # ─────────────────────────────────────────────────────────────────────────────
 def _load_known_names() -> set:
    """Pull all registered names from EntityRegistry. Returns empty set on failure."""
    try:
        from mempalace.entity_registry import EntityRegistry
        reg = EntityRegistry.load()
        names = set()
        for entity in reg._data.get("entities", {}).values():
            names.add(entity.get("canonical", "").lower())
            for alias in entity.get("aliases", []):
                names.add(alias.lower())
        return names
    except Exception:
        return set()
 # ─────────────────────────────────────────────────────────────────────────────
 # Edit distance — used to guard against over-aggressive autocorrect
 # ─────────────────────────────────────────────────────────────────────────────
 def _edit_distance(a: str, b: str) -> int:
    """Levenshtein distance between two strings."""
    if a == b:
        return 0
    if not a:
        return len(b)
    if not b:
        return len(a)
    prev = list(range(len(b) + 1))
    for i, ca in enumerate(a, 1):
        curr = [i]
        for j, cb in enumerate(b, 1):
            curr.append(min(prev[j] + 1, curr[j - 1] + 1, prev[j - 1] + (ca != cb)))
        prev = curr
    return prev[-1]
 # ─────────────────────────────────────────────────────────────────────────────
 # Core correction
 # ─────────────────────────────────────────────────────────────────────────────
 # Split on word boundaries but keep punctuation attached to tokens
 _TOKEN_RE = re.compile(r"(\S+)")
 def spellcheck_user_text(text: str, known_names: Optional[set] = None) -> str:
    """
    Spell-correct a user message.
    Args:
        text: Raw user message text.
        known_names: Set of lowercase names/terms to preserve. If None,
                     attempts to load from EntityRegistry automatically.
    Returns:
        Corrected text. Falls back to original if autocorrect not installed.
    """
    speller = _get_speller()
    if speller is None:
        return text  # autocorrect not installed — pass through unchanged
    if known_names is None:
        known_names = _load_known_names()
    # Process token by token, preserving all whitespace
    sys_words = _get_system_words()
    def _fix(match):
        token = match.group(0)
        # Strip trailing punctuation for checking, reattach after
        stripped = token.rstrip(".,!?;:'\")")
        punct = token[len(stripped) :]
        if not stripped or _should_skip(stripped, known_names):
            return token
        # Only correct lowercase words (capitalized words are likely proper nouns)
        if stripped[0].isupper():
            return token
        # Skip words that are already valid English — prevents "coherently" → "inherently"
        if stripped.lower() in sys_words:
            return token
        corrected = speller(stripped)
        # Guard: don't apply if corrected word is too different from original.
        # Extra safety net for words not in the system dict but also not typos.
        if corrected != stripped:
            dist = _edit_distance(stripped, corrected)
            max_edits = 2 if len(stripped) <= 7 else 3
            if dist > max_edits:
                return token
        return corrected + punct
    return _TOKEN_RE.sub(_fix, text)
 def spellcheck_transcript_line(line: str) -> str:
    """
    Spell-correct a single transcript line.
    Only touches lines that start with '>' (user turns).
    Assistant turns are never modified.
    """
    stripped = line.lstrip()
    if not stripped.startswith(">"):
        return line
    # '> actual message here'
    prefix_len = len(line) - len(stripped) + 2  # '> '
    message = line[prefix_len:]
    if not message.strip():
        return line
    corrected = spellcheck_user_text(message)
    return line[:prefix_len] + corrected
 def spellcheck_transcript(content: str) -> str:
    """
    Spell-correct all user turns in a full transcript.
    Only lines starting with '>' are touched.
    """
    lines = content.split("\n")
    return "\n".join(spellcheck_transcript_line(line) for line in lines)
 # ─────────────────────────────────────────────────────────────────────────────
 # Quick test
 # ─────────────────────────────────────────────────────────────────────────────
 if __name__ == "__main__":
    test_cases = [
        "lsresdy knoe the question befor",
        "isn't there meny diferent benchmarks tesing questions?",
        "also can you pleese spell chekc my questions befroe storing",
        "it's realy hard for me to writte coherently at 3am",
        "Mempalace cant be fine-tunned if you alredy kno the question",
        # Should NOT change these:
        "ChromaDB bge-large-en-v1.5 NDCG@10 R@5",
        "Riley picked up Sam from school",
        "hybrid_v4 top-k=50 longmemeval_bench.py",
    ]
    print("Spell-check test\n" + "=" * 50)
    for msg in test_cases:
        result = spellcheck_user_text(msg, known_names={"riley", "sam", "mempalace"})
        changed = " ← CHANGED" if result != msg else ""
        print(f"\nIN:  {msg}")
        if result != msg:
            print(f"OUT: {result}{changed}")
        else:
            print("OUT: (unchanged)")
@@ -0,0 +1,272 @@
 #!/usr/bin/env python3
 """
 split_mega_files.py — Split concatenated transcript files into per-session files
 =================================================================================
 Scans a directory for .txt files that contain multiple Claude Code sessions
 (identified by "Claude Code v" headers). Splits each into individual files
 named with: date, time, people detected, and subject from first prompt.
 Distinguishes true session starts from mid-session context restores
 (which show "Ctrl+E to show X previous messages").
 Output files are written to --output-dir (default: same dir as source).
 Original files are renamed with .mega_backup extension (not deleted).
 Usage:
    python3 split_mega_files.py                          # scan ~/Desktop/transcripts
    python3 split_mega_files.py --source ~/Desktop/transcripts  # explicit source
    python3 split_mega_files.py --dry-run                # show what would happen
    python3 split_mega_files.py --min-sessions 2         # only files with 2+ sessions
 By: Ben, 2026-03-30
 """
 import argparse
 import json
 import os
 import re
 import sys
 from pathlib import Path
 HOME      = Path.home()
 LUMI_DIR  = Path(os.environ.get("MEMPALACE_SOURCE_DIR", str(HOME / "Desktop/transcripts")))
 # People we know about (for name detection in content)
 # Loaded from ~/.mempalace/known_names.json if it exists, otherwise generic fallback.
 _KNOWN_NAMES_PATH = HOME / ".mempalace" / "known_names.json"
 def _load_known_people() -> list:
    """Load known names from config file, falling back to a generic list."""
    if _KNOWN_NAMES_PATH.exists():
        try:
            data = json.loads(_KNOWN_NAMES_PATH.read_text())
            if isinstance(data, list):
                return data
            return data.get("names", [])
        except (json.JSONDecodeError, OSError):
            pass
    # Generic fallback — override by creating ~/.mempalace/known_names.json
    return ["Alice", "Ben", "Riley", "Max", "Sam", "Devon", "Jordan"]
 KNOWN_PEOPLE = _load_known_people()
 def _load_username_map() -> dict:
    """Load username-to-name mapping from config file."""
    if _KNOWN_NAMES_PATH.exists():
        try:
            data = json.loads(_KNOWN_NAMES_PATH.read_text())
            if isinstance(data, dict):
                return data.get("username_map", {})
        except (json.JSONDecodeError, OSError):
            pass
    return {}
 def is_true_session_start(lines, idx):
    """
    True session start: 'Claude Code v' header NOT followed by 'Ctrl+E'/'previous messages'
    within the next 6 lines (those are context restores, not new sessions).
    """
    nearby = "".join(lines[idx:idx + 6])
    return "Ctrl+E" not in nearby and "previous messages" not in nearby
 def find_session_boundaries(lines):
    """Return list of line indices where true new sessions begin."""
    boundaries = []
    for i, line in enumerate(lines):
        if "Claude Code v" in line and is_true_session_start(lines, i):
            boundaries.append(i)
    return boundaries
 def extract_timestamp(lines):
    """
    Find the first timestamp line: ⏺ H:MM AM/PM Weekday, Month DD, YYYY
    Returns (datetime_str, iso_str) or (None, None).
    """
    ts_pattern = re.compile(
        r"⏺\s+(\d{1,2}:\d{2}\s+[AP]M)\s+\w+,\s+(\w+)\s+(\d{1,2}),\s+(\d{4})"
    )
    months = {
        "January": "01", "February": "02", "March": "03", "April": "04",
        "May": "05", "June": "06", "July": "07", "August": "08",
        "September": "09", "October": "10", "November": "11", "December": "12",
    }
    for line in lines[:50]:
        m = ts_pattern.search(line)
        if m:
            time_str, month, day, year = m.groups()
            mon = months.get(month, "00")
            day_z = day.zfill(2)
            time_safe = time_str.replace(":", "").replace(" ", "")
            iso = f"{year}-{mon}-{day_z}"
            human = f"{year}-{mon}-{day_z}_{time_safe}"
            return human, iso
    return None, None
 def extract_people(lines):
    """
    Detect people mentioned as speakers or by name in first 100 lines.
    Returns sorted list of detected names.
    """
    found = set()
    text = "".join(lines[:100])
    # Speaker tags: "Alice:", "Ben:", etc.
    for person in KNOWN_PEOPLE:
        if re.search(rf"\b{person}\b", text, re.IGNORECASE):
            found.add(person)
    # Working directory username hint — map to known people if configured
    dir_match = re.search(r"/Users/(\w+)/", text)
    if dir_match:
        username = dir_match.group(1)
        # User can map usernames to names in ~/.mempalace/known_names.json
        # under a "username_map" key, e.g. {"username_map": {"jdoe": "John"}}
        username_map = _load_username_map()
        if username in username_map:
            found.add(username_map[username])
    return sorted(found)
 def extract_subject(lines):
    """
    Find the first meaningful user prompt (> line that isn't a shell command).
    Returns cleaned, filename-safe subject string.
    """
    skip_patterns = re.compile(
        r"^(\.\/|cd |ls |python|bash|git |cat |source |export |claude|./activate)"
    )
    for line in lines:
        if line.startswith("> "):
            prompt = line[2:].strip()
            if prompt and not skip_patterns.match(prompt) and len(prompt) > 5:
                # Clean for filename
                subject = re.sub(r"[^\w\s-]", "", prompt)
                subject = re.sub(r"\s+", "-", subject.strip())
                return subject[:60]
    return "session"
 def split_file(filepath, output_dir, dry_run=False):
    """
    Split a single mega-file into per-session files.
    Returns list of output paths written (or would be written if dry_run).
    """
    path = Path(filepath)
    lines = path.read_text(errors="replace").splitlines(keepends=True)
    boundaries = find_session_boundaries(lines)
    if len(boundaries) < 2:
        return []  # Not a mega-file
    # Add sentinel at end
    boundaries.append(len(lines))
    out_dir = Path(output_dir) if output_dir else path.parent
    written = []
    for i, (start, end) in enumerate(zip(boundaries, boundaries[1:])):
        chunk = lines[start:end]
        if len(chunk) < 10:
            continue  # Skip tiny fragments
        ts_human, ts_iso = extract_timestamp(chunk)
        people   = extract_people(chunk)
        subject  = extract_subject(chunk)
        # Build filename: SOURCESTEM__DATE_TIME_People_subject.txt
        # Source stem prefix prevents collisions when multiple mega-files
        # produce sessions with the same timestamp/people/subject.
        ts_part     = ts_human or f"part{i+1:02d}"
        people_part = "-".join(people[:3]) if people else "unknown"
        src_stem    = re.sub(r"[^\w-]", "_", path.stem)[:40]
        name        = f"{src_stem}__{ts_part}_{people_part}_{subject}.txt"
        # Sanitize
        name = re.sub(r"[^\w\.\-]", "_", name)
        name = re.sub(r"_+", "_", name)
        out_path = out_dir / name
        if dry_run:
            print(f"  [{i+1}/{len(boundaries)-1}] {name}  ({len(chunk)} lines)")
        else:
            out_path.write_text("".join(chunk))
            print(f"  ✓ {name}  ({len(chunk)} lines)")
        written.append(out_path)
    return written
 def main():
    parser = argparse.ArgumentParser(
        description="Split concatenated transcript mega-files into per-session files"
    )
    parser.add_argument("--source",       type=str, default=None,
                        help="Source directory (default: MEMPALACE_SOURCE_DIR or ~/Desktop/transcripts)")
    parser.add_argument("--output-dir",   type=str, default=None,
                        help="Output directory (default: same as source)")
    parser.add_argument("--min-sessions", type=int, default=2,
                        help="Only split files with at least N sessions (default: 2)")
    parser.add_argument("--dry-run",      action="store_true",
                        help="Show what would happen without writing files")
    parser.add_argument("--file",         type=str, default=None,
                        help="Split a single specific file instead of scanning dir")
    args = parser.parse_args()
    src_dir    = Path(args.source) if args.source else LUMI_DIR
    output_dir = args.output_dir or None  # None = same dir as file
    if args.file:
        files = [Path(args.file)]
    else:
        files = sorted(src_dir.glob("*.txt"))
    mega_files = []
    for f in files:
        lines = f.read_text(errors="replace").splitlines(keepends=True)
        boundaries = find_session_boundaries(lines)
        if len(boundaries) >= args.min_sessions:
            mega_files.append((f, len(boundaries)))
    if not mega_files:
        print(f"No mega-files found in {src_dir} (min {args.min_sessions} sessions).")
        return
    print(f"\n{'='*60}")
    print(f"  Mega-file splitter — {'DRY RUN' if args.dry_run else 'SPLITTING'}")
    print(f"{'='*60}")
    print(f"  Source:      {src_dir}")
    print(f"  Output:      {output_dir or 'same dir as source'}")
    print(f"  Mega-files:  {len(mega_files)}")
    print(f"{'─'*60}\n")
    total_written = 0
    for f, n_sessions in mega_files:
        print(f"  {f.name}  ({n_sessions} sessions, {f.stat().st_size // 1024}KB)")
        written = split_file(f, output_dir, dry_run=args.dry_run)
        total_written += len(written)
        if not args.dry_run and written:
            backup = f.with_suffix(".mega_backup")
            f.rename(backup)
            print(f"  → Original renamed to {backup.name}\n")
        else:
            print()
    print(f"{'─'*60}")
    if args.dry_run:
        print(f"  DRY RUN — would create {total_written} files from {len(mega_files)} mega-files")
    else:
        print(f"  Done — created {total_written} files from {len(mega_files)} mega-files")
    print()
 if __name__ == "__main__":
    main()
@@ -0,0 +1,62 @@
 [build-system]
 requires = ["setuptools>=64"]
 build-backend = "setuptools.build_meta"
 [project]
 name = "mempalace"
 version = "3.0.0"
 description = "Give your AI a memory — mine projects and conversations into a searchable palace. No API key required."
 readme = "README.md"
 requires-python = ">=3.9"
 license = "MIT"
 authors = [
    {name = "milla-jovovich"},
 ]
 keywords = [
    "ai", "memory", "llm", "rag", "chromadb", "mcp",
    "vector-database", "claude", "chatgpt", "embeddings",
 ]
 classifiers = [
    "Development Status :: 4 - Beta",
    "Environment :: Console",
    "Intended Audience :: Developers",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Topic :: Scientific/Engineering :: Artificial Intelligence",
    "Topic :: Utilities",
 ]
 dependencies = [
    "chromadb>=0.4.0",
    "pyyaml>=6.0",
 ]
 [project.urls]
 Homepage = "https://github.com/milla-jovovich/mempalace"
 Repository = "https://github.com/milla-jovovich/mempalace"
 "Bug Tracker" = "https://github.com/milla-jovovich/mempalace/issues"
 [tool.setuptools.packages.find]
 include = ["mempalace*"]
 [project.scripts]
 mempalace = "mempalace:main"
 [project.optional-dependencies]
 dev = ["pytest>=7.0", "build>=1.0", "twine>=4.0"]
 [tool.ruff]
 line-length = 100
 target-version = "py39"
 [tool.ruff.lint]
 select = ["E", "F", "W"]
 ignore = ["E501"]
 [tool.ruff.format]
 quote-style = "double"
 [tool.pytest.ini_options]
 testpaths = ["tests"]
@@ -0,0 +1,2 @@
 chromadb>=0.4.0
 pyyaml>=6.0