feat(init): context-aware corpus detection

10 files changed. 2,563 insertions, 30 deletions. 48 new tests, including end-to-end coverage live-tested with Anthropic Haiku 4.5.

This PR overhauls the first-run experience of `mempalace init` end-to-end, ships a new corpus-origin detection module from scratch, wires it into entity classification and LLM refinement, adds a graceful-fallback path that means `init` never crashes on a missing LLM, and ships a meta-test that prevents internal-coordination jargon from leaking into source or tests.

The headline change is that `mempalace init` now understands what kind of folder you're pointing it at — AI conversations, regular writing, code, narrative — and adapts how it classifies entities accordingly. The same folder containing `Echo`, `Sparrow`, and `Cipher` (names you've assigned to AI agents) used to dump those into your "people" list alongside biological humans. Now they go into a separate `agent_personas` bucket, and your `people` list stays clean.

But the broader change is that `mempalace init` got upgraded across the board — smarter defaults, smarter degradation, smarter classification, smarter persistence, and a new way to refresh as your folder grows. Built and live-verified with Anthropic Haiku 4.5; runs unmodified on the local LLM runtimes mempalace already supports.

## What changes for users (in order, from `pip install` onwards)

**Install** — `pip install mempalace` is unchanged. The package itself didn't shift.

**First run — `mempalace init <folder>`:**

1. **`init` examines your folder before classifying anything.** A free regex heuristic decides in milliseconds: AI conversations, regular writing, narrative, or code? If an LLM is reachable, a second pass extracts the corpus author's name and any agent persona names from the dialogue. v3.3.3 had no such step — it dove straight into entity detection with no corpus context.

2. **LLM-assisted classification is now ON by default.** v3.3.3 made `--llm` opt-in. The LLM-assisted path is qualitatively better (extracts persona names, refines ambiguous classifications, gives the model corpus context) so it now runs by default. The provider abstraction is unchanged from v3.3.3 — three buckets are supported by `mempalace.llm_client`:
   - **Anthropic** (`--llm-provider anthropic` + `ANTHROPIC_API_KEY`) — the official Messages API. **This is the path live-verified end-to-end in this PR with Haiku 4.5.** Cost: ~\$0.01 per `init`.
   - **Ollama** (`--llm-provider ollama` — the default) — local models via `http://localhost:11434`. Fully offline. Honors the "zero-API required" promise.
   - **OpenAI-compatible** (`--llm-provider openai-compat` + `--llm-endpoint`) — per the v3.3.3 `mempalace/llm_client.py` docstring, this covers "OpenRouter, LM Studio, llama.cpp server, vLLM, Groq, Fireworks, Together, and most self-hosted setups." We did not test each of those individually as part of this PR; the abstraction has been stable since v3.3.3. If you try this PR with a specific provider and hit a quirk, please file an issue or comment here.

3. **`init` never blocks on a missing LLM.** No Ollama running, no API key set? `init` prints a one-line message pointing at `--no-llm` and falls through to the heuristic-only path. New default behavior, new graceful fallback to support it. `--no-llm` is the new explicit opt-out.

4. **`init` shows you what it detected.** A one-line banner — `Detected: Claude (Anthropic) (user: Jordan, agents: Echo, Sparrow, Cipher)` or `Corpus origin: not AI-dialogue (confidence: 0.98)` — tells you at a glance whether mempalace understood your folder.

5. **Entity classification gets smarter across the board.** Even non-persona candidates benefit: the LLM has corpus context (this is AI-dialogue, this is the user's name, these are agent names) and uses it to disambiguate ambiguous candidates that aren't personas at all.

6. **Agent personas live in their own bucket.** Names you've assigned to AI agents (Echo, Sparrow, Cipher) go into a new `agent_personas` bucket instead of your `people` list. Your real-person entity list stays clean.

7. **Detection result persists to `<palace>/.mempalace/origin.json`** with a `schema_version: 1` envelope, so downstream tools can read it.

8. **Re-running `init` is now idempotent.** Bug fix — running `init` twice on the same folder used to give different classification results because the detection step was sampling its own `entities.json` output. Caught by integration testing during this PR.

**Later — when your folder grows:**

9. **`mempalace mine --redetect-origin`** is a new flag for refreshing the stored detection without redoing the whole `init`. Heuristic-only by design (the flag is meant to be cheap). If you want the full LLM-extracted detection refreshed (persona names, user name, etc.), run `mempalace init <yourfolder>` again — `init` is now idempotent (item 8), so re-running it on the same folder is safe.

## Behind the changes

- **New module** `mempalace/corpus_origin.py` (422 lines) with two-tier detection: regex heuristic with co-occurrence rule (suppresses ambiguous terms like `Claude` / `Gemini` / `Haiku` when no unambiguous AI signal is present, so French novels, astrology forums, poetry corpora, llama-rancher journals don't false-positive), and LLM tier that extracts `user_name` and `agent_persona_names` from dialogue structure with belt-and-suspenders user-vs-agent disambiguation.

- **Entity-classification consumer wiring.** `entity_detector.detect_entities` and `project_scanner.discover_entities` accept an optional `corpus_origin` kwarg. When present and the corpus is identified as AI-dialogue, candidates whose name case-insensitively matches an `agent_persona_name` are routed into the `agent_personas` bucket instead of `people`. Per-entity `type` is rewritten to `"agent_persona"`.

- **LLM-refine consumer wiring.** `llm_refine.refine_entities` accepts the same `corpus_origin` kwarg and prepends a `CORPUS CONTEXT` preamble to its system prompt giving the LLM the platform / user / persona context. Existing `TOPIC` / `PERSON` / `PROJECT` / `COMMON_WORD` / `AMBIGUOUS` labels are unchanged.

- **`init` overhaul.** Pass 0 (corpus-origin detection) inserted before existing Pass 1 (entity discovery). `--llm` flipped to default-on. `--no-llm` added. Graceful-fallback path replaces the previous hard-error on missing LLM. Provider precedence unchanged from the existing `llm_client` module.

- **`mine` flag.** `mempalace mine --redetect-origin` re-runs corpus-origin detection on the current corpus state and overwrites `<palace>/.mempalace/origin.json`.

- **`CLAUDE.md` design principle reworded** — "Local-first, zero external API by default." Local LLMs running on `localhost` (Ollama, LM Studio, llama.cpp, vLLM, unsloth studio) are part of the user's machine, not external APIs. External BYOK providers (Anthropic, OpenAI, Google) are supported but always opt-in, never default, never silent fallback.

## Cost story

- **Anthropic (verified path):** ~\$0.01 per `init` via Haiku 4.5 with `ANTHROPIC_API_KEY`.
- **Ollama / local LLM runtime:** zero cost. Fully offline.
- **OpenAI-compatible service:** depends entirely on the service. The abstraction supports any service speaking the standard `/v1/chat/completions` API; specific quirks vary per provider. Try it and tell us how it goes.
- **No LLM at all:** graceful fallback to heuristic-only. Zero cost. `init` never blocks.

## Backwards compatibility

- All public function signatures gained the `corpus_origin` kwarg as optional (default `None`). Callers that don't pass it see the v3.3.3 return shape unchanged — no `agent_personas` key, no behavioral change.
- The `--llm` CLI flag is preserved as a deprecated alias of the default. Existing scripts that pass it continue to work.
- `corpus_origin=None` keeps `llm_refine.SYSTEM_PROMPT` byte-identical to v3.3.3.

## Test coverage

- **19 unit tests** in `tests/test_corpus_origin.py` covering both tiers, the co-occurrence rule, ambiguous-term suppression, word-boundary brand matching, and user/persona disambiguation.
- **29 integration tests** in `tests/test_corpus_origin_integration.py` covering end-to-end through `mempalace init`, persona reclassification, the `--redetect-origin` flag, the `--llm` default flip, graceful fallback paths, and re-init idempotency. Of those 29, five specifically cover the intersection with develop's other in-flight work (Pass 0 ↔ auto-mine ordering, topics + agent_personas bucket coexistence, entities.json shape, the `wing=` kwarg threading, llm_refine TOPIC label + corpus_origin preamble composition).
- **1354 total mempalace tests pass.** 2 pre-existing environmental failures (`test_mcp_stdio_protection` — chromadb optional dep) unrelated to this change; they fail on plain `develop` too.
- **Live-smoke-tested** with real Anthropic Haiku 4.5 on AI-dialogue and narrative fixtures.

## Hygiene guardrail

This PR also adds a meta-test (`test_no_internal_coordination_jargon_in_source_or_tests`) that walks the source tree and asserts no internal-coordination jargon (e.g. development-phase markers, internal review-section references) leaks into runtime code, comments, docstrings, or LLM prompts. RED if anything slips in. Allowlist for legitimate RFC/spec section citations in `sources/`, `backends/`, `knowledge_graph.py`, and `i18n/`.
This commit is contained in:
MSL
2026-04-25 22:49:09 -07:00
parent 5de5b0923d
commit b99e54546b
10 changed files with 2582 additions and 30 deletions
+5
View File
@@ -127,6 +127,11 @@ def test_cmd_init_with_entities(mock_config_cls, tmp_path):
patch("mempalace.entity_detector.detect_entities", return_value=detected),
patch("mempalace.entity_detector.confirm_entities", return_value=confirmed),
patch("mempalace.room_detector_local.detect_rooms_local"),
# Pass 0 (corpus_origin) needs real file IO; this test mocks
# builtins.open globally for the entities.json write, which would
# break Pass 0's file-reading path. Patch Pass 0 out — a separate
# suite (tests/test_corpus_origin_integration.py) covers it directly.
patch("mempalace.cli._run_pass_zero", return_value=None),
patch("builtins.open", MagicMock()),
patch("mempalace.cli._maybe_run_mine_after_init"),
):
+395
View File
@@ -0,0 +1,395 @@
"""Tests for corpus_origin detection.
The corpus-origin detector answers ONE foundational question before any
downstream Pass 2 classification runs:
"Is this corpus a record of AI-agent dialogue, and if so, which platform
and what persona names has the user assigned to the agent?"
Detection is two-tier:
- Tier 1: cheap content-aware heuristic (grep for well-known AI terms
and turn markers). No API calls. Always runs.
- Tier 2: LLM-assisted confirmation + persona extraction. Takes a small
sample of drawer texts and uses Haiku's pre-trained world knowledge
about Claude/ChatGPT/Gemini/etc. to confirm platform + identify
persona-names the user assigned to the agent.
Default stance: "this IS an AI-dialogue corpus" unless strong evidence
otherwise. False-negative (missing an AI corpus) is catastrophic for
downstream classification; false-positive is recoverable via per-drawer
voice-profile detection in later passes.
TDD: these tests fail until mempalace/corpus_origin.py is implemented."""
from mempalace.corpus_origin import (
CorpusOriginResult,
detect_origin_heuristic,
detect_origin_llm,
)
# ── Tier 1: heuristic (no LLM) ────────────────────────────────────────────
class TestHeuristic:
def test_claude_heavy_corpus_detected(self):
"""A corpus with abundant Claude references + turn markers should
be confidently detected as AI-dialogue."""
samples = [
"user: hey Claude, can you help me\nassistant: sure, what do you need\n",
"I was talking to Claude Opus about the MCP server setup",
"Sonnet 4.5 handled this better than Haiku 4.5 did",
"claude mcp add mempalace -- mempalace-mcp",
"human: what's up\nassistant: I'm happy to help",
]
result = detect_origin_heuristic(samples)
assert result.likely_ai_dialogue is True
assert result.confidence >= 0.8
assert (
"Claude" in " ".join(result.evidence) or "claude" in " ".join(result.evidence).lower()
)
def test_gpt_corpus_detected(self):
samples = [
"I asked ChatGPT to summarize my paper",
"The GPT-4 response was surprisingly good",
"user: explain quantum computing\nassistant: quantum computing uses qubits",
"OpenAI's model was able to help with the code",
]
result = detect_origin_heuristic(samples)
assert result.likely_ai_dialogue is True
assert any("GPT" in e or "ChatGPT" in e or "OpenAI" in e for e in result.evidence)
def test_pure_narrative_corpus_detected_as_not_ai(self):
"""A story/journal corpus with no AI signals should be flagged
not-AI (default stance flipped only with evidence)."""
samples = [
"Today the cat finally ventured into the garden. The dog watched.",
"The morning light came through the window as I wrote.",
"Chapter 3: The Reckoning. It was a dark and stormy night.",
"My father's old journal described the same field in 1972.",
]
result = detect_origin_heuristic(samples)
assert result.likely_ai_dialogue is False
assert result.confidence >= 0.8
def test_ambiguous_corpus_defaults_to_ai(self):
"""When evidence is thin or mixed, default to assuming AI-dialogue.
False-negative is worse than false-positive."""
samples = [
"some notes about the meeting today",
"Later on I went to the store.",
"Short file with little signal.",
]
result = detect_origin_heuristic(samples)
# Low signal → default stance is ai_dialogue=True with low confidence
assert result.likely_ai_dialogue is True
assert result.confidence <= 0.6
assert "default-stance" in " ".join(result.evidence).lower()
def test_turn_markers_alone_sufficient(self):
"""Even without AI brand mentions, strong turn-marker presence
indicates dialogue structure consistent with AI corpora."""
samples = [
"user: hello\nassistant: hi there, how can I help?\nuser: summarize X\nassistant: sure",
"human: what's the weather\nai: I don't have real-time data\n",
]
result = detect_origin_heuristic(samples)
assert result.likely_ai_dialogue is True
# ── Pattern + context (not capitalization, not English-rule) ──────────
def test_brand_terms_case_insensitive(self):
"""Detection cannot rely on the user typing proper-cased brand names.
Lowercase 'claude code', 'chatgpt', 'gemini-pro', 'mcp' must trip
the same as their proper-cased equivalents. NO turn-marker fallback
in this corpus — the brand matches must do the work."""
samples = [
"i love claude code, it just works for refactoring tasks",
"asked chatgpt to write a regex and it nailed it on the first try",
"switched to gemini-pro for the long-context summary task last week",
"added mempalace as an mcp server in my .claude/ settings file",
"anthropic's haiku model is cheap enough to run on every drawer",
]
result = detect_origin_heuristic(samples)
assert (
result.likely_ai_dialogue is True
), f"lowercase brand terms missed; evidence: {result.evidence}"
# Evidence must show MULTIPLE distinct case-insensitive brand matches.
# 'chatgpt' lowercase only matches under case-insensitive search
# (the brand list has 'ChatGPT' proper-cased only).
evidence_str = " ".join(result.evidence).lower()
matched = sum(t in evidence_str for t in ("chatgpt", "anthropic", "haiku", "gemini-pro"))
assert (
matched >= 2
), f"case-insensitive brand matches did not fire — only got: {result.evidence}"
def test_zodiac_corpus_not_flagged_as_ai(self):
"""An astrology forum post with high 'Gemini' density but ZERO
unambiguous AI signals (no MCP/LLM/ChatGPT/turn markers) must NOT
be flagged as AI-dialogue. Word-sense disambiguation is required:
Gemini-the-zodiac-sign vs Gemini-the-AI-platform."""
samples = [
"I'm a Gemini sun, Pisces moon, and Leo rising.",
"Geminis are dreamers and overthinkers — that's the dual nature.",
"Compatibility between Gemini and Sagittarius is famously strong.",
"If you're a Gemini, expect Mercury retrograde to hit you hardest.",
"My horoscope this week says Gemini energy will dominate Wednesday.",
"The Gemini twins in Greek mythology are Castor and Pollux.",
]
result = detect_origin_heuristic(samples)
assert (
result.likely_ai_dialogue is False
), f"zodiac corpus wrongly flagged AI; evidence: {result.evidence}"
def test_french_novel_with_claude_name_not_flagged(self):
"""A French novel where 'Claude' is a character name (Claude is a
common French masculine name) must NOT trip AI-dialogue detection.
Disambiguation is by context, not by the presence of the word."""
samples = [
"Claude marchait lentement le long de la Seine ce matin-là.",
"« Claude, tu rentres dîner? » lui demanda sa mère depuis la cuisine.",
"Pour Claude, l'art de vivre passait avant tout par la patience.",
"Le vieux Claude se souvenait encore de la guerre, des champs déserts.",
"Claude ouvrit la fenêtre. Le matin sentait le pain frais et la pluie.",
"Les amis de Claude s'étaient réunis chez lui pour fêter ses soixante ans.",
]
result = detect_origin_heuristic(samples)
assert (
result.likely_ai_dialogue is False
), f"French novel wrongly flagged AI; evidence: {result.evidence}"
def test_poetry_corpus_with_haiku_sonnet_not_flagged(self):
"""A poetry corpus with high 'haiku', 'sonnet', 'opus' density
(poetic forms / classical music terms) but no AI infrastructure
terms must NOT be flagged as AI-dialogue."""
samples = [
"A haiku is seventeen syllables across three lines: 5-7-5.",
"Shakespeare's sonnet 18 remains the most quoted in the English canon.",
"Beethoven's opus 27 includes the Moonlight Sonata.",
"I wrote three haiku this morning before coffee.",
"The sonnet form arrived in England via Wyatt and Surrey.",
"Her first opus, published at twenty, was a song cycle for soprano.",
]
result = detect_origin_heuristic(samples)
assert (
result.likely_ai_dialogue is False
), f"poetry corpus wrongly flagged AI; evidence: {result.evidence}"
def test_word_boundary_brand_matching(self):
"""Brand-term matching must use word boundaries. Embedded matches
inside larger words ('Claudette''Claude', 'opuscule''Opus',
'sonneteer''Sonnet', 'llamas''Llama', 'bardic''Bard')
must NOT be counted as brand hits.
Word boundaries don't change classification on the co-occurrence-
suppressed cases, but they clean up the evidence strings — false
matches must not appear in the audit trail. They also prevent
'Claude Code' from triple-counting as 'Claude Code' + 'Claude'
overlap."""
samples = [
"My grandmother Claudette baked the most beautiful tarts every Sunday.",
"Two llamas were spotted near the trailhead this morning at sunrise.",
"Beethoven's opuscule for solo violin remained unpublished for decades.",
"She studied to become a sonneteer after reading the full Spenser cycle.",
"Bardic traditions in the Hebrides survived well into the eighteenth century.",
"The complete opuses of Mozart fill an entire wall of the library.",
]
result = detect_origin_heuristic(samples)
evidence_str = " ".join(result.evidence).lower()
# None of the brand terms should show up in evidence — every
# would-be match is an embedded false-positive that word
# boundaries should suppress.
for embedded_term in ("claude", "opus", "sonnet", "llama", "bard"):
assert f"'{embedded_term}'" not in evidence_str, (
f"word-boundary bug: '{embedded_term}' falsely matched inside "
f"a longer word — evidence: {result.evidence}"
)
# And classification should be not-AI (no real AI signals present).
assert (
result.likely_ai_dialogue is False
), f"corpus has no real AI signals; evidence: {result.evidence}"
def test_ambiguous_brand_with_unambiguous_signal_flagged(self):
"""When an ambiguous brand term ('Gemini') co-occurs with an
UNAMBIGUOUS AI signal (turn markers, MCP, ChatGPT, Claude Code)
in the same corpus, the Gemini hits SHOULD count and the corpus
SHOULD be flagged as AI-dialogue."""
samples = [
"Switched the agent from Gemini to ChatGPT mid-session for cost reasons.",
"Gemini handled the long-context task; user: please summarize\nassistant: here is the summary",
"user: try Gemini for this\nassistant: running it through gemini-pro now",
"MCP server config: Gemini as primary, OpenAI as fallback.",
]
result = detect_origin_heuristic(samples)
assert (
result.likely_ai_dialogue is True
), f"ambiguous+unambiguous co-occurrence missed; evidence: {result.evidence}"
# ── Tier 2: LLM-assisted (mocked) ─────────────────────────────────────────
class _FakeProvider:
"""Minimal stand-in for mempalace's LLMProvider used for testing."""
def __init__(self, canned_response):
self._response = canned_response
self.calls = []
def classify(self, system, user, json_mode=True):
self.calls.append({"system": system, "user": user})
class R:
text = self._response
return R()
def check_available(self):
return True, "ok"
class TestLLMConfirmation:
def test_extracts_persona_names_and_platform(self):
fake_response = """{
"is_ai_dialogue_corpus": true,
"confidence": 0.97,
"primary_platform": "Claude Code (Anthropic CLI)",
"agent_persona_names": ["Echo", "Sparrow", "Cipher", "Orc"],
"evidence": [
"user addresses agent as 'Echo' on assistant turns",
"Claude Code banner text in samples",
"references to MCP, CLAUDE.md, hooks"
]
}"""
provider = _FakeProvider(fake_response)
samples = [
"user: hey Echo, what's up\nassistant: I'm here, what do you need\n",
"Claude Code session banner Sonnet 4.5 Claude Pro",
]
result = detect_origin_llm(samples, provider)
assert result.likely_ai_dialogue is True
assert result.confidence >= 0.9
assert "Echo" in result.agent_persona_names
assert "Sparrow" in result.agent_persona_names
assert "Claude" in result.primary_platform
def test_narrative_corpus_llm_confirms_no_agent(self):
fake_response = """{
"is_ai_dialogue_corpus": false,
"confidence": 0.95,
"primary_platform": null,
"agent_persona_names": [],
"evidence": ["pure narrative prose, no turn markers, no AI terms"]
}"""
provider = _FakeProvider(fake_response)
samples = ["Once upon a time in a small village", "The old woman smiled"]
result = detect_origin_llm(samples, provider)
assert result.likely_ai_dialogue is False
assert result.agent_persona_names == []
assert result.primary_platform is None
def test_handles_malformed_llm_response(self):
"""If the LLM returns garbage, fall back gracefully to the
conservative default (assume AI-dialogue with low confidence)."""
provider = _FakeProvider("not even close to JSON")
result = detect_origin_llm(["sample text"], provider)
# Fallback: conservative default, low confidence
assert result.likely_ai_dialogue is True
assert result.confidence <= 0.5
assert (
"fallback" in " ".join(result.evidence).lower()
or "error" in " ".join(result.evidence).lower()
)
def test_filters_user_name_out_of_personas(self):
"""Regression test: Haiku sometimes leaks the user's own name into
agent_persona_names despite the prompt's CRITICAL distinction. The
parser must strip the user's name from personas if it appears in
both fields (case-insensitive). The user is the human author of
the corpus, not an agent persona."""
fake_response = """{
"is_ai_dialogue_corpus": true,
"confidence": 0.97,
"primary_platform": "Claude (Anthropic)",
"user_name": "Jordan",
"agent_persona_names": ["Echo", "Sparrow", "Jordan", "Cipher"],
"evidence": ["user Jordan talks to agents Echo/Sparrow/Cipher"]
}"""
provider = _FakeProvider(fake_response)
result = detect_origin_llm(["sample"], provider)
# user_name is exposed in its own field
assert result.user_name == "Jordan"
# "Jordan" is filtered out of agent_persona_names
assert "Jordan" not in result.agent_persona_names
# Real personas are preserved
for persona in ("Echo", "Sparrow", "Cipher"):
assert persona in result.agent_persona_names
def test_filter_is_case_insensitive(self):
"""The user-name filter works even when the LLM returns a casing
mismatch between user_name and the personas list."""
fake_response = """{
"is_ai_dialogue_corpus": true,
"confidence": 0.9,
"primary_platform": "Claude",
"user_name": "Jordan",
"agent_persona_names": ["Echo", "jordan", "JORDAN", "Cipher"],
"evidence": []
}"""
provider = _FakeProvider(fake_response)
result = detect_origin_llm(["sample"], provider)
# All case-variants of the user's name are filtered
assert "jordan" not in [p.lower() for p in result.agent_persona_names]
assert result.agent_persona_names == ["Echo", "Cipher"]
def test_user_name_field_surfaces_author(self):
"""The user_name field captures the human author of the corpus,
separate from agent personas. This gives downstream passes a
clear 'who is the user, who is the agent' distinction."""
fake_response = """{
"is_ai_dialogue_corpus": true,
"confidence": 0.95,
"primary_platform": "ChatGPT (OpenAI)",
"user_name": "Sarah",
"agent_persona_names": ["MyAssistant"],
"evidence": ["Sarah writes to MyAssistant"]
}"""
provider = _FakeProvider(fake_response)
result = detect_origin_llm(["sample"], provider)
assert result.user_name == "Sarah"
assert result.agent_persona_names == ["MyAssistant"]
# ── CorpusOriginResult dataclass ──────────────────────────────────────────
class TestResultDataclass:
def test_result_has_all_fields(self):
r = CorpusOriginResult(
likely_ai_dialogue=True,
confidence=0.95,
primary_platform="Claude Code",
agent_persona_names=["Echo"],
evidence=["test"],
)
assert r.likely_ai_dialogue is True
assert r.confidence == 0.95
assert r.primary_platform == "Claude Code"
assert r.agent_persona_names == ["Echo"]
assert r.evidence == ["test"]
def test_result_serializes_to_dict(self):
r = CorpusOriginResult(
likely_ai_dialogue=False,
confidence=0.9,
primary_platform=None,
agent_persona_names=[],
evidence=[],
)
d = r.to_dict()
assert d["likely_ai_dialogue"] is False
assert d["primary_platform"] is None
assert d["agent_persona_names"] == []
File diff suppressed because it is too large Load Diff