Files
mempalace/tests
Marcio E. Heiderscheidt e61dc2adf8 fix: add provenance header and speaker IDs to Slack transcript imports (#815)
* fix: add provenance header and speaker IDs to Slack transcript imports

Slack exports are multi-party chats where no speaker is inherently
the "user" or "assistant". The parser previously assigned these roles
purely by position, allowing a crafted export to place attacker text
in the "user" role — making it appear as the memory owner's words
in all future retrieval (data poisoning via stored memory).

Changes:
- Add provenance header marking Slack transcripts as multi-party
  with positional (unverified) role assignment
- Prefix each message with the original speaker ID ([U1], [U2], etc.)
  so downstream consumers can distinguish authors
- Keep user/assistant role alternation for exchange-pair chunking
  compatibility with convo_miner.py

Tests:
- Provenance header presence and content
- Speaker ID preservation in output
- Attacker-first-message attribution verification

Refs: MemPalace/mempalace#809

* fix: move Slack provenance to footer, sanitize speaker IDs, extract constant

- Move provenance notice from header to footer to prevent it becoming
  a standalone ChromaDB drawer via paragraph chunking on exports
  with fewer than 3 exchange pairs (violates verbatim-always principle)
- Sanitize speaker user_id/username: strip brackets, newlines, and
  control characters to prevent chunk-boundary injection via crafted
  Slack exports
- Extract header string to _SLACK_PROVENANCE_FOOTER module constant,
  consistent with _TOOL_RESULT_* constants pattern; tests import it
  instead of duplicating the literal

Refs: MemPalace/mempalace#809
2026-04-15 00:27:01 -07:00
..
2026-04-11 16:16:49 -07:00