Files
mempalace/mempalace/sources/registry.py
T
Igor Lins e Silva 552e9927b7 refactor(sources): RFC 002 §9 scaffolding — BaseSourceAdapter, registry, PalaceContext
Lands the read-side contract so third-party adapter authors (@Perseusxrltd,
@JakobSachs, @adv3nt3, @zendesk-thittesdorf, @mfhens, @roip, @MrDys) have a
stable target matching what RFC 001 §10 landed on the write side in #995.

Scope (this PR):

- mempalace/sources/base.py: BaseSourceAdapter ABC with kwargs-only
  ingest() / describe_schema() and default is_current() / source_summary()
  / close() (§1.1–1.2). Typed records: SourceRef, SourceItemMetadata,
  DrawerRecord, RouteHint, SourceSummary, AdapterSchema, FieldSpec (§1.3,
  §5.2). Error classes: SourceNotFoundError, AuthRequiredError,
  AdapterClosedError, TransformationViolationError, SchemaConformanceError
  (§2.7). Class-level identity contract: name / adapter_version /
  capabilities / supported_modes / declared_transformations /
  default_privacy_class (§2.1, §1.4, §1.5, §6).

- mempalace/sources/transforms.py: reference implementations of the 13
  reserved transformations (§1.4) — utf8_replace_invalid, newline_normalize,
  whitespace_trim, whitespace_collapse_internal, line_trim, line_join_spaces,
  blank_line_drop — as pure functions, plus identity shims for the six
  adapter-specific ones (strip_tool_chrome, tool_result_truncate,
  tool_result_omitted, spellcheck_user, synthesized_marker,
  speaker_role_assignment) that the conversations adapter will override
  when migrated. get_transformation(name) resolves by reserved name.

- mempalace/sources/registry.py: entry-point discovery via
  importlib.metadata.entry_points(group="mempalace.sources") + explicit
  register()/unregister() surface (§3.1–3.2). resolve_adapter_for_source()
  implements the §3.3 priority order; crucially, no auto-detection on the
  read side (§3.3 is explicit about that — user intent never inferred from
  on-disk artifacts).

- mempalace/sources/context.py: PalaceContext facade (§9) bundling the
  drawer/closet collections, knowledge graph, palace path, adapter identity,
  and progress hooks core passes into adapter.ingest(). upsert_drawer()
  applies the spec-mandated adapter_name/adapter_version stamps from §5.1.
  skip_current_item() signals laziness; emit() dispatches to hooks and
  swallows hook exceptions.

- mempalace/knowledge_graph.py: add_triple() gains optional source_drawer_id
  and adapter_name kwargs (§5.5). Backwards-compatible column migration
  auto-adds the new columns on open of a pre-RFC 002 palace (PRAGMA
  table_info then ALTER TABLE ADD COLUMN), matching the pattern used for
  any new palace-side provenance fields.

- pyproject.toml: mempalace.sources entry-point group declared. Empty on
  the first-party side for now — miners migrate in a follow-up; the group
  being present means third-party packages can begin registering today.

Out of scope (explicit follow-ups):

- miner.py → mempalace/sources/filesystem.py. Behavior-preserving rename
  that also moves READABLE_EXTENSIONS, detect_room(), detect_hall() into
  the adapter (§9). Larger refactor; lands separately.
- convo_miner.py + normalize.py → mempalace/sources/conversations.py. The
  format-detection if-chain in normalize.py becomes per-format plugins;
  declared_transformations enumerates what the current pipeline already
  does to source bytes (§1.4 existing-code mapping).
- Closet post-step wired into the conversations adapter (§1.7).
- CLI --source flag + --mode deprecation alias (§3.3).
- MCP mempalace_mine tool source parameter.
- AbstractSourceAdapterContractSuite (§7.1–7.3): byte-preservation round-
  trip and declared-transformation round-trip tests.
- Privacy-class floor enforcement (§6.2); depends on #389 for
  secrets_possible scanning.

Tests: 1018 passed (up from ~990 on develop), +27 targeted tests covering
the ABC instantiation rules, typed records, all reserved transformations,
the registry register/get/unregister surface, PalaceContext upsert + skip +
emit semantics, and both the new KG provenance kwargs and backwards-
compatible legacy-schema migration.

Refs: #989 (RFC 002 tracking), #990 (RFC 002 spec), #995 (RFC 001 §10
cleanup — sibling PR on the write side).
2026-04-18 16:05:32 -03:00

163 lines
5.0 KiB
Python

"""Source adapter registry + entry-point discovery (RFC 002 §3).
Third-party adapters ship as installable packages that declare a
``mempalace.sources`` entry point::
# pyproject.toml of mempalace-source-cursor
[project.entry-points."mempalace.sources"]
cursor = "mempalace_source_cursor:CursorAdapter"
MemPalace discovers them at process start. In-tree tests and local
development can register manually via :func:`register`. Explicit
registration wins on name conflict (RFC 002 §3.2).
Unlike storage backends (RFC 001 §3.3), source adapters are never auto-
detected — the user selects the adapter explicitly via ``--source NAME``
or config (§3.3). The default when no adapter is named is ``filesystem``
(to preserve current ``mempalace mine <path>`` behavior).
"""
from __future__ import annotations
import logging
from importlib import metadata
from threading import Lock
from typing import Type
from .base import BaseSourceAdapter
logger = logging.getLogger(__name__)
_ENTRY_POINT_GROUP = "mempalace.sources"
_DEFAULT_ADAPTER = "filesystem"
_registry: dict[str, Type[BaseSourceAdapter]] = {}
_instances: dict[str, BaseSourceAdapter] = {}
_explicit: set[str] = set()
_discovered = False
_lock = Lock()
def register(name: str, adapter_cls: Type[BaseSourceAdapter]) -> None:
"""Register ``adapter_cls`` under ``name``.
Explicit registration wins over entry-point discovery on conflict (§3.2).
"""
with _lock:
_registry[name] = adapter_cls
_explicit.add(name)
_instances.pop(name, None)
def unregister(name: str) -> None:
"""Remove an adapter registration (primarily for tests)."""
with _lock:
_registry.pop(name, None)
_explicit.discard(name)
_instances.pop(name, None)
def _discover_entry_points() -> None:
global _discovered
if _discovered:
return
with _lock:
if _discovered:
return
try:
eps = metadata.entry_points()
group = (
eps.select(group=_ENTRY_POINT_GROUP)
if hasattr(eps, "select")
else eps.get(_ENTRY_POINT_GROUP, [])
)
except Exception:
logger.exception("entry-point discovery for %s failed", _ENTRY_POINT_GROUP)
group = []
for ep in group:
if ep.name in _explicit:
continue # explicit registration wins
try:
cls = ep.load()
except Exception:
logger.exception("failed to load adapter entry point %r", ep.name)
continue
if not isinstance(cls, type) or not issubclass(cls, BaseSourceAdapter):
logger.warning(
"entry point %r did not resolve to a BaseSourceAdapter subclass (got %r)",
ep.name,
cls,
)
continue
_registry.setdefault(ep.name, cls)
_discovered = True
def available_adapters() -> list[str]:
"""Return sorted list of all registered adapter names."""
_discover_entry_points()
return sorted(_registry.keys())
def get_adapter_class(name: str) -> Type[BaseSourceAdapter]:
"""Return the registered adapter class for ``name``."""
_discover_entry_points()
try:
return _registry[name]
except KeyError as e:
raise KeyError(f"unknown source adapter {name!r}; available: {available_adapters()}") from e
def get_adapter(name: str) -> BaseSourceAdapter:
"""Return a long-lived instance of the named adapter.
Instances are cached per-name; repeated calls return the same object.
Call :func:`reset_adapters` in tests that need isolation.
"""
_discover_entry_points()
with _lock:
inst = _instances.get(name)
if inst is not None:
return inst
cls = _registry.get(name)
if cls is None:
raise KeyError(
f"unknown source adapter {name!r}; available: {sorted(_registry.keys())}"
)
inst = cls()
_instances[name] = inst
return inst
def reset_adapters() -> None:
"""Close and drop all cached adapter instances (primarily for tests)."""
with _lock:
for inst in _instances.values():
try:
inst.close()
except Exception:
logger.exception("error closing adapter during reset")
_instances.clear()
def resolve_adapter_for_source(
*,
explicit: str | None = None,
config_value: str | None = None,
default: str = _DEFAULT_ADAPTER,
) -> str:
"""Resolve the adapter name per RFC 002 §3.3 priority order.
1. Explicit ``--source`` flag or kwarg
2. Per-source config value
3. Default (``filesystem``)
Auto-detection is *intentionally* absent on the read side (§3.3); a
directory containing ``.git`` + ``workspaceStorage/`` + an ``mbox`` file
is not a signal of user intent.
"""
for candidate in (explicit, config_value):
if candidate:
return candidate
return default