552e9927b7
Lands the read-side contract so third-party adapter authors (@Perseusxrltd, @JakobSachs, @adv3nt3, @zendesk-thittesdorf, @mfhens, @roip, @MrDys) have a stable target matching what RFC 001 §10 landed on the write side in #995. Scope (this PR): - mempalace/sources/base.py: BaseSourceAdapter ABC with kwargs-only ingest() / describe_schema() and default is_current() / source_summary() / close() (§1.1–1.2). Typed records: SourceRef, SourceItemMetadata, DrawerRecord, RouteHint, SourceSummary, AdapterSchema, FieldSpec (§1.3, §5.2). Error classes: SourceNotFoundError, AuthRequiredError, AdapterClosedError, TransformationViolationError, SchemaConformanceError (§2.7). Class-level identity contract: name / adapter_version / capabilities / supported_modes / declared_transformations / default_privacy_class (§2.1, §1.4, §1.5, §6). - mempalace/sources/transforms.py: reference implementations of the 13 reserved transformations (§1.4) — utf8_replace_invalid, newline_normalize, whitespace_trim, whitespace_collapse_internal, line_trim, line_join_spaces, blank_line_drop — as pure functions, plus identity shims for the six adapter-specific ones (strip_tool_chrome, tool_result_truncate, tool_result_omitted, spellcheck_user, synthesized_marker, speaker_role_assignment) that the conversations adapter will override when migrated. get_transformation(name) resolves by reserved name. - mempalace/sources/registry.py: entry-point discovery via importlib.metadata.entry_points(group="mempalace.sources") + explicit register()/unregister() surface (§3.1–3.2). resolve_adapter_for_source() implements the §3.3 priority order; crucially, no auto-detection on the read side (§3.3 is explicit about that — user intent never inferred from on-disk artifacts). - mempalace/sources/context.py: PalaceContext facade (§9) bundling the drawer/closet collections, knowledge graph, palace path, adapter identity, and progress hooks core passes into adapter.ingest(). upsert_drawer() applies the spec-mandated adapter_name/adapter_version stamps from §5.1. skip_current_item() signals laziness; emit() dispatches to hooks and swallows hook exceptions. - mempalace/knowledge_graph.py: add_triple() gains optional source_drawer_id and adapter_name kwargs (§5.5). Backwards-compatible column migration auto-adds the new columns on open of a pre-RFC 002 palace (PRAGMA table_info then ALTER TABLE ADD COLUMN), matching the pattern used for any new palace-side provenance fields. - pyproject.toml: mempalace.sources entry-point group declared. Empty on the first-party side for now — miners migrate in a follow-up; the group being present means third-party packages can begin registering today. Out of scope (explicit follow-ups): - miner.py → mempalace/sources/filesystem.py. Behavior-preserving rename that also moves READABLE_EXTENSIONS, detect_room(), detect_hall() into the adapter (§9). Larger refactor; lands separately. - convo_miner.py + normalize.py → mempalace/sources/conversations.py. The format-detection if-chain in normalize.py becomes per-format plugins; declared_transformations enumerates what the current pipeline already does to source bytes (§1.4 existing-code mapping). - Closet post-step wired into the conversations adapter (§1.7). - CLI --source flag + --mode deprecation alias (§3.3). - MCP mempalace_mine tool source parameter. - AbstractSourceAdapterContractSuite (§7.1–7.3): byte-preservation round- trip and declared-transformation round-trip tests. - Privacy-class floor enforcement (§6.2); depends on #389 for secrets_possible scanning. Tests: 1018 passed (up from ~990 on develop), +27 targeted tests covering the ABC instantiation rules, typed records, all reserved transformations, the registry register/get/unregister surface, PalaceContext upsert + skip + emit semantics, and both the new KG provenance kwargs and backwards- compatible legacy-schema migration. Refs: #989 (RFC 002 tracking), #990 (RFC 002 spec), #995 (RFC 001 §10 cleanup — sibling PR on the write side).
163 lines
5.0 KiB
Python
163 lines
5.0 KiB
Python
"""Source adapter registry + entry-point discovery (RFC 002 §3).
|
|
|
|
Third-party adapters ship as installable packages that declare a
|
|
``mempalace.sources`` entry point::
|
|
|
|
# pyproject.toml of mempalace-source-cursor
|
|
[project.entry-points."mempalace.sources"]
|
|
cursor = "mempalace_source_cursor:CursorAdapter"
|
|
|
|
MemPalace discovers them at process start. In-tree tests and local
|
|
development can register manually via :func:`register`. Explicit
|
|
registration wins on name conflict (RFC 002 §3.2).
|
|
|
|
Unlike storage backends (RFC 001 §3.3), source adapters are never auto-
|
|
detected — the user selects the adapter explicitly via ``--source NAME``
|
|
or config (§3.3). The default when no adapter is named is ``filesystem``
|
|
(to preserve current ``mempalace mine <path>`` behavior).
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
from importlib import metadata
|
|
from threading import Lock
|
|
from typing import Type
|
|
|
|
from .base import BaseSourceAdapter
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
_ENTRY_POINT_GROUP = "mempalace.sources"
|
|
_DEFAULT_ADAPTER = "filesystem"
|
|
|
|
_registry: dict[str, Type[BaseSourceAdapter]] = {}
|
|
_instances: dict[str, BaseSourceAdapter] = {}
|
|
_explicit: set[str] = set()
|
|
_discovered = False
|
|
_lock = Lock()
|
|
|
|
|
|
def register(name: str, adapter_cls: Type[BaseSourceAdapter]) -> None:
|
|
"""Register ``adapter_cls`` under ``name``.
|
|
|
|
Explicit registration wins over entry-point discovery on conflict (§3.2).
|
|
"""
|
|
with _lock:
|
|
_registry[name] = adapter_cls
|
|
_explicit.add(name)
|
|
_instances.pop(name, None)
|
|
|
|
|
|
def unregister(name: str) -> None:
|
|
"""Remove an adapter registration (primarily for tests)."""
|
|
with _lock:
|
|
_registry.pop(name, None)
|
|
_explicit.discard(name)
|
|
_instances.pop(name, None)
|
|
|
|
|
|
def _discover_entry_points() -> None:
|
|
global _discovered
|
|
if _discovered:
|
|
return
|
|
with _lock:
|
|
if _discovered:
|
|
return
|
|
try:
|
|
eps = metadata.entry_points()
|
|
group = (
|
|
eps.select(group=_ENTRY_POINT_GROUP)
|
|
if hasattr(eps, "select")
|
|
else eps.get(_ENTRY_POINT_GROUP, [])
|
|
)
|
|
except Exception:
|
|
logger.exception("entry-point discovery for %s failed", _ENTRY_POINT_GROUP)
|
|
group = []
|
|
for ep in group:
|
|
if ep.name in _explicit:
|
|
continue # explicit registration wins
|
|
try:
|
|
cls = ep.load()
|
|
except Exception:
|
|
logger.exception("failed to load adapter entry point %r", ep.name)
|
|
continue
|
|
if not isinstance(cls, type) or not issubclass(cls, BaseSourceAdapter):
|
|
logger.warning(
|
|
"entry point %r did not resolve to a BaseSourceAdapter subclass (got %r)",
|
|
ep.name,
|
|
cls,
|
|
)
|
|
continue
|
|
_registry.setdefault(ep.name, cls)
|
|
_discovered = True
|
|
|
|
|
|
def available_adapters() -> list[str]:
|
|
"""Return sorted list of all registered adapter names."""
|
|
_discover_entry_points()
|
|
return sorted(_registry.keys())
|
|
|
|
|
|
def get_adapter_class(name: str) -> Type[BaseSourceAdapter]:
|
|
"""Return the registered adapter class for ``name``."""
|
|
_discover_entry_points()
|
|
try:
|
|
return _registry[name]
|
|
except KeyError as e:
|
|
raise KeyError(f"unknown source adapter {name!r}; available: {available_adapters()}") from e
|
|
|
|
|
|
def get_adapter(name: str) -> BaseSourceAdapter:
|
|
"""Return a long-lived instance of the named adapter.
|
|
|
|
Instances are cached per-name; repeated calls return the same object.
|
|
Call :func:`reset_adapters` in tests that need isolation.
|
|
"""
|
|
_discover_entry_points()
|
|
with _lock:
|
|
inst = _instances.get(name)
|
|
if inst is not None:
|
|
return inst
|
|
cls = _registry.get(name)
|
|
if cls is None:
|
|
raise KeyError(
|
|
f"unknown source adapter {name!r}; available: {sorted(_registry.keys())}"
|
|
)
|
|
inst = cls()
|
|
_instances[name] = inst
|
|
return inst
|
|
|
|
|
|
def reset_adapters() -> None:
|
|
"""Close and drop all cached adapter instances (primarily for tests)."""
|
|
with _lock:
|
|
for inst in _instances.values():
|
|
try:
|
|
inst.close()
|
|
except Exception:
|
|
logger.exception("error closing adapter during reset")
|
|
_instances.clear()
|
|
|
|
|
|
def resolve_adapter_for_source(
|
|
*,
|
|
explicit: str | None = None,
|
|
config_value: str | None = None,
|
|
default: str = _DEFAULT_ADAPTER,
|
|
) -> str:
|
|
"""Resolve the adapter name per RFC 002 §3.3 priority order.
|
|
|
|
1. Explicit ``--source`` flag or kwarg
|
|
2. Per-source config value
|
|
3. Default (``filesystem``)
|
|
|
|
Auto-detection is *intentionally* absent on the read side (§3.3); a
|
|
directory containing ``.git`` + ``workspaceStorage/`` + an ``mbox`` file
|
|
is not a signal of user intent.
|
|
"""
|
|
for candidate in (explicit, config_value):
|
|
if candidate:
|
|
return candidate
|
|
return default
|