fix(hooks): consolidate transcript ingest, harden shell parsers (#1231 review)
Address Copilot review on #1231: 1. Stop double-mining the transcript on the Python side. ``_get_mine_targets`` now returns only the ``MEMPAL_DIR`` projects target — the convos target for the transcript dir is dropped because ``_ingest_transcript`` already handles it on every hook fire. The duplicate spawn was using ``sys.executable`` (vs ``_mempalace_python()``) and a different ``--wing``, so each Stop/PreCompact event was writing the same transcript into two wings under asymmetric interpreters and overwriting the single ``_MINE_PID_FILE`` lock. 2. ``_maybe_auto_ingest`` and ``_mine_sync`` now spawn via ``_mempalace_python()`` so the resolved interpreter matches the venv that owns mempalace (matters under GUI-launched harnesses where ``sys.executable`` may resolve to a system Python without chromadb). 3. Replace ``eval $(...)`` in both shell hooks with a ``mapfile``-based reader. Sanitized values are still emitted by the same Python parser, but the shell now does plain variable assignment instead of executing the parser's stdout — smaller blast radius if the sanitizer is ever bypassed. 4. Mirror ``_validate_transcript_path`` in the shell hooks via a ``is_valid_transcript_path`` helper — extension + traversal-segment rejection, parity with the Python validator. The convos mine in each shell hook is now gated on the validator instead of bare ``-f``. 5. Tighten the ``..`` traversal test that previously exercised the suffix gate by mistake (``../../etc/passwd`` lacks ``.json[l]``). Use ``.jsonl`` paths with traversal segments to actually hit the ``..`` rejection branch. 6. README: add a one-liner pointing at ``mempalace sweep`` for users who want per-message recall on top of the file-level chunks the hooks produce. The sweeper was undiscoverable previously. Tests: 1418 passed, 1 skipped (full suite minus benchmarks).
This commit is contained in:
@@ -65,30 +65,53 @@ fi
|
||||
# Read JSON input from stdin
|
||||
INPUT=$(cat)
|
||||
|
||||
# Parse session_id and transcript_path in one call. Sanitize both before
|
||||
# interpolating into shell — same contract as mempal_save_hook.sh.
|
||||
eval $(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
|
||||
# Parse session_id and transcript_path in one call. Sanitize both, then
|
||||
# read sanitized values from one-per-line stdout into shell variables —
|
||||
# avoids ``eval`` on generated code (#1231 review). Same contract as
|
||||
# mempal_save_hook.sh.
|
||||
mapfile -t _mempal_parsed < <(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
|
||||
import sys, json, re
|
||||
data = json.load(sys.stdin)
|
||||
sid = data.get('session_id', 'unknown')
|
||||
tp = data.get('transcript_path', '')
|
||||
safe = lambda s: re.sub(r'[^a-zA-Z0-9_/.\-~]', '', str(s))
|
||||
print(f'SESSION_ID=\"{safe(sid)}\"')
|
||||
print(f'TRANSCRIPT_PATH=\"{safe(tp)}\"')
|
||||
print(safe(sid))
|
||||
print(safe(tp))
|
||||
" 2>/dev/null)
|
||||
SESSION_ID="${_mempal_parsed[0]:-unknown}"
|
||||
TRANSCRIPT_PATH="${_mempal_parsed[1]:-}"
|
||||
|
||||
# Expand ~ in path
|
||||
TRANSCRIPT_PATH="${TRANSCRIPT_PATH/#\~/$HOME}"
|
||||
|
||||
# Validate that TRANSCRIPT_PATH looks like a transcript file. Mirrors
|
||||
# mempalace.hooks_cli._validate_transcript_path so the shell hook
|
||||
# rejects the same shapes the Python hook rejects (#1231 review).
|
||||
is_valid_transcript_path() {
|
||||
local path="$1"
|
||||
[ -n "$path" ] || return 1
|
||||
case "$path" in
|
||||
*.json|*.jsonl) ;;
|
||||
*) return 1 ;;
|
||||
esac
|
||||
case "/$path/" in
|
||||
*/../*) return 1 ;;
|
||||
esac
|
||||
return 0
|
||||
}
|
||||
|
||||
echo "[$(date '+%H:%M:%S')] PRE-COMPACT triggered for session $SESSION_ID" >> "$STATE_DIR/hook.log"
|
||||
|
||||
# Run ingest synchronously so memories land before compaction. Two
|
||||
# independent targets — both run if both are set:
|
||||
# 1. TRANSCRIPT_PATH (from Claude Code) → parent dir, --mode convos
|
||||
# 2. MEMPAL_DIR → --mode projects
|
||||
if [ -n "$TRANSCRIPT_PATH" ] && [ -f "$TRANSCRIPT_PATH" ]; then
|
||||
if is_valid_transcript_path "$TRANSCRIPT_PATH" && [ -f "$TRANSCRIPT_PATH" ]; then
|
||||
mempalace mine "$(dirname "$TRANSCRIPT_PATH")" --mode convos \
|
||||
>> "$STATE_DIR/hook.log" 2>&1
|
||||
elif [ -n "$TRANSCRIPT_PATH" ]; then
|
||||
echo "[$(date '+%H:%M:%S')] Skipping invalid transcript path: $TRANSCRIPT_PATH" \
|
||||
>> "$STATE_DIR/hook.log"
|
||||
fi
|
||||
if [ -n "$MEMPAL_DIR" ] && [ -d "$MEMPAL_DIR" ]; then
|
||||
mempalace mine "$MEMPAL_DIR" --mode projects \
|
||||
|
||||
@@ -83,9 +83,11 @@ fi
|
||||
INPUT=$(cat)
|
||||
|
||||
# Parse all fields in a single Python call (3x faster than separate invocations)
|
||||
# SECURITY: All values are sanitized before being interpolated into shell assignments.
|
||||
# stop_hook_active is coerced to a strict True/False to prevent command injection via eval.
|
||||
eval $(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
|
||||
# without invoking ``eval`` on generated code: Python prints one sanitized
|
||||
# value per line, the shell reads them via ``mapfile`` and does plain
|
||||
# variable assignment — same data, smaller blast radius if the sanitizer
|
||||
# is ever bypassed (#1231 review).
|
||||
mapfile -t _mempal_parsed < <(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
|
||||
import sys, json, re
|
||||
data = json.load(sys.stdin)
|
||||
sid = data.get('session_id', 'unknown')
|
||||
@@ -95,14 +97,36 @@ tp = data.get('transcript_path', '')
|
||||
safe = lambda s: re.sub(r'[^a-zA-Z0-9_/.\-~]', '', str(s))
|
||||
# Coerce stop_hook_active to strict boolean string
|
||||
sha = 'True' if sha_raw is True or str(sha_raw).lower() in ('true', '1', 'yes') else 'False'
|
||||
print(f'SESSION_ID=\"{safe(sid)}\"')
|
||||
print(f'STOP_HOOK_ACTIVE=\"{sha}\"')
|
||||
print(f'TRANSCRIPT_PATH=\"{safe(tp)}\"')
|
||||
print(safe(sid))
|
||||
print(sha)
|
||||
print(safe(tp))
|
||||
" 2>/dev/null)
|
||||
SESSION_ID="${_mempal_parsed[0]:-unknown}"
|
||||
STOP_HOOK_ACTIVE="${_mempal_parsed[1]:-False}"
|
||||
TRANSCRIPT_PATH="${_mempal_parsed[2]:-}"
|
||||
|
||||
# Expand ~ in path
|
||||
TRANSCRIPT_PATH="${TRANSCRIPT_PATH/#\~/$HOME}"
|
||||
|
||||
# Validate that TRANSCRIPT_PATH looks like a transcript file:
|
||||
# - non-empty
|
||||
# - .jsonl or .json suffix
|
||||
# - no traversal segments (.. components)
|
||||
# Mirrors mempalace.hooks_cli._validate_transcript_path so the shell hook
|
||||
# rejects the same shapes the Python hook rejects (#1231 review).
|
||||
is_valid_transcript_path() {
|
||||
local path="$1"
|
||||
[ -n "$path" ] || return 1
|
||||
case "$path" in
|
||||
*.json|*.jsonl) ;;
|
||||
*) return 1 ;;
|
||||
esac
|
||||
case "/$path/" in
|
||||
*/../*) return 1 ;;
|
||||
esac
|
||||
return 0
|
||||
}
|
||||
|
||||
# If we're already in a save cycle, let the AI stop normally
|
||||
# This is the infinite-loop prevention: block once → AI saves → tries to stop again → we let it through
|
||||
if [ "$STOP_HOOK_ACTIVE" = "True" ] || [ "$STOP_HOOK_ACTIVE" = "true" ]; then
|
||||
@@ -165,9 +189,12 @@ if [ "$SINCE_LAST" -ge "$SAVE_INTERVAL" ] && [ "$EXCHANGE_COUNT" -gt 0 ]; then
|
||||
# (code, notes, docs)
|
||||
# MEMPAL_DIR is *additive*, not an override: a user with MEMPAL_DIR
|
||||
# pointed at their project still gets the active conversation mined.
|
||||
if [ -n "$TRANSCRIPT_PATH" ] && [ -f "$TRANSCRIPT_PATH" ]; then
|
||||
if is_valid_transcript_path "$TRANSCRIPT_PATH" && [ -f "$TRANSCRIPT_PATH" ]; then
|
||||
mempalace mine "$(dirname "$TRANSCRIPT_PATH")" --mode convos \
|
||||
>> "$STATE_DIR/hook.log" 2>&1 &
|
||||
elif [ -n "$TRANSCRIPT_PATH" ]; then
|
||||
echo "[$(date '+%H:%M:%S')] Skipping invalid transcript path: $TRANSCRIPT_PATH" \
|
||||
>> "$STATE_DIR/hook.log"
|
||||
fi
|
||||
if [ -n "$MEMPAL_DIR" ] && [ -d "$MEMPAL_DIR" ]; then
|
||||
mempalace mine "$MEMPAL_DIR" --mode projects \
|
||||
|
||||
Reference in New Issue
Block a user