Skip to content

Sefaria Logger Hook

The sefaria-logger hook captures all Sefaria MCP tool calls and logs them to a source chain file, providing complete traceability for all source material fetched during encoding.

Overview

Attribute Value
Hook Name sefaria-logger
Script hooks/scripts/sefaria-logger.py
Event PreToolUse
Matcher mcp__sefaria
Blocking No (logging only)
Timeout 3000ms

Purpose

The Sefaria logger provides:

  1. Source Traceability: Complete record of all texts fetched
  2. Audit Trail: Timestamped log of research process
  3. Reference Validation: Captures what references were queried
  4. Session Documentation: Preserves research history

Configuration

In hooks/hooks.json:

{
  "PreToolUse": [
    {
      "matcher": "mcp__sefaria",
      "hooks": [
        {
          "type": "command",
          "command": "python ${CLAUDE_PLUGIN_ROOT}/hooks/scripts/sefaria-logger.py \"$TOOL_NAME\" \"$TOOL_INPUT\"",
          "timeout": 3000
        }
      ]
    }
  ]
}

Behavior

Successful Logging

When a Sefaria tool is called:

{
  "continue": true,
  "message": "Logged source fetch: Shulchan Arukh, Yoreh De'ah 87:3"
}

No Reference Found

When reference cannot be extracted:

{
  "continue": true,
  "message": ""
}

The hook always allows the operation to proceed.

Matched Tools

The hook matches all Sefaria MCP tools:

Tool Category Example Reference
mcp__sefaria_texts__get_text text_fetch "Genesis 1:1"
mcp__sefaria_texts__get_english_translations translations "Berakhot 2a"
mcp__sefaria_texts__get_links_between_texts links_fetch "YD 87:3"
mcp__sefaria_texts__clarify_name_argument validation "Shulchan Aruch"
mcp__sefaria_texts__get_text_or_category_shape structure "Yoreh Deah"
mcp__sefaria_texts__text_search search "search: basar bechalav"
mcp__sefaria_texts__english_semantic_search search "search: meat milk prohibition"
mcp__sefaria_texts__search_in_dictionaries dictionary "noten taam"
mcp__sefaria_texts__get_topic_details topic_lookup "topic: meat-and-milk"

Log File Format

Location

.mistaber-artifacts/source-chain-log.yaml

Structure

source_chain:
  - tool: get_text
    reference: "Shulchan Arukh, Yoreh De'ah 87:3"
    category: text_fetch
    timestamp: "2026-01-25T10:05:00.123456"

  - tool: get_links_between_texts
    reference: "Shulchan Arukh, Yoreh De'ah 87:3"
    category: links_fetch
    timestamp: "2026-01-25T10:05:30.456789"

  - tool: get_text
    reference: "Shakh on Shulchan Arukh, Yoreh De'ah 87:3"
    category: text_fetch
    timestamp: "2026-01-25T10:06:00.789012"

  - tool: text_search
    reference: "search: dag bechalav"
    category: search
    timestamp: "2026-01-25T10:07:00.123456"

  - tool: get_topic_details
    reference: "topic: meat-and-milk"
    category: topic_lookup
    timestamp: "2026-01-25T10:08:00.456789"

last_updated: "2026-01-25T10:08:00.456789"

Reference Extraction

From Different Tools

def extract_reference(tool_name: str, tool_input: str) -> str | None:
    """Extract reference from tool input."""
    try:
        data = json.loads(tool_input)

        # Different tools use different parameter names
        ref = data.get("reference")
        if ref:
            return ref

        ref = data.get("name")
        if ref:
            return ref

        ref = data.get("book_name")
        if ref:
            return ref

        ref = data.get("query")
        if ref:
            return f"search: {ref}"

        ref = data.get("topic_slug")
        if ref:
            return f"topic: {ref}"

        return None
    except json.JSONDecodeError:
        return None

Tool Categories

def get_tool_category(tool_name: str) -> str:
    """Categorize the Sefaria tool."""
    if "get_text" in tool_name:
        return "text_fetch"
    elif "links" in tool_name:
        return "links_fetch"
    elif "search" in tool_name:
        return "search"
    elif "topic" in tool_name:
        return "topic_lookup"
    elif "clarify" in tool_name:
        return "validation"
    elif "shape" in tool_name:
        return "structure"
    elif "translations" in tool_name:
        return "translations"
    elif "dictionar" in tool_name:
        return "dictionary"
    else:
        return "other"

Log Entry Structure

Each log entry contains:

Field Type Description
tool string Tool name (without mcp prefix)
reference string Reference or query extracted
category string Tool category
timestamp string ISO 8601 timestamp

Implementation Details

Log Loading

def load_source_log() -> list:
    """Load existing source chain log."""
    log_path = Path(".mistaber-artifacts/source-chain-log.yaml")

    if not log_path.exists():
        return []

    try:
        with open(log_path, "r") as f:
            data = yaml.safe_load(f)
            return data.get("source_chain", []) if data else []
    except Exception:
        return []

Log Saving

def save_source_log(log: list) -> None:
    """Save source chain log."""
    log_dir = Path(".mistaber-artifacts")
    log_dir.mkdir(exist_ok=True)

    log_path = log_dir / "source-chain-log.yaml"

    data = {
        "source_chain": log,
        "last_updated": datetime.now().isoformat()
    }

    with open(log_path, "w") as f:
        yaml.dump(data, f, default_flow_style=False, allow_unicode=True)

Main Execution

def main():
    """Main hook execution."""
    tool_name = sys.argv[1] if len(sys.argv) > 1 else "unknown"
    tool_input = sys.argv[2] if len(sys.argv) > 2 else "{}"

    output = {
        "continue": True,  # Never block - logging only
        "message": ""
    }

    reference = extract_reference(tool_name, tool_input)

    if reference:
        log = load_source_log()

        entry = {
            "tool": tool_name.replace("mcp__sefaria_texts__", ""),
            "reference": reference,
            "category": get_tool_category(tool_name),
            "timestamp": datetime.now().isoformat(),
        }

        log.append(entry)
        save_source_log(log)

        output["message"] = f"Logged source fetch: {reference}"

    print(json.dumps(output))
    return 0

Use Cases

Corpus Preparation

During corpus-prep, the log captures all sources fetched:

source_chain:
  # Primary text fetch
  - tool: get_text
    reference: "Shulchan Arukh, Yoreh De'ah 87:3"
    category: text_fetch
    timestamp: "2026-01-25T10:00:00Z"

  # Get translations
  - tool: get_english_translations
    reference: "Shulchan Arukh, Yoreh De'ah 87:3"
    category: translations
    timestamp: "2026-01-25T10:00:30Z"

  # Get commentaries via links
  - tool: get_links_between_texts
    reference: "Shulchan Arukh, Yoreh De'ah 87:3"
    category: links_fetch
    timestamp: "2026-01-25T10:01:00Z"

  # Fetch Shach
  - tool: get_text
    reference: "Shakh on Shulchan Arukh, Yoreh De'ah 87:3"
    category: text_fetch
    timestamp: "2026-01-25T10:02:00Z"

  # Fetch Taz
  - tool: get_text
    reference: "Turei Zahav on Shulchan Arukh, Yoreh De'ah 87:3"
    category: text_fetch
    timestamp: "2026-01-25T10:03:00Z"

Derivation Chain Building

When tracing sources:

source_chain:
  # Trace to Tur
  - tool: get_links_between_texts
    reference: "Tur, Yoreh Deah 87"
    category: links_fetch

  # Trace to Rambam
  - tool: get_links_between_texts
    reference: "Mishneh Torah, Forbidden Foods 9"
    category: links_fetch

  # Trace to Gemara
  - tool: get_text
    reference: "Chullin 104b"
    category: text_fetch

Research and Validation

When validating references:

source_chain:
  - tool: clarify_name_argument
    reference: "Shulchan Aruch Yoreh Deah"
    category: validation

  - tool: get_text_or_category_shape
    reference: "Yoreh Deah 87"
    category: structure

Log Analysis

Count by Category

# Using yq or Python
cat .mistaber-artifacts/source-chain-log.yaml | \
  python3 -c "import yaml, sys; d=yaml.safe_load(sys.stdin); print({c: sum(1 for e in d['source_chain'] if e['category']==c) for c in set(e['category'] for e in d['source_chain'])})"

List Unique References

cat .mistaber-artifacts/source-chain-log.yaml | \
  python3 -c "import yaml, sys; d=yaml.safe_load(sys.stdin); [print(r) for r in sorted(set(e['reference'] for e in d['source_chain']))]"

Debugging

Manual Testing

# Test with get_text
python mistaber-skills/hooks/scripts/sefaria-logger.py \
  "mcp__sefaria_texts__get_text" \
  '{"reference": "Genesis 1:1"}' | jq .

# Test with search
python mistaber-skills/hooks/scripts/sefaria-logger.py \
  "mcp__sefaria_texts__text_search" \
  '{"query": "basar bechalav"}' | jq .

# Check log file
cat .mistaber-artifacts/source-chain-log.yaml

Debug Mode

export MISTABER_DEBUG=1
python mistaber-skills/hooks/scripts/sefaria-logger.py "mcp__sefaria_texts__get_text" '{"reference": "test"}'

Common Issues

Log File Not Created

Symptom: No source-chain-log.yaml file.

Causes: - Artifacts directory doesn't exist - Permission issues - PyYAML not installed

Solutions:

mkdir -p .mistaber-artifacts
pip install pyyaml

Reference Not Logged

Symptom: Sefaria call made but not in log.

Causes: - Tool name doesn't match pattern - Reference not in expected field - Hook timeout

Solutions: Check tool name matches mcp__sefaria:

echo "Tool: mcp__sefaria_texts__get_text" | grep "mcp__sefaria"

Log Corrupted

Symptom: YAML parse error on load.

Solutions:

# Validate YAML
python -c "import yaml; yaml.safe_load(open('.mistaber-artifacts/source-chain-log.yaml'))"

# Reset if needed
rm .mistaber-artifacts/source-chain-log.yaml

Integration

With Corpus Preparation

The log is used by corpus-prep to document all sources:

# In corpus-sources-YD-87-3.yaml
sources_fetched:
  primary: ["Shulchan Arukh, Yoreh De'ah 87:3"]
  commentaries: ["Shakh...", "Taz..."]
  chain: ["Tur...", "Rambam...", "Chullin 104b"]
  # Derived from source-chain-log.yaml

With Session Archive

The log is archived with the session:

docs/encoding-sessions/yd_87_3_2026-01-25/
├── source-chain-log.yaml  # Complete fetch history
└── ...