Skip to content

Validation and Testing

The validation phase ensures encoded rules compile correctly, pass semantic checks, and behave as expected. This phase uses the validate skill to run comprehensive tests before human review at Checkpoint 3.

Purpose

The validate skill accomplishes:

  1. Compilation Validation: Compile HLL to ASP, test grounding
  2. Semantic Validation: Check disjointness, inheritance, integrity
  3. Behavioral Testing: Generate and execute test scenarios
  4. Machloket Verification: Confirm both positions are derivable
  5. Regression Testing: Ensure no existing rules are broken

Prerequisites

Before invoking this skill:

  1. HLL-encode checkpoint MUST be approved
  2. Session state shows hll-encode.status: approved
  3. Encoded .lp files exist in ontology directories

Invoking the Skill

After encoding approval:

User: "Run validation"

The skill automatically:

  1. Compiles all affected HLL files
  2. Runs semantic validation checks
  3. Generates and executes test scenarios
  4. Verifies machloket encoding
  5. Runs regression suite
  6. Requests checkpoint approval

Phase A: Compilation Validation

Step 1: HLL Compilation

Compile HLL to ASP using the Mistaber compiler:

from mistaber.dsl.compiler import compile_hll_file, CompileError
from pathlib import Path

try:
    # Compile all world files that include the new rules
    for world_file in Path("mistaber/ontology/worlds").glob("*.lp"):
        result = compile_hll_file(world_file)
        print(f"Compiled {world_file}: {result.rules_count} rules")
    compilation_status = "passed"
except CompileError as e:
    compilation_status = "failed"
    error_details = str(e)

Common Compilation Errors:

Error Cause Fix
ParseError: Unexpected token Missing period, unbalanced parens Check syntax
NameError: Unknown predicate Predicate not in registry Add to sorts.lp or fix spelling
TypeError: Arity mismatch Wrong number of arguments Check predicate definition
DirectiveError: Missing @makor Normative rule without source Add makor directive

Step 2: ASP Grounding Test

Test that Clingo can ground the program:

import clingo

ctl = clingo.Control(["--warn=none"])

# Load all ontology files
for lp_file in Path("mistaber/ontology").rglob("*.lp"):
    ctl.load(str(lp_file))

# Ground
try:
    ctl.ground([("base", [])])
    grounding_status = "passed"
except RuntimeError as e:
    grounding_status = "failed"
    grounding_error = str(e)

Common Grounding Errors:

Error Cause Fix
RuntimeError: atom undefined Using undefined predicate Define the predicate
RuntimeError: unsafe variable Variable not bound in body Add positive occurrence
RuntimeError: cyclic dependency Rules form infinite loop Restructure rules

Step 3: Satisfiability Check

Verify the program is satisfiable (has at least one answer set):

with ctl.solve(yield_=True) as handle:
    model_count = 0
    for model in handle:
        model_count += 1
        if model_count >= 1:
            break

if model_count == 0:
    # UNSAT - contradictory rules
    satisfiability = "UNSAT"
else:
    satisfiability = "SAT"

CRITICAL: UNSAT Result

If the program is UNSAT, the rules contain contradictions. This MUST be resolved before proceeding.

Debugging UNSAT:

# Enable core extraction
ctl = clingo.Control(["--warn=none", "-c", "output-debug=all"])

# Add conflict analysis
ctl.configuration.solve.enum_mode = "core"

Common UNSAT causes:

Cause Example Fix
Direct contradiction issur(X) :- ... . heter(X) :- ... Check rule conditions
Missing conditions Rule fires when it shouldn't Add qualifying conditions
World inheritance conflict Base and child disagree Use override mechanism

Phase B: Semantic Validation

Step 1: Disjointness Verification

Verify no entity violates disjointness constraints:

% Constraint: No food can be both basar and chalav
:- food(F), food_type(F, basar), food_type(F, chalav).

% Constraint: No mixture can have contradictory status in same world
:- asserts(W, issur(A, M, _)), asserts(W, heter(A, M)),
   not override(W, _, _).

Run validation query:

violations = engine.query("disjoint_violation(X, Y)")
if violations:
    for v in violations:
        semantic_errors.append(f"Disjointness violation: {v}")

Step 2: Madrega Consistency

Verify d'oraita rules have Torah sources:

for rule in rules_with_madrega_doraita:
    makor_chain = get_makor_chain(rule)
    if not has_torah_source(makor_chain):
        warnings.append(
            f"Rule {rule} marked d_oraita but chain doesn't reach Torah"
        )

Madrega Validation Table:

Madrega Required Source Terminus
d_oraita Torah verse
d_rabanan Mishnah/Gemara at minimum
minhag Any documented minhag source
chumra Commentary recommending stringency

Step 3: World Inheritance Verification

Verify world hierarchy is respected:

# Check that asserts flow correctly through accessible/2
for world in ["mechaber", "rema", "sefardi_yo", "ashk_mb"]:
    base_assertions = engine.query(f"asserts(base, X)")
    world_holds = engine.query(f"holds(X, {world})")

    # Base assertions should propagate unless overridden
    for assertion in base_assertions:
        if assertion not in world_holds:
            if not is_overridden(world, assertion):
                errors.append(
                    f"Inheritance broken: {assertion} not in {world}"
                )

World Hierarchy Reference:

graph TD
    BASE["base"] --> MECH["mechaber"]
    BASE --> REMA["rema"]
    MECH --> SEF_YO["sefardi_yo"]
    REMA --> ASHK_MB["ashk_mb"]
    REMA --> ASHK_AH["ashk_ah"]
    BASE --> GRA["gra"]
    GRA --> ASHK_AH

Step 4: Makor Chain Integrity

Verify every normative rule has makor:

normative_rules = engine.query("rule(R), scope(R, _)")
for rule in normative_rules:
    makor = engine.query(f"makor({rule['R']}, _)")
    if not makor:
        errors.append(f"Rule {rule['R']} missing makor")

Phase C: Behavioral Testing

Step 1: Generate Test Scenarios from Corpus

Based on the corpus statements, generate test cases:

# test-scenarios-YD-87-3.yaml
scenarios:
  - id: test_dag_chalav_mechaber
    description: "Fish + dairy mixture should trigger sakana in mechaber world"
    category: positive
    world: mechaber
    setup: |
      food(salmon). food_type(salmon, dag).
      food(cream). food_type(cream, chalav).
      mixture(m1). contains(m1, salmon). contains(m1, cream).
    expected:
      - holds(sakana(m1), mechaber)
      - holds(issur(achiila, m1, sakana), mechaber)

  - id: test_dag_chalav_rema
    description: "Same mixture should show heter in rema world"
    category: positive
    world: rema
    setup: |
      food(salmon). food_type(salmon, dag).
      food(cream). food_type(cream, chalav).
      mixture(m1). contains(m1, salmon). contains(m1, cream).
    expected:
      - holds(heter(achiila, m1), rema)
      - holds(no_sakana(m1), rema)

Step 2: Positive Tests

Test that expected conclusions hold:

def run_positive_test(scenario):
    """Verify expected atoms appear in result."""
    result = engine.analyze(
        scenario['setup'],
        world=scenario['world']
    )

    failures = []
    for expected in scenario['expected']:
        if expected not in result['atoms']:
            failures.append(f"Expected {expected} not found")

    return {
        'scenario_id': scenario['id'],
        'status': 'passed' if not failures else 'failed',
        'failures': failures
    }

Positive Test Categories:

Category Tests For
Basic prohibition issur derived for matching mixture
Basic permission heter derived for permitted case
World-specific Correct ruling in each world
Machloket Both positions derivable

Step 3: Negative Tests

Test that incorrect conclusions do NOT hold:

def run_negative_test(scenario):
    """Verify unexpected atoms do NOT appear."""
    result = engine.analyze(
        scenario['setup'],
        world=scenario['world']
    )

    failures = []
    for unexpected in scenario.get('unexpected', []):
        if unexpected in result['atoms']:
            failures.append(f"Unexpected {unexpected} found")

    return {
        'scenario_id': scenario['id'],
        'status': 'passed' if not failures else 'failed',
        'failures': failures
    }

Example Negative Test:

- id: test_no_sakana_in_rema
  description: "Rema world should NOT have sakana for fish+dairy"
  category: negative
  world: rema
  setup: |
    food(salmon). food_type(salmon, dag).
    food(cream). food_type(cream, chalav).
    mixture(m1). contains(m1, salmon). contains(m1, cream).
  unexpected:
    - holds(sakana(m1), rema)
    - holds(issur(achiila, m1, sakana), rema)

Step 4: Edge Cases

Test boundary conditions:

edge_cases:
  - id: empty_mixture
    description: "Empty mixture should not trigger rules"
    setup: "mixture(m_empty)."
    world: base
    expected:
      - not holds(issur(achiila, m_empty, _), base)

  - id: single_component
    description: "Single component mixture (no chalav)"
    setup: |
      food(beef). food_type(beef, beheima).
      mixture(m_single). contains(m_single, beef).
    world: base
    expected:
      - not holds(issur(achiila, m_single, _), base)

  - id: parve_mixture
    description: "Parve foods should be permitted"
    setup: |
      food(carrot). food_type(carrot, parve).
      food(rice). food_type(rice, parve).
      mixture(m_parve). contains(m_parve, carrot). contains(m_parve, rice).
    world: base
    expected:
      - not holds(issur(achiila, m_parve, _), base)

Step 5: Machloket Verification

Verify both positions are derivable in their respective worlds:

def verify_machloket(machloket_entry):
    """Verify machloket is correctly encoded."""
    topic = machloket_entry['topic']
    auth_a = machloket_entry['position_a']['authority']
    auth_b = machloket_entry['position_b']['authority']

    # Create test scenario
    setup = machloket_entry.get('test_setup', '')

    # Check position A holds in world A
    result_a = engine.analyze(setup, world=auth_a)
    position_a_holds = check_position(
        result_a,
        machloket_entry['position_a']['ruling']
    )

    # Check position B holds in world B
    result_b = engine.analyze(setup, world=auth_b)
    position_b_holds = check_position(
        result_b,
        machloket_entry['position_b']['ruling']
    )

    # Check machloket marker exists
    marker = engine.query(
        f"machloket({topic}, {auth_a}, {auth_b}, _)"
    )

    return {
        'topic': topic,
        'position_a_derivable': position_a_holds,
        'position_b_derivable': position_b_holds,
        'marker_exists': bool(marker),
        'status': 'passed' if all([
            position_a_holds,
            position_b_holds,
            marker
        ]) else 'failed'
    }

Machloket Test Output:

machloket_verification:
  - topic: dag_chalav
    position_a: {authority: mechaber, holds: true}
    position_b: {authority: rema, holds: true}
    marker_exists: true
    status: passed

Step 6: Regression Testing

Run all existing tests to ensure no regressions:

pytest tests/ontology/ -v --tb=short

Parse results:

def parse_pytest_results(output):
    """Parse pytest output for summary."""
    import re

    passed = int(re.search(r'(\d+) passed', output).group(1) or 0)
    failed = int(re.search(r'(\d+) failed', output).group(1) or 0)
    skipped = int(re.search(r'(\d+) skipped', output).group(1) or 0)

    return {
        'passed': passed,
        'failed': failed,
        'skipped': skipped,
        'total': passed + failed + skipped,
        'success': failed == 0
    }

Regression Criteria:

Metric Requirement
Failed tests 0
Skipped tests Documented reason
Coverage No decrease

Step 7: Context Sensitivity Testing

Test lechatchila/bedieved distinctions:

context_tests:
  - id: test_lechatchila_strict
    description: "Lechatchila context should be strict"
    context: lechatchila
    setup: |
      food(beef). food_type(beef, beheima).
      food(milk). food_type(milk, chalav).
      mixture(m1). contains(m1, beef). contains(m1, milk).
      context(lechatchila).
    world: base
    expected:
      - holds(issur(achiila, m1, d_oraita), base)

  - id: test_bedieved_lenient
    description: "Bedieved with bitul may permit"
    context: bedieved
    setup: |
      food(beef). food_type(beef, beheima).
      food(milk). food_type(milk, chalav).
      mixture(m1). contains(m1, beef). contains(m1, milk).
      context(bedieved).
      batel_b_shishim(m1).
    world: base
    expected:
      - holds(heter(achiila, m1), base)

Phase D: Query Verification

Step 1: Engine API Testing

Test all engine methods work with new rules:

from pathlib import Path
from mistaber.engine import HsrsEngine

engine = HsrsEngine(Path("mistaber/ontology"))

# Test ask() - boolean query
assert engine.ask(
    "holds(issur(achiila, m1, d_oraita), base)",
    additional_facts="mixture(m1). is_beheima_chalav_mixture(m1)."
)

# Test query() - find all matches
results = engine.query("holds(issur(A, M, L), W)")
assert len(results) > 0

# Test compare() - cross-world comparison
comparison = engine.compare(
    "holds(issur(achiila, M, L), W)",
    worlds=["mechaber", "rema"]
)
assert "mechaber" in comparison
assert "rema" in comparison

Step 2: Derivation Chain Testing

Test explain() produces valid derivation:

explanation = engine.explain(
    "holds(issur(achiila, m1, d_oraita), mechaber)",
    world="mechaber",
    additional_facts="""
        food(beef). food_type(beef, beheima).
        food(milk). food_type(milk, chalav).
        mixture(m1). contains(m1, beef). contains(m1, milk).
    """
)

assert explanation['derivation'] is not None
assert len(explanation['tree']) > 0
print("Derivation tree:", explanation['tree'])

Expected Derivation Output:

holds(issur(achiila, m1, d_oraita), mechaber)
├─ asserts(mechaber, issur(achiila, m1, d_oraita))
│  ├─ is_beheima_chalav_mixture(m1)
│  │  ├─ mixture_has_basar(m1, beheima)
│  │  │  └─ food_type(beef, beheima)
│  │  └─ mixture_has_chalav(m1)
│  │     └─ food_type(milk, chalav)
│  └─ rule(r_bb_beheima_achiila)
│     └─ makor(r_bb_beheima_achiila, sa("yd:87:1"))
└─ accessible(mechaber, base)

Step 3: Performance Testing

Ensure queries complete in reasonable time:

import time

start = time.time()
for _ in range(100):
    engine.query("holds(issur(A, M, L), W)")
elapsed = time.time() - start

avg_query_time = elapsed / 100
assert avg_query_time < 0.1, f"Query too slow: {avg_query_time}s"

performance = {
    'avg_query_ms': avg_query_time * 1000,
    'queries_tested': 100,
    'acceptable': avg_query_time < 0.1
}

Performance Thresholds:

Metric Acceptable Warning Failing
Avg query time < 50ms 50-100ms > 100ms
Grounding time < 500ms 500-1000ms > 1000ms
Memory usage < 100MB 100-500MB > 500MB

Phase E: Validation Report

Output 1: Validation Report

validation-report-YD-{siman}-{seif}.md

# Validation Report: YD 87:3

## Summary
- **Reference:** YD 87:3
- **Validated:** 2026-01-25T15:30:00Z
- **Status:** PASSED

## Compilation
| Check | Status |
|-------|--------|
| HLL Compilation | PASSED |
| ASP Grounding | PASSED |
| Satisfiability | SAT |

## Semantic Validation
| Check | Status |
|-------|--------|
| Disjointness | PASSED |
| Madrega Consistency | PASSED |
| World Inheritance | PASSED |
| Makor Integrity | PASSED |

## Behavioral Tests
| Category | Passed | Failed | Total |
|----------|--------|--------|-------|
| Positive Tests | 10 | 0 | 10 |
| Negative Tests | 5 | 0 | 5 |
| Edge Cases | 3 | 0 | 3 |
| Machloket | 1 | 0 | 1 |

## Regression
| Metric | Value |
|--------|-------|
| Total Tests | 495 |
| Passed | 495 |
| Failed | 0 |
| Skipped | 0 |

## Query Verification
| Method | Status |
|--------|--------|
| ask() | PASSED |
| query() | PASSED |
| compare() | PASSED |
| explain() | PASSED |

## Performance
| Metric | Value | Status |
|--------|-------|--------|
| Avg Query Time | 45ms | ACCEPTABLE |
| Grounding Time | 120ms | ACCEPTABLE |

## Issues Found
None

Output 2: Validation Results YAML

validation-results-YD-{siman}-{seif}.yaml

reference: "YD:87:3"
validated_at: "2026-01-25T15:30:00Z"
status: passed

compilation:
  hll_compilation: passed
  asp_grounding: passed
  satisfiability: SAT

semantic:
  disjointness: passed
  madrega_consistency: passed
  world_inheritance: passed
  makor_integrity: passed

behavioral:
  positive_tests:
    total: 10
    passed: 10
    failed: 0
    scenarios:
      - {id: test_dag_chalav_mechaber, status: passed}
      - {id: test_beheima_chalav_base, status: passed}
      # ...

  negative_tests:
    total: 5
    passed: 5
    failed: 0

  edge_cases:
    total: 3
    passed: 3
    failed: 0

  machloket:
    verified: true
    topics:
      - {topic: dag_chalav, status: passed}

  regression:
    total: 495
    passed: 495
    failed: 0
    skipped: 0

performance:
  avg_query_ms: 45
  grounding_ms: 120

Output 3: Test Scenarios

test-scenarios-YD-{siman}-{seif}.yaml

Complete test scenarios for future regression testing (archived for permanent test suite).

Output 4: Performance Metrics

performance-metrics-YD-{siman}-{seif}.yaml

Detailed timing data for trend analysis.

Checkpoint Criteria

Before requesting human approval, verify:

  • [ ] Compilation successful (no syntax errors)
  • [ ] Program is SAT (no contradictions)
  • [ ] All semantic checks pass
  • [ ] All generated tests pass (positive, negative, edge)
  • [ ] Machloket properly verified (both positions derivable)
  • [ ] No regression failures
  • [ ] Performance acceptable (< 100ms avg query)
  • [ ] All four outputs generated

Session State Update

After completing validation:

current_phase: validate
checkpoints:
  validate:
    status: pending_review
    artifacts:
      - validation-report-YD-87-3.md
      - validation-results-YD-87-3.yaml
      - test-scenarios-YD-87-3.yaml
      - performance-metrics-YD-87-3.yaml
    test_summary:
      total: 19
      passed: 19
      failed: 0
    regression:
      total: 495
      passed: 495
    performance:
      avg_query_ms: 45

Troubleshooting

"UNSAT - No models found"

The rules contain contradictions:

  1. Check for conflicting issur/heter without override
  2. Verify world inheritance doesn't create conflict
  3. Look for circular rule dependencies
  4. Enable debug output: clingo --warn=all

"Test failed: Expected X not found"

The expected conclusion isn't being derived:

  1. Check rule conditions match test setup
  2. Verify world scoping is correct
  3. Check for missing predicates in setup
  4. Use explain() to trace derivation

"Regression failure in existing test"

A new rule broke existing functionality:

  1. Identify which rule caused the break
  2. Check if override is needed
  3. Verify new rule conditions are specific enough
  4. Consider if existing test is still valid

"Performance degraded"

Queries are slower than acceptable:

  1. Check for unbounded rules (no conditions)
  2. Look for redundant derivation paths
  3. Consider adding auxiliary predicates
  4. Profile with Clingo's --stats flag

Best Practices

  1. Run validation frequently - Catch errors early during encoding
  2. Write tests as you encode - Don't leave testing until the end
  3. Test edge cases explicitly - Empty mixtures, single components
  4. Verify both machloket positions - Don't assume inheritance works
  5. Check performance trends - Watch for degradation over time

Next Phase

After Checkpoint 3 approval, proceed to Review and Approval.