Validation and Testing¶

The validation phase ensures encoded rules compile correctly, pass semantic checks, and behave as expected. This phase uses the validate skill to run comprehensive tests before human review at Checkpoint 3.

Purpose¶

The validate skill accomplishes:

Compilation Validation: Compile HLL to ASP, test grounding
Semantic Validation: Check disjointness, inheritance, integrity
Behavioral Testing: Generate and execute test scenarios
Machloket Verification: Confirm both positions are derivable
Regression Testing: Ensure no existing rules are broken

Prerequisites¶

Before invoking this skill:

HLL-encode checkpoint MUST be approved
Session state shows hll-encode.status: approved
Encoded .lp files exist in ontology directories

Invoking the Skill¶

After encoding approval:

User: "Run validation"

The skill automatically:

Compiles all affected HLL files
Runs semantic validation checks
Generates and executes test scenarios
Verifies machloket encoding
Runs regression suite
Requests checkpoint approval

Phase A: Compilation Validation¶

Step 1: HLL Compilation¶

Compile HLL to ASP using the Mistaber compiler:

from mistaber.dsl.compiler import compile_hll_file, CompileError
from pathlib import Path

try:
    # Compile all world files that include the new rules
    for world_file in Path("mistaber/ontology/worlds").glob("*.lp"):
        result = compile_hll_file(world_file)
        print(f"Compiled {world_file}: {result.rules_count} rules")
    compilation_status = "passed"
except CompileError as e:
    compilation_status = "failed"
    error_details = str(e)

Common Compilation Errors:

Error	Cause	Fix
`ParseError: Unexpected token`	Missing period, unbalanced parens	Check syntax
`NameError: Unknown predicate`	Predicate not in registry	Add to sorts.lp or fix spelling
`TypeError: Arity mismatch`	Wrong number of arguments	Check predicate definition
`DirectiveError: Missing @makor`	Normative rule without source	Add makor directive

Step 2: ASP Grounding Test¶

Test that Clingo can ground the program:

import clingo

ctl = clingo.Control(["--warn=none"])

# Load all ontology files
for lp_file in Path("mistaber/ontology").rglob("*.lp"):
    ctl.load(str(lp_file))

# Ground
try:
    ctl.ground([("base", [])])
    grounding_status = "passed"
except RuntimeError as e:
    grounding_status = "failed"
    grounding_error = str(e)

Common Grounding Errors:

Error	Cause	Fix
`RuntimeError: atom undefined`	Using undefined predicate	Define the predicate
`RuntimeError: unsafe variable`	Variable not bound in body	Add positive occurrence
`RuntimeError: cyclic dependency`	Rules form infinite loop	Restructure rules

Step 3: Satisfiability Check¶

Verify the program is satisfiable (has at least one answer set):

with ctl.solve(yield_=True) as handle:
    model_count = 0
    for model in handle:
        model_count += 1
        if model_count >= 1:
            break

if model_count == 0:
    # UNSAT - contradictory rules
    satisfiability = "UNSAT"
else:
    satisfiability = "SAT"

CRITICAL: UNSAT Result

If the program is UNSAT, the rules contain contradictions. This MUST be resolved before proceeding.

Debugging UNSAT:

# Enable core extraction
ctl = clingo.Control(["--warn=none", "-c", "output-debug=all"])

# Add conflict analysis
ctl.configuration.solve.enum_mode = "core"

Common UNSAT causes:

Cause	Example	Fix
Direct contradiction	`issur(X) :- ... . heter(X) :- ...`	Check rule conditions
Missing conditions	Rule fires when it shouldn't	Add qualifying conditions
World inheritance conflict	Base and child disagree	Use override mechanism

Phase B: Semantic Validation¶

Step 1: Disjointness Verification¶

Verify no entity violates disjointness constraints:

% Constraint: No food can be both basar and chalav
:- food(F), food_type(F, basar), food_type(F, chalav).

% Constraint: No mixture can have contradictory status in same world
:- asserts(W, issur(A, M, _)), asserts(W, heter(A, M)),
   not override(W, _, _).

Run validation query:

violations = engine.query("disjoint_violation(X, Y)")
if violations:
    for v in violations:
        semantic_errors.append(f"Disjointness violation: {v}")

Step 2: Madrega Consistency¶

Verify d'oraita rules have Torah sources:

for rule in rules_with_madrega_doraita:
    makor_chain = get_makor_chain(rule)
    if not has_torah_source(makor_chain):
        warnings.append(
            f"Rule {rule} marked d_oraita but chain doesn't reach Torah"
        )

Madrega Validation Table:

Madrega	Required Source Terminus
`d_oraita`	Torah verse
`d_rabanan`	Mishnah/Gemara at minimum
`minhag`	Any documented minhag source
`chumra`	Commentary recommending stringency

Step 3: World Inheritance Verification¶

Verify world hierarchy is respected:

# Check that asserts flow correctly through accessible/2
for world in ["mechaber", "rema", "sefardi_yo", "ashk_mb"]:
    base_assertions = engine.query(f"asserts(base, X)")
    world_holds = engine.query(f"holds(X, {world})")

    # Base assertions should propagate unless overridden
    for assertion in base_assertions:
        if assertion not in world_holds:
            if not is_overridden(world, assertion):
                errors.append(
                    f"Inheritance broken: {assertion} not in {world}"
                )

World Hierarchy Reference:

graph TD
    BASE["base"] --> MECH["mechaber"]
    BASE --> REMA["rema"]
    MECH --> SEF_YO["sefardi_yo"]
    REMA --> ASHK_MB["ashk_mb"]
    REMA --> ASHK_AH["ashk_ah"]
    BASE --> GRA["gra"]
    GRA --> ASHK_AH

Step 4: Makor Chain Integrity¶

Verify every normative rule has makor:

normative_rules = engine.query("rule(R), scope(R, _)")
for rule in normative_rules:
    makor = engine.query(f"makor({rule['R']}, _)")
    if not makor:
        errors.append(f"Rule {rule['R']} missing makor")

Phase C: Behavioral Testing¶

Step 1: Generate Test Scenarios from Corpus¶

Based on the corpus statements, generate test cases:

# test-scenarios-YD-87-3.yaml
scenarios:
  - id: test_dag_chalav_mechaber
    description: "Fish + dairy mixture should trigger sakana in mechaber world"
    category: positive
    world: mechaber
    setup: |
      food(salmon). food_type(salmon, dag).
      food(cream). food_type(cream, chalav).
      mixture(m1). contains(m1, salmon). contains(m1, cream).
    expected:
      - holds(sakana(m1), mechaber)
      - holds(issur(achiila, m1, sakana), mechaber)

  - id: test_dag_chalav_rema
    description: "Same mixture should show heter in rema world"
    category: positive
    world: rema
    setup: |
      food(salmon). food_type(salmon, dag).
      food(cream). food_type(cream, chalav).
      mixture(m1). contains(m1, salmon). contains(m1, cream).
    expected:
      - holds(heter(achiila, m1), rema)
      - holds(no_sakana(m1), rema)

Step 2: Positive Tests¶

Test that expected conclusions hold:

def run_positive_test(scenario):
    """Verify expected atoms appear in result."""
    result = engine.analyze(
        scenario['setup'],
        world=scenario['world']
    )

    failures = []
    for expected in scenario['expected']:
        if expected not in result['atoms']:
            failures.append(f"Expected {expected} not found")

    return {
        'scenario_id': scenario['id'],
        'status': 'passed' if not failures else 'failed',
        'failures': failures
    }

Positive Test Categories:

Category	Tests For
Basic prohibition	issur derived for matching mixture
Basic permission	heter derived for permitted case
World-specific	Correct ruling in each world
Machloket	Both positions derivable

Step 3: Negative Tests¶

Test that incorrect conclusions do NOT hold:

def run_negative_test(scenario):
    """Verify unexpected atoms do NOT appear."""
    result = engine.analyze(
        scenario['setup'],
        world=scenario['world']
    )

    failures = []
    for unexpected in scenario.get('unexpected', []):
        if unexpected in result['atoms']:
            failures.append(f"Unexpected {unexpected} found")

    return {
        'scenario_id': scenario['id'],
        'status': 'passed' if not failures else 'failed',
        'failures': failures
    }

Example Negative Test:

- id: test_no_sakana_in_rema
  description: "Rema world should NOT have sakana for fish+dairy"
  category: negative
  world: rema
  setup: |
    food(salmon). food_type(salmon, dag).
    food(cream). food_type(cream, chalav).
    mixture(m1). contains(m1, salmon). contains(m1, cream).
  unexpected:
    - holds(sakana(m1), rema)
    - holds(issur(achiila, m1, sakana), rema)

Step 4: Edge Cases¶

Test boundary conditions:

edge_cases:
  - id: empty_mixture
    description: "Empty mixture should not trigger rules"
    setup: "mixture(m_empty)."
    world: base
    expected:
      - not holds(issur(achiila, m_empty, _), base)

  - id: single_component
    description: "Single component mixture (no chalav)"
    setup: |
      food(beef). food_type(beef, beheima).
      mixture(m_single). contains(m_single, beef).
    world: base
    expected:
      - not holds(issur(achiila, m_single, _), base)

  - id: parve_mixture
    description: "Parve foods should be permitted"
    setup: |
      food(carrot). food_type(carrot, parve).
      food(rice). food_type(rice, parve).
      mixture(m_parve). contains(m_parve, carrot). contains(m_parve, rice).
    world: base
    expected:
      - not holds(issur(achiila, m_parve, _), base)

Step 5: Machloket Verification¶

Verify both positions are derivable in their respective worlds:

def verify_machloket(machloket_entry):
    """Verify machloket is correctly encoded."""
    topic = machloket_entry['topic']
    auth_a = machloket_entry['position_a']['authority']
    auth_b = machloket_entry['position_b']['authority']

    # Create test scenario
    setup = machloket_entry.get('test_setup', '')

    # Check position A holds in world A
    result_a = engine.analyze(setup, world=auth_a)
    position_a_holds = check_position(
        result_a,
        machloket_entry['position_a']['ruling']
    )

    # Check position B holds in world B
    result_b = engine.analyze(setup, world=auth_b)
    position_b_holds = check_position(
        result_b,
        machloket_entry['position_b']['ruling']
    )

    # Check machloket marker exists
    marker = engine.query(
        f"machloket({topic}, {auth_a}, {auth_b}, _)"
    )

    return {
        'topic': topic,
        'position_a_derivable': position_a_holds,
        'position_b_derivable': position_b_holds,
        'marker_exists': bool(marker),
        'status': 'passed' if all([
            position_a_holds,
            position_b_holds,
            marker
        ]) else 'failed'
    }

Machloket Test Output:

machloket_verification:
  - topic: dag_chalav
    position_a: {authority: mechaber, holds: true}
    position_b: {authority: rema, holds: true}
    marker_exists: true
    status: passed

Step 6: Regression Testing¶

Run all existing tests to ensure no regressions:

pytest tests/ontology/ -v --tb=short

Parse results:

def parse_pytest_results(output):
    """Parse pytest output for summary."""
    import re

    passed = int(re.search(r'(\d+) passed', output).group(1) or 0)
    failed = int(re.search(r'(\d+) failed', output).group(1) or 0)
    skipped = int(re.search(r'(\d+) skipped', output).group(1) or 0)

    return {
        'passed': passed,
        'failed': failed,
        'skipped': skipped,
        'total': passed + failed + skipped,
        'success': failed == 0
    }

Regression Criteria:

Metric	Requirement
Failed tests	0
Skipped tests	Documented reason
Coverage	No decrease

Step 7: Context Sensitivity Testing¶

Test lechatchila/bedieved distinctions:

context_tests:
  - id: test_lechatchila_strict
    description: "Lechatchila context should be strict"
    context: lechatchila
    setup: |
      food(beef). food_type(beef, beheima).
      food(milk). food_type(milk, chalav).
      mixture(m1). contains(m1, beef). contains(m1, milk).
      context(lechatchila).
    world: base
    expected:
      - holds(issur(achiila, m1, d_oraita), base)

  - id: test_bedieved_lenient
    description: "Bedieved with bitul may permit"
    context: bedieved
    setup: |
      food(beef). food_type(beef, beheima).
      food(milk). food_type(milk, chalav).
      mixture(m1). contains(m1, beef). contains(m1, milk).
      context(bedieved).
      batel_b_shishim(m1).
    world: base
    expected:
      - holds(heter(achiila, m1), base)

Phase D: Query Verification¶

Step 1: Engine API Testing¶

Test all engine methods work with new rules:

from pathlib import Path
from mistaber.engine import HsrsEngine

engine = HsrsEngine(Path("mistaber/ontology"))

# Test ask() - boolean query
assert engine.ask(
    "holds(issur(achiila, m1, d_oraita), base)",
    additional_facts="mixture(m1). is_beheima_chalav_mixture(m1)."
)

# Test query() - find all matches
results = engine.query("holds(issur(A, M, L), W)")
assert len(results) > 0

# Test compare() - cross-world comparison
comparison = engine.compare(
    "holds(issur(achiila, M, L), W)",
    worlds=["mechaber", "rema"]
)
assert "mechaber" in comparison
assert "rema" in comparison

Step 2: Derivation Chain Testing¶

Test explain() produces valid derivation:

explanation = engine.explain(
    "holds(issur(achiila, m1, d_oraita), mechaber)",
    world="mechaber",
    additional_facts="""
        food(beef). food_type(beef, beheima).
        food(milk). food_type(milk, chalav).
        mixture(m1). contains(m1, beef). contains(m1, milk).
    """
)

assert explanation['derivation'] is not None
assert len(explanation['tree']) > 0
print("Derivation tree:", explanation['tree'])

Expected Derivation Output:

holds(issur(achiila, m1, d_oraita), mechaber)
├─ asserts(mechaber, issur(achiila, m1, d_oraita))
│  ├─ is_beheima_chalav_mixture(m1)
│  │  ├─ mixture_has_basar(m1, beheima)
│  │  │  └─ food_type(beef, beheima)
│  │  └─ mixture_has_chalav(m1)
│  │     └─ food_type(milk, chalav)
│  └─ rule(r_bb_beheima_achiila)
│     └─ makor(r_bb_beheima_achiila, sa("yd:87:1"))
└─ accessible(mechaber, base)

Step 3: Performance Testing¶

Ensure queries complete in reasonable time:

import time

start = time.time()
for _ in range(100):
    engine.query("holds(issur(A, M, L), W)")
elapsed = time.time() - start

avg_query_time = elapsed / 100
assert avg_query_time < 0.1, f"Query too slow: {avg_query_time}s"

performance = {
    'avg_query_ms': avg_query_time * 1000,
    'queries_tested': 100,
    'acceptable': avg_query_time < 0.1
}

Performance Thresholds:

Metric	Acceptable	Warning	Failing
Avg query time	< 50ms	50-100ms	> 100ms
Grounding time	< 500ms	500-1000ms	> 1000ms
Memory usage	< 100MB	100-500MB	> 500MB

Phase E: Validation Report¶

Output 1: Validation Report¶

validation-report-YD-{siman}-{seif}.md

# Validation Report: YD 87:3

## Summary
- **Reference:** YD 87:3
- **Validated:** 2026-01-25T15:30:00Z
- **Status:** PASSED

## Compilation
| Check | Status |
|-------|--------|
| HLL Compilation | PASSED |
| ASP Grounding | PASSED |
| Satisfiability | SAT |

## Semantic Validation
| Check | Status |
|-------|--------|
| Disjointness | PASSED |
| Madrega Consistency | PASSED |
| World Inheritance | PASSED |
| Makor Integrity | PASSED |

## Behavioral Tests
| Category | Passed | Failed | Total |
|----------|--------|--------|-------|
| Positive Tests | 10 | 0 | 10 |
| Negative Tests | 5 | 0 | 5 |
| Edge Cases | 3 | 0 | 3 |
| Machloket | 1 | 0 | 1 |

## Regression
| Metric | Value |
|--------|-------|
| Total Tests | 495 |
| Passed | 495 |
| Failed | 0 |
| Skipped | 0 |

## Query Verification
| Method | Status |
|--------|--------|
| ask() | PASSED |
| query() | PASSED |
| compare() | PASSED |
| explain() | PASSED |

## Performance
| Metric | Value | Status |
|--------|-------|--------|
| Avg Query Time | 45ms | ACCEPTABLE |
| Grounding Time | 120ms | ACCEPTABLE |

## Issues Found
None

Output 2: Validation Results YAML¶

validation-results-YD-{siman}-{seif}.yaml

reference: "YD:87:3"
validated_at: "2026-01-25T15:30:00Z"
status: passed

compilation:
  hll_compilation: passed
  asp_grounding: passed
  satisfiability: SAT

semantic:
  disjointness: passed
  madrega_consistency: passed
  world_inheritance: passed
  makor_integrity: passed

behavioral:
  positive_tests:
    total: 10
    passed: 10
    failed: 0
    scenarios:
      - {id: test_dag_chalav_mechaber, status: passed}
      - {id: test_beheima_chalav_base, status: passed}
      # ...

  negative_tests:
    total: 5
    passed: 5
    failed: 0

  edge_cases:
    total: 3
    passed: 3
    failed: 0

  machloket:
    verified: true
    topics:
      - {topic: dag_chalav, status: passed}

  regression:
    total: 495
    passed: 495
    failed: 0
    skipped: 0

performance:
  avg_query_ms: 45
  grounding_ms: 120

Output 3: Test Scenarios¶

test-scenarios-YD-{siman}-{seif}.yaml

Complete test scenarios for future regression testing (archived for permanent test suite).

Output 4: Performance Metrics¶

performance-metrics-YD-{siman}-{seif}.yaml

Detailed timing data for trend analysis.

Checkpoint Criteria¶

Before requesting human approval, verify:

[ ] Compilation successful (no syntax errors)
[ ] Program is SAT (no contradictions)
[ ] All semantic checks pass
[ ] All generated tests pass (positive, negative, edge)
[ ] Machloket properly verified (both positions derivable)
[ ] No regression failures
[ ] Performance acceptable (< 100ms avg query)
[ ] All four outputs generated

Session State Update¶

After completing validation:

current_phase: validate
checkpoints:
  validate:
    status: pending_review
    artifacts:
      - validation-report-YD-87-3.md
      - validation-results-YD-87-3.yaml
      - test-scenarios-YD-87-3.yaml
      - performance-metrics-YD-87-3.yaml
    test_summary:
      total: 19
      passed: 19
      failed: 0
    regression:
      total: 495
      passed: 495
    performance:
      avg_query_ms: 45

Troubleshooting¶

"UNSAT - No models found"¶

The rules contain contradictions:

Check for conflicting issur/heter without override
Verify world inheritance doesn't create conflict
Look for circular rule dependencies
Enable debug output: clingo --warn=all

"Test failed: Expected X not found"¶

The expected conclusion isn't being derived:

Check rule conditions match test setup
Verify world scoping is correct
Check for missing predicates in setup
Use explain() to trace derivation

"Regression failure in existing test"¶

A new rule broke existing functionality:

Identify which rule caused the break
Check if override is needed
Verify new rule conditions are specific enough
Consider if existing test is still valid

"Performance degraded"¶

Queries are slower than acceptable:

Check for unbounded rules (no conditions)
Look for redundant derivation paths
Consider adding auxiliary predicates
Profile with Clingo's --stats flag

Best Practices¶

Run validation frequently - Catch errors early during encoding
Write tests as you encode - Don't leave testing until the end
Test edge cases explicitly - Empty mixtures, single components
Verify both machloket positions - Don't assume inheritance works
Check performance trends - Watch for degradation over time

Next Phase¶

After Checkpoint 3 approval, proceed to Review and Approval.