Validation and Testing¶
The validation phase ensures encoded rules compile correctly, pass semantic checks, and behave as expected. This phase uses the validate skill to run comprehensive tests before human review at Checkpoint 3.
Purpose¶
The validate skill accomplishes:
- Compilation Validation: Compile HLL to ASP, test grounding
- Semantic Validation: Check disjointness, inheritance, integrity
- Behavioral Testing: Generate and execute test scenarios
- Machloket Verification: Confirm both positions are derivable
- Regression Testing: Ensure no existing rules are broken
Prerequisites¶
Before invoking this skill:
- HLL-encode checkpoint MUST be approved
- Session state shows
hll-encode.status: approved - Encoded
.lpfiles exist in ontology directories
Invoking the Skill¶
After encoding approval:
The skill automatically:
- Compiles all affected HLL files
- Runs semantic validation checks
- Generates and executes test scenarios
- Verifies machloket encoding
- Runs regression suite
- Requests checkpoint approval
Phase A: Compilation Validation¶
Step 1: HLL Compilation¶
Compile HLL to ASP using the Mistaber compiler:
from mistaber.dsl.compiler import compile_hll_file, CompileError
from pathlib import Path
try:
# Compile all world files that include the new rules
for world_file in Path("mistaber/ontology/worlds").glob("*.lp"):
result = compile_hll_file(world_file)
print(f"Compiled {world_file}: {result.rules_count} rules")
compilation_status = "passed"
except CompileError as e:
compilation_status = "failed"
error_details = str(e)
Common Compilation Errors:
| Error | Cause | Fix |
|---|---|---|
ParseError: Unexpected token |
Missing period, unbalanced parens | Check syntax |
NameError: Unknown predicate |
Predicate not in registry | Add to sorts.lp or fix spelling |
TypeError: Arity mismatch |
Wrong number of arguments | Check predicate definition |
DirectiveError: Missing @makor |
Normative rule without source | Add makor directive |
Step 2: ASP Grounding Test¶
Test that Clingo can ground the program:
import clingo
ctl = clingo.Control(["--warn=none"])
# Load all ontology files
for lp_file in Path("mistaber/ontology").rglob("*.lp"):
ctl.load(str(lp_file))
# Ground
try:
ctl.ground([("base", [])])
grounding_status = "passed"
except RuntimeError as e:
grounding_status = "failed"
grounding_error = str(e)
Common Grounding Errors:
| Error | Cause | Fix |
|---|---|---|
RuntimeError: atom undefined |
Using undefined predicate | Define the predicate |
RuntimeError: unsafe variable |
Variable not bound in body | Add positive occurrence |
RuntimeError: cyclic dependency |
Rules form infinite loop | Restructure rules |
Step 3: Satisfiability Check¶
Verify the program is satisfiable (has at least one answer set):
with ctl.solve(yield_=True) as handle:
model_count = 0
for model in handle:
model_count += 1
if model_count >= 1:
break
if model_count == 0:
# UNSAT - contradictory rules
satisfiability = "UNSAT"
else:
satisfiability = "SAT"
CRITICAL: UNSAT Result
If the program is UNSAT, the rules contain contradictions. This MUST be resolved before proceeding.
Debugging UNSAT:
# Enable core extraction
ctl = clingo.Control(["--warn=none", "-c", "output-debug=all"])
# Add conflict analysis
ctl.configuration.solve.enum_mode = "core"
Common UNSAT causes:
| Cause | Example | Fix |
|---|---|---|
| Direct contradiction | issur(X) :- ... . heter(X) :- ... |
Check rule conditions |
| Missing conditions | Rule fires when it shouldn't | Add qualifying conditions |
| World inheritance conflict | Base and child disagree | Use override mechanism |
Phase B: Semantic Validation¶
Step 1: Disjointness Verification¶
Verify no entity violates disjointness constraints:
% Constraint: No food can be both basar and chalav
:- food(F), food_type(F, basar), food_type(F, chalav).
% Constraint: No mixture can have contradictory status in same world
:- asserts(W, issur(A, M, _)), asserts(W, heter(A, M)),
not override(W, _, _).
Run validation query:
violations = engine.query("disjoint_violation(X, Y)")
if violations:
for v in violations:
semantic_errors.append(f"Disjointness violation: {v}")
Step 2: Madrega Consistency¶
Verify d'oraita rules have Torah sources:
for rule in rules_with_madrega_doraita:
makor_chain = get_makor_chain(rule)
if not has_torah_source(makor_chain):
warnings.append(
f"Rule {rule} marked d_oraita but chain doesn't reach Torah"
)
Madrega Validation Table:
| Madrega | Required Source Terminus |
|---|---|
d_oraita |
Torah verse |
d_rabanan |
Mishnah/Gemara at minimum |
minhag |
Any documented minhag source |
chumra |
Commentary recommending stringency |
Step 3: World Inheritance Verification¶
Verify world hierarchy is respected:
# Check that asserts flow correctly through accessible/2
for world in ["mechaber", "rema", "sefardi_yo", "ashk_mb"]:
base_assertions = engine.query(f"asserts(base, X)")
world_holds = engine.query(f"holds(X, {world})")
# Base assertions should propagate unless overridden
for assertion in base_assertions:
if assertion not in world_holds:
if not is_overridden(world, assertion):
errors.append(
f"Inheritance broken: {assertion} not in {world}"
)
World Hierarchy Reference:
graph TD
BASE["base"] --> MECH["mechaber"]
BASE --> REMA["rema"]
MECH --> SEF_YO["sefardi_yo"]
REMA --> ASHK_MB["ashk_mb"]
REMA --> ASHK_AH["ashk_ah"]
BASE --> GRA["gra"]
GRA --> ASHK_AH
Step 4: Makor Chain Integrity¶
Verify every normative rule has makor:
normative_rules = engine.query("rule(R), scope(R, _)")
for rule in normative_rules:
makor = engine.query(f"makor({rule['R']}, _)")
if not makor:
errors.append(f"Rule {rule['R']} missing makor")
Phase C: Behavioral Testing¶
Step 1: Generate Test Scenarios from Corpus¶
Based on the corpus statements, generate test cases:
# test-scenarios-YD-87-3.yaml
scenarios:
- id: test_dag_chalav_mechaber
description: "Fish + dairy mixture should trigger sakana in mechaber world"
category: positive
world: mechaber
setup: |
food(salmon). food_type(salmon, dag).
food(cream). food_type(cream, chalav).
mixture(m1). contains(m1, salmon). contains(m1, cream).
expected:
- holds(sakana(m1), mechaber)
- holds(issur(achiila, m1, sakana), mechaber)
- id: test_dag_chalav_rema
description: "Same mixture should show heter in rema world"
category: positive
world: rema
setup: |
food(salmon). food_type(salmon, dag).
food(cream). food_type(cream, chalav).
mixture(m1). contains(m1, salmon). contains(m1, cream).
expected:
- holds(heter(achiila, m1), rema)
- holds(no_sakana(m1), rema)
Step 2: Positive Tests¶
Test that expected conclusions hold:
def run_positive_test(scenario):
"""Verify expected atoms appear in result."""
result = engine.analyze(
scenario['setup'],
world=scenario['world']
)
failures = []
for expected in scenario['expected']:
if expected not in result['atoms']:
failures.append(f"Expected {expected} not found")
return {
'scenario_id': scenario['id'],
'status': 'passed' if not failures else 'failed',
'failures': failures
}
Positive Test Categories:
| Category | Tests For |
|---|---|
| Basic prohibition | issur derived for matching mixture |
| Basic permission | heter derived for permitted case |
| World-specific | Correct ruling in each world |
| Machloket | Both positions derivable |
Step 3: Negative Tests¶
Test that incorrect conclusions do NOT hold:
def run_negative_test(scenario):
"""Verify unexpected atoms do NOT appear."""
result = engine.analyze(
scenario['setup'],
world=scenario['world']
)
failures = []
for unexpected in scenario.get('unexpected', []):
if unexpected in result['atoms']:
failures.append(f"Unexpected {unexpected} found")
return {
'scenario_id': scenario['id'],
'status': 'passed' if not failures else 'failed',
'failures': failures
}
Example Negative Test:
- id: test_no_sakana_in_rema
description: "Rema world should NOT have sakana for fish+dairy"
category: negative
world: rema
setup: |
food(salmon). food_type(salmon, dag).
food(cream). food_type(cream, chalav).
mixture(m1). contains(m1, salmon). contains(m1, cream).
unexpected:
- holds(sakana(m1), rema)
- holds(issur(achiila, m1, sakana), rema)
Step 4: Edge Cases¶
Test boundary conditions:
edge_cases:
- id: empty_mixture
description: "Empty mixture should not trigger rules"
setup: "mixture(m_empty)."
world: base
expected:
- not holds(issur(achiila, m_empty, _), base)
- id: single_component
description: "Single component mixture (no chalav)"
setup: |
food(beef). food_type(beef, beheima).
mixture(m_single). contains(m_single, beef).
world: base
expected:
- not holds(issur(achiila, m_single, _), base)
- id: parve_mixture
description: "Parve foods should be permitted"
setup: |
food(carrot). food_type(carrot, parve).
food(rice). food_type(rice, parve).
mixture(m_parve). contains(m_parve, carrot). contains(m_parve, rice).
world: base
expected:
- not holds(issur(achiila, m_parve, _), base)
Step 5: Machloket Verification¶
Verify both positions are derivable in their respective worlds:
def verify_machloket(machloket_entry):
"""Verify machloket is correctly encoded."""
topic = machloket_entry['topic']
auth_a = machloket_entry['position_a']['authority']
auth_b = machloket_entry['position_b']['authority']
# Create test scenario
setup = machloket_entry.get('test_setup', '')
# Check position A holds in world A
result_a = engine.analyze(setup, world=auth_a)
position_a_holds = check_position(
result_a,
machloket_entry['position_a']['ruling']
)
# Check position B holds in world B
result_b = engine.analyze(setup, world=auth_b)
position_b_holds = check_position(
result_b,
machloket_entry['position_b']['ruling']
)
# Check machloket marker exists
marker = engine.query(
f"machloket({topic}, {auth_a}, {auth_b}, _)"
)
return {
'topic': topic,
'position_a_derivable': position_a_holds,
'position_b_derivable': position_b_holds,
'marker_exists': bool(marker),
'status': 'passed' if all([
position_a_holds,
position_b_holds,
marker
]) else 'failed'
}
Machloket Test Output:
machloket_verification:
- topic: dag_chalav
position_a: {authority: mechaber, holds: true}
position_b: {authority: rema, holds: true}
marker_exists: true
status: passed
Step 6: Regression Testing¶
Run all existing tests to ensure no regressions:
Parse results:
def parse_pytest_results(output):
"""Parse pytest output for summary."""
import re
passed = int(re.search(r'(\d+) passed', output).group(1) or 0)
failed = int(re.search(r'(\d+) failed', output).group(1) or 0)
skipped = int(re.search(r'(\d+) skipped', output).group(1) or 0)
return {
'passed': passed,
'failed': failed,
'skipped': skipped,
'total': passed + failed + skipped,
'success': failed == 0
}
Regression Criteria:
| Metric | Requirement |
|---|---|
| Failed tests | 0 |
| Skipped tests | Documented reason |
| Coverage | No decrease |
Step 7: Context Sensitivity Testing¶
Test lechatchila/bedieved distinctions:
context_tests:
- id: test_lechatchila_strict
description: "Lechatchila context should be strict"
context: lechatchila
setup: |
food(beef). food_type(beef, beheima).
food(milk). food_type(milk, chalav).
mixture(m1). contains(m1, beef). contains(m1, milk).
context(lechatchila).
world: base
expected:
- holds(issur(achiila, m1, d_oraita), base)
- id: test_bedieved_lenient
description: "Bedieved with bitul may permit"
context: bedieved
setup: |
food(beef). food_type(beef, beheima).
food(milk). food_type(milk, chalav).
mixture(m1). contains(m1, beef). contains(m1, milk).
context(bedieved).
batel_b_shishim(m1).
world: base
expected:
- holds(heter(achiila, m1), base)
Phase D: Query Verification¶
Step 1: Engine API Testing¶
Test all engine methods work with new rules:
from pathlib import Path
from mistaber.engine import HsrsEngine
engine = HsrsEngine(Path("mistaber/ontology"))
# Test ask() - boolean query
assert engine.ask(
"holds(issur(achiila, m1, d_oraita), base)",
additional_facts="mixture(m1). is_beheima_chalav_mixture(m1)."
)
# Test query() - find all matches
results = engine.query("holds(issur(A, M, L), W)")
assert len(results) > 0
# Test compare() - cross-world comparison
comparison = engine.compare(
"holds(issur(achiila, M, L), W)",
worlds=["mechaber", "rema"]
)
assert "mechaber" in comparison
assert "rema" in comparison
Step 2: Derivation Chain Testing¶
Test explain() produces valid derivation:
explanation = engine.explain(
"holds(issur(achiila, m1, d_oraita), mechaber)",
world="mechaber",
additional_facts="""
food(beef). food_type(beef, beheima).
food(milk). food_type(milk, chalav).
mixture(m1). contains(m1, beef). contains(m1, milk).
"""
)
assert explanation['derivation'] is not None
assert len(explanation['tree']) > 0
print("Derivation tree:", explanation['tree'])
Expected Derivation Output:
holds(issur(achiila, m1, d_oraita), mechaber)
├─ asserts(mechaber, issur(achiila, m1, d_oraita))
│ ├─ is_beheima_chalav_mixture(m1)
│ │ ├─ mixture_has_basar(m1, beheima)
│ │ │ └─ food_type(beef, beheima)
│ │ └─ mixture_has_chalav(m1)
│ │ └─ food_type(milk, chalav)
│ └─ rule(r_bb_beheima_achiila)
│ └─ makor(r_bb_beheima_achiila, sa("yd:87:1"))
└─ accessible(mechaber, base)
Step 3: Performance Testing¶
Ensure queries complete in reasonable time:
import time
start = time.time()
for _ in range(100):
engine.query("holds(issur(A, M, L), W)")
elapsed = time.time() - start
avg_query_time = elapsed / 100
assert avg_query_time < 0.1, f"Query too slow: {avg_query_time}s"
performance = {
'avg_query_ms': avg_query_time * 1000,
'queries_tested': 100,
'acceptable': avg_query_time < 0.1
}
Performance Thresholds:
| Metric | Acceptable | Warning | Failing |
|---|---|---|---|
| Avg query time | < 50ms | 50-100ms | > 100ms |
| Grounding time | < 500ms | 500-1000ms | > 1000ms |
| Memory usage | < 100MB | 100-500MB | > 500MB |
Phase E: Validation Report¶
Output 1: Validation Report¶
validation-report-YD-{siman}-{seif}.md
# Validation Report: YD 87:3
## Summary
- **Reference:** YD 87:3
- **Validated:** 2026-01-25T15:30:00Z
- **Status:** PASSED
## Compilation
| Check | Status |
|-------|--------|
| HLL Compilation | PASSED |
| ASP Grounding | PASSED |
| Satisfiability | SAT |
## Semantic Validation
| Check | Status |
|-------|--------|
| Disjointness | PASSED |
| Madrega Consistency | PASSED |
| World Inheritance | PASSED |
| Makor Integrity | PASSED |
## Behavioral Tests
| Category | Passed | Failed | Total |
|----------|--------|--------|-------|
| Positive Tests | 10 | 0 | 10 |
| Negative Tests | 5 | 0 | 5 |
| Edge Cases | 3 | 0 | 3 |
| Machloket | 1 | 0 | 1 |
## Regression
| Metric | Value |
|--------|-------|
| Total Tests | 495 |
| Passed | 495 |
| Failed | 0 |
| Skipped | 0 |
## Query Verification
| Method | Status |
|--------|--------|
| ask() | PASSED |
| query() | PASSED |
| compare() | PASSED |
| explain() | PASSED |
## Performance
| Metric | Value | Status |
|--------|-------|--------|
| Avg Query Time | 45ms | ACCEPTABLE |
| Grounding Time | 120ms | ACCEPTABLE |
## Issues Found
None
Output 2: Validation Results YAML¶
validation-results-YD-{siman}-{seif}.yaml
reference: "YD:87:3"
validated_at: "2026-01-25T15:30:00Z"
status: passed
compilation:
hll_compilation: passed
asp_grounding: passed
satisfiability: SAT
semantic:
disjointness: passed
madrega_consistency: passed
world_inheritance: passed
makor_integrity: passed
behavioral:
positive_tests:
total: 10
passed: 10
failed: 0
scenarios:
- {id: test_dag_chalav_mechaber, status: passed}
- {id: test_beheima_chalav_base, status: passed}
# ...
negative_tests:
total: 5
passed: 5
failed: 0
edge_cases:
total: 3
passed: 3
failed: 0
machloket:
verified: true
topics:
- {topic: dag_chalav, status: passed}
regression:
total: 495
passed: 495
failed: 0
skipped: 0
performance:
avg_query_ms: 45
grounding_ms: 120
Output 3: Test Scenarios¶
test-scenarios-YD-{siman}-{seif}.yaml
Complete test scenarios for future regression testing (archived for permanent test suite).
Output 4: Performance Metrics¶
performance-metrics-YD-{siman}-{seif}.yaml
Detailed timing data for trend analysis.
Checkpoint Criteria¶
Before requesting human approval, verify:
- [ ] Compilation successful (no syntax errors)
- [ ] Program is SAT (no contradictions)
- [ ] All semantic checks pass
- [ ] All generated tests pass (positive, negative, edge)
- [ ] Machloket properly verified (both positions derivable)
- [ ] No regression failures
- [ ] Performance acceptable (< 100ms avg query)
- [ ] All four outputs generated
Session State Update¶
After completing validation:
current_phase: validate
checkpoints:
validate:
status: pending_review
artifacts:
- validation-report-YD-87-3.md
- validation-results-YD-87-3.yaml
- test-scenarios-YD-87-3.yaml
- performance-metrics-YD-87-3.yaml
test_summary:
total: 19
passed: 19
failed: 0
regression:
total: 495
passed: 495
performance:
avg_query_ms: 45
Troubleshooting¶
"UNSAT - No models found"¶
The rules contain contradictions:
- Check for conflicting issur/heter without override
- Verify world inheritance doesn't create conflict
- Look for circular rule dependencies
- Enable debug output:
clingo --warn=all
"Test failed: Expected X not found"¶
The expected conclusion isn't being derived:
- Check rule conditions match test setup
- Verify world scoping is correct
- Check for missing predicates in setup
- Use
explain()to trace derivation
"Regression failure in existing test"¶
A new rule broke existing functionality:
- Identify which rule caused the break
- Check if override is needed
- Verify new rule conditions are specific enough
- Consider if existing test is still valid
"Performance degraded"¶
Queries are slower than acceptable:
- Check for unbounded rules (no conditions)
- Look for redundant derivation paths
- Consider adding auxiliary predicates
- Profile with Clingo's
--statsflag
Best Practices¶
- Run validation frequently - Catch errors early during encoding
- Write tests as you encode - Don't leave testing until the end
- Test edge cases explicitly - Empty mixtures, single components
- Verify both machloket positions - Don't assume inheritance works
- Check performance trends - Watch for degradation over time
Next Phase¶
After Checkpoint 3 approval, proceed to Review and Approval.