Validation Skill¶
The validate skill compiles encoded HLL rules, runs semantic validation checks, generates and executes test scenarios, and verifies machloket encoding before final human review.
Overview¶
| Attribute | Value |
|---|---|
| Skill Name | validate |
| Phase | 3 of 5 |
| Checkpoint | 3: Test Review |
| Prerequisites | Checkpoint 2 (hll-encode) approved |
| Outputs | validation-report.md, validation-results.yaml, test-scenarios.yaml |
| Trigger Phrases | "validate encoding", "run tests", "compile rules", "check rules", "verify encoding" |
Prerequisites¶
Before this skill can execute:
- HLL-encode checkpoint MUST be approved
- Encoded
.lpfiles must exist: - Shared definitions in
mistaber/ontology/corpus/yd_{siman}/base.lp - World-specific rules in
mistaber/ontology/worlds/*.lp - Encoding report and mapping artifacts must exist
- Session state shows
hll-encode.status: approved
Phase A: Compilation Validation¶
Step 1: HLL Compilation¶
The skill compiles HLL to ASP using the Mistaber compiler:
from mistaber.dsl.compiler import compile_hll_file, CompileError
try:
# Compile world files that include the new rules
for world_file in Path("mistaber/ontology/worlds").glob("*.lp"):
result = compile_hll_file(world_file)
compilation_status = "passed"
except CompileError as e:
compilation_status = "failed"
error_details = str(e)
Compilation Checks:
| Check | Description |
|---|---|
| Syntax | Valid ASP syntax |
| Directives | Valid HLL directive format |
| Includes | All #include files exist |
| Predicates | All predicates defined |
Step 2: ASP Grounding Test¶
The skill tests that Clingo can ground the program:
import clingo
ctl = clingo.Control(["--warn=none"])
# Load all ontology
for lp_file in Path("mistaber/ontology").rglob("*.lp"):
ctl.load(str(lp_file))
# Ground
try:
ctl.ground([("base", [])])
grounding_status = "passed"
except RuntimeError as e:
grounding_status = "failed"
grounding_error = str(e)
Grounding Statistics:
| Metric | Description |
|---|---|
| Rules grounded | Number of ground rules generated |
| Atoms created | Number of ground atoms |
| Grounding time | Time to complete grounding (ms) |
Step 3: Satisfiability Check¶
The skill verifies the program is satisfiable (has at least one answer set):
with ctl.solve(yield_=True) as handle:
model_count = 0
for model in handle:
model_count += 1
if model_count >= 1:
break
if model_count == 0:
satisfiability = "UNSAT" # CRITICAL - rules are contradictory
else:
satisfiability = "SAT"
UNSAT is Critical
If the program is UNSAT, the rules contain logical contradictions. This MUST be resolved before proceeding. The skill will not pass checkpoint with UNSAT result.
UNSAT Debugging:
# Get minimal unsatisfiable core
clingo ontology.lp --opt-mode=optN -n 0
# Trace to find conflict
clingo --trace=asp ontology.lp
Phase B: Semantic Validation¶
Step 1: Disjointness Verification¶
The skill verifies no entity violates disjointness constraints:
% Constraints checked:
% No food can be both basar and chalav
:- food(F), food_type(F, basar), food_type(F, chalav).
% No mixture can have contradictory status in same world
:- asserts(W, issur(A, M, _)), asserts(W, heter(A, M)), not override(W, _, _).
Validation Query:
violations = engine.query("disjoint_violation(X, Y)")
if violations:
semantic_errors.append(f"Disjointness violation: {violations}")
Step 2: Madrega Consistency¶
The skill verifies d'oraita rules have Torah sources:
for rule in rules_with_madrega_doraita:
makor_chain = get_makor_chain(rule)
if not has_torah_source(makor_chain):
warnings.append(
f"Rule {rule} marked d_oraita but chain doesn't reach Torah"
)
Madrega Validation Rules:
| Madrega | Required Source |
|---|---|
| d_oraita | Torah verse |
| d_rabanan | Mishnah, Gemara, or Takana |
| minhag | Community practice reference |
| chumra | Any halachic source |
Step 3: World Inheritance Verification¶
The skill verifies world hierarchy is respected:
# Check that asserts flow correctly through accessible/2
for world in ["mechaber", "rema", "sefardi_yo", "ashk_mb"]:
base_assertions = engine.query("asserts(base, X)")
world_holds = engine.query(f"holds(X, {world})")
# Base assertions should propagate unless overridden
for assertion in base_assertions:
if assertion not in world_holds:
if not is_overridden(world, assertion):
errors.append(f"Inheritance broken: {assertion} not in {world}")
World Inheritance Map:
graph TD
BASE["base"] --> MECH["mechaber"]
BASE --> REMA["rema"]
BASE --> GRA["gra"]
MECH --> SEF["sefardi_yo"]
REMA --> ASHK_MB["ashk_mb"]
REMA --> ASHK_AH["ashk_ah"]
GRA --> ASHK_AH
Step 4: Makor Chain Integrity¶
The skill verifies every normative rule has makor:
normative_rules = engine.query("rule(R), scope(R, _)")
for rule in normative_rules:
makor = engine.query(f"makor({rule['R']}, _)")
if not makor:
errors.append(f"Rule {rule['R']} missing makor")
Phase C: Behavioral Testing¶
Step 1: Generate Test Scenarios¶
Based on the corpus statements, the skill generates test cases:
# test-scenarios-YD-87-3.yaml
scenarios:
- id: test_dag_chalav_mechaber
description: "Fish + dairy mixture should trigger sakana in mechaber world"
setup: |
food(salmon). food_type(salmon, dag).
food(cream). food_type(cream, chalav).
mixture(m1). contains(m1, salmon). contains(m1, cream).
expected:
mechaber:
- holds(sakana(m1), mechaber)
- holds(issur(achiila, m1, sakana), mechaber)
rema:
- holds(heter(achiila, m1), rema)
- holds(no_sakana(m1), rema)
Step 2: Positive Tests¶
Tests that expected conclusions hold:
def run_positive_test(scenario, world):
# Setup scenario
result = engine.analyze(scenario['setup'], world=world)
# Check expected atoms
for expected in scenario['expected'][world]:
assert expected in result['atoms'], \
f"Expected {expected} not found in {world}"
Positive Test Categories:
| Category | Tests |
|---|---|
| Base rules | Universal prohibitions hold |
| World-specific | Mechaber/Rema differences work |
| Inheritance | Child worlds inherit correctly |
| Override | Overrides take effect |
Step 3: Negative Tests¶
Tests that incorrect conclusions do NOT hold:
def run_negative_test(scenario, world):
result = engine.analyze(scenario['setup'], world=world)
for unexpected in scenario.get('unexpected', {}).get(world, []):
assert unexpected not in result['atoms'], \
f"Unexpected {unexpected} found in {world}"
Example Negative Tests:
negative_tests:
- id: rema_no_sakana
description: "Rema world should NOT have sakana for fish+dairy"
setup: |
food(salmon). food_type(salmon, dag).
food(butter). food_type(butter, chalav).
mixture(m1). contains(m1, salmon). contains(m1, butter).
unexpected:
rema:
- holds(sakana(m1), rema)
- holds(issur(achiila, m1, sakana), rema)
Step 4: Edge Cases¶
Tests boundary conditions:
edge_cases:
- id: empty_mixture
description: "Empty mixture should not trigger rules"
setup: "mixture(m_empty)."
expected:
base:
- not holds(issur(achiila, m_empty, _), base)
- id: single_component
description: "Single component mixture - no prohibition"
setup: |
food(beef). food_type(beef, beheima).
mixture(m_single). contains(m_single, beef).
expected:
base:
- not holds(issur(achiila, m_single, _), base)
- id: two_basar_no_chalav
description: "Two basar types without chalav - no BB"
setup: |
food(beef). food_type(beef, beheima).
food(chicken). food_type(chicken, of).
mixture(m_basar). contains(m_basar, beef). contains(m_basar, chicken).
expected:
base:
- not holds(issur(achiila, m_basar, d_oraita), base)
Step 5: Machloket Verification¶
The skill verifies both dispute positions are derivable:
def verify_machloket(machloket_entry):
topic = machloket_entry['topic']
auth_a = machloket_entry['position_a']['authority']
auth_b = machloket_entry['position_b']['authority']
# Check position A holds in world A
result_a = engine.query(f"holds({topic}_position_a, {auth_a})")
assert result_a, f"Position A not derivable in {auth_a}"
# Check position B holds in world B
result_b = engine.query(f"holds({topic}_position_b, {auth_b})")
assert result_b, f"Position B not derivable in {auth_b}"
# Check machloket marker exists
marker = engine.query(f"machloket({topic}, {auth_a}, {auth_b}, _)")
assert marker, f"Machloket marker missing for {topic}"
Step 6: Regression Testing¶
The skill runs all existing tests to ensure no regressions:
Result Parsing:
def parse_pytest_results(output):
passed = int(re.search(r'(\d+) passed', output).group(1))
failed = int(re.search(r'(\d+) failed', output).group(1) or 0)
skipped = int(re.search(r'(\d+) skipped', output).group(1) or 0)
return {
'passed': passed,
'failed': failed,
'skipped': skipped,
'total': passed + failed + skipped,
'success': failed == 0
}
Step 7: Context Sensitivity Testing¶
Tests lechatchila/bedieved distinctions:
context_tests:
- id: test_lechatchila
context: lechatchila
setup: |
food(salmon). food_type(salmon, dag).
food(cream). food_type(cream, chalav).
mixture(m1). contains(m1, salmon). contains(m1, cream).
context(lechatchila).
expected:
mechaber:
- holds(issur(achiila, m1, sakana), mechaber)
- id: test_bedieved
context: bedieved
setup: |
food(salmon). food_type(salmon, dag).
food(cream). food_type(cream, chalav).
mixture(m1). contains(m1, salmon). contains(m1, cream).
context(bedieved).
hefsed_merubeh.
expected:
mechaber:
- "varies by posek" # May need human interpretation
Phase D: Query Verification¶
Step 1: Engine API Testing¶
The skill tests all engine methods work with new rules:
# Test ask()
assert engine.ask("holds(issur(achiila, m1, d_oraita), base)")
# Test query()
results = engine.query("holds(issur(A, M, L), W)")
assert len(results) > 0
# Test compare()
comparison = engine.compare(
"holds(issur(achiila, M, L), W)",
worlds=["mechaber", "rema"]
)
assert "mechaber" in comparison
assert "rema" in comparison
Step 2: Derivation Chain Testing¶
The skill tests explain() produces valid derivation:
explanation = engine.explain(
"holds(issur(achiila, m1, d_oraita), mechaber)",
world="mechaber",
additional_facts="food(beef). food_type(beef, beheima). ..."
)
assert explanation['derivation'] is not None
assert len(explanation['tree']) > 0
Step 3: Performance Testing¶
The skill ensures queries complete in reasonable time:
import time
start = time.time()
for _ in range(100):
engine.query("holds(issur(A, M, L), W)")
elapsed = time.time() - start
avg_query_time = elapsed / 100
assert avg_query_time < 0.1, f"Query too slow: {avg_query_time}s"
Performance Targets:
| Metric | Target | Warning Threshold |
|---|---|---|
| Grounding time | <100ms | >200ms |
| First query | <50ms | >100ms |
| Subsequent queries | <10ms | >50ms |
| Memory usage | <100MB | >200MB |
Phase E: Validation Report¶
Output 1: Validation Report¶
validation-report-YD-{siman}-{seif}.md
Uses template from templates/validation-report.md:
# Validation Report: YD {siman}:{seif}
## Summary
- **Reference:** YD {siman}:{seif}
- **Validated:** {timestamp}
- **Status:** {PASSED/FAILED}
## Compilation
- HLL Compilation: {passed/failed}
- ASP Grounding: {passed/failed}
- Satisfiability: {SAT/UNSAT}
## Semantic Validation
- Disjointness: {passed/failed}
- Madrega Consistency: {passed/failed}
- World Inheritance: {passed/failed}
- Makor Integrity: {passed/failed}
## Behavioral Tests
- Positive Tests: {N} passed
- Negative Tests: {N} passed
- Edge Cases: {N} passed
- Machloket Verification: {passed/failed}
- Regression: {N}/{N} passed
## Performance
- Average Query Time: {N}ms
- Grounding Time: {N}ms
## Issues Found
{list of issues if any}
Output 2: Validation Results YAML¶
validation-results-YD-{siman}-{seif}.yaml
reference: "YD:{siman}:{seif}"
validated_at: "{timestamp}"
status: passed # or failed
compilation:
hll_compilation: passed
asp_grounding: passed
satisfiability: SAT
semantic:
disjointness: passed
madrega_consistency: passed
world_inheritance: passed
makor_integrity: passed
behavioral:
positive_tests:
total: 10
passed: 10
failed: 0
negative_tests:
total: 5
passed: 5
failed: 0
edge_cases:
total: 3
passed: 3
failed: 0
machloket:
verified: true
regression:
total: 495
passed: 495
failed: 0
skipped: 0
performance:
avg_query_ms: 45
grounding_ms: 120
Output 3: Test Scenarios¶
test-scenarios-YD-{siman}-{seif}.yaml
Full test scenarios for future regression testing.
Checkpoint Criteria¶
Before requesting human approval, verify:
- [ ] Compilation successful (no syntax errors)
- [ ] Program is SAT (no contradictions)
- [ ] All semantic checks pass
- [ ] All generated tests pass
- [ ] Machloket properly verified
- [ ] No regression failures
- [ ] Performance acceptable (<100ms avg query)
Session State Update¶
After completing validation:
current_phase: validate
checkpoints:
validate:
status: pending_review
artifacts:
- validation-report-YD-{siman}-{seif}.md
- validation-results-YD-{siman}-{seif}.yaml
- test-scenarios-YD-{siman}-{seif}.yaml
test_summary:
total: 25
passed: 25
failed: 0
performance:
avg_query_ms: 45
Common Issues¶
UNSAT Result¶
Symptom: Clingo returns UNSATISFIABLE.
Diagnosis:
Common Causes:
- Direct contradiction (issur + heter for same thing)
- Circular negation without choice rule
- Conflicting world assertions
Grounding Timeout¶
Symptom: Grounding takes >10 seconds.
Solutions:
- Add domain bounds
- Reduce Cartesian products
- Use projection directives
Test Failure¶
Symptom: Expected atom not found.
Diagnosis:
# Debug specific query
result = engine.analyze(setup_facts, world="mechaber")
print("All atoms:", result['atoms'])
Common Causes:
- Missing prerequisite facts
- World inheritance not working
- Override incorrectly structured
Related Documentation¶
- HLL Encoding Skill - Previous phase
- Review Skill - Next phase
- Troubleshooting - Detailed error resolution