HLL Compiler Pipeline¶
Overview¶
The HLL compiler transforms human-readable Halachic Logic Language into Answer Set Programming (ASP) code for the Clingo solver. This is the compilation layer of the Mistaber system, following a classic four-stage architecture.
flowchart TB
subgraph INPUT["HLL SOURCE CODE"]
hll["@world(base)<br/>basar(chicken).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
end
subgraph S1["STAGE 1: PARSER"]
parser["• Tokenize HLL source<br/>• Build parse tree (Lark LALR)<br/>• Transform to AST<br/><i>mistaber/dsl/compiler/parser.py</i>"]
end
subgraph S2["STAGE 2: NORMALIZER"]
norm["• Expand surface shortcuts<br/>• basar(X) → food_type(X, basar)<br/>• issur(A,F) → forbidden(W,A,F,ctx)<br/><i>mistaber/dsl/compiler/normalizer.py</i>"]
end
subgraph S3["STAGE 3: TYPE CHECKER"]
tc["• Validate predicates<br/>• Check arity, sorts, enums<br/>• Enforce @makor<br/><i>mistaber/dsl/compiler/type_checker.py</i>"]
end
subgraph S4["STAGE 4: EMITTER"]
emit["• Generate ASP code<br/>• Add metadata (rule IDs, makor)<br/>• Add xclingo2 trace<br/><i>mistaber/dsl/compiler/emitter.py</i>"]
end
subgraph OUTPUT["ASP OUTPUT"]
asp["% World: base<br/>food_type(chicken, basar).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
end
INPUT --> S1
S1 -->|ParseResult| S2
S2 -->|Normalized| S3
S3 -->|Validated| S4
S4 --> OUTPUT
Stage 1: Parser¶
Location: mistaber/dsl/compiler/parser.py
The parser uses Lark with an LALR(1) grammar to tokenize and parse HLL source code.
Grammar Location¶
mistaber/dsl/grammar.lark - The complete HLL grammar specification.
AST Types¶
@dataclass
class Atom:
"""Represents a predicate with arguments."""
predicate: str
args: List[str]
negated: bool = False
@dataclass
class Fact:
"""Unconditional statement: head."""
predicate: str
args: List[str]
@dataclass
class Rule:
"""Conditional statement: head :- body."""
head: Atom
body: List[Atom]
@dataclass
class ParseResult:
"""Complete parse output."""
facts: List[Fact]
rules: List[Rule]
world: Optional[str] # @world directive
rule_id: Optional[str] # @rule directive
sources: Optional[List[Tuple[str, str]]] # @makor citations
madrega: Optional[str] # @madrega level
Directive Processing¶
| Directive | Example | Stored In |
|---|---|---|
@world(id) |
@world(base) |
ParseResult.world |
@rule(id) |
@rule(r_basar_bechalav) |
ParseResult.rule_id |
@makor([...]) |
@makor([sa("YD:87:1")]) |
ParseResult.sources |
@madrega(level) |
@madrega(d_oraita) |
ParseResult.madrega |
Error Handling¶
class ParseError(Exception):
"""Raised for syntax errors, unbalanced parens, invalid tokens."""
pass
Common parse errors: - Missing period at end of fact/rule - Unbalanced parentheses - Invalid identifier (uppercase where constant expected) - Unrecognized directive
Stage 2: Normalizer¶
Location: mistaber/dsl/compiler/normalizer.py
The normalizer expands Hebrew-friendly surface syntax to canonical predicates. This stage must run before type checking because the registry only contains canonical predicates.
Surface Syntax Expansions¶
Food Type Shortcuts¶
| Surface | Canonical |
|---|---|
basar(X) |
food_type(X, basar) |
chalav(X) |
food_type(X, chalav) |
parve(X) |
food_type(X, parve) |
beheima(X) |
food_type(X, beheima) |
chaya(X) |
food_type(X, chaya) |
of(X) |
food_type(X, of) |
dag(X) |
food_type(X, dag) |
mashkeh(X) |
food_type(X, mashkeh) |
tavlin(X) |
food_type(X, tavlin) |
Status Shortcuts¶
| Surface | Canonical |
|---|---|
issur(action, food) |
forbidden(W, action, food, ctx_normal) |
mutar(action, food) |
permitted(W, action, food, ctx_normal) |
Where W is taken from @world directive or remains a variable.
World Injection¶
When a @world directive is present, the normalizer injects the world value into expanded predicates:
@world(rema)
issur(achiila, gevinas_akum).
% Normalizes to:
forbidden(rema, achiila, gevinas_akum, ctx_normal).
Error Handling¶
Common normalization errors:
- Wrong arity: basar(X, Y) (expects 1 arg)
- Wrong arity: issur(X) (expects 2 args)
Stage 3: Type Checker¶
Location: mistaber/dsl/compiler/type_checker.py
The type checker validates the normalized AST against the predicate registry (mistaber/dsl/vocabulary/base.yaml).
Validation Checks¶
- Predicate Existence: Is the predicate declared in the registry?
- Arity Matching: Does argument count match signature?
- Sort Enforcement: Are enum values valid for their position?
- Madrega Validation: Is
@madregavalue valid (d_oraita,d_rabanan, etc.)? - Makor Requirement: Do normative rules have
@makorcitations? - OWA Negation Warning: Is negation used on open-world predicates?
Normative Predicates¶
These predicates require @makor citation:
- forbidden
- permitted
- safek
Error vs. Warning¶
| Severity | Stops Compilation | Example |
|---|---|---|
| Error | Yes | Undeclared predicate, arity mismatch |
| Warning | No | OWA predicate negation |
Type Check Result¶
@dataclass
class TypeCheckError:
message: str
severity: Literal["error", "warning"]
predicate: str = ""
line: int = 0
Enum Value Checking¶
The type checker dynamically reads enum sorts from the registry:
If @madrega(invalid_value) is used, an error is generated.
Stage 4: Emitter¶
Location: mistaber/dsl/compiler/emitter.py
The emitter generates ASP code from the validated AST, adding metadata and safety measures.
Output Structure¶
% World: base
% Rule ID: r_basar_bechalav_issur
rule(r_basar_bechalav_issur).
scope(r_basar_bechalav_issur, base).
makor(r_basar_bechalav_issur, sa("YD:87:1")).
madrega(r_basar_bechalav_issur, d_oraita).
food(beef).
food_type(beef, beheima).
%!trace_rule {"r_basar_bechalav_issur: action on food is forbidden"}
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).
xclingo2 Trace Annotations¶
For explainability, the emitter adds trace annotations that xclingo2 uses to generate human-readable explanations:
Security: Injection Prevention¶
The emitter sanitizes all arguments to prevent ASP code injection:
DANGEROUS_CHARS = {';', '\n', '\r', '%', '.', ':-', '#'}
def _sanitize_argument(arg: str) -> str:
"""Reject arguments containing dangerous characters."""
for char in [';', '\n', '\r', '%', '#']:
if char in arg:
raise EmitterError(f"Invalid character '{char}'")
# Additional checks for :- and standalone .
This prevents attacks like:
Identifier Validation¶
Identifiers (predicates, worlds, rule IDs) must:
- Start with lowercase letter or underscore
- Not contain: ;, \n, \r, %, #, ., :, (, )
Facade: compile_hll()¶
Location: mistaber/dsl/compiler/compiler.py
The compile_hll() function orchestrates the entire pipeline:
def compile_hll(source: str, return_warnings: bool = False) -> str:
"""
Pipeline: Parse → Normalize → TypeCheck → Emit
Raises:
CompileError: On parse, normalization, type, or emission errors
"""
# 1. Parse
parser = HLLParser()
ast = parser.parse(source)
# 2. Normalize (MUST be before type checking)
normalizer = Normalizer()
normalized_ast = normalizer.normalize(ast)
# 3. Type check
checker = TypeChecker()
errors = checker.check(normalized_ast)
# 4. Handle errors/warnings
hard_errors = [e for e in errors if e.severity == "error"]
if hard_errors:
raise CompileError(...)
# 5. Emit
emitter = ASPEmitter()
asp = emitter.emit(normalized_ast)
return asp
Pipeline Order Rationale¶
Normalize → TypeCheck (not TypeCheck → Normalize)
The registry contains only canonical predicates (food_type, not basar). Surface syntax must be expanded before validation can occur.
Usage Example¶
from mistaber.dsl.compiler import compile_hll
source = """
@world(base)
@rule(r_test)
@makor([sa("YD:87:1")])
@madrega(d_oraita)
basar(chicken).
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).
"""
asp_code = compile_hll(source)
print(asp_code)
Compile with Warnings¶
asp_code, warnings = compile_hll(source, return_warnings=True)
for w in warnings:
print(f"Warning: {w.message}")
Error Handling Summary¶
| Stage | Exception | Example |
|---|---|---|
| Parser | ParseError |
Syntax error at line 5 |
| Normalizer | NormalizationError |
issur() expects 2 args |
| Type Checker | TypeCheckError |
Undeclared predicate |
| Emitter | EmitterError |
Invalid character in argument |
| Facade | CompileError |
Wraps all above errors |
Testing¶
Tests are located in tests/dsl/:
| File | Coverage |
|---|---|
test_parser.py |
Parser and AST construction |
test_normalizer.py |
Surface syntax expansion |
test_type_checker.py |
Registry validation |
test_emitter.py |
ASP code generation |
test_compiler.py |
End-to-end pipeline |
Run all DSL tests: