HLL Compiler Pipeline¶
Overview¶
The HLL compiler transforms human-readable Halachic Logic Language into Answer Set Programming (ASP) code for the Clingo solver. The system operates at two levels:
- Per-file compilation — The
compile_hll()function processes a single.hllfile through four stages (parse → normalize → type check → emit) - Build pipeline — The
python -m mistaber.dsl.buildcommand orchestrates compilation of all.hllfiles across the project, merges vocabulary intobase.yaml, and generatesmeta.lp
flowchart TB
subgraph INPUT["HLL SOURCE CODE"]
hll["@world(base)<br/>basar(chicken).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
end
subgraph S1["STAGE 1: PARSER"]
parser["• Tokenize HLL source<br/>• Build parse tree (Lark LALR)<br/>• Transform to AST<br/><i>mistaber/dsl/compiler/parser.py</i>"]
end
subgraph S2["STAGE 2: NORMALIZER"]
norm["• Expand surface shortcuts<br/>• basar(X) → food_type(X, basar)<br/>• issur(A,F) → forbidden(W,A,F,ctx)<br/><i>mistaber/dsl/compiler/normalizer.py</i>"]
end
subgraph S3["STAGE 3: TYPE CHECKER"]
tc["• Validate predicates<br/>• Check arity, sorts, enums<br/>• Enforce @makor<br/><i>mistaber/dsl/compiler/type_checker.py</i>"]
end
subgraph S4["STAGE 4: EMITTER"]
emit["• Generate ASP code<br/>• Add metadata (rule IDs, makor)<br/>• Add xclingo2 trace<br/><i>mistaber/dsl/compiler/emitter.py</i>"]
end
subgraph OUTPUT["ASP OUTPUT"]
asp["% World: base<br/>food_type(chicken, basar).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
end
INPUT --> S1
S1 -->|ParseResult| S2
S2 -->|Normalized| S3
S3 -->|Validated| S4
S4 --> OUTPUT
Stage 1: Parser¶
Location: mistaber/dsl/compiler/parser.py
The parser uses Lark with an LALR(1) grammar to tokenize and parse HLL source code.
Grammar Location¶
mistaber/dsl/grammar.lark - The complete HLL grammar specification.
AST Types¶
@dataclass
class Atom:
"""Represents a predicate with arguments."""
predicate: str
args: List[str]
negated: bool = False
@dataclass
class Fact:
"""Unconditional statement: head."""
predicate: str
args: List[str]
@dataclass
class Rule:
"""Conditional statement: head :- body."""
head: Atom
body: List[Atom]
@dataclass
class ParseResult:
"""Complete parse output."""
facts: List[Fact]
rules: List[Rule]
# Rule metadata directives
world: Optional[str] # @world directive
rule_id: Optional[str] # @rule directive
sources: Optional[List[Tuple[str, str]]] # @makor citations
madrega: Optional[str] # @madrega level
# Vocabulary registration directives
sorts: List[SortDecl] # @sort directives
subsorts: List[SubsortDecl] # @subsort directives
enums: List[EnumDecl] # @enum directives
declarations: List[PredicateDecl] # @declare directives
# World definition directives
world_defs: List[WorldDef] # @world_def directives
endorsements: List[Endorsement] # @endorses directives
interprets: List[InterpretDecl] # @interprets directives
interpretations: List[InterpretationDecl] # @interpretation directives
# Output and documentation directives
shows: List[ShowDirective] # @show directives
encoding_notes: List[str] # @encoding_note directives
constraints: List[ConstraintDecl] # @constraint directives
Directive Processing¶
| Directive | Example | Stored In |
|---|---|---|
@world(id) |
@world(base) |
ParseResult.world |
@rule(id) |
@rule(r_basar_bechalav) |
ParseResult.rule_id |
@makor([...]) |
@makor([sa("YD:87:1")]) |
ParseResult.sources |
@madrega(level) |
@madrega(d_oraita) |
ParseResult.madrega |
@sort(name, domain, desc) |
@sort(food, physical, "...") |
ParseResult.sorts |
@subsort(child, parent) |
@subsort(beheima, food) |
ParseResult.subsorts |
@enum(sort, [members]) |
@enum(food_category, [...]) |
ParseResult.enums |
@declare(name, [sorts], ...) |
@declare(is_food, [food], ...) |
ParseResult.declarations |
@world_def(name, parent) |
@world_def(mechaber, base) |
ParseResult.world_defs |
@endorses(world, prop, ...) |
@endorses(gra, issur(...), ...) |
ParseResult.endorsements |
@interprets(comm, auth) |
@interprets(shach, mechaber) |
ParseResult.interprets |
@interpretation(comm, rule, ...) |
@interpretation(shach, r_id, ...) |
ParseResult.interpretations |
@show(pred/arity) |
@show(holds/2) |
ParseResult.shows |
@encoding_note("text") |
@encoding_note("...") |
ParseResult.encoding_notes |
@constraint(name, cat, desc) |
@constraint(no_dual, ...) |
ParseResult.constraints |
Error Handling¶
class ParseError(Exception):
"""Raised for syntax errors, unbalanced parens, invalid tokens."""
pass
Common parse errors: - Missing period at end of fact/rule - Unbalanced parentheses - Invalid identifier (uppercase where constant expected) - Unrecognized directive
Stage 2: Normalizer¶
Location: mistaber/dsl/compiler/normalizer.py
The normalizer expands Hebrew-friendly surface syntax to canonical predicates. This stage must run before type checking because the registry only contains canonical predicates.
Surface Syntax Expansions¶
Food Type Shortcuts¶
| Surface | Canonical |
|---|---|
basar(X) |
food_type(X, basar) |
chalav(X) |
food_type(X, chalav) |
parve(X) |
food_type(X, parve) |
beheima(X) |
food_type(X, beheima) |
chaya(X) |
food_type(X, chaya) |
of(X) |
food_type(X, of) |
dag(X) |
food_type(X, dag) |
mashkeh(X) |
food_type(X, mashkeh) |
tavlin(X) |
food_type(X, tavlin) |
Status Shortcuts¶
| Surface | Canonical |
|---|---|
issur(action, food) |
forbidden(W, action, food, ctx_normal) |
mutar(action, food) |
permitted(W, action, food, ctx_normal) |
Where W is taken from @world directive or remains a variable.
World Injection¶
When a @world directive is present, the normalizer injects the world value into expanded predicates:
@world(rema)
issur(achiila, gevinas_akum).
% Normalizes to:
forbidden(rema, achiila, gevinas_akum, ctx_normal).
Error Handling¶
Common normalization errors:
- Wrong arity: basar(X, Y) (expects 1 arg)
- Wrong arity: issur(X) (expects 2 args)
Stage 3: Type Checker¶
Location: mistaber/dsl/compiler/type_checker.py
The type checker validates the normalized AST against the predicate registry (mistaber/dsl/vocabulary/base.yaml).
Validation Checks¶
Errors (stop compilation)¶
| Check | Description |
|---|---|
Missing hebrew/english in @declare |
Required fields for predicate registration |
Undefined sort in @declare signature |
Sort must exist in registry or local @sort |
| Arity conflict with existing predicate | Same predicate name, different arity already registered |
Invalid @sort domain |
Must be: physical, normative, classification, temporal, meta |
Invalid @interpretation action |
Must be: adds_condition, removes_condition, restricts_scope, expands_scope |
Self-referential @world_def parent |
World cannot be its own parent |
Invalid @madrega value |
Must match registry's madrega_type enum |
Warnings (compilation continues)¶
| Check | Description |
|---|---|
| Undeclared predicate | Predicate not in registry or local @declare |
| Arity mismatch | Argument count differs from declaration |
| Invalid enum value | Constant not in enumeration |
| OWA predicate with negation | May cause unsafe permissiveness |
Normative rule without @makor |
Source citations expected |
Normative Predicates¶
These predicates require @makor citation:
- forbidden
- permitted
- safek
Type Check Result¶
@dataclass
class TypeCheckError:
message: str
severity: Literal["error", "warning"]
predicate: str = ""
line: int = 0
Enum Value Checking¶
The type checker dynamically reads enum sorts from the registry:
If @madrega(invalid_value) is used, an error is generated.
Stage 4: Emitter¶
Location: mistaber/dsl/compiler/emitter.py
The emitter generates ASP code from the validated AST, adding metadata and safety measures.
Output Structure¶
% World: base
% Rule ID: r_basar_bechalav_issur
rule(r_basar_bechalav_issur).
scope(r_basar_bechalav_issur, base).
makor(r_basar_bechalav_issur, sa("YD:87:1")).
madrega(r_basar_bechalav_issur, d_oraita).
food(beef).
food_type(beef, beheima).
%!trace_rule {"r_basar_bechalav_issur: action on food is forbidden"}
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).
xclingo2 Trace Annotations¶
For explainability, the emitter adds trace annotations that xclingo2 uses to generate human-readable explanations:
Security: Injection Prevention¶
The emitter sanitizes all arguments to prevent ASP code injection:
DANGEROUS_CHARS = {';', '\n', '\r', '%', '.', ':-', '#'}
def _sanitize_argument(arg: str) -> str:
"""Reject arguments containing dangerous characters."""
for char in [';', '\n', '\r', '%', '#']:
if char in arg:
raise EmitterError(f"Invalid character '{char}'")
# Additional checks for :- and standalone .
This prevents attacks like:
Identifier Validation¶
Identifiers (predicates, worlds, rule IDs) must:
- Start with lowercase letter or underscore
- Not contain: ;, \n, \r, %, #, ., :, (, )
Facade: compile_hll()¶
Location: mistaber/dsl/compiler/compiler.py
The compile_hll() function orchestrates the per-file pipeline:
def compile_hll(source: str, return_warnings: bool = False) -> str:
"""
Pipeline: Parse → Normalize → TypeCheck → Emit
Raises:
CompileError: On parse, normalization, type, or emission errors
"""
# 1. Parse
parser = HLLParser()
ast = parser.parse(source)
# 2. Normalize (MUST be before type checking)
normalizer = Normalizer()
normalized_ast = normalizer.normalize(ast)
# 3. Type check
checker = TypeChecker()
errors = checker.check(normalized_ast)
# 4. Handle errors/warnings
hard_errors = [e for e in errors if e.severity == "error"]
if hard_errors:
raise CompileError(...)
# 5. Emit
emitter = ASPEmitter()
asp = emitter.emit(normalized_ast)
return asp
Pipeline Order Rationale¶
Normalize → TypeCheck (not TypeCheck → Normalize)
The registry contains only canonical predicates (food_type, not basar). Surface syntax must be expanded before validation can occur.
Usage Example¶
from mistaber.dsl.compiler import compile_hll
source = """
@world(base)
@rule(r_test)
@makor([sa("YD:87:1")])
@madrega(d_oraita)
basar(chicken).
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).
"""
asp_code = compile_hll(source)
print(asp_code)
Compile with Warnings¶
asp_code, warnings = compile_hll(source, return_warnings=True)
for w in warnings:
print(f"Warning: {w.message}")
Build Pipeline¶
The build pipeline (python -m mistaber.dsl.build) orchestrates compilation of all .hll files across the project. It is the standard way to compile HLL source — compile_hll() is the per-file API it uses internally.
flowchart TB
subgraph COLLECT["PHASE 1: COLLECT & PARSE"]
collect["• Find .hll files by layer order<br/>• Parse each to AST (ParseResult)<br/>• Extract directives into BuildManifest"]
end
subgraph MERGE["PHASE 2: REGISTRY MERGE"]
merge["• YAMLMerger merges directives into base.yaml<br/>• @declare → predicates section<br/>• @sort → sorts section<br/>• @enum → enums section<br/>• @world_def, @interprets, @interpretation → worlds/interpretations"]
end
subgraph COMPILE["PHASE 3: COMPILE"]
comp["• For each .hll: normalize → type_check → emit<br/>• Produces .lp files in ontology/<br/>• .lp passthrough files copied verbatim"]
end
subgraph META["PHASE 4: GENERATE META"]
meta["• Rebuild meta.lp from updated base.yaml<br/>• Sort/predicate/enum membership atoms"]
end
subgraph DSL["dsl/"]
schema["schema/*.hll"]
base["base/*.hll"]
worlds["worlds/*.hll"]
engine["engine/*.hll + *.lp"]
interp["interpretations/*.hll"]
corpus["corpus/*.hll"]
end
subgraph OUT["ontology/"]
out_schema["schema/*.lp"]
out_base["base/*.lp"]
out_worlds["worlds/*.lp"]
out_engine["engine/*.lp"]
out_interp["interpretations/*.lp"]
out_corpus["corpus/*.lp"]
out_meta["meta.lp"]
out_yaml["(base.yaml updated)"]
end
DSL --> COLLECT
COLLECT --> MERGE
MERGE --> COMPILE
COMPILE --> META
META --> OUT
Layer Order¶
Files are processed in dependency order — each layer may reference sorts and predicates from earlier layers:
schema— Sort definitions, constraints, disjointnessbase— Core facts (status, substance, issur_types, madrega, shiur)worlds— Kripke worlds (base, mechaber, rema, gra, ashk_ah, sefardi_yo)engine— Reasoning engine (safek, priorities, interpretations, policy)interpretations— Commentator rules (Shach, Taz)corpus— Encoded seifim (yd_87/, yd_89/)
Passthrough Files¶
.lp files in dsl/ (e.g., engine/preferences.lp for asprin directives) are copied verbatim to ontology/. A naming conflict (both foo.hll and foo.lp in the same directory) is a build error.
Atomic Writes¶
All outputs are written to disk only after all 4 phases succeed. If any phase fails, no files are modified. This prevents partial/inconsistent state in ontology/.
BuildManifest¶
The BuildManifest aggregates directives from all parsed .hll files:
| Directive | Aggregated Into |
|---|---|
@sort |
BuildManifest.sorts |
@subsort |
BuildManifest.subsorts |
@enum |
BuildManifest.enums |
@declare |
BuildManifest.declarations |
@world_def |
BuildManifest.world_defs |
@interprets |
BuildManifest.interprets |
@interpretation |
BuildManifest.interpretations |
The YAMLMerger then merges these into base.yaml, which generate_meta.py uses to rebuild meta.lp.
Error Handling Summary¶
| Stage | Exception | Example |
|---|---|---|
| Parser | ParseError |
Syntax error at line 5 |
| Normalizer | NormalizationError |
issur() expects 2 args |
| Type Checker | TypeCheckError |
Undeclared predicate |
| Emitter | EmitterError |
Invalid character in argument |
| Facade | CompileError |
Wraps all above errors |
| Build Pipeline | BuildError |
Layer ordering violation, naming conflict |
Testing¶
Tests are located in tests/dsl/:
| File | Coverage |
|---|---|
test_parser.py |
Parser and AST construction |
test_normalizer.py |
Surface syntax expansion |
test_type_checker.py |
Registry validation |
test_emitter.py |
ASP code generation |
test_compiler.py |
End-to-end pipeline |
Run all DSL tests: