Skip to content

HLL Compiler Pipeline

Overview

The HLL compiler transforms human-readable Halachic Logic Language into Answer Set Programming (ASP) code for the Clingo solver. The system operates at two levels:

  1. Per-file compilation — The compile_hll() function processes a single .hll file through four stages (parse → normalize → type check → emit)
  2. Build pipeline — The python -m mistaber.dsl.build command orchestrates compilation of all .hll files across the project, merges vocabulary into base.yaml, and generates meta.lp
flowchart TB
    subgraph INPUT["HLL SOURCE CODE"]
        hll["@world(base)<br/>basar(chicken).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
    end

    subgraph S1["STAGE 1: PARSER"]
        parser["• Tokenize HLL source<br/>• Build parse tree (Lark LALR)<br/>• Transform to AST<br/><i>mistaber/dsl/compiler/parser.py</i>"]
    end

    subgraph S2["STAGE 2: NORMALIZER"]
        norm["• Expand surface shortcuts<br/>• basar(X) → food_type(X, basar)<br/>• issur(A,F) → forbidden(W,A,F,ctx)<br/><i>mistaber/dsl/compiler/normalizer.py</i>"]
    end

    subgraph S3["STAGE 3: TYPE CHECKER"]
        tc["• Validate predicates<br/>• Check arity, sorts, enums<br/>• Enforce @makor<br/><i>mistaber/dsl/compiler/type_checker.py</i>"]
    end

    subgraph S4["STAGE 4: EMITTER"]
        emit["• Generate ASP code<br/>• Add metadata (rule IDs, makor)<br/>• Add xclingo2 trace<br/><i>mistaber/dsl/compiler/emitter.py</i>"]
    end

    subgraph OUTPUT["ASP OUTPUT"]
        asp["% World: base<br/>food_type(chicken, basar).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
    end

    INPUT --> S1
    S1 -->|ParseResult| S2
    S2 -->|Normalized| S3
    S3 -->|Validated| S4
    S4 --> OUTPUT

Stage 1: Parser

Location: mistaber/dsl/compiler/parser.py

The parser uses Lark with an LALR(1) grammar to tokenize and parse HLL source code.

Grammar Location

mistaber/dsl/grammar.lark - The complete HLL grammar specification.

AST Types

@dataclass
class Atom:
    """Represents a predicate with arguments."""
    predicate: str
    args: List[str]
    negated: bool = False

@dataclass
class Fact:
    """Unconditional statement: head."""
    predicate: str
    args: List[str]

@dataclass
class Rule:
    """Conditional statement: head :- body."""
    head: Atom
    body: List[Atom]

@dataclass
class ParseResult:
    """Complete parse output."""
    facts: List[Fact]
    rules: List[Rule]
    # Rule metadata directives
    world: Optional[str]                          # @world directive
    rule_id: Optional[str]                        # @rule directive
    sources: Optional[List[Tuple[str, str]]]      # @makor citations
    madrega: Optional[str]                        # @madrega level
    # Vocabulary registration directives
    sorts: List[SortDecl]                         # @sort directives
    subsorts: List[SubsortDecl]                   # @subsort directives
    enums: List[EnumDecl]                         # @enum directives
    declarations: List[PredicateDecl]             # @declare directives
    # World definition directives
    world_defs: List[WorldDef]                    # @world_def directives
    endorsements: List[Endorsement]               # @endorses directives
    interprets: List[InterpretDecl]               # @interprets directives
    interpretations: List[InterpretationDecl]     # @interpretation directives
    # Output and documentation directives
    shows: List[ShowDirective]                    # @show directives
    encoding_notes: List[str]                     # @encoding_note directives
    constraints: List[ConstraintDecl]             # @constraint directives

Directive Processing

Directive Example Stored In
@world(id) @world(base) ParseResult.world
@rule(id) @rule(r_basar_bechalav) ParseResult.rule_id
@makor([...]) @makor([sa("YD:87:1")]) ParseResult.sources
@madrega(level) @madrega(d_oraita) ParseResult.madrega
@sort(name, domain, desc) @sort(food, physical, "...") ParseResult.sorts
@subsort(child, parent) @subsort(beheima, food) ParseResult.subsorts
@enum(sort, [members]) @enum(food_category, [...]) ParseResult.enums
@declare(name, [sorts], ...) @declare(is_food, [food], ...) ParseResult.declarations
@world_def(name, parent) @world_def(mechaber, base) ParseResult.world_defs
@endorses(world, prop, ...) @endorses(gra, issur(...), ...) ParseResult.endorsements
@interprets(comm, auth) @interprets(shach, mechaber) ParseResult.interprets
@interpretation(comm, rule, ...) @interpretation(shach, r_id, ...) ParseResult.interpretations
@show(pred/arity) @show(holds/2) ParseResult.shows
@encoding_note("text") @encoding_note("...") ParseResult.encoding_notes
@constraint(name, cat, desc) @constraint(no_dual, ...) ParseResult.constraints

Error Handling

class ParseError(Exception):
    """Raised for syntax errors, unbalanced parens, invalid tokens."""
    pass

Common parse errors: - Missing period at end of fact/rule - Unbalanced parentheses - Invalid identifier (uppercase where constant expected) - Unrecognized directive

Stage 2: Normalizer

Location: mistaber/dsl/compiler/normalizer.py

The normalizer expands Hebrew-friendly surface syntax to canonical predicates. This stage must run before type checking because the registry only contains canonical predicates.

Surface Syntax Expansions

Food Type Shortcuts

Surface Canonical
basar(X) food_type(X, basar)
chalav(X) food_type(X, chalav)
parve(X) food_type(X, parve)
beheima(X) food_type(X, beheima)
chaya(X) food_type(X, chaya)
of(X) food_type(X, of)
dag(X) food_type(X, dag)
mashkeh(X) food_type(X, mashkeh)
tavlin(X) food_type(X, tavlin)

Status Shortcuts

Surface Canonical
issur(action, food) forbidden(W, action, food, ctx_normal)
mutar(action, food) permitted(W, action, food, ctx_normal)

Where W is taken from @world directive or remains a variable.

World Injection

When a @world directive is present, the normalizer injects the world value into expanded predicates:

@world(rema)
issur(achiila, gevinas_akum).

% Normalizes to:
forbidden(rema, achiila, gevinas_akum, ctx_normal).

Error Handling

class NormalizationError(Exception):
    """Raised for invalid surface syntax usage."""
    pass

Common normalization errors: - Wrong arity: basar(X, Y) (expects 1 arg) - Wrong arity: issur(X) (expects 2 args)

Stage 3: Type Checker

Location: mistaber/dsl/compiler/type_checker.py

The type checker validates the normalized AST against the predicate registry (mistaber/dsl/vocabulary/base.yaml).

Validation Checks

Errors (stop compilation)

Check Description
Missing hebrew/english in @declare Required fields for predicate registration
Undefined sort in @declare signature Sort must exist in registry or local @sort
Arity conflict with existing predicate Same predicate name, different arity already registered
Invalid @sort domain Must be: physical, normative, classification, temporal, meta
Invalid @interpretation action Must be: adds_condition, removes_condition, restricts_scope, expands_scope
Self-referential @world_def parent World cannot be its own parent
Invalid @madrega value Must match registry's madrega_type enum

Warnings (compilation continues)

Check Description
Undeclared predicate Predicate not in registry or local @declare
Arity mismatch Argument count differs from declaration
Invalid enum value Constant not in enumeration
OWA predicate with negation May cause unsafe permissiveness
Normative rule without @makor Source citations expected

Normative Predicates

These predicates require @makor citation: - forbidden - permitted - safek

Type Check Result

@dataclass
class TypeCheckError:
    message: str
    severity: Literal["error", "warning"]
    predicate: str = ""
    line: int = 0

Enum Value Checking

The type checker dynamically reads enum sorts from the registry:

# base.yaml
enumerations:
  madrega_type:
    - d_oraita
    - d_rabanan
    - minhag
    - chumra

If @madrega(invalid_value) is used, an error is generated.

Stage 4: Emitter

Location: mistaber/dsl/compiler/emitter.py

The emitter generates ASP code from the validated AST, adding metadata and safety measures.

Output Structure

% World: base
% Rule ID: r_basar_bechalav_issur

rule(r_basar_bechalav_issur).
scope(r_basar_bechalav_issur, base).
makor(r_basar_bechalav_issur, sa("YD:87:1")).
madrega(r_basar_bechalav_issur, d_oraita).

food(beef).
food_type(beef, beheima).

%!trace_rule {"r_basar_bechalav_issur: action on food is forbidden"}
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).

xclingo2 Trace Annotations

For explainability, the emitter adds trace annotations that xclingo2 uses to generate human-readable explanations:

%!trace_rule {"r_basar_bechalav: action on food is forbidden"}

Security: Injection Prevention

The emitter sanitizes all arguments to prevent ASP code injection:

DANGEROUS_CHARS = {';', '\n', '\r', '%', '.', ':-', '#'}

def _sanitize_argument(arg: str) -> str:
    """Reject arguments containing dangerous characters."""
    for char in [';', '\n', '\r', '%', '#']:
        if char in arg:
            raise EmitterError(f"Invalid character '{char}'")
    # Additional checks for :- and standalone .

This prevents attacks like:

% Attempted injection
is_food("evil). secret(leak".  % Would inject malicious code

Identifier Validation

Identifiers (predicates, worlds, rule IDs) must: - Start with lowercase letter or underscore - Not contain: ;, \n, \r, %, #, ., :, (, )

Facade: compile_hll()

Location: mistaber/dsl/compiler/compiler.py

The compile_hll() function orchestrates the per-file pipeline:

def compile_hll(source: str, return_warnings: bool = False) -> str:
    """
    Pipeline: Parse → Normalize → TypeCheck → Emit

    Raises:
        CompileError: On parse, normalization, type, or emission errors
    """
    # 1. Parse
    parser = HLLParser()
    ast = parser.parse(source)

    # 2. Normalize (MUST be before type checking)
    normalizer = Normalizer()
    normalized_ast = normalizer.normalize(ast)

    # 3. Type check
    checker = TypeChecker()
    errors = checker.check(normalized_ast)

    # 4. Handle errors/warnings
    hard_errors = [e for e in errors if e.severity == "error"]
    if hard_errors:
        raise CompileError(...)

    # 5. Emit
    emitter = ASPEmitter()
    asp = emitter.emit(normalized_ast)

    return asp

Pipeline Order Rationale

Normalize → TypeCheck (not TypeCheck → Normalize)

The registry contains only canonical predicates (food_type, not basar). Surface syntax must be expanded before validation can occur.

Usage Example

from mistaber.dsl.compiler import compile_hll

source = """
@world(base)
@rule(r_test)
@makor([sa("YD:87:1")])
@madrega(d_oraita)

basar(chicken).
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).
"""

asp_code = compile_hll(source)
print(asp_code)

Compile with Warnings

asp_code, warnings = compile_hll(source, return_warnings=True)
for w in warnings:
    print(f"Warning: {w.message}")

Build Pipeline

The build pipeline (python -m mistaber.dsl.build) orchestrates compilation of all .hll files across the project. It is the standard way to compile HLL source — compile_hll() is the per-file API it uses internally.

flowchart TB
    subgraph COLLECT["PHASE 1: COLLECT & PARSE"]
        collect["• Find .hll files by layer order<br/>• Parse each to AST (ParseResult)<br/>• Extract directives into BuildManifest"]
    end

    subgraph MERGE["PHASE 2: REGISTRY MERGE"]
        merge["• YAMLMerger merges directives into base.yaml<br/>• @declare → predicates section<br/>• @sort → sorts section<br/>• @enum → enums section<br/>• @world_def, @interprets, @interpretation → worlds/interpretations"]
    end

    subgraph COMPILE["PHASE 3: COMPILE"]
        comp["• For each .hll: normalize → type_check → emit<br/>• Produces .lp files in ontology/<br/>• .lp passthrough files copied verbatim"]
    end

    subgraph META["PHASE 4: GENERATE META"]
        meta["• Rebuild meta.lp from updated base.yaml<br/>• Sort/predicate/enum membership atoms"]
    end

    subgraph DSL["dsl/"]
        schema["schema/*.hll"]
        base["base/*.hll"]
        worlds["worlds/*.hll"]
        engine["engine/*.hll + *.lp"]
        interp["interpretations/*.hll"]
        corpus["corpus/*.hll"]
    end

    subgraph OUT["ontology/"]
        out_schema["schema/*.lp"]
        out_base["base/*.lp"]
        out_worlds["worlds/*.lp"]
        out_engine["engine/*.lp"]
        out_interp["interpretations/*.lp"]
        out_corpus["corpus/*.lp"]
        out_meta["meta.lp"]
        out_yaml["(base.yaml updated)"]
    end

    DSL --> COLLECT
    COLLECT --> MERGE
    MERGE --> COMPILE
    COMPILE --> META
    META --> OUT

Layer Order

Files are processed in dependency order — each layer may reference sorts and predicates from earlier layers:

  1. schema — Sort definitions, constraints, disjointness
  2. base — Core facts (status, substance, issur_types, madrega, shiur)
  3. worlds — Kripke worlds (base, mechaber, rema, gra, ashk_ah, sefardi_yo)
  4. engine — Reasoning engine (safek, priorities, interpretations, policy)
  5. interpretations — Commentator rules (Shach, Taz)
  6. corpus — Encoded seifim (yd_87/, yd_89/)

Passthrough Files

.lp files in dsl/ (e.g., engine/preferences.lp for asprin directives) are copied verbatim to ontology/. A naming conflict (both foo.hll and foo.lp in the same directory) is a build error.

Atomic Writes

All outputs are written to disk only after all 4 phases succeed. If any phase fails, no files are modified. This prevents partial/inconsistent state in ontology/.

BuildManifest

The BuildManifest aggregates directives from all parsed .hll files:

Directive Aggregated Into
@sort BuildManifest.sorts
@subsort BuildManifest.subsorts
@enum BuildManifest.enums
@declare BuildManifest.declarations
@world_def BuildManifest.world_defs
@interprets BuildManifest.interprets
@interpretation BuildManifest.interpretations

The YAMLMerger then merges these into base.yaml, which generate_meta.py uses to rebuild meta.lp.

Error Handling Summary

Stage Exception Example
Parser ParseError Syntax error at line 5
Normalizer NormalizationError issur() expects 2 args
Type Checker TypeCheckError Undeclared predicate
Emitter EmitterError Invalid character in argument
Facade CompileError Wraps all above errors
Build Pipeline BuildError Layer ordering violation, naming conflict

Testing

Tests are located in tests/dsl/:

File Coverage
test_parser.py Parser and AST construction
test_normalizer.py Surface syntax expansion
test_type_checker.py Registry validation
test_emitter.py ASP code generation
test_compiler.py End-to-end pipeline

Run all DSL tests:

pytest tests/dsl/ -v