Skip to content

HLL Compiler Pipeline

Overview

The HLL compiler transforms human-readable Halachic Logic Language into Answer Set Programming (ASP) code for the Clingo solver. This is the compilation layer of the Mistaber system, following a classic four-stage architecture.

flowchart TB
    subgraph INPUT["HLL SOURCE CODE"]
        hll["@world(base)<br/>basar(chicken).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
    end

    subgraph S1["STAGE 1: PARSER"]
        parser["• Tokenize HLL source<br/>• Build parse tree (Lark LALR)<br/>• Transform to AST<br/><i>mistaber/dsl/compiler/parser.py</i>"]
    end

    subgraph S2["STAGE 2: NORMALIZER"]
        norm["• Expand surface shortcuts<br/>• basar(X) → food_type(X, basar)<br/>• issur(A,F) → forbidden(W,A,F,ctx)<br/><i>mistaber/dsl/compiler/normalizer.py</i>"]
    end

    subgraph S3["STAGE 3: TYPE CHECKER"]
        tc["• Validate predicates<br/>• Check arity, sorts, enums<br/>• Enforce @makor<br/><i>mistaber/dsl/compiler/type_checker.py</i>"]
    end

    subgraph S4["STAGE 4: EMITTER"]
        emit["• Generate ASP code<br/>• Add metadata (rule IDs, makor)<br/>• Add xclingo2 trace<br/><i>mistaber/dsl/compiler/emitter.py</i>"]
    end

    subgraph OUTPUT["ASP OUTPUT"]
        asp["% World: base<br/>food_type(chicken, basar).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
    end

    INPUT --> S1
    S1 -->|ParseResult| S2
    S2 -->|Normalized| S3
    S3 -->|Validated| S4
    S4 --> OUTPUT

Stage 1: Parser

Location: mistaber/dsl/compiler/parser.py

The parser uses Lark with an LALR(1) grammar to tokenize and parse HLL source code.

Grammar Location

mistaber/dsl/grammar.lark - The complete HLL grammar specification.

AST Types

@dataclass
class Atom:
    """Represents a predicate with arguments."""
    predicate: str
    args: List[str]
    negated: bool = False

@dataclass
class Fact:
    """Unconditional statement: head."""
    predicate: str
    args: List[str]

@dataclass
class Rule:
    """Conditional statement: head :- body."""
    head: Atom
    body: List[Atom]

@dataclass
class ParseResult:
    """Complete parse output."""
    facts: List[Fact]
    rules: List[Rule]
    world: Optional[str]        # @world directive
    rule_id: Optional[str]      # @rule directive
    sources: Optional[List[Tuple[str, str]]]  # @makor citations
    madrega: Optional[str]      # @madrega level

Directive Processing

Directive Example Stored In
@world(id) @world(base) ParseResult.world
@rule(id) @rule(r_basar_bechalav) ParseResult.rule_id
@makor([...]) @makor([sa("YD:87:1")]) ParseResult.sources
@madrega(level) @madrega(d_oraita) ParseResult.madrega

Error Handling

class ParseError(Exception):
    """Raised for syntax errors, unbalanced parens, invalid tokens."""
    pass

Common parse errors: - Missing period at end of fact/rule - Unbalanced parentheses - Invalid identifier (uppercase where constant expected) - Unrecognized directive

Stage 2: Normalizer

Location: mistaber/dsl/compiler/normalizer.py

The normalizer expands Hebrew-friendly surface syntax to canonical predicates. This stage must run before type checking because the registry only contains canonical predicates.

Surface Syntax Expansions

Food Type Shortcuts

Surface Canonical
basar(X) food_type(X, basar)
chalav(X) food_type(X, chalav)
parve(X) food_type(X, parve)
beheima(X) food_type(X, beheima)
chaya(X) food_type(X, chaya)
of(X) food_type(X, of)
dag(X) food_type(X, dag)
mashkeh(X) food_type(X, mashkeh)
tavlin(X) food_type(X, tavlin)

Status Shortcuts

Surface Canonical
issur(action, food) forbidden(W, action, food, ctx_normal)
mutar(action, food) permitted(W, action, food, ctx_normal)

Where W is taken from @world directive or remains a variable.

World Injection

When a @world directive is present, the normalizer injects the world value into expanded predicates:

@world(rema)
issur(achiila, gevinas_akum).

% Normalizes to:
forbidden(rema, achiila, gevinas_akum, ctx_normal).

Error Handling

class NormalizationError(Exception):
    """Raised for invalid surface syntax usage."""
    pass

Common normalization errors: - Wrong arity: basar(X, Y) (expects 1 arg) - Wrong arity: issur(X) (expects 2 args)

Stage 3: Type Checker

Location: mistaber/dsl/compiler/type_checker.py

The type checker validates the normalized AST against the predicate registry (mistaber/dsl/vocabulary/base.yaml).

Validation Checks

  1. Predicate Existence: Is the predicate declared in the registry?
  2. Arity Matching: Does argument count match signature?
  3. Sort Enforcement: Are enum values valid for their position?
  4. Madrega Validation: Is @madrega value valid (d_oraita, d_rabanan, etc.)?
  5. Makor Requirement: Do normative rules have @makor citations?
  6. OWA Negation Warning: Is negation used on open-world predicates?

Normative Predicates

These predicates require @makor citation: - forbidden - permitted - safek

Error vs. Warning

Severity Stops Compilation Example
Error Yes Undeclared predicate, arity mismatch
Warning No OWA predicate negation

Type Check Result

@dataclass
class TypeCheckError:
    message: str
    severity: Literal["error", "warning"]
    predicate: str = ""
    line: int = 0

Enum Value Checking

The type checker dynamically reads enum sorts from the registry:

# base.yaml
enumerations:
  madrega_type:
    - d_oraita
    - d_rabanan
    - minhag
    - chumra

If @madrega(invalid_value) is used, an error is generated.

Stage 4: Emitter

Location: mistaber/dsl/compiler/emitter.py

The emitter generates ASP code from the validated AST, adding metadata and safety measures.

Output Structure

% World: base
% Rule ID: r_basar_bechalav_issur

rule(r_basar_bechalav_issur).
scope(r_basar_bechalav_issur, base).
makor(r_basar_bechalav_issur, sa("YD:87:1")).
madrega(r_basar_bechalav_issur, d_oraita).

food(beef).
food_type(beef, beheima).

%!trace_rule {"r_basar_bechalav_issur: action on food is forbidden"}
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).

xclingo2 Trace Annotations

For explainability, the emitter adds trace annotations that xclingo2 uses to generate human-readable explanations:

%!trace_rule {"r_basar_bechalav: action on food is forbidden"}

Security: Injection Prevention

The emitter sanitizes all arguments to prevent ASP code injection:

DANGEROUS_CHARS = {';', '\n', '\r', '%', '.', ':-', '#'}

def _sanitize_argument(arg: str) -> str:
    """Reject arguments containing dangerous characters."""
    for char in [';', '\n', '\r', '%', '#']:
        if char in arg:
            raise EmitterError(f"Invalid character '{char}'")
    # Additional checks for :- and standalone .

This prevents attacks like:

% Attempted injection
is_food("evil). secret(leak").  % Would inject malicious code

Identifier Validation

Identifiers (predicates, worlds, rule IDs) must: - Start with lowercase letter or underscore - Not contain: ;, \n, \r, %, #, ., :, (, )

Facade: compile_hll()

Location: mistaber/dsl/compiler/compiler.py

The compile_hll() function orchestrates the entire pipeline:

def compile_hll(source: str, return_warnings: bool = False) -> str:
    """
    Pipeline: Parse → Normalize → TypeCheck → Emit

    Raises:
        CompileError: On parse, normalization, type, or emission errors
    """
    # 1. Parse
    parser = HLLParser()
    ast = parser.parse(source)

    # 2. Normalize (MUST be before type checking)
    normalizer = Normalizer()
    normalized_ast = normalizer.normalize(ast)

    # 3. Type check
    checker = TypeChecker()
    errors = checker.check(normalized_ast)

    # 4. Handle errors/warnings
    hard_errors = [e for e in errors if e.severity == "error"]
    if hard_errors:
        raise CompileError(...)

    # 5. Emit
    emitter = ASPEmitter()
    asp = emitter.emit(normalized_ast)

    return asp

Pipeline Order Rationale

Normalize → TypeCheck (not TypeCheck → Normalize)

The registry contains only canonical predicates (food_type, not basar). Surface syntax must be expanded before validation can occur.

Usage Example

from mistaber.dsl.compiler import compile_hll

source = """
@world(base)
@rule(r_test)
@makor([sa("YD:87:1")])
@madrega(d_oraita)

basar(chicken).
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).
"""

asp_code = compile_hll(source)
print(asp_code)

Compile with Warnings

asp_code, warnings = compile_hll(source, return_warnings=True)
for w in warnings:
    print(f"Warning: {w.message}")

Error Handling Summary

Stage Exception Example
Parser ParseError Syntax error at line 5
Normalizer NormalizationError issur() expects 2 args
Type Checker TypeCheckError Undeclared predicate
Emitter EmitterError Invalid character in argument
Facade CompileError Wraps all above errors

Testing

Tests are located in tests/dsl/:

File Coverage
test_parser.py Parser and AST construction
test_normalizer.py Surface syntax expansion
test_type_checker.py Registry validation
test_emitter.py ASP code generation
test_compiler.py End-to-end pipeline

Run all DSL tests:

pytest tests/dsl/ -v