HLL Compiler Pipeline¶

Overview¶

The HLL compiler transforms human-readable Halachic Logic Language into Answer Set Programming (ASP) code for the Clingo solver. This is the compilation layer of the Mistaber system, following a classic four-stage architecture.

flowchart TB
    subgraph INPUT["HLL SOURCE CODE"]
        hll["@world(base)<br/>basar(chicken).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
    end

    subgraph S1["STAGE 1: PARSER"]
        parser["• Tokenize HLL source<br/>• Build parse tree (Lark LALR)<br/>• Transform to AST<br/><i>mistaber/dsl/compiler/parser.py</i>"]
    end

    subgraph S2["STAGE 2: NORMALIZER"]
        norm["• Expand surface shortcuts<br/>• basar(X) → food_type(X, basar)<br/>• issur(A,F) → forbidden(W,A,F,ctx)<br/><i>mistaber/dsl/compiler/normalizer.py</i>"]
    end

    subgraph S3["STAGE 3: TYPE CHECKER"]
        tc["• Validate predicates<br/>• Check arity, sorts, enums<br/>• Enforce @makor<br/><i>mistaber/dsl/compiler/type_checker.py</i>"]
    end

    subgraph S4["STAGE 4: EMITTER"]
        emit["• Generate ASP code<br/>• Add metadata (rule IDs, makor)<br/>• Add xclingo2 trace<br/><i>mistaber/dsl/compiler/emitter.py</i>"]
    end

    subgraph OUTPUT["ASP OUTPUT"]
        asp["% World: base<br/>food_type(chicken, basar).<br/>forbidden(W, achiila, M, ctx_normal) :- ..."]
    end

    INPUT --> S1
    S1 -->|ParseResult| S2
    S2 -->|Normalized| S3
    S3 -->|Validated| S4
    S4 --> OUTPUT

Stage 1: Parser¶

Location: mistaber/dsl/compiler/parser.py

The parser uses Lark with an LALR(1) grammar to tokenize and parse HLL source code.

Grammar Location¶

mistaber/dsl/grammar.lark - The complete HLL grammar specification.

AST Types¶

@dataclass
class Atom:
    """Represents a predicate with arguments."""
    predicate: str
    args: List[str]
    negated: bool = False

@dataclass
class Fact:
    """Unconditional statement: head."""
    predicate: str
    args: List[str]

@dataclass
class Rule:
    """Conditional statement: head :- body."""
    head: Atom
    body: List[Atom]

@dataclass
class ParseResult:
    """Complete parse output."""
    facts: List[Fact]
    rules: List[Rule]
    world: Optional[str]        # @world directive
    rule_id: Optional[str]      # @rule directive
    sources: Optional[List[Tuple[str, str]]]  # @makor citations
    madrega: Optional[str]      # @madrega level

Directive Processing¶

Directive	Example	Stored In
`@world(id)`	`@world(base)`	`ParseResult.world`
`@rule(id)`	`@rule(r_basar_bechalav)`	`ParseResult.rule_id`
`@makor([...])`	`@makor([sa("YD:87:1")])`	`ParseResult.sources`
`@madrega(level)`	`@madrega(d_oraita)`	`ParseResult.madrega`

Error Handling¶

class ParseError(Exception):
    """Raised for syntax errors, unbalanced parens, invalid tokens."""
    pass

Common parse errors: - Missing period at end of fact/rule - Unbalanced parentheses - Invalid identifier (uppercase where constant expected) - Unrecognized directive

Stage 2: Normalizer¶

Location: mistaber/dsl/compiler/normalizer.py

The normalizer expands Hebrew-friendly surface syntax to canonical predicates. This stage must run before type checking because the registry only contains canonical predicates.

Surface Syntax Expansions¶

Food Type Shortcuts¶

Surface	Canonical
`basar(X)`	`food_type(X, basar)`
`chalav(X)`	`food_type(X, chalav)`
`parve(X)`	`food_type(X, parve)`
`beheima(X)`	`food_type(X, beheima)`
`chaya(X)`	`food_type(X, chaya)`
`of(X)`	`food_type(X, of)`
`dag(X)`	`food_type(X, dag)`
`mashkeh(X)`	`food_type(X, mashkeh)`
`tavlin(X)`	`food_type(X, tavlin)`

Status Shortcuts¶

Surface	Canonical
`issur(action, food)`	`forbidden(W, action, food, ctx_normal)`
`mutar(action, food)`	`permitted(W, action, food, ctx_normal)`

Where W is taken from @world directive or remains a variable.

World Injection¶

When a @world directive is present, the normalizer injects the world value into expanded predicates:

@world(rema)
issur(achiila, gevinas_akum).

% Normalizes to:
forbidden(rema, achiila, gevinas_akum, ctx_normal).

Error Handling¶

class NormalizationError(Exception):
    """Raised for invalid surface syntax usage."""
    pass

Common normalization errors: - Wrong arity: basar(X, Y) (expects 1 arg) - Wrong arity: issur(X) (expects 2 args)

Stage 3: Type Checker¶

Location: mistaber/dsl/compiler/type_checker.py

The type checker validates the normalized AST against the predicate registry (mistaber/dsl/vocabulary/base.yaml).

Validation Checks¶

Predicate Existence: Is the predicate declared in the registry?
Arity Matching: Does argument count match signature?
Sort Enforcement: Are enum values valid for their position?
Madrega Validation: Is @madrega value valid (d_oraita, d_rabanan, etc.)?
Makor Requirement: Do normative rules have @makor citations?
OWA Negation Warning: Is negation used on open-world predicates?

Normative Predicates¶

These predicates require @makor citation: - forbidden - permitted - safek

Error vs. Warning¶

Severity	Stops Compilation	Example
Error	Yes	Undeclared predicate, arity mismatch
Warning	No	OWA predicate negation

Type Check Result¶

@dataclass
class TypeCheckError:
    message: str
    severity: Literal["error", "warning"]
    predicate: str = ""
    line: int = 0

Enum Value Checking¶

The type checker dynamically reads enum sorts from the registry:

# base.yaml
enumerations:
  madrega_type:
    - d_oraita
    - d_rabanan
    - minhag
    - chumra

If @madrega(invalid_value) is used, an error is generated.

Stage 4: Emitter¶

Location: mistaber/dsl/compiler/emitter.py

The emitter generates ASP code from the validated AST, adding metadata and safety measures.

Output Structure¶

% World: base
% Rule ID: r_basar_bechalav_issur

rule(r_basar_bechalav_issur).
scope(r_basar_bechalav_issur, base).
makor(r_basar_bechalav_issur, sa("YD:87:1")).
madrega(r_basar_bechalav_issur, d_oraita).

food(beef).
food_type(beef, beheima).

%!trace_rule {"r_basar_bechalav_issur: action on food is forbidden"}
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).

xclingo2 Trace Annotations¶

For explainability, the emitter adds trace annotations that xclingo2 uses to generate human-readable explanations:

%!trace_rule {"r_basar_bechalav: action on food is forbidden"}

Security: Injection Prevention¶

The emitter sanitizes all arguments to prevent ASP code injection:

DANGEROUS_CHARS = {';', '\n', '\r', '%', '.', ':-', '#'}

def _sanitize_argument(arg: str) -> str:
    """Reject arguments containing dangerous characters."""
    for char in [';', '\n', '\r', '%', '#']:
        if char in arg:
            raise EmitterError(f"Invalid character '{char}'")
    # Additional checks for :- and standalone .

This prevents attacks like:

% Attempted injection
is_food("evil). secret(leak").  % Would inject malicious code

Identifier Validation¶

Identifiers (predicates, worlds, rule IDs) must: - Start with lowercase letter or underscore - Not contain: ;, \n, \r, %, #, ., :, (, )

Facade: compile_hll()¶

Location: mistaber/dsl/compiler/compiler.py

The compile_hll() function orchestrates the entire pipeline:

def compile_hll(source: str, return_warnings: bool = False) -> str:
    """
    Pipeline: Parse → Normalize → TypeCheck → Emit

    Raises:
        CompileError: On parse, normalization, type, or emission errors
    """
    # 1. Parse
    parser = HLLParser()
    ast = parser.parse(source)

    # 2. Normalize (MUST be before type checking)
    normalizer = Normalizer()
    normalized_ast = normalizer.normalize(ast)

    # 3. Type check
    checker = TypeChecker()
    errors = checker.check(normalized_ast)

    # 4. Handle errors/warnings
    hard_errors = [e for e in errors if e.severity == "error"]
    if hard_errors:
        raise CompileError(...)

    # 5. Emit
    emitter = ASPEmitter()
    asp = emitter.emit(normalized_ast)

    return asp

Pipeline Order Rationale¶

Normalize → TypeCheck (not TypeCheck → Normalize)

The registry contains only canonical predicates (food_type, not basar). Surface syntax must be expanded before validation can occur.

Usage Example¶

from mistaber.dsl.compiler import compile_hll

source = """
@world(base)
@rule(r_test)
@makor([sa("YD:87:1")])
@madrega(d_oraita)

basar(chicken).
forbidden(W, achiila, M, ctx_normal) :- mixture_is_basar_bechalav(M).
"""

asp_code = compile_hll(source)
print(asp_code)

Compile with Warnings¶

asp_code, warnings = compile_hll(source, return_warnings=True)
for w in warnings:
    print(f"Warning: {w.message}")

Error Handling Summary¶

Stage	Exception	Example
Parser	`ParseError`	Syntax error at line 5
Normalizer	`NormalizationError`	issur() expects 2 args
Type Checker	`TypeCheckError`	Undeclared predicate
Emitter	`EmitterError`	Invalid character in argument
Facade	`CompileError`	Wraps all above errors

Testing¶

Tests are located in tests/dsl/:

File	Coverage
`test_parser.py`	Parser and AST construction
`test_normalizer.py`	Surface syntax expansion
`test_type_checker.py`	Registry validation
`test_emitter.py`	ASP code generation
`test_compiler.py`	End-to-end pipeline

Run all DSL tests:

pytest tests/dsl/ -v