|
Lime Parser Generator 0.1.0
Runtime-extensible LALR(1) parser with SIMD tokenization and LLVM JIT
|
This document explains the key ideas behind Lime's runtime extension system. For the basic parser generator (grammar → C parser), see Getting Started.
Lime reads a .y grammar file containing production rules and emits a C source file implementing an LALR(1) push-parser. The generated parser exposes three functions:
ParseAlloc() — allocate parser stateParse(parser, token_code, token_value) — feed one tokenParseFree() — release parser stateThis is the same model as Lemon. Everything below is what Lime adds.
A snapshot is an immutable, reference-counted capture of a parser's state tables (action table, goto table, rule table, symbol table). Snapshots are the unit of concurrency: multiple threads can parse simultaneously by acquiring references to the same snapshot.
Snapshots are created from a grammar file or by modifying an existing snapshot (see Extensions below). They are never mutated in place — modification always produces a new snapshot.
An extension is a set of grammar modifications (new tokens, new rules, precedence changes) packaged with metadata (name, version, dependencies). Extensions are registered, then loaded:
Loading an extension applies its modifications to the current snapshot, producing a new snapshot. The old snapshot remains valid for any in-flight parses. Unloading reverses the process.
When two extensions modify the same part of the grammar, a conflict arises. Lime detects conflicts at three levels:
^ as exponentiation vs. XOR).Conflicts are detected automatically when extensions are loaded. They do not prevent loading — they are reported and can be resolved by a disambiguation strategy.
A disambiguation strategy decides which extension "wins" when a conflict occurs during parsing. Lime provides several built-in strategies and supports custom ones:
| Strategy | How it works |
|---|---|
| Priority | Each extension has a numeric priority; highest wins. Fast (~740 ns). |
| Fork-resolve | Clone the parser state, try both interpretations, pick the one that succeeds. More accurate but slower (~1.3 µs). |
| LLM oracle | Query an external LLM API to decide. See examples/llm_oracle/. |
| Custom | Implement the DisambiguationStrategyVTable interface. |
Strategies are pluggable per-extension or per-registry.
After disambiguation selects a winner, the execution policy controls how semantic actions are dispatched:
| Policy | Behavior |
|---|---|
EXEC_FIRST_ONLY | Run only the winning extension's action. |
EXEC_ALL | Run all extensions' actions independently. |
EXEC_CHAIN | Run actions in sequence, piping output. |
Lime's extension system is thread-safe by design:
This means you can load an extension on one thread while other threads are actively parsing with the previous snapshot.
Lime's tokenizer uses SIMD instructions (AVX2 on x86, NEON on ARM) to classify characters in parallel — 32 bytes at a time on AVX2. This delivers 5-10x faster lexing compared to scalar character-by-character scanning. A scalar fallback is always available.
The optional LLVM JIT compiles parser action table lookups into native machine code at runtime. This replaces the table-driven interpreter loop with direct jumps, yielding 2.5-4.2x speedup on the action lookup phase. JIT is most beneficial for large grammars (500+ states) in long-running processes. See JIT_ANALYSIS.md for a detailed cost-benefit analysis.
Lime supports modular grammars via directives:
Modules can be composed with the lime-compose tool, which resolves dependencies via topological sort and merges grammars. See MODULE_FORMAT.md.