|
Lime Parser Generator 0.1.0
Runtime-extensible LALR(1) parser with SIMD tokenization and LLVM JIT
|
This guide walks through the process of converting a Bison grammar file (.y) to a Lime grammar file (.lime). It covers directive mapping, syntax differences, and common pitfalls, with a worked example based on the PostgreSQL bootstrap parser in examples/bootstrap/.
| Bison Directive | Lime Equivalent | Notes |
|---|---|---|
name-prefix "foo" | name_prefix foo or name foo | Both accepted. Lime does not parse the dashed form name-prefix (its directive tokenizer rejects the dash); the underscore form is a direct alias for name. |
union { ... } | Per-symbol type | Lime has no union; each symbol declares its own C type |
token <type> TOK | token TOK. | Lime tokens have no inline type annotation; use token_type for the default |
token_type | (none) | Bison has no equivalent; uses union instead |
type <type> sym | type sym {CType} | Curly braces instead of angle brackets |
start sym | start sym or start_symbol sym | Both accepted. |
parse-param { T *p } | extra_argument {T *p} | Passed as parameter to all parse calls |
lex-param { T *p } | (none) | Lime uses push parsing; caller manages the lexer |
pure-parser | (default) | Lime parsers are always reentrant |
expect N | expect N. | Supported. Lime treats the count as an exact-match assertion (unlike Bison's loose "at most N"): if the actual conflict count differs, lime exits non-zero. Currently reports a combined shift/reduce + reduce/reduce total; distinct expect_shift_reduce / expect_reduce_reduce counters are not yet separated. |
locations | locations | Supported. Combine with location_type {T} to declare the type. |
#define YYLLOC_DEFAULT(C, R, N) ... | #define YYLLOC_DEFAULT(C, R, N) ... | Supported. Bison-compatible signature; see Location override below. |
destructor { ... } sym | destructor sym { ... } | Order is reversed |
left TOK1 TOK2 | left TOK1 TOK2. | Terminating period required |
right TOK1 TOK2 | right TOK1 TOK2. | Terminating period required |
nonassoc TOK1 | nonassoc TOK1. | Terminating period required |
prec TOKEN | [TOKEN] | Square brackets at end of rule |
code { ... } | include { ... } | Code included at top of generated file |
defines "file.h" | Automatic | Lime always generates a .h file |
output "file.c" | -d dir flag | Lime uses output directory, not output filename |
verbose | Default | Lime always generates .out report (suppress with -q) |
define api.pure full | (default) | Lime is always pure/reentrant |
define parse.error verbose | syntax_error { ... } | Custom error callback |
Bison:
Lime (two supported forms):
Expanded – each alternative as a separate rule with its own action:
**|-alternated** – alternatives share one trailing action:
Key differences from Bison:
::= instead of :.) before the action$$, $1, $2| is accepted in RHS for bison-compat: the trailing action, precedence marker, and {NEVER-REDUCE} flag all propagate to every alternative in the group. Per-alternative actions are not supported – actions are always after the rule-terminating ., not inline per alternative. If the original Bison grammar's alternatives each had different actions, expand them to separate rules (the expanded form above).s ::= A | | B .) are accepted; the empty position between |s becomes a rule with zero RHS symbols.Bison:
Lime:
Lime uses a single token_type for all tokens. If you need different types per token, use a union or tagged struct as your token_type.
Bison:
Lime:
Each non-terminal gets its own type declaration. There is no shared union.
Bison:
Lime:
The [UMINUS] in square brackets replaces prec UMINUS.
Bison:
Lime:
Lime provides two separate callbacks: syntax_error fires on each error token, while parse_failure fires when recovery is impossible. Inside these blocks, yymajor is the offending token type, yyminor is its semantic value, and yyTokenName[] maps token codes to strings.
Bison supports actions between grammar symbols:
Lime does not support mid-rule actions. Restructure by moving all logic to the final action, or split the rule:
If the mid-rule action produces a value consumed by later symbols, you must introduce a helper non-terminal:
Or with a custom template:
Lime generates the .c and .h files in the same directory as the input grammar by default. Use -d <dir> to redirect output.
Bison (pull parser):
Lime (push parser):
The push model means you control the tokenization loop. There is no yylex() callback.
The examples/bootstrap/ directory contains a complete conversion of the PostgreSQL BKI bootstrap parser from Bison to Lime. Here are the key transformations:
name-prefix "boot_yy" became name bootparse-param became extra_argumentunion was removed; each symbol gets its own type$2 became B (named parameter)$$ became A (LHS result): rule separator became ::=; rule terminator became . (before the action block)| alternative became a separate rule. before the action. Missing it causes cryptic parse errors in the grammar file.token FOO. not token FOO.| alternatives: Each production is a separate LHS ::= RHS. rule. There is no shorthand for alternatives.union: If your Bison grammar uses multiple semantic value types via union, you must either use a tagged union struct as your token_type or restructure to use per-symbol type declarations.yylex() function. You must write the token-feeding loop yourself. This is actually an advantage for integration but requires rethinking the control flow.expect is an exact-match assertion**: Unlike Bison's loose "at most N" semantics, Lime's expect N. fails the build when the actual conflict count differs from N (in either direction). Use lime -p to see which conflicts were resolved by precedence rules. Lime currently reports a single combined count rather than separate shift/reduce and reduce/reduce totals.extra_argument value is available in all action blocks as the variable name you declared. In Bison, parse-param values require explicit access patterns.destructor puts the symbol name before the code block: destructor sym { free($$); }. Bison puts the code first: destructor { free($$); } sym.YYLLOC_DEFAULT before the action body for the same reduce, stores the result in a per-reduce local, binds @$ / @<lhsalias> to that local, and commits the local's final value to the LHS slot's yyloc after the action body returns. This is Bison's documented ordering. An action body that writes @$ = expr overwrites the default for this reduce's LHS location; an action body that does not write @$ inherits the default unchanged. This is what the ecpg @$ = cat_str(...) source-text-accumulation pattern relies on.Bison lets a grammar override its built-in location-inheritance rule by #define-ing the macro
Lime honors the same signature on every reduce when the macro is defined in the grammar's include { ... } block (or in any header included therein). Semantics match Bison's documented contract:
Current is the LHS location, an lvalue of type YYLOCATIONTYPE.Rhs is a 0-indexed array such that Rhs[i] for i = 1..N is the i-th RHS symbol's location. Rhs[0] is the location of the slot below the rule on the parser stack (used for the empty-rule fallback). The array is stack-allocated per reduce, sized to fit the longest RHS in the grammar (YYNRHS_MAX, emitted by lime).N is the number of RHS symbols (0 for empty rules).YYRHSLOC(Rhs, K) is provided as ((Rhs)[K]) so user macros that follow Bison's documented YYRHSLOC indirection – e.g. Bison's stock default copying first/last positions – also work unmodified.
When YYLLOC_DEFAULT is not defined, Lime applies its built-in rule: for non-empty rules the LHS yyloc is Rhs[1] (via slot reuse), for empty rules it is the lookahead location.
Example mirroring ecpg's source-text concatenation:
See tests/test_yylloc_default_grammar.y for a self-contained example using struct {int start; int end;} locations.
Lime is a push-driven parser: the caller feeds tokens into Parse(parser, tok, value, loc) one at a time. Bison is pull-driven: the parser repeatedly calls yylex() to fetch tokens.
This difference is mostly invisible at the grammar level, but two corner cases need explicit support:
A Bison action that reads yychar (the pending lookahead) or calls yyclearin to consume it has push-mode equivalents:
Returns YYEMPTY (-2) when no Parse() call is in progress. Use case: an empty rule's action body needs to peek at the next token, decide what to do, and possibly tell the parser the token has been consumed. PostgreSQL plpgsql uses this in decl_datatype and a dozen sibling rules. Shipped in P0-NEW-5.
Bison's pull-mode fires default reduces between yylex() calls; Lime's push-mode waits for the next Parse() call to confirm the reduce. When action bodies have side effects (writing to an output stream, mutating shared state) that must precede the lexer's next side effects, the timing difference is observable.
Call after each Parse() to fire any pending default reduces. Loops until the parser is in a state that requires a real lookahead. Idempotent past quiescence (multiple calls in a row are safe). Action bodies fired during drain see no lookahead (Parse_get_lookahead returns YYEMPTY).
Driver template for grammars that need Bison-equivalent reduce timing:
Motivating example: PostgreSQL ecpg's preprocessor has a lex-time echo_text channel and a reduce-time fprintf channel both writing to the same FILE *. Without Parse_drain, reduces fire AFTER the lexer's next echo, scrambling the output. With Parse_drain, reduces fire BEFORE the lexer's next echo, matching Bison's pull-mode order. Shipped in P0-NEW-8.
See tests/test_drain_grammar.y for a self-contained discriminator: a three-token grammar where the driver appends a space between Parse() calls and the action bodies append the token text; with drain the buffer is "A B C ", without drain it's " A B C".
| Task | Bison | Lime |
|---|---|---|
| Define a rule | lhs: rhs { action }; | lhs(A) ::= rhs(B). { action } |
| LHS value | $$ | A |
| RHS value N | $N | B, C, D, ... |
| Alternative | lhs: alt1 \| alt2; | Two separate lhs ::= rules |
| Precedence override | prec TOKEN | [TOKEN] |
| Token type | union + token <member> | token_type {Type} |
| Non-terminal type | type <member> sym | type sym {Type} |
| Parser param | parse-param {T *p} | extra_argument {T *p} |
| Error callback | yyerror() function | syntax_error { ... } |
| Start symbol | start sym | start sym or start_symbol sym |
| Generate parser | bison -d gram.y | lime gram.lime |
| Invoke parser | yyparse() | Parse(p, token, val, arg) |