This guide walks through the process of converting a Bison grammar file (.y) to a Lime grammar file (.lime). It covers directive mapping, syntax differences, and common pitfalls, with a worked example based on the PostgreSQL bootstrap parser in examples/bootstrap/.

Directive Mapping

Bison Directive	Lime Equivalent	Notes
`name-prefix "foo"`	`name_prefix foo` or `name foo`	Both accepted. Lime does not parse the dashed form `name-prefix` (its directive tokenizer rejects the dash); the underscore form is a direct alias for `name`.
`union { ... }`	Per-symbol `type`	Lime has no union; each symbol declares its own C type
`token <type> TOK`	`token TOK.`	Lime tokens have no inline type annotation; use `token_type` for the default
`token_type`	(none)	Bison has no equivalent; uses `union` instead
`type <type> sym`	`type sym {CType}`	Curly braces instead of angle brackets
`start sym`	`start sym` or `start_symbol sym`	Both accepted.
`parse-param { T *p }`	`extra_argument {T *p}`	Passed as parameter to all parse calls
`lex-param { T *p }`	(none)	Lime uses push parsing; caller manages the lexer
`pure-parser`	(default)	Lime parsers are always reentrant
`expect N`	`expect N.`	Supported. Lime treats the count as an exact-match assertion (unlike Bison's loose "at most N"): if the actual conflict count differs, `lime` exits non-zero. Currently reports a combined shift/reduce + reduce/reduce total; distinct `expect_shift_reduce` / `expect_reduce_reduce` counters are not yet separated.
`locations`	`locations`	Supported. Combine with `location_type {T}` to declare the type.
`#define YYLLOC_DEFAULT(C, R, N) ...`	`#define YYLLOC_DEFAULT(C, R, N) ...`	Supported. Bison-compatible signature; see Location override below.
`destructor { ... } sym`	`destructor sym { ... }`	Order is reversed
`left TOK1 TOK2`	`left TOK1 TOK2.`	Terminating period required
`right TOK1 TOK2`	`right TOK1 TOK2.`	Terminating period required
`nonassoc TOK1`	`nonassoc TOK1.`	Terminating period required
`prec TOKEN`	`[TOKEN]`	Square brackets at end of rule
`code { ... }`	`include { ... }`	Code included at top of generated file
`defines "file.h"`	Automatic	Lime always generates a `.h` file
`output "file.c"`	`-d dir` flag	Lime uses output directory, not output filename
`verbose`	Default	Lime always generates `.out` report (suppress with `-q`)
`define api.pure full`	(default)	Lime is always pure/reentrant
`define parse.error verbose`	`syntax_error { ... }`	Custom error callback

Grammar Syntax Differences

Rule Syntax

Bison:

result : symbol1 symbol2 { $$ = f($1, $2); }
       | symbol3         { $$ = g($1); }
       ;

Lime (two supported forms):

Expanded – each alternative as a separate rule with its own action:

result(A) ::= symbol1(B) symbol2(C). { A = f(B, C); }

result(A) ::= symbol3(B). { A = g(B); }

**|-alternated** – alternatives share one trailing action:

result(A) ::= symbol1(B) symbol2(C)
            | symbol3(B)              .
            { A = B; /* same action for both */ }

Key differences from Bison:

::= instead of :
Every rule ends with a period (.) before the action
Semantic values use letter labels (A, B, C...) instead of $$, $1, $2
Labels are declared in parentheses after the symbol name
| is accepted in RHS for bison-compat: the trailing action, precedence marker, and {NEVER-REDUCE} flag all propagate to every alternative in the group. Per-alternative actions are not supported – actions are always after the rule-terminating ., not inline per alternative. If the original Bison grammar's alternatives each had different actions, expand them to separate rules (the expanded form above).
Epsilon alternatives (s ::= A | | B .) are accepted; the empty position between |s becomes a rule with zero RHS symbols.

Token Declarations

Bison:

%token <str_val> IDENTIFIER STRING_LITERAL
%token <int_val> INTEGER
%token PLUS MINUS TIMES

Lime:

%token_type {Token}
%token IDENTIFIER.
%token STRING_LITERAL.
%token INTEGER.
%token PLUS.
%token MINUS.
%token TIMES.

Lime uses a single token_type for all tokens. If you need different types per token, use a union or tagged struct as your token_type.

Type Declarations

Bison:

%union {
    int    int_val;
    char  *str_val;
    Node  *node;
}
%type <node> expression statement
%type <int_val> count

Lime:

%type expression {Node *}
%type statement {Node *}
%type count {int}

Each non-terminal gets its own type declaration. There is no shared union.

Precedence with %prec

Bison:

expr : '-' expr %prec UMINUS { $$ = -$2; }

;

Lime:

expr(A) ::= MINUS expr(B). [UMINUS] { A = -B; }

The [UMINUS] in square brackets replaces prec UMINUS.

Error Handling

Bison:

%{
void yyerror(const char *msg) { fprintf(stderr, "%s\n", msg); }
%}

Lime:

%syntax_error {
    fprintf(stderr, "Syntax error near token %s\n",
            yyTokenName[yymajor]);
}
%parse_failure {
    fprintf(stderr, "Parse failure -- giving up\n");
}

Lime provides two separate callbacks: syntax_error fires on each error token, while parse_failure fires when recovery is impossible. Inside these blocks, yymajor is the offending token type, yyminor is its semantic value, and yyTokenName[] maps token codes to strings.

Mid-Rule Actions

Bison supports actions between grammar symbols:

stmt : KW_BEGIN { start_transaction(); }
       block
       KW_END   { end_transaction(); }
     ;

Lime does not support mid-rule actions. Restructure by moving all logic to the final action, or split the rule:

stmt(A) ::= KW_BEGIN block(B) KW_END. {
    start_transaction();
    /* process B */
    end_transaction();
    A = 0;
}

If the mid-rule action produces a value consumed by later symbols, you must introduce a helper non-terminal:

begin_marker(A) ::= KW_BEGIN. { start_transaction(); A = get_txn_id(); }
stmt(A) ::= begin_marker(B) block(C) KW_END. {
    end_transaction(B);
    A = C;
}

Build System Changes

Bison Build

parser.c parser.h: grammar.y

bison -d -o parser.c grammar.y

Lime Build

LIME ?= /path/to/lime
 
parser.c parser.h: grammar.lime $(LIME)
    $(LIME) grammar.lime

Or with a custom template:

parser.c parser.h: grammar.lime $(LIME) limpar.c

$(LIME) -Tlimpar.c grammar.lime

Lime generates the .c and .h files in the same directory as the input grammar by default. Use -d <dir> to redirect output.

Parser Interface

Bison (pull parser):

extern int yyparse(void);
extern FILE *yyin;
 
yyin = fopen("input.txt", "r");
yyparse();

Lime (push parser):

void *parser = ParseAlloc(malloc);
 
/* Feed tokens one at a time */
Parse(parser, TOKEN_TYPE, token_value, extra_arg);
Parse(parser, TOKEN_TYPE2, token_value2, extra_arg);
/* ... */
Parse(parser, 0, zero_value, extra_arg);  /* Signal end of input */
 
ParseFree(parser, free);

The push model means you control the tokenization loop. There is no yylex() callback.

Worked Example: PostgreSQL Bootstrap Parser

The examples/bootstrap/ directory contains a complete conversion of the PostgreSQL BKI bootstrap parser from Bison to Lime. Here are the key transformations:

Original Bison (bootparse.y excerpt)

%name-prefix="boot_yy"
%parse-param {BootParseState *pstate}
 
%union {
    char    *str;
    int      ival;
    Oid      oid;
}
 
%type <ival> boot_query boot_openStmt
%type <str>  boot_ident
%type <oid>  oidspec
 
%token <str>  ID
%token OPEN XCLOSE XCREATE
 
%%
 
boot_openStmt:
    OPEN boot_ident
        {
            boot_do_start(pstate);
            boot_openrel(pstate, $2);
            boot_do_end(pstate);
            $$ = 0;
        }
    ;

Converted Lime (boot_grammar.lime excerpt)

%name boot
%extra_argument {BootParseState *pstate}
%token_type {BootToken}
 
%type boot_query {int}
%type boot_openStmt {int}
%type boot_ident {char *}
%type oidspec {Oid}
 
%token ID.
%token OPEN.
%token XCLOSE.
%token XCREATE.
 
boot_openStmt(A) ::= OPEN boot_ident(B). {
    boot_do_start(pstate);
    boot_openrel(pstate, B);
    boot_do_end(pstate);
    A = 0;
}

What Changed

name-prefix "boot_yy" became name boot
parse-param became extra_argument
union was removed; each symbol gets its own type
$2 became B (named parameter)
$$ became A (LHS result)
The : rule separator became ::=
The ; rule terminator became . (before the action block)
Each | alternative became a separate rule

Common Gotchas

Forgetting the period: Every rule in Lime ends with . before the action. Missing it causes cryptic parse errors in the grammar file.
Token declarations need periods too: Write token FOO. not token FOO.
No | alternatives: Each production is a separate LHS ::= RHS. rule. There is no shorthand for alternatives.
No union: If your Bison grammar uses multiple semantic value types via union, you must either use a tagged union struct as your token_type or restructure to use per-symbol type declarations.
Push vs pull: Lime does not call a yylex() function. You must write the token-feeding loop yourself. This is actually an advantage for integration but requires rethinking the control flow.
**expect is an exact-match assertion**: Unlike Bison's loose "at most N" semantics, Lime's expect N. fails the build when the actual conflict count differs from N (in either direction). Use lime -p to see which conflicts were resolved by precedence rules. Lime currently reports a single combined count rather than separate shift/reduce and reduce/reduce totals.
Extra argument scope: The extra_argument value is available in all action blocks as the variable name you declared. In Bison, parse-param values require explicit access patterns.
Destructor syntax: Lime's destructor puts the symbol name before the code block: destructor sym { free($$); }. Bison puts the code first: destructor { free($$); } sym.
YYLLOC_DEFAULT timing: Lime evaluates YYLLOC_DEFAULT before the action body for the same reduce, stores the result in a per-reduce local, binds @$ / @<lhsalias> to that local, and commits the local's final value to the LHS slot's yyloc after the action body returns. This is Bison's documented ordering. An action body that writes @$ = expr overwrites the default for this reduce's LHS location; an action body that does not write @$ inherits the default unchanged. This is what the ecpg @$ = cat_str(...) source-text-accumulation pattern relies on.

Location override (<tt>YYLLOC_DEFAULT</tt>)

Bison lets a grammar override its built-in location-inheritance rule by #define-ing the macro

#define YYLLOC_DEFAULT(Current, Rhs, N) /* user code */

Lime honors the same signature on every reduce when the macro is defined in the grammar's include { ... } block (or in any header included therein). Semantics match Bison's documented contract:

Current is the LHS location, an lvalue of type YYLOCATIONTYPE.
Rhs is a 0-indexed array such that Rhs[i] for i = 1..N is the i-th RHS symbol's location. Rhs[0] is the location of the slot below the rule on the parser stack (used for the empty-rule fallback). The array is stack-allocated per reduce, sized to fit the longest RHS in the grammar (YYNRHS_MAX, emitted by lime).
N is the number of RHS symbols (0 for empty rules).

YYRHSLOC(Rhs, K) is provided as ((Rhs)[K]) so user macros that follow Bison's documented YYRHSLOC indirection – e.g. Bison's stock default copying first/last positions – also work unmodified.

When YYLLOC_DEFAULT is not defined, Lime applies its built-in rule: for non-empty rules the LHS yyloc is Rhs[1] (via slot reuse), for empty rules it is the lookahead location.

Example mirroring ecpg's source-text concatenation:

%location_type { char * }
 
%include {
#define YYLLOC_DEFAULT(Current, Rhs, N)                  \
    do {                                                 \
        if ((N) > 0) {                                   \
            (Current) = cat_str_n(Rhs, N);               \
        } else {                                         \
            (Current) = strdup("");                      \
        }                                                \
    } while (0)
}

See tests/test_yylloc_default_grammar.y for a self-contained example using struct {int start; int end;} locations.

Push vs pull semantics

Lime is a push-driven parser: the caller feeds tokens into Parse(parser, tok, value, loc) one at a time. Bison is pull-driven: the parser repeatedly calls yylex() to fetch tokens.

This difference is mostly invisible at the grammar level, but two corner cases need explicit support:

Action-time lookahead access (<tt>Parse_get_lookahead</tt> / <tt>Parse_clear_lookahead</tt>)

A Bison action that reads yychar (the pending lookahead) or calls yyclearin to consume it has push-mode equivalents:

int  Parse_get_lookahead(void *parser, ParseTOKENTYPE *value
                         /* , YYLOCATIONTYPE *loc -- when %locations */);
void Parse_clear_lookahead(void *parser);

Returns YYEMPTY (-2) when no Parse() call is in progress. Use case: an empty rule's action body needs to peek at the next token, decide what to do, and possibly tell the parser the token has been consumed. PostgreSQL plpgsql uses this in decl_datatype and a dozen sibling rules. Shipped in P0-NEW-5.

Eager default-reduce drain (<tt>Parse_drain</tt>)

Bison's pull-mode fires default reduces between yylex() calls; Lime's push-mode waits for the next Parse() call to confirm the reduce. When action bodies have side effects (writing to an output stream, mutating shared state) that must precede the lexer's next side effects, the timing difference is observable.

void Parse_drain(void *parser);

Call after each Parse() to fire any pending default reduces. Loops until the parser is in a state that requires a real lookahead. Idempotent past quiescence (multiple calls in a row are safe). Action bodies fired during drain see no lookahead (Parse_get_lookahead returns YYEMPTY).

Driver template for grammars that need Bison-equivalent reduce timing:

while ((tok = lex_next()) != 0) {
    Parse(parser, tok, value, loc);
    Parse_drain(parser);   /* match Bison's pull-mode timing */
}
Parse(parser, 0, eof_value, eof_loc);   /* push EOF */

Motivating example: PostgreSQL ecpg's preprocessor has a lex-time echo_text channel and a reduce-time fprintf channel both writing to the same FILE *. Without Parse_drain, reduces fire AFTER the lexer's next echo, scrambling the output. With Parse_drain, reduces fire BEFORE the lexer's next echo, matching Bison's pull-mode order. Shipped in P0-NEW-8.

See tests/test_drain_grammar.y for a self-contained discriminator: a three-token grammar where the driver appends a space between Parse() calls and the action bodies append the token text; with drain the buffer is "A B C ", without drain it's " A B C".

Quick Reference Card

Task	Bison	Lime
Define a rule	`lhs: rhs { action };`	`lhs(A) ::= rhs(B). { action }`
LHS value	`$$`	`A`
RHS value N	`$N`	`B`, `C`, `D`, ...
Alternative	`lhs: alt1 \\| alt2;`	Two separate `lhs ::=` rules
Precedence override	`prec TOKEN`	`[TOKEN]`
Token type	`union` + `token <member>`	`token_type {Type}`
Non-terminal type	`type <member> sym`	`type sym {Type}`
Parser param	`parse-param {T *p}`	`extra_argument {T *p}`
Error callback	`yyerror()` function	`syntax_error { ... }`
Start symbol	`start sym`	`start sym` or `start_symbol sym`
Generate parser	`bison -d gram.y`	`lime gram.lime`
Invoke parser	`yyparse()`	`Parse(p, token, val, arg)`