|
Lime Parser Generator 0.1.0
Runtime-extensible LALR(1) parser with SIMD tokenization and LLVM JIT
|
Lime is a runtime-extensible LALR(1) parser generator. It reads a context-free grammar and emits a C parser, like Yacc or Bison — but unlike those tools, the generated parser can load and unload grammar extensions at runtime without recompilation.
The generator itself compiles from a single C file with no dependencies. Generated parsers optionally use SIMD-accelerated tokenization (AVX2/NEON) and LLVM JIT compilation for action table lookups.
Database engines, language servers, and extensible query processors need parsers that can evolve without downtime. Adding a new operator, a custom type, or a dialect-specific clause traditionally means editing the grammar, regenerating the parser, and restarting the process.
Lime eliminates that cycle. Grammar extensions are shared libraries loaded at runtime. Conflict detection and disambiguation happen live. The base parser runs at full speed when no extensions are loaded — the extension machinery has zero overhead until activated.
This design is driven by a single observation: no existing parser generator supports runtime grammar modification. Lime fills that gap.
destructor directives prevent semantic value leaks during error recovery. All allocations tracked; zero leaks under Valgrind and ASan.For a detailed comparison with Yacc, Bison, ANTLR, and Menhir, see docs/COMPARISON.md. Migration guides: from Bison · from Yacc · from Flex.
Since v0.2.0 Lime also generates lexers. lime -X foo.lex produces foo_lex.c and foo_lex.h; the generated pair compiles and links with no Lime runtime dependency, the same way the parser side does. The emit callback signature matches ParseLoc, so the entire driver loop for a paired lexer + parser collapses to one LexFeedBytes call.
The lexer is push-driven, reentrant, and zero-globals by construction – no yytext / yyleng / yylineno side channels. Action bodies see typed locals (matched, matched_len, loc, lex, extra, state) and a small set of macros (LEX_EMIT, LEX_TRANSITION, LEX_PUSHBACK, LEX_TERMINATE, LEX_ERROR_AT). Exclusive states carry typed local data; an include-buffer stack (LexInclude) handles ecpg-style splice grammars without yywrap. POSIX-extended regex subset; no PCRE assertions, no captures, no REJECT, no yymore – the PG flex audit (six scanners, ~5,300 lines) found zero uses of any of those.
Reference docs:
.l scanners to Lime .lex, with directive mapping, action-body translation, and a common-gotchas list.man/lime_lex(5)** – .lex file format and runtime API reference.bootscanner.l ported end-to-end as a worked example.Build options:
With -Dllvm=disabled the resulting binaries have zero references to libLLVM.so; jit_is_available() returns false and JIT call sites fall through to the interpreter.
With -Dllvm-static=true meson invokes llvm-config --link-static and links the LLVM component archives directly into the final binary, removing the runtime dependency on libLLVM.so. Expect a 50-80 MB binary size increase and slower link; useful when shipping to hosts that do not have a matching LLVM SONAME installed.
The project root contains the parser generator itself — three files inherited from Lemon/SQLite. The src/ directory contains the runtime extension framework, which is a separate library.
See docs/README.md for the full index. Key documents:
| Document | Description |
|---|---|
| docs/GETTING_STARTED.md | Build Lime, write your first grammar |
| docs/CONCEPTS.md | Snapshots, extensions, conflicts, JIT |
| docs/INTEGRATION.md | Embed Lime in your project |
| docs/EXAMPLES.md | All examples explained |
| docs/API.md | C API reference |
| docs/ARCHITECTURE.md | System design |
| docs/DIAGNOSTICS.md | Parser error messages and recovery |
| docs/EXTENSIONS.md | Writing runtime extensions |
| docs/ALGORITHM.md | LALR(1) theory and implementation |
| docs/PERFORMANCE.md | Performance tuning |
| docs/BENCHMARKS_VS_BISON.md | Head-to-head comparison with Bison |
| docs/COMPARISON.md | Comparison with Yacc, Bison, ANTLR |
Every example lives under examples/ and builds standalone (its own Makefile or meson.build). See docs/EXAMPLES.md for a longer walkthrough of each. Grouped quick reference:
| Example | What it shows |
|---|---|
examples/calc/ | A four-operation calculator extended at runtime with shared-library plugins. The canonical "hello world" for Lime's extension framework. |
examples/plugin_template/ | Minimal skeleton for packaging a Lime-generated parser as a runtime-loadable plugin (sql_plugin.c) and a host application that loads it via ParserManager (plugin_host.c). |
examples/jsonb_extension.c | Single-file walkthrough of MOD_ADD_TOKEN + MOD_ADD_RULE + MOD_MODIFY_PRECEDENCE adding PostgreSQL-style JSONB operators (->, ->>, @>, <@, ?) to an existing SQL parser. |
examples/llm_oracle/ | Custom disambiguation strategy that consults an LLM when Lime's built-in strategies decline to resolve a conflict. Illustrates the disambiguation callback API. |
| Example | What it shows |
|---|---|
examples/datalog/ | Datalog / EDN parser with a hand-rolled tokenizer driving Lime's push parser. Demonstrates the "bring your own lexer" integration pattern. |
examples/jsonpath/ | JSONPath parser converted from PostgreSQL's jsonpath_gram.y / jsonpath_scan.l. Self-contained; does not link against PostgreSQL. |
examples/xpath/, examples/xquery/ | XPath 1.0 and XQuery parsers, each with a standalone driver that reads expressions from stdin or argv and prints the AST. |
examples/mongodb/ | MongoDB query-document parser for expressions like { "age": { "$gt": 25 } }. |
These demonstrate Lime's ability to handle real production grammars by porting PostgreSQL subsystem parsers. They are demos of Lime, not dependencies on PostgreSQL – each is a self-contained standalone parser.
| Example | What it shows |
|---|---|
examples/pg/ | Full PostgreSQL SQL grammar from gram.y (~21,000 lines in upstream) as a single Lime grammar. |
examples/pg_modular/ | The same PostgreSQL grammar decomposed into 35+ literate modules under base/, ddl/, dml/, expr/, from_clause/, select_targets/, functions/, window/, cte/, transactions/, security/, utility/. Exercises Lime's module_name / require / import composition directives. |
examples/bootstrap/ | PostgreSQL BKI (bootstrap) parser from bootparse.y + bootscanner.l – the small grammar used during initdb. |
examples/pgbench/ | pgbench expression-language parser. |
examples/replication/ | Streaming-replication protocol parser from repl_gram.y + repl_scanner.l (IDENTIFY_SYSTEM, START_REPLICATION, etc.). |
examples/syncrep/ | Synchronous-replication config-string parser (synchronous_standby_names). |
examples/isolation/ | Parser for the .spec files driving PostgreSQL's isolation test framework. |
examples/lime_postgres/ | Integration notes specifically for embedding Lime inside PostgreSQL, including EXTENSION_AUTHORING.md, DIALECT_SUPPORT.md, and EMBEDDED_LANGUAGES.md. Documentation, not shipped code. |
| Example | What it shows |
|---|---|
examples/literate/ | Two-file literate grammar (tokens.md + grammar.md) showing the module_name / require system driving a calculator. Companion reading: docs/LITERATE_FORMAT.md and docs/MODULE_FORMAT.md. |
Generate a parser from a grammar file:
Key flags: -d dir (output directory), -T template (custom template), -s (statistics), -L (lint), -F (format). See man lime or lime -x for the full list.
See docs/EXTENSIONS.md and examples/jsonb_extension.c for working examples.
JIT comparison benchmark (LLVM 21):
JIT comparison benchmark (LLVM 21, aarch64-darwin):
| Grammar Size | Interpreted | JIT | Speedup |
|---|---|---|---|
| Small (64 states) | 62 ns | 24 ns | 2.59x |
| Medium (256 states) | 91 ns | 43 ns | 2.13x |
| Large (512 states) | 161 ns | 85 ns | 1.89x |
(Absolute numbers are lower than some published measurements because this is Apple Silicon; on x86_64 with AVX2 the speedup ratios tend to be larger. See docs/BENCHMARKS_VS_BISON.md for head-to-head comparison methodology.)
Extension overhead with no extensions loaded: 26 ns (a single atomic load). With extensions active: ~232 ns for token-level conflict detection, ~222 ns for priority disambiguation, ~456 ns for the full detect-resolve-execute pipeline. See docs/EXTENSION_PERFORMANCE.md and bench/BENCHMARK_RESULTS.md.
Sanitizer builds:
Build: GCC 13+ or Clang 15+, Meson 0.60+, Ninja, pkg-config. Optional: LLVM 14-21 (JIT; verified on 14.0.6 and 21.1.8, expected to build on every release in between via the compat shim in include/jit_llvm_compat.h). lcov/gcovr (coverage), Valgrind, perf. Runtime: pthreads, C11 standard library. LLVM if JIT enabled. Runtime: pthreads, C11 standard library. LLVM if JIT enabled.
All provided by nix develop via flake.nix.
tests/test_<name>.c, add to tests/meson.buildninja -C builddir && meson test -C builddir./scripts/measure_coverage.shLime is derived from the Lemon parser generator by D. Richard Hipp, originally developed as part of the SQLite project. The tokenize.c file in the project root is also from SQLite. Both Lemon and SQLite are released into the public domain.
We are grateful to Dr. Hipp and the SQLite team for creating and maintaining Lemon, and for their commitment to public domain software.
Public Domain