|
Lime Parser Generator 0.1.0
Runtime-extensible LALR(1) parser with SIMD tokenization and LLVM JIT
|
This document provides a comprehensive analysis of the performance impact of the extensible grammar framework in the Lime parser generator. It covers overhead breakdowns by component, memory costs, CPU costs per token, scaling behavior with increasing extension counts, JIT interaction, and tuning recommendations.
All measurements referenced in this document were collected on a Linux x86_64 system (GCC 14.3, C11, -O2) using the benchmark suites in bench/. Numbers are representative; your hardware will vary.
The extension framework adds four major subsystems to the parsing pipeline:
Each subsystem introduces overhead that is only incurred when extensions are actually loaded and when conflicts are actually encountered. The key design principle is zero overhead when no extensions are loaded: the parser operates identically to a standard Lemon/Lime-generated parser.
The ExtensionRegistry (src/extension_registry.c) manages extension metadata, dependency resolution, and conflict declarations.
| Operation | Complexity | Typical Cost | When Incurred |
|---|---|---|---|
extension_registry_create | O(1) | ~200 ns | Once at startup |
extension_registry_register | O(1) amortized | ~500 ns | Once per extension load |
extension_registry_find | O(1) | ~50 ns | Per conflict detection query |
extension_registry_check_dependencies | O(V + E) | ~2-50 us | Once after all extensions registered |
extension_registry_get_order | O(V + E) | ~5-60 us | Once to determine load order |
extension_registry_unregister | O(1) amortized | ~300 ns | On extension unload |
extension_registry_destroy | O(N) | ~100 ns per ext | Once at shutdown |
Implementation details:
Key insight: Registry operations are almost entirely startup costs. During parsing, the registry is read-only (protected by pthread_rwlock read lock), so multiple parser threads proceed concurrently with no contention.
The conflict_detector module (src/conflict_detector.c) identifies ambiguities at three levels: token, rule, and semantic.
| Detection Phase | Complexity | Typical Cost | Notes |
|---|---|---|---|
| Token conflicts | O(N * M) | ~5-50 us | N=extensions, M=tokens/ext |
| Rule conflicts | O(N * M * T) | ~10-100 us | T=distinct token codes |
| Semantic conflicts | O(N^2 * R^2) | ~20-500 us | R=rules/ext; pairwise compare |
| Full scan | Sum of above | ~50-650 us | Called once after load |
Token-level detection uses a TokenCollector with linear scan for grouping. For typical extension counts (< 20), this is fast despite O(N*M) complexity.
Rule-level detection requires scanning all loaded extensions for each distinct token code, checking whether any extension defines rules that reference the token. Cost grows linearly with the number of distinct tokens across all extensions.
Semantic-level detection is the most expensive: it performs pairwise comparison of all MOD_ADD_RULE modifications across loaded extensions, checking for identical LHS+RHS with different action code. This is O(N^2 * R^2) in the worst case but is bounded by the FORK_RESOLVE_MAX_FORKS limit (16) in practice.
Key insight: Conflict detection is a one-time cost incurred at extension load time. It does not run on the per-token parsing hot path.
The DisambiguationContext (src/disambiguation.c) dispatches conflict resolution to a pluggable strategy via vtable.
| Operation | Cost | Notes |
|---|---|---|
disambiguation_create | ~1-5 us | Creates strategy context |
disambiguation_resolve | ~100-5000 ns | Depends on strategy (see below) |
disambiguation_update | ~10-50 ns | No-op for priority/fork-resolve |
disambiguation_destroy | ~50-200 ns | Frees strategy context |
The framework overhead is minimal: it is essentially a vtable dispatch plus a malloc for the StrategyResult.
The ParseFork system (src/parser_fork.c) provides deep-copy of parser state for the fork-resolve disambiguation strategy.
| Operation | Typical Cost | Notes |
|---|---|---|
fork_parser (full clone) | ~1-10 us | Depends on stack depth |
clone_parser_state (core) | ~0.5-5 us | malloc + memcpy of parser+stack |
feed_token_to_fork | ~20-50 ns | Per token per fork |
parse_fork_set_create | ~100 ns | Allocates fork pointer array |
parse_fork_set_prune | ~50-200 ns | Linear scan + free |
parse_fork_set_best | ~50-100 ns | Linear scan |
free_parse_fork | ~200-500 ns | Frees parser state + stack |
parse_fork_set_destroy | ~1-5 us | Frees all forks |
parser_fork_next_id | ~5-10 ns | Atomic fetch-add (relaxed) |
Cloning cost breakdown for a typical parser (2 KB parser struct, 4 KB stack):
| Step | Bytes Copied | Estimated Cost |
|---|---|---|
| malloc parser struct | - | ~50 ns |
| memcpy parser struct | ~2 KB | ~100 ns |
| malloc stack buffer | - | ~50 ns |
| memcpy stack (used portion) | ~1-4 KB | ~100-400 ns |
| Pointer fixup (3 writes) | 24 B | ~10 ns |
| snapshot_acquire (atomic) | - | ~35 ns |
| Total | **~350-650 ns** |
The clone always produces a heap-allocated stack regardless of whether the source parser uses the inline yystk0[] buffer. This simplifies lifecycle management at the cost of an extra malloc for parsers that are still using the inline stack.
Each registered extension consumes memory in the registry:
| Component | Size | Notes |
|---|---|---|
RegistryEntry struct | ~200 B | Includes deep-copied metadata |
| Name string (strdup) | ~10-50 B | Extension name |
| Version string (strdup) | ~6-20 B | Semver string |
requires array (strdup each) | ~50-200 B | NULL-terminated string array |
conflicts_with array (strdup) | ~50-200 B | NULL-terminated string array |
| Hash table entry | ~24 B | Key pointer + index + bool |
| Typical total per extension | **~350-700 B** |
Each parser fork during disambiguation consumes:
| Component | Size | Notes |
|---|---|---|
ParseFork struct | ~128 B | Status, priority, counters |
Cloned parser state (state_data) | ~2-8 KB | Grammar-dependent |
Cloned stack (stack_data) | ~2-32 KB | Depth-dependent |
| Snapshot reference | 0 B (shared) | Only refcount increment |
| Typical total per fork | **~4-40 KB** |
The fork-resolve strategy creates up to FORK_RESOLVE_MAX_FORKS (16) forks per conflict point, so peak memory for a single conflict resolution is approximately:
This memory is freed immediately after the conflict is resolved.
For a deployment with N extensions and F concurrent fork sets:
| Component | Formula | Example (10 ext, 1 fork set) |
|---|---|---|
| Extension registry | ~700 B * N + ~4 KB base | ~11 KB |
| Conflict detection result | ~200 B * conflicts | ~2 KB |
| Disambiguation context | ~100 B per strategy | ~100 B |
| Fork set (peak) | ~40 KB * 16 * F | ~640 KB |
| Snapshot (action tables) | ~150-300 KB per snapshot | ~300 KB |
| JIT context (if compiled) | ~50-500 KB per snapshot | ~200 KB |
| Total extension overhead | **~1.15 MB** |
Compare this to the base parser memory (no extensions): ~500 KB - 1 MB for a typical deployment with 1 snapshot, 1 JIT context, and 4 parser threads (see docs/PERFORMANCE.md). Extensions roughly double the memory footprint at peak, but the fork memory is transient.
When no extensions are loaded, the parser follows the standard table-driven (or JIT-compiled) path with zero extension overhead:
| Grammar Size | Interpreted (ns/token) | JIT (ns/token) |
|---|---|---|
| Small (64 states, 32 terminals) | ~8-12 | ~3-5 |
| Medium (256 states, 100 terminals) | ~12-18 | ~4-6 |
| Large (512 states, 150 terminals) | ~15-25 | ~4-7 |
These numbers are derived from the jit_comparison benchmark:
When extensions are loaded but no conflicts arise during parsing, the overhead comes from:
Estimated per-token overhead: < 1 ns (effectively zero).
When a token triggers a conflict between multiple extensions, the disambiguation framework is invoked. Per-token overhead depends on the strategy:
| Strategy | Per-conflict Cost | Amortized Per-token | Notes |
|---|---|---|---|
| Priority | ~100-300 ns | ~2-6 ns | Linear scan of contexts |
| Fork-resolve | ~2-50 us | ~40-1000 ns | Creates and evaluates forks |
The "amortized per-token" assumes conflicts occur on ~5% of tokens in a typical extension-heavy workload.
The performance impact varies significantly by conflict type:
Cause: Multiple extensions define the same lexeme as different tokens.
| Metric | Value | Notes |
|---|---|---|
| Detection cost | ~5-50 us | Linear scan of all extension tokens |
| Resolution cost | ~100-300 ns | Priority comparison |
| Memory cost | ~200 B | ConflictPoint + LimeContexts |
| Frequency | Low | Caught at load time |
Token conflicts are detected and resolved at extension load time, not during parsing. They have no per-token runtime cost.
Cause: Multiple grammars can parse the same token sequence.
| Metric | Value | Notes |
|---|---|---|
| Detection cost | ~10-100 us | Per distinct token code |
| Resolution cost | ~200 ns - 50 us | Depends on strategy |
| Memory cost | ~500 B | ConflictPoint + contexts |
| Frequency | Medium | May occur at parse time |
Rule conflicts may be detected statically (at load time) or dynamically (when the parser encounters an ambiguous state). Dynamic detection adds per-token overhead only for the specific states where conflicts exist.
Cause: Identical rule structure but different reduction actions.
| Metric | Value | Notes |
|---|---|---|
| Detection cost | ~20-500 us | Pairwise O(N^2 * R^2) comparison |
| Resolution cost | ~5-50 us | Usually requires fork-resolve |
| Memory cost | ~1 KB | ConflictPoint + fork set |
| Frequency | Low | Detected at load time |
Semantic conflicts are the most expensive to detect but also the rarest. The pairwise comparison algorithm (detect_semantic_conflicts) examines every rule pair across every extension pair. For N=10 extensions with R=20 rules each, this is 10*9/2 * 20*20 = 18,000 comparisons, which completes in ~200-500 us.
The priority strategy (src/strategy_priority.c) resolves conflicts by comparing numeric priority values.
| Metric | Value | Notes |
|---|---|---|
| Init cost | ~500 ns | Builds priority table |
| Resolve cost | ~100-300 ns | Linear scan of contexts |
| Update cost | ~10 ns | No-op (static strategy) |
| Destroy cost | ~50 ns | Frees priority table |
| Memory per resolution | ~100 B | StrategyResult + explanation |
| Confidence | Always 1.0 | Deterministic |
The priority strategy's resolve function performs a linear scan of the ConflictPoint.contexts array (typically 2-8 entries) comparing priority values. The inner loop has no allocations and no hash lookups.
Complexity: O(C) where C is the number of conflicting contexts.
The fork-resolve strategy (src/strategy_fork_resolve.c) creates parser forks for each candidate interpretation and evaluates them.
| Metric | Value | Notes |
|---|---|---|
| Init cost | ~1-2 us | Builds extension priority map |
| Resolve cost (static) | ~2-10 us | No parser state to clone |
| Resolve cost (runtime) | ~5-50 us | Clones parser state per fork |
| Update cost | ~10 ns | No-op (currently) |
| Destroy cost | ~100 ns | Frees context + entries |
| Max forks per conflict | 16 | FORK_RESOLVE_MAX_FORKS |
| Default lookahead | 3 tokens | FORK_RESOLVE_DEFAULT_LOOKAHEAD |
| Max lookahead | 128 tokens | FORK_RESOLVE_MAX_LOOKAHEAD |
Static mode (no live parser): The strategy creates lightweight ParseFork structs (just calloc + metadata) and simulates token feeding. Each fork is ~128 bytes. This mode is used during pre-parse conflict analysis.
Runtime mode (live parser available): The strategy calls fork_parser() to deep-clone the parser state for each candidate. Cloning cost is dominated by the memcpy of the parser struct and stack. For a typical parser with a 4 KB stack at depth 20:
Tiebreaker overhead is negligible. All three rules (priority, longest-match, first-complete) are O(F) scans where F is the number of completed forks:
| Tiebreaker Rule | Cost (16 forks) | Notes |
|---|---|---|
| TIEBREAK_PRIORITY | ~30-50 ns | Compare priority + error count |
| TIEBREAK_LONGEST_MATCH | ~30-50 ns | Compare tokens_consumed |
| TIEBREAK_FIRST_COMPLETE | ~20-40 ns | Compare fork_id |
| Aspect | Priority | Fork-Resolve |
|---|---|---|
| Resolution cost | ~100-300 ns | ~2-50 us |
| Accuracy | Low (static) | High (dynamic) |
| Memory per resolve | ~100 B | ~4-640 KB |
| Deterministic | Yes | Yes |
| Needs parser state | No | Optional |
| Learns over time | No | No (future) |
| Best for | Known priorities | Unknown/dynamic conflicts |
Loading an extension creates a new ParserSnapshot with modified action tables. The JIT context is bound to a specific snapshot, so:
This means there is a warmup period after each extension load during which parsing runs on the interpreted path.
The JIT break-even point depends on the compilation cost and the per-parse savings. Loading extensions affects both:
| Scenario | Compile Cost | Per-Parse Savings | Break-Even |
|---|---|---|---|
| Baseline (no extensions) | ~10-50 ms | ~1-5 us | ~5K-50K parses |
| After 1 extension load | ~10-50 ms | ~1-5 us | ~5K-50K parses |
| After 5 extension loads | ~15-60 ms | ~1-5 us | ~6K-60K parses |
| Frequent extension loads | Repeated | Reduced window | May not break even |
Key insight: If extensions are loaded/unloaded frequently, the JIT never reaches break-even because each load resets the warmup period. For best JIT utilization:
Extension loading increases the effective grammar size (more states, more terminals). JIT compilation cost scales with grammar complexity:
| Grammar Config | States | Terminals | Compile Time | Code Size |
|---|---|---|---|---|
| Base grammar | 64 | 32 | ~5-10 ms | ~6-12 KB |
| + 1 small extension | 80 | 40 | ~7-14 ms | ~8-16 KB |
| + 5 extensions | 150 | 80 | ~15-30 ms | ~15-30 KB |
| + 10 extensions | 256 | 100 | ~25-50 ms | ~25-50 KB |
| Full SQL + 10 ext | 512 | 150 | ~40-80 ms | ~50-100 KB |
Compilation cost is approximately linear in nstate * nterminal because the JIT generates one function per state, each with a switch over terminal values.
| Metric | Value | Delta from Baseline |
|---|---|---|
| Registry overhead | ~700 B | +700 B |
| Snapshot size | ~300 KB | ~0% (tables merged) |
| Conflict detection | ~5 us | One-time |
| Per-token overhead (no conflict) | < 1 ns | ~0% |
| JIT recompilation | ~10-50 ms | One-time |
| Metric | Value | Delta from Baseline |
|---|---|---|
| Registry overhead | ~3.5 KB | +3.5 KB |
| Snapshot size | ~310-350 KB | +3-15% |
| Conflict detection (full) | ~50-200 us | One-time |
| Dependency resolution | ~5-15 us | One-time |
| Potential token conflicts | 0-3 | Depends on overlap |
| Per-token overhead (no conflict) | < 1 ns | ~0% |
| Per-conflict resolution | ~200 ns - 10 us | Strategy-dependent |
| Metric | Value | Delta from Baseline |
|---|---|---|
| Registry overhead | ~7 KB | +7 KB |
| Snapshot size | ~320-400 KB | +7-30% |
| Conflict detection (full) | ~100-500 us | One-time |
| Dependency resolution | ~10-40 us | One-time |
| Potential token conflicts | 0-10 | Higher overlap |
| Semantic conflict detection | ~100-300 us | O(N^2) pairwise |
| Per-token overhead (no conflict) | < 1 ns | ~0% |
| Per-conflict resolution | ~200 ns - 50 us | Strategy-dependent |
| Metric | Value | Delta from Baseline |
|---|---|---|
| Registry overhead | ~35 KB | +35 KB |
| Snapshot size | ~400-600 KB | +30-100% |
| Conflict detection (full) | ~1-5 ms | One-time |
| Dependency resolution | ~50-200 us | One-time |
| Semantic conflict detection | ~2-10 ms | O(N^2) dominates |
| Hash table (registry) | ~128 buckets | Auto-resized |
| Per-token overhead (no conflict) | < 2 ns | ~0% |
| Per-conflict resolution | ~500 ns - 100 us | More contexts |
At 50 extensions, the dominant costs are:
| Extensions | Registry | Conflict Detect | Semantic Detect | Snapshot | Per-token |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | ~296 KB | 0 |
| 1 | ~700 B | ~5 us | N/A | ~300 KB | < 1 ns |
| 5 | ~3.5 KB | ~50-200 us | ~20-50 us | ~330 KB | < 1 ns |
| 10 | ~7 KB | ~100-500 us | ~100-300 us | ~370 KB | < 1 ns |
| 50 | ~35 KB | ~1-5 ms | ~2-10 ms | ~500 KB | < 2 ns |
Observation: Extension overhead scales sub-linearly for most operations. The exception is semantic conflict detection, which is O(N^2) and should be profiled when using > 20 extensions.
The fork-resolve strategy supports three tiebreaker rules. Their performance is nearly identical because all are simple linear scans:
| Rule | Algorithm | Cost (16 forks) |
|---|---|---|
TIEBREAK_PRIORITY | Scan for min priority, sub-break by error count then tokens | ~40 ns |
TIEBREAK_LONGEST_MATCH | Scan for max tokens_consumed, sub-break by priority | ~40 ns |
TIEBREAK_FIRST_COMPLETE | Scan for min fork_id | ~30 ns |
The tiebreaker is invoked once per conflict resolution after fork evaluation completes. It operates on the completed-forks subset (typically 1-4 forks survive out of the initial set).
extension_registry_get_order() to determine the correct load order. Loading in dependency order avoids redundant conflict re-evaluation.STRAT_PRIORITY. This is 100-1000x faster than fork-resolve.FORK_RESOLVE_MAX_FORKS of 16 is reasonable for most workloads. Reducing it to 4-8 significantly reduces peak memory and resolution time with minimal accuracy loss.conflicts_with declarations. Declaring known conflicts in extension metadata allows the registry to reject incompatible combinations at registration time, before expensive conflict detection runs.conflict_threshold appropriately. Extensions with conflict_threshold = 0.0 reject any conflict; 1.0 tolerates all. Tuning this per-extension avoids unnecessary fork-resolve invocations.Extension overhead can be measured by comparing baseline benchmark results (no extensions) against results with extensions loaded:
| Metric | How to Measure |
|---|---|
| Per-token parse time | ns_per_op / tokens_per_parse |
| Conflict detection time | Instrument detect_all_multi_grammar_conflicts() |
| Fork-resolve time | Instrument fork_resolve_resolve() |
| Snapshot clone time | Instrument clone_parser_state() |
| JIT break-even | Compare total interpreted vs JIT time |
Approximate sizes of key structures (x86_64, 64-bit pointers):
| Structure | Size | Notes |
|---|---|---|
ExtensionRegistry | ~32 B | Plus dynamic entries + hash |
RegistryEntry | ~200 B | Includes GrammarExtensionMetadata |
GrammarExtensionMetadata | ~120 B | Pointers to strings + arrays |
HashEntry | ~24 B | Key pointer + index + bool |
ConflictPoint | ~48 B | Plus dynamic contexts array |
LimeContext | ~32 B | Token, state, ext_id, priority |
MultiGrammarConflictResult | ~32 B | Plus dynamic points array |
DisambiguationContext | ~200 B | vtable + strategy context |
StrategyResult | ~32 B | Plus malloc'd arrays |
ForkResolveContext | ~80 B | Plus priority entries |
PriorityContext | ~24 B | Plus priority entries |
ParseFork | ~128 B | Plus cloned state + stack |
ClonedParserState | ~48 B | Plus malloc'd state + stack |
ParseForkSet | ~24 B | Plus fork pointer array |
ParserSnapshot | ~128 B | Plus action tables |
Last updated: 2026-04-29 Reference implementation: src/extension_registry.c, src/strategy_fork_resolve.c, src/parser_fork.c, src/conflict_detector.c, src/disambiguation.c, src/strategy_priority.c