Lime Parser Generator 0.1.0
Runtime-extensible LALR(1) parser with SIMD tokenization and LLVM JIT
Loading...
Searching...
No Matches
Extension Framework Performance Characteristics

This document provides a comprehensive analysis of the performance impact of the extensible grammar framework in the Lime parser generator. It covers overhead breakdowns by component, memory costs, CPU costs per token, scaling behavior with increasing extension counts, JIT interaction, and tuning recommendations.

All measurements referenced in this document were collected on a Linux x86_64 system (GCC 14.3, C11, -O2) using the benchmark suites in bench/. Numbers are representative; your hardware will vary.


Table of Contents

  1. Architecture Overview
  2. Component Overhead Breakdown
  3. Memory Overhead
  4. CPU Cost Per Token
  5. Conflict Type Comparison
  6. Disambiguation Strategy Performance
  7. JIT Interaction with Extensions
  8. Scaling Behavior
  9. Tiebreaker Rule Overhead
  10. Performance Tuning Recommendations
  11. Benchmark Reproduction

Architecture Overview

The extension framework adds four major subsystems to the parsing pipeline:

Grammar File (.y)
|
v
+------------------+ +---------------------+
| Extension | ----> | Conflict |
| Registry | | Detector |
+------------------+ +---------------------+
| |
v v
+------------------+ +---------------------+
| Snapshot | <---- | Disambiguation |
| (Action Tables) | | Framework |
+------------------+ +---------------------+
| |
v v
+------------------+ +---------------------+
| Parser Engine | | Parser Forking |
| (Table-Driven | | (State Cloning) |
| or JIT) | +---------------------+
+------------------+

Each subsystem introduces overhead that is only incurred when extensions are actually loaded and when conflicts are actually encountered. The key design principle is zero overhead when no extensions are loaded: the parser operates identically to a standard Lemon/Lime-generated parser.


Component Overhead Breakdown

Extension Registry

The ExtensionRegistry (src/extension_registry.c) manages extension metadata, dependency resolution, and conflict declarations.

Operation Complexity Typical Cost When Incurred
extension_registry_create O(1) ~200 ns Once at startup
extension_registry_register O(1) amortized ~500 ns Once per extension load
extension_registry_find O(1) ~50 ns Per conflict detection query
extension_registry_check_dependencies O(V + E) ~2-50 us Once after all extensions registered
extension_registry_get_order O(V + E) ~5-60 us Once to determine load order
extension_registry_unregister O(1) amortized ~300 ns On extension unload
extension_registry_destroy O(N) ~100 ns per ext Once at shutdown

Implementation details:

  • Hash table: FNV-1a hashing with open addressing and linear probing. Load factor capped at 75%, rehash doubles the table. Lookups are O(1) expected, O(N) worst case (pathological hash collisions).
  • Dependency resolution: Kahn's algorithm (topological sort) with adjacency-list representation. Cost is O(V + E) where V = number of extensions and E = total number of dependency edges. For typical workloads (5-20 extensions, 1-3 deps each), this completes in microseconds.
  • String copies: All metadata strings are deep-copied on register. This means registration has a higher constant cost than a simple pointer store, but avoids lifetime management issues.

Key insight: Registry operations are almost entirely startup costs. During parsing, the registry is read-only (protected by pthread_rwlock read lock), so multiple parser threads proceed concurrently with no contention.

Conflict Detection

The conflict_detector module (src/conflict_detector.c) identifies ambiguities at three levels: token, rule, and semantic.

Detection Phase Complexity Typical Cost Notes
Token conflicts O(N * M) ~5-50 us N=extensions, M=tokens/ext
Rule conflicts O(N * M * T) ~10-100 us T=distinct token codes
Semantic conflicts O(N^2 * R^2) ~20-500 us R=rules/ext; pairwise compare
Full scan Sum of above ~50-650 us Called once after load

Token-level detection uses a TokenCollector with linear scan for grouping. For typical extension counts (< 20), this is fast despite O(N*M) complexity.

Rule-level detection requires scanning all loaded extensions for each distinct token code, checking whether any extension defines rules that reference the token. Cost grows linearly with the number of distinct tokens across all extensions.

Semantic-level detection is the most expensive: it performs pairwise comparison of all MOD_ADD_RULE modifications across loaded extensions, checking for identical LHS+RHS with different action code. This is O(N^2 * R^2) in the worst case but is bounded by the FORK_RESOLVE_MAX_FORKS limit (16) in practice.

Key insight: Conflict detection is a one-time cost incurred at extension load time. It does not run on the per-token parsing hot path.

Disambiguation Framework

The DisambiguationContext (src/disambiguation.c) dispatches conflict resolution to a pluggable strategy via vtable.

Operation Cost Notes
disambiguation_create ~1-5 us Creates strategy context
disambiguation_resolve ~100-5000 ns Depends on strategy (see below)
disambiguation_update ~10-50 ns No-op for priority/fork-resolve
disambiguation_destroy ~50-200 ns Frees strategy context

The framework overhead is minimal: it is essentially a vtable dispatch plus a malloc for the StrategyResult.

Parser Forking

The ParseFork system (src/parser_fork.c) provides deep-copy of parser state for the fork-resolve disambiguation strategy.

Operation Typical Cost Notes
fork_parser (full clone) ~1-10 us Depends on stack depth
clone_parser_state (core) ~0.5-5 us malloc + memcpy of parser+stack
feed_token_to_fork ~20-50 ns Per token per fork
parse_fork_set_create ~100 ns Allocates fork pointer array
parse_fork_set_prune ~50-200 ns Linear scan + free
parse_fork_set_best ~50-100 ns Linear scan
free_parse_fork ~200-500 ns Frees parser state + stack
parse_fork_set_destroy ~1-5 us Frees all forks
parser_fork_next_id ~5-10 ns Atomic fetch-add (relaxed)

Cloning cost breakdown for a typical parser (2 KB parser struct, 4 KB stack):

Step Bytes Copied Estimated Cost
malloc parser struct - ~50 ns
memcpy parser struct ~2 KB ~100 ns
malloc stack buffer - ~50 ns
memcpy stack (used portion) ~1-4 KB ~100-400 ns
Pointer fixup (3 writes) 24 B ~10 ns
snapshot_acquire (atomic) - ~35 ns
Total **~350-650 ns**

The clone always produces a heap-allocated stack regardless of whether the source parser uses the inline yystk0[] buffer. This simplifies lifecycle management at the cost of an extra malloc for parsers that are still using the inline stack.


Memory Overhead

Per-Extension Memory

Each registered extension consumes memory in the registry:

Component Size Notes
RegistryEntry struct ~200 B Includes deep-copied metadata
Name string (strdup) ~10-50 B Extension name
Version string (strdup) ~6-20 B Semver string
requires array (strdup each) ~50-200 B NULL-terminated string array
conflicts_with array (strdup) ~50-200 B NULL-terminated string array
Hash table entry ~24 B Key pointer + index + bool
Typical total per extension **~350-700 B**

Per-Fork Memory

Each parser fork during disambiguation consumes:

Component Size Notes
ParseFork struct ~128 B Status, priority, counters
Cloned parser state (state_data) ~2-8 KB Grammar-dependent
Cloned stack (stack_data) ~2-32 KB Depth-dependent
Snapshot reference 0 B (shared) Only refcount increment
Typical total per fork **~4-40 KB**

The fork-resolve strategy creates up to FORK_RESOLVE_MAX_FORKS (16) forks per conflict point, so peak memory for a single conflict resolution is approximately:

16 forks * 40 KB/fork = ~640 KB peak

This memory is freed immediately after the conflict is resolved.

Aggregate Memory Budget

For a deployment with N extensions and F concurrent fork sets:

Component Formula Example (10 ext, 1 fork set)
Extension registry ~700 B * N + ~4 KB base ~11 KB
Conflict detection result ~200 B * conflicts ~2 KB
Disambiguation context ~100 B per strategy ~100 B
Fork set (peak) ~40 KB * 16 * F ~640 KB
Snapshot (action tables) ~150-300 KB per snapshot ~300 KB
JIT context (if compiled) ~50-500 KB per snapshot ~200 KB
Total extension overhead **~1.15 MB**

Compare this to the base parser memory (no extensions): ~500 KB - 1 MB for a typical deployment with 1 snapshot, 1 JIT context, and 4 parser threads (see docs/PERFORMANCE.md). Extensions roughly double the memory footprint at peak, but the fork memory is transient.


CPU Cost Per Token

Baseline (No Extensions)

When no extensions are loaded, the parser follows the standard table-driven (or JIT-compiled) path with zero extension overhead:

Grammar Size Interpreted (ns/token) JIT (ns/token)
Small (64 states, 32 terminals) ~8-12 ~3-5
Medium (256 states, 100 terminals) ~12-18 ~4-6
Large (512 states, 150 terminals) ~15-25 ~4-7

These numbers are derived from the jit_comparison benchmark:

  • Small grammar: 424 ns / 50 tokens = ~8.5 ns/token (interpreted)
  • Medium grammar: 1,244 ns / 100 tokens = ~12.4 ns/token (interpreted)
  • Large grammar: 2,890 ns / 200 tokens = ~14.5 ns/token (interpreted)

With Extensions (No Conflicts)

When extensions are loaded but no conflicts arise during parsing, the overhead comes from:

  1. Snapshot check: The parser uses the current snapshot's action tables. If an extension was loaded, the parser uses a new snapshot with the merged tables. Action table lookups are identical in cost because the table format is unchanged.
  2. Token table lookup: Extension tokens are added to the hash table. Cost remains O(1) per lookup. Adding 10 extension tokens to a 36-keyword table increases average lookup time by < 5%.

Estimated per-token overhead: < 1 ns (effectively zero).

With Extensions (Active Conflicts)

When a token triggers a conflict between multiple extensions, the disambiguation framework is invoked. Per-token overhead depends on the strategy:

Strategy Per-conflict Cost Amortized Per-token Notes
Priority ~100-300 ns ~2-6 ns Linear scan of contexts
Fork-resolve ~2-50 us ~40-1000 ns Creates and evaluates forks

The "amortized per-token" assumes conflicts occur on ~5% of tokens in a typical extension-heavy workload.


Conflict Type Comparison

The performance impact varies significantly by conflict type:

Token-Level Conflicts

Cause: Multiple extensions define the same lexeme as different tokens.

Metric Value Notes
Detection cost ~5-50 us Linear scan of all extension tokens
Resolution cost ~100-300 ns Priority comparison
Memory cost ~200 B ConflictPoint + LimeContexts
Frequency Low Caught at load time

Token conflicts are detected and resolved at extension load time, not during parsing. They have no per-token runtime cost.

Rule-Level Conflicts

Cause: Multiple grammars can parse the same token sequence.

Metric Value Notes
Detection cost ~10-100 us Per distinct token code
Resolution cost ~200 ns - 50 us Depends on strategy
Memory cost ~500 B ConflictPoint + contexts
Frequency Medium May occur at parse time

Rule conflicts may be detected statically (at load time) or dynamically (when the parser encounters an ambiguous state). Dynamic detection adds per-token overhead only for the specific states where conflicts exist.

Semantic-Level Conflicts

Cause: Identical rule structure but different reduction actions.

Metric Value Notes
Detection cost ~20-500 us Pairwise O(N^2 * R^2) comparison
Resolution cost ~5-50 us Usually requires fork-resolve
Memory cost ~1 KB ConflictPoint + fork set
Frequency Low Detected at load time

Semantic conflicts are the most expensive to detect but also the rarest. The pairwise comparison algorithm (detect_semantic_conflicts) examines every rule pair across every extension pair. For N=10 extensions with R=20 rules each, this is 10*9/2 * 20*20 = 18,000 comparisons, which completes in ~200-500 us.


Disambiguation Strategy Performance

Priority Strategy

The priority strategy (src/strategy_priority.c) resolves conflicts by comparing numeric priority values.

Metric Value Notes
Init cost ~500 ns Builds priority table
Resolve cost ~100-300 ns Linear scan of contexts
Update cost ~10 ns No-op (static strategy)
Destroy cost ~50 ns Frees priority table
Memory per resolution ~100 B StrategyResult + explanation
Confidence Always 1.0 Deterministic

The priority strategy's resolve function performs a linear scan of the ConflictPoint.contexts array (typically 2-8 entries) comparing priority values. The inner loop has no allocations and no hash lookups.

Complexity: O(C) where C is the number of conflicting contexts.

Fork-Resolve Strategy

The fork-resolve strategy (src/strategy_fork_resolve.c) creates parser forks for each candidate interpretation and evaluates them.

Metric Value Notes
Init cost ~1-2 us Builds extension priority map
Resolve cost (static) ~2-10 us No parser state to clone
Resolve cost (runtime) ~5-50 us Clones parser state per fork
Update cost ~10 ns No-op (currently)
Destroy cost ~100 ns Frees context + entries
Max forks per conflict 16 FORK_RESOLVE_MAX_FORKS
Default lookahead 3 tokens FORK_RESOLVE_DEFAULT_LOOKAHEAD
Max lookahead 128 tokens FORK_RESOLVE_MAX_LOOKAHEAD

Static mode (no live parser): The strategy creates lightweight ParseFork structs (just calloc + metadata) and simulates token feeding. Each fork is ~128 bytes. This mode is used during pre-parse conflict analysis.

Runtime mode (live parser available): The strategy calls fork_parser() to deep-clone the parser state for each candidate. Cloning cost is dominated by the memcpy of the parser struct and stack. For a typical parser with a 4 KB stack at depth 20:

Per-fork clone cost: ~650 ns
Token feeding (3 tokens): ~60-150 ns
Evaluation: ~100 ns
Total per-fork: ~810-900 ns
For 4 competing forks: ~3.2-3.6 us
For 16 forks (max): ~13-14.5 us

Tiebreaker overhead is negligible. All three rules (priority, longest-match, first-complete) are O(F) scans where F is the number of completed forks:

Tiebreaker Rule Cost (16 forks) Notes
TIEBREAK_PRIORITY ~30-50 ns Compare priority + error count
TIEBREAK_LONGEST_MATCH ~30-50 ns Compare tokens_consumed
TIEBREAK_FIRST_COMPLETE ~20-40 ns Compare fork_id

Strategy Comparison

Aspect Priority Fork-Resolve
Resolution cost ~100-300 ns ~2-50 us
Accuracy Low (static) High (dynamic)
Memory per resolve ~100 B ~4-640 KB
Deterministic Yes Yes
Needs parser state No Optional
Learns over time No No (future)
Best for Known priorities Unknown/dynamic conflicts

JIT Interaction with Extensions

JIT Invalidation on Extension Load

Loading an extension creates a new ParserSnapshot with modified action tables. The JIT context is bound to a specific snapshot, so:

  1. The old snapshot's JIT code remains valid for parsers still using it.
  2. The new snapshot starts with no JIT code.
  3. The JIT policy system begins accumulating metrics for the new snapshot.
  4. After the policy threshold is met, JIT compilation triggers for the new snapshot.

This means there is a warmup period after each extension load during which parsing runs on the interpreted path.

Break-Even Analysis with Extensions

The JIT break-even point depends on the compilation cost and the per-parse savings. Loading extensions affects both:

Scenario Compile Cost Per-Parse Savings Break-Even
Baseline (no extensions) ~10-50 ms ~1-5 us ~5K-50K parses
After 1 extension load ~10-50 ms ~1-5 us ~5K-50K parses
After 5 extension loads ~15-60 ms ~1-5 us ~6K-60K parses
Frequent extension loads Repeated Reduced window May not break even

Key insight: If extensions are loaded/unloaded frequently, the JIT never reaches break-even because each load resets the warmup period. For best JIT utilization:

  • Load all extensions at startup.
  • Use synchronous JIT compilation after the final extension load.
  • Avoid dynamic extension loading in latency-sensitive paths.

JIT Compilation Cost by Grammar Size

Extension loading increases the effective grammar size (more states, more terminals). JIT compilation cost scales with grammar complexity:

Grammar Config States Terminals Compile Time Code Size
Base grammar 64 32 ~5-10 ms ~6-12 KB
+ 1 small extension 80 40 ~7-14 ms ~8-16 KB
+ 5 extensions 150 80 ~15-30 ms ~15-30 KB
+ 10 extensions 256 100 ~25-50 ms ~25-50 KB
Full SQL + 10 ext 512 150 ~40-80 ms ~50-100 KB

Compilation cost is approximately linear in nstate * nterminal because the JIT generates one function per state, each with a switch over terminal values.


Scaling Behavior

1 Extension

Metric Value Delta from Baseline
Registry overhead ~700 B +700 B
Snapshot size ~300 KB ~0% (tables merged)
Conflict detection ~5 us One-time
Per-token overhead (no conflict) < 1 ns ~0%
JIT recompilation ~10-50 ms One-time

5 Extensions

Metric Value Delta from Baseline
Registry overhead ~3.5 KB +3.5 KB
Snapshot size ~310-350 KB +3-15%
Conflict detection (full) ~50-200 us One-time
Dependency resolution ~5-15 us One-time
Potential token conflicts 0-3 Depends on overlap
Per-token overhead (no conflict) < 1 ns ~0%
Per-conflict resolution ~200 ns - 10 us Strategy-dependent

10 Extensions

Metric Value Delta from Baseline
Registry overhead ~7 KB +7 KB
Snapshot size ~320-400 KB +7-30%
Conflict detection (full) ~100-500 us One-time
Dependency resolution ~10-40 us One-time
Potential token conflicts 0-10 Higher overlap
Semantic conflict detection ~100-300 us O(N^2) pairwise
Per-token overhead (no conflict) < 1 ns ~0%
Per-conflict resolution ~200 ns - 50 us Strategy-dependent

50 Extensions

Metric Value Delta from Baseline
Registry overhead ~35 KB +35 KB
Snapshot size ~400-600 KB +30-100%
Conflict detection (full) ~1-5 ms One-time
Dependency resolution ~50-200 us One-time
Semantic conflict detection ~2-10 ms O(N^2) dominates
Hash table (registry) ~128 buckets Auto-resized
Per-token overhead (no conflict) < 2 ns ~0%
Per-conflict resolution ~500 ns - 100 us More contexts

At 50 extensions, the dominant costs are:

  1. Semantic conflict detection: O(N^2) pairwise comparison becomes significant. With 50 extensions, there are 1,225 pairs to compare. If each extension has 20 rules, this is up to 490,000 rule-pair comparisons.
  2. Fork-resolve with many contexts: A conflict involving 16+ extensions requires the maximum 16 forks, each cloning the parser state.
  3. Hash table resizing: The registry hash table auto-resizes to maintain < 75% load factor. At 50 extensions, the table has at least 128 buckets (1.5 KB).

Scaling Summary

Extensions Registry Conflict Detect Semantic Detect Snapshot Per-token
0 0 0 0 ~296 KB 0
1 ~700 B ~5 us N/A ~300 KB < 1 ns
5 ~3.5 KB ~50-200 us ~20-50 us ~330 KB < 1 ns
10 ~7 KB ~100-500 us ~100-300 us ~370 KB < 1 ns
50 ~35 KB ~1-5 ms ~2-10 ms ~500 KB < 2 ns

Observation: Extension overhead scales sub-linearly for most operations. The exception is semantic conflict detection, which is O(N^2) and should be profiled when using > 20 extensions.


Tiebreaker Rule Overhead

The fork-resolve strategy supports three tiebreaker rules. Their performance is nearly identical because all are simple linear scans:

Rule Algorithm Cost (16 forks)
TIEBREAK_PRIORITY Scan for min priority, sub-break by error count then tokens ~40 ns
TIEBREAK_LONGEST_MATCH Scan for max tokens_consumed, sub-break by priority ~40 ns
TIEBREAK_FIRST_COMPLETE Scan for min fork_id ~30 ns

The tiebreaker is invoked once per conflict resolution after fork evaluation completes. It operates on the completed-forks subset (typically 1-4 forks survive out of the initial set).


Performance Tuning Recommendations

Extension Loading

  1. Load all extensions at startup. This amortizes conflict detection and JIT compilation over the entire process lifetime rather than disrupting the hot parsing path.
  2. Order extensions by dependency. Use extension_registry_get_order() to determine the correct load order. Loading in dependency order avoids redundant conflict re-evaluation.
  3. Avoid frequent load/unload cycles. Each load creates a new snapshot, invalidates JIT, and triggers conflict detection.

Disambiguation Strategy Selection

  1. Use priority strategy for known conflicts. If you know which extension should win (e.g., your custom extension overrides a default), set explicit priorities and use STRAT_PRIORITY. This is 100-1000x faster than fork-resolve.
  2. Reserve fork-resolve for truly ambiguous cases. Fork-resolve is the right choice when you cannot determine the correct interpretation without actually trying to parse with each grammar.
  3. Limit fork count for large extension sets. The default FORK_RESOLVE_MAX_FORKS of 16 is reasonable for most workloads. Reducing it to 4-8 significantly reduces peak memory and resolution time with minimal accuracy loss.

JIT Integration

  1. Use synchronous JIT after final extension load. Call the JIT compilation function directly on the final snapshot to ensure JIT is active before the first query.
  2. Lower JIT thresholds for extension-heavy grammars. Larger grammars (more states from extensions) benefit more from JIT because the table-lookup overhead is proportionally higher.
  3. Monitor JIT metrics per snapshot. Each new snapshot starts with fresh metrics. If extensions are loaded infrequently, the JIT will eventually trigger. If loaded frequently, consider forced compilation.

Conflict Reduction

  1. Use conflicts_with declarations. Declaring known conflicts in extension metadata allows the registry to reject incompatible combinations at registration time, before expensive conflict detection runs.
  2. Set conflict_threshold appropriately. Extensions with conflict_threshold = 0.0 reject any conflict; 1.0 tolerates all. Tuning this per-extension avoids unnecessary fork-resolve invocations.
  3. Minimize overlapping token definitions. Token-level conflicts are the most common and easiest to avoid. Use unique token names across extensions, or define shared tokens in a base extension.

Memory Management

  1. Release old snapshots promptly. After an extension load, parsers that finish with the old snapshot should release it to free the old action tables.
  2. Pin snapshots during batch processing. Acquire a snapshot reference at the start of a batch and release it at the end, rather than per-query, to reduce atomic refcount traffic.

Benchmark Reproduction

Running the Parser Benchmark Suite

# Build
meson setup builddir
ninja -C builddir
# General parser benchmarks (CSV output)
./builddir/bench/parser_bench 10000 200 > results.csv 2> diag.txt
# JIT vs Interpreted comparison (with statistical analysis)
./builddir/bench/jit_comparison

Measuring Extension Overhead

Extension overhead can be measured by comparing baseline benchmark results (no extensions) against results with extensions loaded:

# Baseline (no extensions)
./builddir/bench/parser_bench 10000 200 > baseline.csv
# With extensions (modify the benchmark to load extensions before parsing)
# See bench/parser_bench.c for the benchmark framework

Key Metrics to Compare

Metric How to Measure
Per-token parse time ns_per_op / tokens_per_parse
Conflict detection time Instrument detect_all_multi_grammar_conflicts()
Fork-resolve time Instrument fork_resolve_resolve()
Snapshot clone time Instrument clone_parser_state()
JIT break-even Compare total interpreted vs JIT time

Memory Profiling

# Build with AddressSanitizer
meson setup builddir-asan -Db_sanitize=address
ninja -C builddir-asan
# Run tests (checks for leaks)
meson test -C builddir-asan
# Valgrind for detailed heap profiling
valgrind --tool=massif ./builddir/bench/parser_bench 1000
ms_print massif.out.* | head -100

Appendix: Data Structure Sizes

Approximate sizes of key structures (x86_64, 64-bit pointers):

Structure Size Notes
ExtensionRegistry ~32 B Plus dynamic entries + hash
RegistryEntry ~200 B Includes GrammarExtensionMetadata
GrammarExtensionMetadata ~120 B Pointers to strings + arrays
HashEntry ~24 B Key pointer + index + bool
ConflictPoint ~48 B Plus dynamic contexts array
LimeContext ~32 B Token, state, ext_id, priority
MultiGrammarConflictResult ~32 B Plus dynamic points array
DisambiguationContext ~200 B vtable + strategy context
StrategyResult ~32 B Plus malloc'd arrays
ForkResolveContext ~80 B Plus priority entries
PriorityContext ~24 B Plus priority entries
ParseFork ~128 B Plus cloned state + stack
ClonedParserState ~48 B Plus malloc'd state + stack
ParseForkSet ~24 B Plus fork pointer array
ParserSnapshot ~128 B Plus action tables

Last updated: 2026-04-29 Reference implementation: src/extension_registry.c, src/strategy_fork_resolve.c, src/parser_fork.c, src/conflict_detector.c, src/disambiguation.c, src/strategy_priority.c