Lime Parser Generator 0.1.0
Runtime-extensible LALR(1) parser with SIMD tokenization and LLVM JIT
Loading...
Searching...
No Matches
Extensible SQL Parser – API Reference

This document covers the public C API for the extensible SQL parser library. All public symbols are declared in headers under include/.

Table of Contents


Library Version

Header: include/parser.h

const char *lime_parser_version(void);

Returns the library version as a NUL-terminated string (e.g. "0.1.0"). The returned pointer is to static storage and must not be freed.


Snapshot API

Header: include/parser.h Internal details: src/snapshot.h

A ParserSnapshot captures the complete state of a parser's tables at a point in time. Snapshots are reference-counted and immutable after creation. Multiple threads can share a snapshot safely by acquiring references.

Types

typedef struct ParserSnapshot ParserSnapshot; /* Opaque handle */
Opaque snapshot handle.
Definition snapshot.h:117

Functions

lime_snapshot_create

ParserSnapshot *lime_snapshot_create(const char *grammar_file, char **error);

Create a base snapshot by parsing a Lemon grammar file. On success, returns a snapshot with reference count 1 and sets *error to NULL. On failure, returns NULL and sets *error to a malloc'd error message that the caller must free().

Parameters:

  • grammar_file – Path to a .y grammar file.
  • error – Output pointer for error message on failure.

lime_snapshot_acquire

ParserSnapshot *lime_snapshot_acquire(ParserSnapshot *snap);

Increment the reference count on a snapshot. Returns the same pointer for convenience. The caller must eventually call lime_snapshot_release(). Passing NULL is safe and returns NULL.

lime_snapshot_release

void lime_snapshot_release(ParserSnapshot *snap);

Decrement the reference count. When it reaches zero, the snapshot and all memory it owns (grammar data, action tables, JIT context) are freed. Passing NULL is safe.


Parse Context API

Header: include/parse_context.h

A ParseContext wraps a Lemon-generated parser with a pinned snapshot reference. Table lookups are indirected through the snapshot rather than compiled-in static arrays, enabling hot-swapping of parser tables when extensions modify the grammar.

Types

typedef struct ParseContext ParseContext; /* Opaque handle */
Per-parse-session state.

Functions

parse_begin

ParseContext *parse_begin(ParserSnapshot *snap);

Begin a new parse session pinned to snap. Acquires a reference to the snapshot that is held until parse_end() is called. Returns NULL on allocation failure. snap must not be NULL.

parse_token

int parse_token(ParseContext *ctx,
int token_code,
void *token_value,
int location);

Feed one token to the parser.

Parameters:

  • ctx – Active parse context.
  • token_code – Integer token type (terminal symbol code). Pass 0 to signal end-of-input.
  • token_value – Opaque pointer to the semantic value. The layout is determined by the parser template's TOKENTYPE.
  • location – Byte offset of the token in the original source, or LIME_LOC_UNKNOWN (-1) if the grammar does not declare locations or the caller does not track positions (e.g. the synthetic end-of-input token). Currently accepted and stored; full propagation into reduce actions lands with the push-parser implementation that replaces the current parse_token() stub. Callers should thread real locations anyway so they are ready.

Returns: 0 on success, non-zero on error (syntax error or OOM).

See also: LIME_LOC_UNKNOWN.

parse_end

void parse_end(ParseContext *ctx);

End the parse session. Releases the pinned snapshot reference and frees all internal state. Passing NULL is safe.

parse_get_snapshot

ParserSnapshot *parse_get_snapshot(ParseContext *ctx);

Return the snapshot pinned by this context. Valid as long as the context is alive.

LIME_LOC_UNKNOWN

#define LIME_LOC_UNKNOWN (-1)

Sentinel value for the location argument of parse_token(). Pass this when the grammar does not declare locations, or when no meaningful byte offset can be attributed to the token (the synthetic end-of-input marker, runtime-injected tokens, etc.). Guaranteed to be -1 so that integer byte offsets (always >= 0) never collide with it.

Snapshot-Indirected Table Access

These lower-level functions replace direct static array access in the generated parser. They are primarily used internally by the parse engine.

uint16_t snap_find_shift_action(const ParserSnapshot *snap,
uint16_t stateno, uint16_t iLookAhead);
uint16_t snap_find_reduce_action(const ParserSnapshot *snap,
uint16_t stateno, uint16_t iLookAhead);
uint16_t snap_default_action(const ParserSnapshot *snap, uint16_t stateno);

Tokenizer API

Header: include/tokenize.h

The tokenizer converts SQL text into a stream of tokens using SIMD-accelerated character classification. It automatically selects the fastest available implementation (AVX2 on x86_64, NEON on ARM, or scalar fallback) at runtime.

Types

typedef struct Tokenizer Tokenizer; /* Opaque handle */
typedef struct Token {
int type; /* Token type code (TK_* constant or keyword code) */
const char *start; /* Pointer into the source buffer */
size_t length; /* Length in bytes */
uint32_t line; /* 1-based line number */
uint32_t column; /* 1-based column number */
A single token returned by the tokenizer.
Definition tokenize.h:34
uint32_t line
1-based line number
Definition tokenize.h:38
const char * start
Pointer into source buffer.
Definition tokenize.h:36
size_t length
Length in bytes.
Definition tokenize.h:37
uint32_t column
1-based column number
Definition tokenize.h:39
int type
Token type code (keyword code or generic)
Definition tokenize.h:35

Functions

tokenizer_create

Tokenizer *tokenizer_create(TokenTable *table, const char *input, size_t length);
Thread-safe token lookup table.
Definition token_table.h:40

Create a new tokenizer for the given input buffer.

Parameters:

  • table – Keyword lookup table for recognizing SQL keywords. Pass NULL for identifier-only mode (all identifiers return TK_IDENTIFIER).
  • input – NUL-terminated SQL input string. Must remain valid for the lifetime of the tokenizer.
  • length – Length of input in bytes, not including the NUL terminator. Important: The buffer must have at least 32 bytes of readable memory past the end (e.g., zero-padded) for SIMD safety.

Returns: A new tokenizer, or NULL on allocation failure.

tokenizer_destroy

void tokenizer_destroy(Tokenizer *tok);

Destroy the tokenizer and free its memory. Passing NULL is safe.

tokenizer_next

bool tokenizer_next(Tokenizer *tok, Token *out);

Extract the next token from the input. Returns true if a token was produced, false at end-of-input. On false return, out->type is TK_EOF.

Comments (both -- single-line and /* */ block) are skipped automatically and never returned as tokens.

tokenizer_peek

bool tokenizer_peek(Tokenizer *tok, Token *out);

Peek at the next token without consuming it. Subsequent calls to tokenizer_peek() return the same token. The next call to tokenizer_next() consumes the peeked token.

tokenizer_position

size_t tokenizer_position(const Tokenizer *tok);

Return the current byte offset in the input.

tokenizer_line

uint32_t tokenizer_line(const Tokenizer *tok);

Return the current 1-based line number.

tokenizer_column

uint32_t tokenizer_column(const Tokenizer *tok);

Return the current 1-based column number.

SIMD Acceleration

The tokenizer uses SIMD instructions to accelerate three hot paths:

  1. Whitespace skipping – Classifies 32 characters at a time to find the first non-whitespace character.
  2. Identifier scanning – Uses the alpha+digit bitmask to find identifier boundaries in 32-byte chunks.
  3. Number scanning – Uses the digit bitmask for bulk digit scanning in integer and float literals.

The SIMD implementation is selected automatically at runtime via get_classify_func() and requires no user configuration.


Token Table API

Header: include/token_table.h

The token table provides thread-safe keyword lookup using a hash table with RCU-style versioning. Readers are lock-free; writers acquire an internal write lock.

Types

typedef uint32_t ExtensionID;
typedef struct TokenDefinition {
const char *lexeme; /* Token string (e.g. "SELECT") */
size_t lexeme_len;
int token_code; /* Numeric token ID */
ExtensionID extension_id; /* Which extension added it (0 = base) */
uint32_t next_in_chain; /* Internal: hash collision chain */
typedef struct TokenTable TokenTable;
uint32_t ExtensionID
Opaque ID identifying a registered extension.
Definition conflict.h:39
A single token definition in the table.
Definition token_table.h:25
const char * lexeme
Token string (e.g., "SELECT")
Definition token_table.h:26
size_t lexeme_len
Length of lexeme.
Definition token_table.h:27
ExtensionID extension_id
Which extension added it (0 = base)
Definition token_table.h:29
int token_code
Token ID (e.g., TK_SELECT)
Definition token_table.h:28
uint32_t next_in_chain
Hash collision chain link.
Definition token_table.h:30

Functions

create_token_table

TokenTable *create_token_table(uint32_t initial_capacity);

Create a new token table. initial_capacity is the initial number of slots in the hash table. Returns NULL on allocation failure.

destroy_token_table

void destroy_token_table(TokenTable *table);

Destroy the token table and free all memory.

lookup_token

int lookup_token(TokenTable *table, const char *str, size_t len);

Look up a token by its string value. This is lock-free for concurrent readers. Returns the token_code if found, or -1 if not found.

add_token

bool add_token(TokenTable *table, const char *lexeme, int token_code,
ExtensionID ext_id);

Add a token to the table. Acquires the write lock internally. Returns true on success, false on failure (allocation error or duplicate).

remove_tokens_by_extension

bool remove_tokens_by_extension(TokenTable *table, ExtensionID ext_id);

Remove all tokens belonging to a given extension. Acquires the write lock and rebuilds hash chains. Returns true on success.


SIMD Character Classification API

Header: src/tokenize_simd.h

Low-level parallel character classification. Most users should use the Tokenizer API instead; this interface is for advanced users building custom scanners.

Types

typedef struct CharClassVector {
uint32_t is_alpha_mask; /* Bit i set if char i is [A-Za-z_] */
uint32_t is_digit_mask; /* Bit i set if char i is [0-9] */
uint32_t is_space_mask; /* Bit i set if char i is [ \t\n\r] */
} CharClassVector;
typedef CharClassVector (*ClassifyFunc)(const char *input, size_t offset);

Functions

get_classify_func

ClassifyFunc get_classify_func(void);

Return the best available classification function for the current CPU. Performs runtime CPU feature detection (CPUID on x86, compile-time on ARM).

Platform CPU Feature Function returned
x86_64 AVX2 present classify_simd_avx2 (32 chars)
ARM NEON (baseline on AArch64) classify_simd_neon (16 chars)
Any Fallback classify_scalar (32 chars)

classify_scalar

CharClassVector classify_scalar(const char *input, size_t offset);

Classify 32 characters starting at input + offset. Always available on every platform. The caller must ensure 32 bytes are readable from input + offset.

classify_simd_avx2 (x86_64 only)

CharClassVector classify_simd_avx2(const char *input, size_t offset);

AVX2 implementation. Classifies 32 characters in parallel using 256-bit SIMD registers. Only callable on CPUs with AVX2 support – use get_classify_func() for safe dispatch.

classify_simd_neon (ARM only)

CharClassVector classify_simd_neon(const char *input, size_t offset);

NEON implementation. Classifies 16 characters in parallel. Only the lower 16 bits of each mask field are meaningful.


Extension API

Headers: include/parser.h (public entry points), src/extension.h (internal)

Extensions add grammar modifications (new tokens, rules, precedence changes) to the parser at runtime. Each extension is managed through a thread-safe registry.

High-Level API (parser.h)

bool lime_extension_registry_init(void);
void lime_extension_registry_destroy(void);

Initialize and destroy the global extension registry. Must be called before and after any extension operations, respectively.

Internal Registry API (src/extension.h)

Types

typedef uint32_t ExtensionID;
typedef enum ExtensionState {
EXT_REGISTERED, /* Registered but not loaded */
EXT_LOADED, /* Active, modifications applied */
EXT_UNLOADED, /* Was loaded, now removed */
EXT_ERROR, /* Failed to load */
} ExtensionState;

ExtensionInfo – Input to register_extension():

typedef struct ExtensionInfo {
const char *name;
const char *version;
ExtGetModificationsFn get_modifications; /* Required */
ExtOnConflictFn on_conflict; /* Optional */
ExtOnUnloadFn on_unload; /* Optional */
void *user_data;
} ExtensionInfo;

Extension Callbacks

/* Called on load to get grammar modifications */
typedef bool (*ExtGetModificationsFn)(
void *user_data,
const ParserSnapshot *base_snapshot,
GrammarModification **mods_out,
uint32_t *nmods_out);
/* Called when two extensions conflict */
typedef ConflictResolution (*ExtOnConflictFn)(
void *user_data,
const ConflictInfo *info);
/* Called on unload for cleanup */
typedef void (*ExtOnUnloadFn)(void *user_data);

Registry Functions

ExtensionRegistry *create_extension_registry(void);
void destroy_extension_registry(ExtensionRegistry *reg);
bool register_extension(ExtensionRegistry *reg,
const ExtensionInfo *info,
ExtensionID *id_out);
bool load_extension(ExtensionRegistry *reg,
const ParserSnapshot *base_snapshot,
char **error);
bool unload_extension(ExtensionRegistry *reg, ExtensionID id);
const Extension *find_extension(ExtensionRegistry *reg, ExtensionID id);
uint32_t get_loaded_extension_count(ExtensionRegistry *reg);
struct ExtensionRegistry ExtensionRegistry
Opaque extension registry handle.

Grammar Modification Types

typedef enum GrammarModType {
MOD_ADD_RULE,
MOD_ADD_TOKEN,
MOD_MODIFY_PRECEDENCE,
MOD_ADD_TYPE,
MOD_REMOVE_RULE,
} GrammarModType;

Each modification is a GrammarModification struct with a type field and a tagged union u containing the type-specific payload.

Type Union Field Purpose
MOD_ADD_RULE u.add_rule Add a new production rule
MOD_ADD_TOKEN u.add_token Add a new terminal token
MOD_MODIFY_PRECEDENCE u.modify_prec Change symbol precedence
MOD_ADD_TYPE u.add_type Add a non-terminal type
MOD_REMOVE_RULE u.remove_rule Remove an existing rule

MOD_ADD_RULE reduce actions

u.add_rule carries two fields for the rule's reduction action:

typedef void (*LimeReduceFn)(
void *user_data, /* opaque, from .reduce_user */
void *extra_arg, /* grammar's %extra_argument, or NULL */
int nrhs, /* count of RHS symbols in this rule */
const void *rhs_values, /* array of nrhs %token_type payloads */
const int *rhs_locs, /* array of nrhs byte offsets, or NULL */
void *lhs_out /* writeback slot for the LHS value */
);
struct {
/* ... lhs / rhs / nrhs / precedence fields ... */
LimeReduceFn reduce; /* runtime-dispatched action, or NULL */
void *reduce_user; /* opaque pointer passed to reduce() */
const char *code; /* generator-time C code, or NULL */
} add_rule;

Precedence of the two action-source fields:

reduce code Behaviour
non-NULL any Parser invokes reduce(reduce_user, ...) at reduce time.
NULL non-NULL code is compiled into the parser's generated reduce() switch at generator time. Applicable to grammars fed through lime; not usable from extensions loaded into a pre-compiled parser.
NULL NULL Rule reduces with no action.

Current implementation status: reduce-based dispatch is not yet wired through to the push-parser stack (blocks on the runtime rebuild work). The types are stable; extension code written against the contract today will not need changes when dispatch lights up.

Conflict Resolution

When two extensions modify the same grammar element, the on_conflict callback is invoked:

typedef enum ConflictResolution {
CONFLICT_UNRESOLVED, /* No resolution provided */
CONFLICT_KEEP_EXISTING, /* Keep the existing item */
CONFLICT_USE_NEW, /* Replace with new item */
CONFLICT_MERGE, /* Extension provides merged result */
} ConflictResolution;

Modification Serializer

Header: src/mod_serialize.h

char *lime_modifications_to_grammar_text(
const GrammarModification *mods,
uint32_t nmods,
uint32_t *skipped_out, /* may be NULL */
char **error /* may be NULL */
);

Render an array of GrammarModifications as .lime-syntax text that, when concatenated after a base grammar and re-parsed by the lime generator, produces a parser equivalent to applying the modifications. This is the intended mechanism for the "subprocess fallback" pattern that unblocks runtime extension validation while real in-process apply_add_rule() (Task #3) is pending.

Returns: malloc'd NUL-terminated buffer; NULL on allocation failure or bad arguments. Caller owns the buffer.

Round-trip fidelity – not every modification serializes cleanly:

Case Behaviour
MOD_ADD_RULE with .reduce != NULL and .code == NULL Skipped; counted in *skipped_out. A function pointer has no text form.
MOD_REMOVE_RULE Always skipped; concat cannot express removal. Filter the base grammar text if removals must take effect.
MOD_MODIFY_PRECEDENCE with new_assoc == 0 Emitted as a comment (no single .lime directive expresses "no associativity").
Integer .precedence on MOD_ADD_RULE Emitted as a /* NOTE */ comment; .lime uses [SYMBOL] markers, not numbers.

Typical subprocess-fallback usage:

uint32_t skipped = 0;
char *err = NULL;
char *fragment = lime_modifications_to_grammar_text(
mods, nmods, &skipped, &err);
if (fragment == NULL) {
fprintf(stderr, "serialization failed: %s\n", err);
free(err);
return -1;
}
/* concat base grammar text + fragment, write to tempfile,
** spawn `lime`, compile the output, dlopen the result */
FILE *tmp = fopen(tmpfile, "w");
fputs(base_grammar_text, tmp);
fputs(fragment, tmp);
fclose(tmp);
free(fragment);

JIT Compilation API

Header: include/jit_context.h

Optional LLVM-based JIT compilation of parser action tables. When LLVM is available, the JIT compiles specialized lookup functions for each parser state, replacing table-driven lookups with direct branch sequences.

When compiled without LLVM (LIME_NO_JIT), all JIT functions degrade to no-ops.

Types

typedef struct JITContext JITContext; /* Opaque handle */
typedef uint16_t (*JITShiftActionFn)(uint16_t iLookAhead);
typedef enum JITStatus {
JIT_OK = 0,
JIT_ERR_NO_LLVM,
JIT_ERR_INIT_FAILED,
JIT_ERR_CODEGEN_FAILED,
JIT_ERR_COMPILE_FAILED,
JIT_ERR_LOOKUP_FAILED,
JIT_ERR_INVALID_ARG,
JIT_ERR_ALREADY_COMPILED,
} JITStatus;
typedef struct JITStats {
uint32_t states_compiled;
uint32_t states_total;
uint64_t compile_time_ns;
uint64_t code_size_bytes;
bool available;
JIT compilation statistics for a snapshot.
Definition jit_context.h:54
uint64_t code_size_bytes
Approximate generated code size in bytes.
Definition jit_context.h:58
bool available
True if JIT support is available at runtime.
Definition jit_context.h:59
uint64_t compile_time_ns
Wall-clock nanoseconds spent compiling.
Definition jit_context.h:57
uint32_t states_total
Total number of states in the snapshot.
Definition jit_context.h:56
uint32_t states_compiled
Number of states with JIT code attached.
Definition jit_context.h:55

Functions

High-Level (parser.h)

bool lime_jit_available(void);
bool lime_jit_available(void)
Check whether JIT compilation support is available at runtime.
int lime_jit_compile(ParserSnapshot *snap)
Compile and attach JIT code to a snapshot's action tables.

lime_jit_available() returns true if LLVM was linked and initialization succeeds.

lime_jit_compile() compiles and attaches JIT code to a snapshot. Returns 0 on success, non-zero on failure. No-op if already compiled or LLVM is unavailable.

Low-Level (jit_context.h)

JITStatus jit_create(JITContext **ctx_out);
void jit_destroy(JITContext *ctx);
JITStatus jit_compile_snapshot(JITContext *ctx, const ParserSnapshot *snap);
JITShiftActionFn jit_get_shift_action(const JITContext *ctx, uint32_t state_id);
JITStats jit_get_stats(const JITContext *ctx);
const char *jit_status_string(JITStatus status);
bool jit_is_available(void);

Snapshot Integration

JITStatus jit_attach_to_snapshot(ParserSnapshot *snap);
void jit_detach_from_snapshot(ParserSnapshot *snap);
uint16_t jit_find_shift_action(const ParserSnapshot *snap,
uint16_t stateno,
uint16_t iLookAhead);

jit_find_shift_action() is the primary runtime dispatch function. If the snapshot has JIT code for the given state, it uses the compiled path; otherwise it falls back to the table-driven lookup.


JIT Policy API

Header: include/jit_policy.h

Adaptive JIT compilation policy that decides when to compile based on runtime metrics. Tracks per-snapshot usage and triggers compilation when the expected benefit exceeds the cost.

Types

typedef struct JITMetrics {
atomic_uint_fast64_t parse_count;
atomic_uint_fast64_t total_parse_time_ns;
atomic_uint_fast64_t action_lookup_count;
atomic_int is_jitted;
atomic_int jit_in_progress;
typedef struct JITPolicyConfig {
uint64_t min_parse_count; /* Default: 50 */
uint64_t min_total_parse_time_ns; /* Default: 10,000,000 (10 ms) */
uint64_t min_avg_lookups_per_parse; /* Default: 100 */
bool background_compile; /* Default: true */
Per-snapshot runtime metrics used by the JIT policy.
Definition jit_policy.h:38
atomic_uint_fast64_t parse_count
Number of parse sessions.
Definition jit_policy.h:39
atomic_uint_fast64_t action_lookup_count
Total action table lookups.
Definition jit_policy.h:41
atomic_int jit_in_progress
1 if background compile active
Definition jit_policy.h:44
atomic_int is_jitted
1 if JIT code is attached
Definition jit_policy.h:43
atomic_uint_fast64_t total_parse_time_ns
Cumulative parse wall-clock (ns)
Definition jit_policy.h:40
Tunable thresholds for the JIT compilation policy.
Definition jit_policy.h:57
uint64_t min_avg_lookups_per_parse
Minimum average action lookups per parse session.
Definition jit_policy.h:69
uint64_t min_parse_count
Minimum number of parse sessions before considering JIT.
Definition jit_policy.h:60
uint64_t min_total_parse_time_ns
Minimum cumulative parse time (nanoseconds) before JIT.
Definition jit_policy.h:65
bool background_compile
If true, JIT compilation happens on a background thread.
Definition jit_policy.h:73

Functions

JITPolicyConfig jit_policy_default_config(void);
void jit_metrics_init(JITMetrics *m);
void jit_metrics_record_parse(JITMetrics *m,
uint64_t parse_time_ns,
uint64_t action_lookups);
bool jit_should_compile(const JITMetrics *m, const JITPolicyConfig *config);
int jit_maybe_compile(ParserSnapshot *snap,
const JITPolicyConfig *config);
void jit_policy_shutdown(void);

jit_maybe_compile() returns 0 if compilation was triggered, 1 if metrics do not yet warrant compilation, or -1 on error. When background_compile is true, compilation happens on a detached thread.


Data Structures Reference

ParserSnapshot (src/snapshot.h)

Field Type Description
version uint64_t Monotonically increasing version number
refcount atomic_uint_fast32_t Reference count (starts at 1)
symbols struct symbol ** Array of symbol structs
nsymbol uint32_t Total symbol count
nterminal uint32_t Terminal symbol count
rules struct rule * Linked list of production rules
nrule uint32_t Rule count
states struct state ** Array of parser states
nstate uint32_t State count
yy_action uint16_t * Combined shift+reduce action array
yy_lookahead uint16_t * Parallel lookahead values
yy_shift_ofst int16_t * Per-state shift offset
yy_reduce_ofst int16_t * Per-state reduce offset
yy_default uint16_t * Default action per state
create_time_ns uint64_t Creation timestamp (nanoseconds)
jit_ctx void * Attached JIT context (or NULL)

Token Type Codes

Defined in include/tokenize.h. Keyword tokens use positive codes assigned via the TokenTable. Built-in token types use non-positive values:

Constant Value Description
TK_EOF 0 End of input
TK_IDENTIFIER -1 Unrecognized identifier
TK_INTEGER -2 Integer literal (decimal or hex)
TK_FLOAT -3 Floating point literal
TK_STRING -4 Single-quoted string literal
TK_BLOB -5 Blob literal (‘X’...') \ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_LPAREN\ilinebr </td> <td class="markdownTableBodyNone"> -6 \ilinebr </td> <td class="markdownTableBodyNone">(\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_RPAREN\ilinebr </td> <td class="markdownTableBodyNone"> -7 \ilinebr </td> <td class="markdownTableBodyNone">)\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_SEMICOLON\ilinebr </td> <td class="markdownTableBodyNone"> -8 \ilinebr </td> <td class="markdownTableBodyNone">;\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_COMMA\ilinebr </td> <td class="markdownTableBodyNone"> -9 \ilinebr </td> <td class="markdownTableBodyNone">,\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_DOT\ilinebr </td> <td class="markdownTableBodyNone"> -10 \ilinebr </td> <td class="markdownTableBodyNone">.\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_STAR\ilinebr </td> <td class="markdownTableBodyNone"> -11 \ilinebr </td> <td class="markdownTableBodyNone">*\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_PLUS\ilinebr </td> <td class="markdownTableBodyNone"> -12 \ilinebr </td> <td class="markdownTableBodyNone">+\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_MINUS\ilinebr </td> <td class="markdownTableBodyNone"> -13 \ilinebr </td> <td class="markdownTableBodyNone">-\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_SLASH\ilinebr </td> <td class="markdownTableBodyNone"> -14 \ilinebr </td> <td class="markdownTableBodyNone">/\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_PERCENT\ilinebr </td> <td class="markdownTableBodyNone"> -15 \ilinebr </td> <td class="markdownTableBodyNone">%\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_EQ\ilinebr </td> <td class="markdownTableBodyNone"> -16 \ilinebr </td> <td class="markdownTableBodyNone">=or==\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_NE\ilinebr </td> <td class="markdownTableBodyNone"> -17 \ilinebr </td> <td class="markdownTableBodyNone">!=or<>\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_LT\ilinebr </td> <td class="markdownTableBodyNone"> -18 \ilinebr </td> <td class="markdownTableBodyNone"><\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_GT\ilinebr </td> <td class="markdownTableBodyNone"> -19 \ilinebr </td> <td class="markdownTableBodyNone">>\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_LE\ilinebr </td> <td class="markdownTableBodyNone"> -20 \ilinebr </td> <td class="markdownTableBodyNone"><=\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_GE\ilinebr </td> <td class="markdownTableBodyNone"> -21 \ilinebr </td> <td class="markdownTableBodyNone">>=\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_BITAND\ilinebr </td> <td class="markdownTableBodyNone"> -22 \ilinebr </td> <td class="markdownTableBodyNone">&\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_BITOR\ilinebr </td> <td class="markdownTableBodyNone"> -23 \ilinebr </td> <td class="markdownTableBodyNone">|\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_BITNOT\ilinebr </td> <td class="markdownTableBodyNone"> -24 \ilinebr </td> <td class="markdownTableBodyNone">~\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_LSHIFT\ilinebr </td> <td class="markdownTableBodyNone"> -25 \ilinebr </td> <td class="markdownTableBodyNone"><<\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_RSHIFT\ilinebr </td> <td class="markdownTableBodyNone"> -26 \ilinebr </td> <td class="markdownTableBodyNone">>>\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_CONCAT\ilinebr </td> <td class="markdownTableBodyNone"> -27 \ilinebr </td> <td class="markdownTableBodyNone">||\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_DQUOTE_ID\ilinebr </td> <td class="markdownTableBodyNone"> -28 \ilinebr </td> <td class="markdownTableBodyNone">"quoted identifier"\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_BACKTICK_ID\ilinebr </td> <td class="markdownTableBodyNone"> -29 \ilinebr </td> <td class="markdownTableBodyNone">backtick identifier\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">TK_BRACKET_ID\ilinebr </td> <td class="markdownTableBodyNone"> -30 \ilinebr </td> <td class="markdownTableBodyNone">[bracket identifier]\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">TK_ILLEGAL` -31 Unrecognized character

Error Handling Conventions

The library uses the following conventions for error reporting:

  • NULL return – Functions that create objects (tokenizer_create, parse_begin, create_token_table, lime_snapshot_create) return NULL on failure.
  • Boolean return – Functions that perform operations (add_token, register_extension, load_extension) return false on failure.
  • Error string – Functions with a char **error parameter set it to a malloc'd string on failure. The caller must free() this string.
  • Status codes – JIT functions return JITStatus enum values. Use jit_status_string() to convert to a human-readable message.
  • NULL-safe – Destroy/release functions (tokenizer_destroy, parse_end, lime_snapshot_release) accept NULL safely.

Allocator Contract

Lime's generated parsers accept a caller-supplied allocator via XxxAlloc(void *(*mallocProc)(size_t)) (where Xxx is the parser-name prefix set by name or -P). The matching XxxFree(void *, void (*freeProc)(void*)) uses the caller's free. This is strictly better than Bison's YYMALLOC/YYFREE macro hack: the allocator is passed as a first-class argument rather than baked in at compile time.

The contract the generator relies on:

  1. Error semantics are caller-chosen. mallocProc may return NULL on failure, or it may never return (longjmp / throw). If mallocProc returns NULL, the parser enters a failure path and subsequent Parse() calls are no-ops. If it longjmps out, the parser's internal state is left in whatever condition the jump leaves it; the caller must not reuse that parser instance without calling XxxFree first.
  2. Pairing is symmetric. freeProc is called exactly as many times as mallocProc succeeded – one call per successful allocation – and always on the pointers mallocProc returned.
  3. No assumptions about alignment beyond max_align_t. Pointers returned by mallocProc must satisfy the alignment requirements of any C type up to max_align_t (the guarantee malloc(3) gives). Lime never allocates over-aligned objects.
  4. Allocation sites are stack growth, token buffer growth, and the parser handle itself. Typical parsers allocate once at XxxAlloc time and then occasionally as the shift stack grows past stack_size. Callers embedding Lime in memory-constrained contexts can set stack_size to a static upper bound to avoid runtime growth.

This contract lets a Lime-driven parser hosted inside a language runtime (e.g. one with a memory-context-aware allocator and longjmp-based error handling) delegate allocation to that runtime without macro gymnastics.


Thread Safety

Component Read Write
ParserSnapshot Thread-safe (immutable after creation) N/A (immutable)
snapshot_acquire/release Thread-safe (atomic refcount) N/A
ParseContext Single-thread only Single-thread only
Tokenizer Single-thread only Single-thread only
TokenTable lookup Lock-free (concurrent readers) Write-locked
TokenTable add/remove N/A Acquires internal lock
ExtensionRegistry Read-locked (concurrent) Write-locked
JITMetrics Atomic reads Atomic updates

Key points:

  • Snapshots are safe to share across threads. Acquire a reference per thread.
  • Each ParseContext and Tokenizer is single-threaded. Create one per thread/task.
  • The TokenTable supports concurrent readers with lock-free lookups. Writes (adding/removing tokens) serialize internally.
  • JIT metrics use atomic operations for contention-free updates from multiple parser threads.

Usage Examples

Basic Tokenization

#include "tokenize.h"
#include "token_table.h"
/* Prepare a padded input buffer (32 extra bytes for SIMD) */
const char *sql = "SELECT id, name FROM users WHERE active = 1;";
size_t len = strlen(sql);
char *buf = calloc(1, len + 64);
memcpy(buf, sql, len);
/* Optional: set up keyword table */
TokenTable *keywords = create_token_table(64);
add_token(keywords, "SELECT", 100, 0);
add_token(keywords, "FROM", 101, 0);
add_token(keywords, "WHERE", 102, 0);
/* Tokenize */
Tokenizer *tok = tokenizer_create(keywords, buf, len);
while (tokenizer_next(tok, &t)) {
printf("line %u col %u: type=%d '%.*s'\n",
t.line, t.column, t.type, (int)t.length, t.start);
}
tokenizer_destroy(tok);
destroy_token_table(keywords);
free(buf);

Snapshot Lifecycle

#include "parser.h"
char *error = NULL;
ParserSnapshot *snap = lime_snapshot_create("sql.y", &error);
if (!snap) {
fprintf(stderr, "Error: %s\n", error);
free(error);
return 1;
}
/* Share across threads */
ParserSnapshot *ref = lime_snapshot_acquire(snap);
/* ... use ref in another thread ... */
lime_snapshot_release(ref); /* Thread done */
lime_snapshot_release(snap); /* Original reference */
Core public API for the Lime Parser library.

Parse Session

#include "parser.h"
#include "parse_context.h"
ParseContext *ctx = parse_begin(snap);
/* Feed tokens from the tokenizer */
while (tokenizer_next(tok, &t)) {
if (t.type == TK_EOF) break;
int rc = parse_token(ctx, t.type, (void *)&t, (int)t.offset);
if (rc != 0) {
fprintf(stderr, "Parse error at line %u col %u\n", t.line, t.column);
break;
}
}
parse_token(ctx, 0, NULL, LIME_LOC_UNKNOWN); /* Signal end-of-input */
parse_end(ctx);

Extension Registration

#include "parser.h"
#include "extension.h"
/* Extension callback: provide modifications */
static bool my_get_mods(void *user_data,
const ParserSnapshot *base,
GrammarModification **mods_out,
uint32_t *nmods_out) {
static GrammarModification mods[1];
mods[0].type = MOD_ADD_TOKEN;
mods[0].description = "Add JSONB arrow operator token";
mods[0].u.add_token.name = "TK_JSONB_ARROW";
mods[0].u.add_token.lexeme = "->";
mods[0].u.add_token.token_code = -1; /* auto-assign */
*mods_out = mods;
*nmods_out = 1;
return true;
}
/* Register and load */
ExtensionRegistry *reg = create_extension_registry();
ExtensionInfo info = {
.name = "jsonb_ops",
.version = "1.0.0",
.get_modifications = my_get_mods,
};
register_extension(reg, &info, &id);
char *error = NULL;
load_extension(reg, id, snap, &error);

JIT Compilation with Policy

#include "parser.h"
#include "jit_policy.h"
/* Initialize metrics and policy */
JITMetrics metrics;
jit_metrics_init(&metrics);
JITPolicyConfig policy = jit_policy_default_config();
policy.min_parse_count = 100;
/* After each parse session, record metrics */
jit_metrics_record_parse(&metrics, elapsed_ns, lookup_count);
/* Check if JIT compilation should trigger */
int rc = jit_maybe_compile(snap, &metrics, &policy);
/* rc == 0: compilation triggered
** rc == 1: not yet warranted
** rc == -1: error */
/* At shutdown */
jit_policy_shutdown();