This document covers the public C API for the extensible SQL parser library. All public symbols are declared in headers under include/.

Library Version

Header: include/parser.h

const char *lime_parser_version(void);

Returns the library version as a NUL-terminated string (e.g. "0.1.0"). The returned pointer is to static storage and must not be freed.

Snapshot API

Header: include/parser.h Internal details: src/snapshot.h

A ParserSnapshot captures the complete state of a parser's tables at a point in time. Snapshots are reference-counted and immutable after creation. Multiple threads can share a snapshot safely by acquiring references.

Types

typedef struct ParserSnapshot ParserSnapshot; /* Opaque handle */

ParserSnapshot

Opaque snapshot handle.

Definition snapshot.h:117

Functions

lime_snapshot_create

ParserSnapshot *lime_snapshot_create(const char *grammar_file, char **error);

Create a base snapshot by parsing a Lemon grammar file. On success, returns a snapshot with reference count 1 and sets *error to NULL. On failure, returns NULL and sets *error to a malloc'd error message that the caller must free().

Parameters:

grammar_file – Path to a .y grammar file.
error – Output pointer for error message on failure.

lime_snapshot_acquire

ParserSnapshot *lime_snapshot_acquire(ParserSnapshot *snap);

Increment the reference count on a snapshot. Returns the same pointer for convenience. The caller must eventually call lime_snapshot_release(). Passing NULL is safe and returns NULL.

lime_snapshot_release

void lime_snapshot_release(ParserSnapshot *snap);

Decrement the reference count. When it reaches zero, the snapshot and all memory it owns (grammar data, action tables, JIT context) are freed. Passing NULL is safe.

Parse Context API

Header: include/parse_context.h

A ParseContext wraps a Lemon-generated parser with a pinned snapshot reference. Table lookups are indirected through the snapshot rather than compiled-in static arrays, enabling hot-swapping of parser tables when extensions modify the grammar.

Types

typedef struct ParseContext ParseContext; /* Opaque handle */

ParseContext

Per-parse-session state.

Definition parse_context.h:22

Functions

parse_begin

ParseContext *parse_begin(ParserSnapshot *snap);

Begin a new parse session pinned to snap. Acquires a reference to the snapshot that is held until parse_end() is called. Returns NULL on allocation failure. snap must not be NULL.

parse_token

int parse_token(ParseContext *ctx,
                int token_code,
                void *token_value,
                int location);

Feed one token to the parser.

Parameters:

ctx – Active parse context.
token_code – Integer token type (terminal symbol code). Pass 0 to signal end-of-input.
token_value – Opaque pointer to the semantic value. The layout is determined by the parser template's TOKENTYPE.
location – Byte offset of the token in the original source, or LIME_LOC_UNKNOWN (-1) if the grammar does not declare locations or the caller does not track positions (e.g. the synthetic end-of-input token). Currently accepted and stored; full propagation into reduce actions lands with the push-parser implementation that replaces the current parse_token() stub. Callers should thread real locations anyway so they are ready.

Returns: 0 on success, non-zero on error (syntax error or OOM).

See also: LIME_LOC_UNKNOWN.

parse_end

void parse_end(ParseContext *ctx);

End the parse session. Releases the pinned snapshot reference and frees all internal state. Passing NULL is safe.

parse_get_snapshot

ParserSnapshot *parse_get_snapshot(ParseContext *ctx);

Return the snapshot pinned by this context. Valid as long as the context is alive.

LIME_LOC_UNKNOWN

#define LIME_LOC_UNKNOWN (-1)

Sentinel value for the location argument of parse_token(). Pass this when the grammar does not declare locations, or when no meaningful byte offset can be attributed to the token (the synthetic end-of-input marker, runtime-injected tokens, etc.). Guaranteed to be -1 so that integer byte offsets (always >= 0) never collide with it.

Snapshot-Indirected Table Access

These lower-level functions replace direct static array access in the generated parser. They are primarily used internally by the parse engine.

uint16_t snap_find_shift_action(const ParserSnapshot *snap,
                                uint16_t stateno, uint16_t iLookAhead);
uint16_t snap_find_reduce_action(const ParserSnapshot *snap,
                                 uint16_t stateno, uint16_t iLookAhead);
uint16_t snap_default_action(const ParserSnapshot *snap, uint16_t stateno);

Tokenizer API

Header: include/tokenize.h

The tokenizer converts SQL text into a stream of tokens using SIMD-accelerated character classification. It automatically selects the fastest available implementation (AVX2 on x86_64, NEON on ARM, or scalar fallback) at runtime.

Types

typedef struct Tokenizer Tokenizer;  /* Opaque handle */
 
typedef struct Token {
    int type;            /* Token type code (TK_* constant or keyword code) */
    const char *start;   /* Pointer into the source buffer */
    size_t length;       /* Length in bytes */
    uint32_t line;       /* 1-based line number */
    uint32_t column;     /* 1-based column number */
} Token;

Functions

tokenizer_create

Tokenizer *tokenizer_create(TokenTable *table, const char *input, size_t length);

TokenTable

Thread-safe token lookup table.

Definition token_table.h:40

Create a new tokenizer for the given input buffer.

Parameters:

table – Keyword lookup table for recognizing SQL keywords. Pass NULL for identifier-only mode (all identifiers return TK_IDENTIFIER).
input – NUL-terminated SQL input string. Must remain valid for the lifetime of the tokenizer.
length – Length of input in bytes, not including the NUL terminator. Important: The buffer must have at least 32 bytes of readable memory past the end (e.g., zero-padded) for SIMD safety.

Returns: A new tokenizer, or NULL on allocation failure.

tokenizer_destroy

void tokenizer_destroy(Tokenizer *tok);

Destroy the tokenizer and free its memory. Passing NULL is safe.

tokenizer_next

bool tokenizer_next(Tokenizer *tok, Token *out);

Extract the next token from the input. Returns true if a token was produced, false at end-of-input. On false return, out->type is TK_EOF.

Comments (both -- single-line and /* */ block) are skipped automatically and never returned as tokens.

tokenizer_peek

bool tokenizer_peek(Tokenizer *tok, Token *out);

Peek at the next token without consuming it. Subsequent calls to tokenizer_peek() return the same token. The next call to tokenizer_next() consumes the peeked token.

tokenizer_position

size_t tokenizer_position(const Tokenizer *tok);

Return the current byte offset in the input.

tokenizer_line

uint32_t tokenizer_line(const Tokenizer *tok);

Return the current 1-based line number.

tokenizer_column

uint32_t tokenizer_column(const Tokenizer *tok);

Return the current 1-based column number.

SIMD Acceleration

The tokenizer uses SIMD instructions to accelerate three hot paths:

Whitespace skipping – Classifies 32 characters at a time to find the first non-whitespace character.
Identifier scanning – Uses the alpha+digit bitmask to find identifier boundaries in 32-byte chunks.
Number scanning – Uses the digit bitmask for bulk digit scanning in integer and float literals.

The SIMD implementation is selected automatically at runtime via get_classify_func() and requires no user configuration.

Token Table API

Header: include/token_table.h

The token table provides thread-safe keyword lookup using a hash table with RCU-style versioning. Readers are lock-free; writers acquire an internal write lock.

Types

typedef uint32_t ExtensionID;
 
typedef struct TokenDefinition {
    const char *lexeme;        /* Token string (e.g. "SELECT") */
    size_t lexeme_len;
    int token_code;            /* Numeric token ID */
    ExtensionID extension_id;  /* Which extension added it (0 = base) */
    uint32_t next_in_chain;    /* Internal: hash collision chain */
} TokenDefinition;
 
typedef struct TokenTable TokenTable;

Functions

create_token_table

TokenTable *create_token_table(uint32_t initial_capacity);

Create a new token table. initial_capacity is the initial number of slots in the hash table. Returns NULL on allocation failure.

destroy_token_table

void destroy_token_table(TokenTable *table);

Destroy the token table and free all memory.

lookup_token

int lookup_token(TokenTable *table, const char *str, size_t len);

Look up a token by its string value. This is lock-free for concurrent readers. Returns the token_code if found, or -1 if not found.

add_token

bool add_token(TokenTable *table, const char *lexeme, int token_code,

ExtensionID ext_id);

Add a token to the table. Acquires the write lock internally. Returns true on success, false on failure (allocation error or duplicate).

remove_tokens_by_extension

bool remove_tokens_by_extension(TokenTable *table, ExtensionID ext_id);

Remove all tokens belonging to a given extension. Acquires the write lock and rebuilds hash chains. Returns true on success.

SIMD Character Classification API

Header: src/tokenize_simd.h

Low-level parallel character classification. Most users should use the Tokenizer API instead; this interface is for advanced users building custom scanners.

Types

typedef struct CharClassVector {
    uint32_t is_alpha_mask;  /* Bit i set if char i is [A-Za-z_] */
    uint32_t is_digit_mask;  /* Bit i set if char i is [0-9] */
    uint32_t is_space_mask;  /* Bit i set if char i is [ \t\n\r] */
} CharClassVector;
 
typedef CharClassVector (*ClassifyFunc)(const char *input, size_t offset);

Functions

get_classify_func

ClassifyFunc get_classify_func(void);

Return the best available classification function for the current CPU. Performs runtime CPU feature detection (CPUID on x86, compile-time on ARM).

Platform	CPU Feature	Function returned
x86_64	AVX2 present	`classify_simd_avx2` (32 chars)
ARM	NEON (baseline on AArch64)	`classify_simd_neon` (16 chars)
Any	Fallback	`classify_scalar` (32 chars)

classify_scalar

CharClassVector classify_scalar(const char *input, size_t offset);

Classify 32 characters starting at input + offset. Always available on every platform. The caller must ensure 32 bytes are readable from input + offset.

classify_simd_avx2 (x86_64 only)

CharClassVector classify_simd_avx2(const char *input, size_t offset);

AVX2 implementation. Classifies 32 characters in parallel using 256-bit SIMD registers. Only callable on CPUs with AVX2 support – use get_classify_func() for safe dispatch.

classify_simd_neon (ARM only)

CharClassVector classify_simd_neon(const char *input, size_t offset);

NEON implementation. Classifies 16 characters in parallel. Only the lower 16 bits of each mask field are meaningful.

Extension API

Headers: include/parser.h (public entry points), src/extension.h (internal)

Extensions add grammar modifications (new tokens, rules, precedence changes) to the parser at runtime. Each extension is managed through a thread-safe registry.

High-Level API (parser.h)

bool lime_extension_registry_init(void);

void lime_extension_registry_destroy(void);

Initialize and destroy the global extension registry. Must be called before and after any extension operations, respectively.

Internal Registry API (src/extension.h)

Types

typedef uint32_t ExtensionID;
 
typedef enum ExtensionState {
    EXT_REGISTERED,  /* Registered but not loaded */
    EXT_LOADED,      /* Active, modifications applied */
    EXT_UNLOADED,    /* Was loaded, now removed */
    EXT_ERROR,       /* Failed to load */
} ExtensionState;

ExtensionInfo – Input to register_extension():

typedef struct ExtensionInfo {
    const char *name;
    const char *version;
    ExtGetModificationsFn get_modifications;  /* Required */
    ExtOnConflictFn on_conflict;              /* Optional */
    ExtOnUnloadFn on_unload;                  /* Optional */
    void *user_data;
} ExtensionInfo;

Extension Callbacks

/* Called on load to get grammar modifications */
typedef bool (*ExtGetModificationsFn)(
    void *user_data,
    const ParserSnapshot *base_snapshot,
    GrammarModification **mods_out,
    uint32_t *nmods_out);
 
/* Called when two extensions conflict */
typedef ConflictResolution (*ExtOnConflictFn)(
    void *user_data,
    const ConflictInfo *info);
 
/* Called on unload for cleanup */
typedef void (*ExtOnUnloadFn)(void *user_data);

Registry Functions

ExtensionRegistry *create_extension_registry(void);
void destroy_extension_registry(ExtensionRegistry *reg);
 
bool register_extension(ExtensionRegistry *reg,
                        const ExtensionInfo *info,
                        ExtensionID *id_out);
 
bool load_extension(ExtensionRegistry *reg,
                    ExtensionID id,
                    const ParserSnapshot *base_snapshot,
                    char **error);
 
bool unload_extension(ExtensionRegistry *reg, ExtensionID id);
 
const Extension *find_extension(ExtensionRegistry *reg, ExtensionID id);
 
uint32_t get_loaded_extension_count(ExtensionRegistry *reg);

Grammar Modification Types

typedef enum GrammarModType {
    MOD_ADD_RULE,
    MOD_ADD_TOKEN,
    MOD_MODIFY_PRECEDENCE,
    MOD_ADD_TYPE,
    MOD_REMOVE_RULE,
} GrammarModType;

Each modification is a GrammarModification struct with a type field and a tagged union u containing the type-specific payload.

Type	Union Field	Purpose
`MOD_ADD_RULE`	`u.add_rule`	Add a new production rule
`MOD_ADD_TOKEN`	`u.add_token`	Add a new terminal token
`MOD_MODIFY_PRECEDENCE`	`u.modify_prec`	Change symbol precedence
`MOD_ADD_TYPE`	`u.add_type`	Add a non-terminal type
`MOD_REMOVE_RULE`	`u.remove_rule`	Remove an existing rule

MOD_ADD_RULE reduce actions

u.add_rule carries two fields for the rule's reduction action:

typedef void (*LimeReduceFn)(
    void       *user_data,    /* opaque, from .reduce_user           */
    void       *extra_arg,    /* grammar's %extra_argument, or NULL  */
    int         nrhs,         /* count of RHS symbols in this rule    */
    const void *rhs_values,   /* array of nrhs %token_type payloads   */
    const int  *rhs_locs,     /* array of nrhs byte offsets, or NULL  */
    void       *lhs_out       /* writeback slot for the LHS value     */
);
 
struct {
    /* ... lhs / rhs / nrhs / precedence fields ... */
    LimeReduceFn  reduce;      /* runtime-dispatched action, or NULL */
    void         *reduce_user; /* opaque pointer passed to reduce()   */
    const char   *code;        /* generator-time C code, or NULL      */
} add_rule;

Precedence of the two action-source fields:

`reduce`	`code`	Behaviour
non-NULL	any	Parser invokes `reduce(reduce_user, ...)` at reduce time.
NULL	non-NULL	`code` is compiled into the parser's generated `reduce()` switch at generator time. Applicable to grammars fed through `lime`; not usable from extensions loaded into a pre-compiled parser.
NULL	NULL	Rule reduces with no action.

Current implementation status: reduce-based dispatch is not yet wired through to the push-parser stack (blocks on the runtime rebuild work). The types are stable; extension code written against the contract today will not need changes when dispatch lights up.

Conflict Resolution

When two extensions modify the same grammar element, the on_conflict callback is invoked:

typedef enum ConflictResolution {
    CONFLICT_UNRESOLVED,    /* No resolution provided */
    CONFLICT_KEEP_EXISTING, /* Keep the existing item */
    CONFLICT_USE_NEW,       /* Replace with new item */
    CONFLICT_MERGE,         /* Extension provides merged result */
} ConflictResolution;

Modification Serializer

Header: src/mod_serialize.h

char *lime_modifications_to_grammar_text(
    const GrammarModification *mods,
    uint32_t                   nmods,
    uint32_t                  *skipped_out,  /* may be NULL */
    char                     **error         /* may be NULL */
);

Render an array of GrammarModifications as .lime-syntax text that, when concatenated after a base grammar and re-parsed by the lime generator, produces a parser equivalent to applying the modifications. This is the intended mechanism for the "subprocess fallback" pattern that unblocks runtime extension validation while real in-process apply_add_rule() (Task #3) is pending.

Returns: malloc'd NUL-terminated buffer; NULL on allocation failure or bad arguments. Caller owns the buffer.

Round-trip fidelity – not every modification serializes cleanly:

Case	Behaviour
`MOD_ADD_RULE` with `.reduce != NULL` and `.code == NULL`	Skipped; counted in `*skipped_out`. A function pointer has no text form.
`MOD_REMOVE_RULE`	Always skipped; concat cannot express removal. Filter the base grammar text if removals must take effect.
`MOD_MODIFY_PRECEDENCE` with `new_assoc == 0`	Emitted as a comment (no single `.lime` directive expresses "no associativity").
Integer `.precedence` on `MOD_ADD_RULE`	Emitted as a `/* NOTE */` comment; `.lime` uses `[SYMBOL]` markers, not numbers.

Typical subprocess-fallback usage:

uint32_t skipped = 0;
char *err = NULL;
char *fragment = lime_modifications_to_grammar_text(
    mods, nmods, &skipped, &err);
if (fragment == NULL) {
    fprintf(stderr, "serialization failed: %s\n", err);
    free(err);
    return -1;
}
 
/* concat base grammar text + fragment, write to tempfile,
** spawn `lime`, compile the output, dlopen the result */
FILE *tmp = fopen(tmpfile, "w");
fputs(base_grammar_text, tmp);
fputs(fragment, tmp);
fclose(tmp);
free(fragment);

JIT Compilation API

Header: include/jit_context.h

Optional LLVM-based JIT compilation of parser action tables. When LLVM is available, the JIT compiles specialized lookup functions for each parser state, replacing table-driven lookups with direct branch sequences.

When compiled without LLVM (LIME_NO_JIT), all JIT functions degrade to no-ops.

Types

typedef struct JITContext JITContext;  /* Opaque handle */
 
typedef uint16_t (*JITShiftActionFn)(uint16_t iLookAhead);
 
typedef enum JITStatus {
    JIT_OK = 0,
    JIT_ERR_NO_LLVM,
    JIT_ERR_INIT_FAILED,
    JIT_ERR_CODEGEN_FAILED,
    JIT_ERR_COMPILE_FAILED,
    JIT_ERR_LOOKUP_FAILED,
    JIT_ERR_INVALID_ARG,
    JIT_ERR_ALREADY_COMPILED,
} JITStatus;
 
typedef struct JITStats {
    uint32_t states_compiled;
    uint32_t states_total;
    uint64_t compile_time_ns;
    uint64_t code_size_bytes;
    bool     available;
} JITStats;

Functions

High-Level (parser.h)

bool lime_jit_available(void);

int lime_jit_compile(ParserSnapshot *snap);

lime_jit_available

bool lime_jit_available(void)

Check whether JIT compilation support is available at runtime.

lime_jit_compile

int lime_jit_compile(ParserSnapshot *snap)

Compile and attach JIT code to a snapshot's action tables.

lime_jit_available() returns true if LLVM was linked and initialization succeeds.

lime_jit_compile() compiles and attaches JIT code to a snapshot. Returns 0 on success, non-zero on failure. No-op if already compiled or LLVM is unavailable.

Low-Level (jit_context.h)

JITStatus jit_create(JITContext **ctx_out);
void      jit_destroy(JITContext *ctx);
 
JITStatus jit_compile_snapshot(JITContext *ctx, const ParserSnapshot *snap);
 
JITShiftActionFn jit_get_shift_action(const JITContext *ctx, uint32_t state_id);
 
JITStats    jit_get_stats(const JITContext *ctx);
const char *jit_status_string(JITStatus status);
bool        jit_is_available(void);

Snapshot Integration

JITStatus jit_attach_to_snapshot(ParserSnapshot *snap);
void      jit_detach_from_snapshot(ParserSnapshot *snap);
 
uint16_t jit_find_shift_action(const ParserSnapshot *snap,
                                uint16_t stateno,
                                uint16_t iLookAhead);

jit_find_shift_action() is the primary runtime dispatch function. If the snapshot has JIT code for the given state, it uses the compiled path; otherwise it falls back to the table-driven lookup.

JIT Policy API

Header: include/jit_policy.h

Adaptive JIT compilation policy that decides when to compile based on runtime metrics. Tracks per-snapshot usage and triggers compilation when the expected benefit exceeds the cost.

Types

typedef struct JITMetrics {
    atomic_uint_fast64_t parse_count;
    atomic_uint_fast64_t total_parse_time_ns;
    atomic_uint_fast64_t action_lookup_count;
    atomic_int           is_jitted;
    atomic_int           jit_in_progress;
} JITMetrics;
 
typedef struct JITPolicyConfig {
    uint64_t min_parse_count;             /* Default: 50 */
    uint64_t min_total_parse_time_ns;     /* Default: 10,000,000 (10 ms) */
    uint64_t min_avg_lookups_per_parse;   /* Default: 100 */
    bool     background_compile;          /* Default: true */
} JITPolicyConfig;

Functions

JITPolicyConfig jit_policy_default_config(void);
 
void jit_metrics_init(JITMetrics *m);
 
void jit_metrics_record_parse(JITMetrics *m,
                              uint64_t parse_time_ns,
                              uint64_t action_lookups);
 
bool jit_should_compile(const JITMetrics *m, const JITPolicyConfig *config);
 
int jit_maybe_compile(ParserSnapshot *snap,
                      JITMetrics *m,
                      const JITPolicyConfig *config);
 
void jit_policy_shutdown(void);

jit_maybe_compile() returns 0 if compilation was triggered, 1 if metrics do not yet warrant compilation, or -1 on error. When background_compile is true, compilation happens on a detached thread.

Data Structures Reference

ParserSnapshot (src/snapshot.h)

Field	Type	Description
`version`	`uint64_t`	Monotonically increasing version number
`refcount`	`atomic_uint_fast32_t`	Reference count (starts at 1)
`symbols`	`struct symbol **`	Array of symbol structs
`nsymbol`	`uint32_t`	Total symbol count
`nterminal`	`uint32_t`	Terminal symbol count
`rules`	`struct rule *`	Linked list of production rules
`nrule`	`uint32_t`	Rule count
`states`	`struct state **`	Array of parser states
`nstate`	`uint32_t`	State count
`yy_action`	`uint16_t *`	Combined shift+reduce action array
`yy_lookahead`	`uint16_t *`	Parallel lookahead values
`yy_shift_ofst`	`int16_t *`	Per-state shift offset
`yy_reduce_ofst`	`int16_t *`	Per-state reduce offset
`yy_default`	`uint16_t *`	Default action per state
`create_time_ns`	`uint64_t`	Creation timestamp (nanoseconds)
`jit_ctx`	`void *`	Attached JIT context (or NULL)

Token Type Codes

Defined in include/tokenize.h. Keyword tokens use positive codes assigned via the TokenTable. Built-in token types use non-positive values:

Constant	Value	Description
`TK_EOF`	0	End of input
`TK_IDENTIFIER`	-1	Unrecognized identifier
`TK_INTEGER`	-2	Integer literal (decimal or hex)
`TK_FLOAT`	-3	Floating point literal
`TK_STRING`	-4	Single-quoted string literal
`TK_BLOB`	-5	Blob literal (‘X’...'`) \ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_LPAREN`\ilinebr </td> <td class="markdownTableBodyNone"> -6 \ilinebr </td> <td class="markdownTableBodyNone">`(`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_RPAREN`\ilinebr </td> <td class="markdownTableBodyNone"> -7 \ilinebr </td> <td class="markdownTableBodyNone">`)`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_SEMICOLON`\ilinebr </td> <td class="markdownTableBodyNone"> -8 \ilinebr </td> <td class="markdownTableBodyNone">`;`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_COMMA`\ilinebr </td> <td class="markdownTableBodyNone"> -9 \ilinebr </td> <td class="markdownTableBodyNone">`,`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_DOT`\ilinebr </td> <td class="markdownTableBodyNone"> -10 \ilinebr </td> <td class="markdownTableBodyNone">`.`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_STAR`\ilinebr </td> <td class="markdownTableBodyNone"> -11 \ilinebr </td> <td class="markdownTableBodyNone">`*`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_PLUS`\ilinebr </td> <td class="markdownTableBodyNone"> -12 \ilinebr </td> <td class="markdownTableBodyNone">`+`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_MINUS`\ilinebr </td> <td class="markdownTableBodyNone"> -13 \ilinebr </td> <td class="markdownTableBodyNone">`-`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_SLASH`\ilinebr </td> <td class="markdownTableBodyNone"> -14 \ilinebr </td> <td class="markdownTableBodyNone">`/`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_PERCENT`\ilinebr </td> <td class="markdownTableBodyNone"> -15 \ilinebr </td> <td class="markdownTableBodyNone">`%`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_EQ`\ilinebr </td> <td class="markdownTableBodyNone"> -16 \ilinebr </td> <td class="markdownTableBodyNone">`=`or`==`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_NE`\ilinebr </td> <td class="markdownTableBodyNone"> -17 \ilinebr </td> <td class="markdownTableBodyNone">`!=`or`<>`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_LT`\ilinebr </td> <td class="markdownTableBodyNone"> -18 \ilinebr </td> <td class="markdownTableBodyNone">`<`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_GT`\ilinebr </td> <td class="markdownTableBodyNone"> -19 \ilinebr </td> <td class="markdownTableBodyNone">`>`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_LE`\ilinebr </td> <td class="markdownTableBodyNone"> -20 \ilinebr </td> <td class="markdownTableBodyNone">`<=`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_GE`\ilinebr </td> <td class="markdownTableBodyNone"> -21 \ilinebr </td> <td class="markdownTableBodyNone">`>=`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_BITAND`\ilinebr </td> <td class="markdownTableBodyNone"> -22 \ilinebr </td> <td class="markdownTableBodyNone">`&`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_BITOR`\ilinebr </td> <td class="markdownTableBodyNone"> -23 \ilinebr </td> <td class="markdownTableBodyNone">`\|`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_BITNOT`\ilinebr </td> <td class="markdownTableBodyNone"> -24 \ilinebr </td> <td class="markdownTableBodyNone">`~`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_LSHIFT`\ilinebr </td> <td class="markdownTableBodyNone"> -25 \ilinebr </td> <td class="markdownTableBodyNone">`<<`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_RSHIFT`\ilinebr </td> <td class="markdownTableBodyNone"> -26 \ilinebr </td> <td class="markdownTableBodyNone">`>>`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_CONCAT`\ilinebr </td> <td class="markdownTableBodyNone"> -27 \ilinebr </td> <td class="markdownTableBodyNone">`\|\|`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_DQUOTE_ID`\ilinebr </td> <td class="markdownTableBodyNone"> -28 \ilinebr </td> <td class="markdownTableBodyNone">`"quoted identifier"`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_BACKTICK_ID`\ilinebr </td> <td class="markdownTableBodyNone"> -29 \ilinebr </td> <td class="markdownTableBodyNone">`backtick identifier`\ilinebr </td> </tr> <tr class="markdownTableRowOdd"> <td class="markdownTableBodyNone">`TK_BRACKET_ID`\ilinebr </td> <td class="markdownTableBodyNone"> -30 \ilinebr </td> <td class="markdownTableBodyNone">`[bracket identifier]`\ilinebr </td> </tr> <tr class="markdownTableRowEven"> <td class="markdownTableBodyNone">`TK_ILLEGAL`	-31	Unrecognized character

Error Handling Conventions

The library uses the following conventions for error reporting:

NULL return – Functions that create objects (tokenizer_create, parse_begin, create_token_table, lime_snapshot_create) return NULL on failure.
Boolean return – Functions that perform operations (add_token, register_extension, load_extension) return false on failure.
Error string – Functions with a char **error parameter set it to a malloc'd string on failure. The caller must free() this string.
Status codes – JIT functions return JITStatus enum values. Use jit_status_string() to convert to a human-readable message.
NULL-safe – Destroy/release functions (tokenizer_destroy, parse_end, lime_snapshot_release) accept NULL safely.

Allocator Contract

Lime's generated parsers accept a caller-supplied allocator via XxxAlloc(void *(*mallocProc)(size_t)) (where Xxx is the parser-name prefix set by name or -P). The matching XxxFree(void *, void (*freeProc)(void*)) uses the caller's free. This is strictly better than Bison's YYMALLOC/YYFREE macro hack: the allocator is passed as a first-class argument rather than baked in at compile time.

The contract the generator relies on:

Error semantics are caller-chosen. mallocProc may return NULL on failure, or it may never return (longjmp / throw). If mallocProc returns NULL, the parser enters a failure path and subsequent Parse() calls are no-ops. If it longjmps out, the parser's internal state is left in whatever condition the jump leaves it; the caller must not reuse that parser instance without calling XxxFree first.
Pairing is symmetric. freeProc is called exactly as many times as mallocProc succeeded – one call per successful allocation – and always on the pointers mallocProc returned.
No assumptions about alignment beyond max_align_t. Pointers returned by mallocProc must satisfy the alignment requirements of any C type up to max_align_t (the guarantee malloc(3) gives). Lime never allocates over-aligned objects.
Allocation sites are stack growth, token buffer growth, and the parser handle itself. Typical parsers allocate once at XxxAlloc time and then occasionally as the shift stack grows past stack_size. Callers embedding Lime in memory-constrained contexts can set stack_size to a static upper bound to avoid runtime growth.

This contract lets a Lime-driven parser hosted inside a language runtime (e.g. one with a memory-context-aware allocator and longjmp-based error handling) delegate allocation to that runtime without macro gymnastics.

Thread Safety

Component	Read	Write
`ParserSnapshot`	Thread-safe (immutable after creation)	N/A (immutable)
`snapshot_acquire/release`	Thread-safe (atomic refcount)	N/A
`ParseContext`	Single-thread only	Single-thread only
`Tokenizer`	Single-thread only	Single-thread only
`TokenTable` lookup	Lock-free (concurrent readers)	Write-locked
`TokenTable` add/remove	N/A	Acquires internal lock
`ExtensionRegistry`	Read-locked (concurrent)	Write-locked
`JITMetrics`	Atomic reads	Atomic updates

Key points:

Snapshots are safe to share across threads. Acquire a reference per thread.
Each ParseContext and Tokenizer is single-threaded. Create one per thread/task.
The TokenTable supports concurrent readers with lock-free lookups. Writes (adding/removing tokens) serialize internally.
JIT metrics use atomic operations for contention-free updates from multiple parser threads.

Usage Examples

Basic Tokenization

#include "tokenize.h"
#include "token_table.h"
 
/* Prepare a padded input buffer (32 extra bytes for SIMD) */
const char *sql = "SELECT id, name FROM users WHERE active = 1;";
size_t len = strlen(sql);
char *buf = calloc(1, len + 64);
memcpy(buf, sql, len);
 
/* Optional: set up keyword table */
TokenTable *keywords = create_token_table(64);
add_token(keywords, "SELECT", 100, 0);
add_token(keywords, "FROM",   101, 0);
add_token(keywords, "WHERE",  102, 0);
 
/* Tokenize */
Tokenizer *tok = tokenizer_create(keywords, buf, len);
Token t;
while (tokenizer_next(tok, &t)) {
    printf("line %u col %u: type=%d '%.*s'\n",
           t.line, t.column, t.type, (int)t.length, t.start);
}
 
tokenizer_destroy(tok);
destroy_token_table(keywords);
free(buf);

Snapshot Lifecycle

#include "parser.h"
 
char *error = NULL;
ParserSnapshot *snap = lime_snapshot_create("sql.y", &error);
if (!snap) {
    fprintf(stderr, "Error: %s\n", error);
    free(error);
    return 1;
}
 
/* Share across threads */
ParserSnapshot *ref = lime_snapshot_acquire(snap);
 
/* ... use ref in another thread ... */
 
lime_snapshot_release(ref);   /* Thread done */
lime_snapshot_release(snap);  /* Original reference */

Parse Session

#include "parser.h"
#include "parse_context.h"
 
ParseContext *ctx = parse_begin(snap);
 
/* Feed tokens from the tokenizer */
Token t;
while (tokenizer_next(tok, &t)) {
    if (t.type == TK_EOF) break;
    int rc = parse_token(ctx, t.type, (void *)&t, (int)t.offset);
    if (rc != 0) {
        fprintf(stderr, "Parse error at line %u col %u\n", t.line, t.column);
        break;
    }
}
parse_token(ctx, 0, NULL, LIME_LOC_UNKNOWN);  /* Signal end-of-input */
 
parse_end(ctx);

Extension Registration

#include "parser.h"
#include "extension.h"
 
/* Extension callback: provide modifications */
static bool my_get_mods(void *user_data,
                        const ParserSnapshot *base,
                        GrammarModification **mods_out,
                        uint32_t *nmods_out) {
    static GrammarModification mods[1];
    mods[0].type = MOD_ADD_TOKEN;
    mods[0].description = "Add JSONB arrow operator token";
    mods[0].u.add_token.name = "TK_JSONB_ARROW";
    mods[0].u.add_token.lexeme = "->";
    mods[0].u.add_token.token_code = -1;  /* auto-assign */
    *mods_out = mods;
    *nmods_out = 1;
    return true;
}
 
/* Register and load */
ExtensionRegistry *reg = create_extension_registry();
ExtensionInfo info = {
    .name = "jsonb_ops",
    .version = "1.0.0",
    .get_modifications = my_get_mods,
};
ExtensionID id;
register_extension(reg, &info, &id);
 
char *error = NULL;
load_extension(reg, id, snap, &error);

JIT Compilation with Policy

#include "parser.h"
#include "jit_policy.h"
 
/* Initialize metrics and policy */
JITMetrics metrics;
jit_metrics_init(&metrics);
JITPolicyConfig policy = jit_policy_default_config();
policy.min_parse_count = 100;
 
/* After each parse session, record metrics */
jit_metrics_record_parse(&metrics, elapsed_ns, lookup_count);
 
/* Check if JIT compilation should trigger */
int rc = jit_maybe_compile(snap, &metrics, &policy);
/* rc == 0: compilation triggered
** rc == 1: not yet warranted
** rc == -1: error */
 
/* At shutdown */
jit_policy_shutdown();

Table of Contents

Library Version

Snapshot API

Types

Functions

lime_snapshot_create

lime_snapshot_acquire

lime_snapshot_release

Parse Context API

Types

Functions

parse_begin

parse_token

parse_end

parse_get_snapshot

LIME_LOC_UNKNOWN

Snapshot-Indirected Table Access

Tokenizer API

Types

Functions

tokenizer_create

tokenizer_destroy

tokenizer_next

tokenizer_peek

tokenizer_position

tokenizer_line

tokenizer_column

SIMD Acceleration

Token Table API

Types

Functions

create_token_table

destroy_token_table

lookup_token

add_token

remove_tokens_by_extension

SIMD Character Classification API

Types

Functions

get_classify_func

classify_scalar

classify_simd_avx2 (x86_64 only)

classify_simd_neon (ARM only)

Extension API

High-Level API (parser.h)

Internal Registry API (src/extension.h)

Types

Extension Callbacks

Registry Functions

Grammar Modification Types

MOD_ADD_RULE reduce actions

Conflict Resolution

Modification Serializer

JIT Compilation API

Types

Functions

High-Level (parser.h)

Low-Level (jit_context.h)

Snapshot Integration

JIT Policy API

Types

Functions

Data Structures Reference

ParserSnapshot (src/snapshot.h)

Token Type Codes

Error Handling Conventions

Allocator Contract

Thread Safety

Usage Examples

Basic Tokenization

Snapshot Lifecycle

Parse Session

Extension Registration

JIT Compilation with Policy