Text And Data

std.regex

Compile-once regular expression matching for a documented ECMA-262-leaning subset.

When To Use std.regex

In Zerolang, use std.regex to match text against a documented ECMA-262-leaning regular expression subset, such as JSON Schema pattern checks.

Supported syntax: literals, ., character classes with negation, ranges, and \d \D \w \W \s \S, anchors ^ $ and word boundaries \b \B, greedy quantifiers * + ? {m} {m,} {m,n}, alternation |, and capturing or (?:...) non-capturing groups (matching only; no capture extraction). Matching is by Unicode codepoint over UTF-8 text and searches anywhere in the text unless the pattern is anchored, like ECMAScript RegExp.prototype.test. When multiple matches start at the same byte, span-returning helpers use the longest end position, so a|ab finds ab in ab.

Unsupported constructs never misparse silently. Compilation fails with a structured status code: 1 backreference, 2 lookahead, 3 lookbehind, 4 named group, 5 lazy quantifier, 6 group modifier or inline flags, 7 unicode property escape, 8 invalid syntax, 9 invalid quantifier range, 10 program over the buffer or 2048-byte limit, 11 pattern is not valid UTF-8, 12 group nesting over depth 32.

Runnable today:

APIReturnNotes
std.regex.compile(buffer, pattern)Maybe<Span<u8>>Compiles a pattern into a caller-owned buffer; returns the compiled program span or null on any compile failure.
std.regex.compileStatus(buffer, pattern)u32Compiles and returns 0 or the structured status code for diagnostics.
std.regex.compileErrorOffset(buffer, pattern)Maybe<usize>Returns the pattern byte offset for a compile failure, or null when the pattern compiles.
std.regex.statusName(status)StringNames a status code, such as unsupported backreference.
std.regex.isMatch(program, text)BoolTests text against a compiled program. Compile once, then match many times.
std.regex.matches(pattern, text)Maybe<Bool>One-shot compile and match with an internal 1024-byte program buffer; returns null when the pattern does not compile.
std.regex.contains(pattern, text)Maybe<Bool>Alias-shaped one-shot search helper; returns null when the pattern does not compile.
std.regex.findIndex(pattern, text)Maybe<usize>Returns the first matching byte index, the input length when absent, or null when the pattern does not compile.
std.regex.find(pattern, text)Maybe<Span<u8>>Borrows the first matching span, or returns null when absent or invalid.
std.regex.findCount(pattern, text)Maybe<usize>Counts non-overlapping matches, or returns null when the pattern does not compile.
std.regex.findNth(pattern, text, index)Maybe<Span<u8>>Borrows the zero-based non-overlapping match at index, or returns null when absent or invalid.
std.regex.findNthIndex(pattern, text, index)Maybe<usize>Returns the byte index of the zero-based non-overlapping match, the input length when absent, or null when invalid.
std.regex.replace(buffer, pattern, text, replacement)Maybe<Span<u8>>Replaces non-overlapping matches with literal replacement bytes into caller storage.
std.regex.splitCount(pattern, text)Maybe<usize>Counts fields separated by non-empty regex matches, or returns null when the pattern does not compile.
std.regex.split(pattern, text, index)Maybe<Span<u8>>Borrows the zero-based field separated by non-empty regex matches, or returns null when absent or invalid.

Example

pub fn main(world: World) -> Void raises {    var storage: [512]u8 = [0; 512]    let buffer: MutSpan<u8> = storage    let compiled: Maybe<Span<u8>> = std.regex.compile(buffer, "^[a-z]+-\\d{2,4}$")    if !compiled.has {        return    }    let program: Span<u8> = compiled.value    let quick: Maybe<Bool> = std.regex.matches("^(cat|dog)s?$", "dogs")    let first: Maybe<Span<u8>> = std.regex.find("\\d+", "build-2048")    let second: Maybe<Span<u8>> = std.regex.findNth("\\d+", "a1 b22 c333", 1)    var replaced_storage: [16]u8 = [0; 16]    let replaced: Maybe<Span<u8>> = std.regex.replace(replaced_storage, "\\d+", "a1 b22", "#")    let fields: Maybe<usize> = std.regex.splitCount("[,;]", "red,green;blue")    let middle: Maybe<Span<u8>> = std.regex.split("[,;]", "red,green;blue", 1)    if std.regex.isMatch(program, "build-2048") && !std.regex.isMatch(program, "build-1") && (quick.has && quick.value) && first.has && std.mem.eql(first.value, "2048") && second.has && std.mem.eql(second.value, "22") && replaced.has && std.mem.eql(replaced.value, "a# b#") && fields.has && fields.value == 3 && middle.has && std.mem.eql(middle.value, "green") {        check world.out.write("regex ok\n")    }}

Diagnosing a rejected pattern:

pub fn main(world: World) -> Void raises {    var storage: [128]u8 = [0; 128]    let buffer: MutSpan<u8> = storage    let status: u32 = std.regex.compileStatus(buffer, "(?=lookahead)")    let offset: Maybe<usize> = std.regex.compileErrorOffset(buffer, "(?=lookahead)")    if status != 0 {        check world.out.write(std.regex.statusName(status))        check world.out.write("\n")    }}

Effects: writes to caller-provided mutable storage for compile, compileStatus, compileErrorOffset, and replace; other helpers only borrow input spans or return scalar results.

Allocation behavior: compile, compileStatus, and compileErrorOffset write the caller program buffer. replace writes the caller output buffer. One-shot search, split, and match helpers use fixed internal program storage and allocate nothing on the heap.

Error behavior: compile returns null, compileStatus returns a status code naming the construct, and compileErrorOffset returns the byte offset for a failed compile. One-shot helpers return null for invalid patterns; isMatch returns false for malformed program spans or invalid UTF-8 text.

find, findNth, replace, split, and their index/count variants use the leftmost start and longest end for each match. split and splitCount use non-empty regex matches as separators. Zero-length matches are ignored as separators so callers get deterministic field traversal without a cursor object.

Target support: current compiler targets.