Releases: hellobertrand/zxc
v0.3.0
GNR v2, ARM64 Optimizations & Hardened Safety
This release introduces the v2 generation of the internal GNR (General Number Range) decoder, bringing performance improvements through branchless logic and SIMD vectorization. It also includes a comprehensive security hardening pass, adding rigorous bounds checking and validation to all decoding paths.
Highlights
GNR v2 Decoder Engine
The core decoding loop has been rewritten to maximize instruction-level parallelism:
- Branchless Design: Implemented branchless wild copies and match checks to minimize pipeline flushes.
- SIMD Acceleration: Added native NEON (ARM64) and AVX2/SSSE3 (x86) implementations for overlapping copy routines.
- Hybrid Decoding Strategy: Decompression uses a two-phase approach: careful bounds checking for the first 64KB, then an optimized unchecked path once all possible 16-bit offsets are mathematically guaranteed to be valid. This removes a branch per sequence for 75% of each chunk.
Fuzzing-Driven Safety Hardening
Following extensive fuzzing, multiple layers of protection have been added to prevent malformed streams from causing crashes:
- Offset & Size Validation: Added rigorous checks for out-of-bounds reads during variable byte decoding and numeric referencing.
- Overflow Protection: Implemented detection for integer overflows in VByte reads and destination buffer writes.
- Infinite Loop Prevention: Added size limits to variable byte decoding sequences.
Encoder Optimizations
- Fibonacci Hashing: Switched to a faster Fibonacci hash function for better distribution and speed.
- Speculative Prefetching: Added memory prefetching for hash table entries to reduce cache miss latency.
- Branchless Match Finder: Refactored the encoder's match checker to use bitwise masks instead of conditional branches.
Special Thanks
Thanks to @tansy for rewriting the CLI client, including the addition of short options and standardizing compression level flags (-1 to -5).
- Add short options to help, version by @tansy in #15
- Use -1..-5 as compression levels by @tansy in #16
Performance
- Level -1: Add new compression level -1
- GNR v2: Replaced legacy decoder with v2; utilizes single-sequence processing and streamlined loops.
- SIMD: Added 32-byte copy routines and NEON shuffle optimizations for small offsets (2-15 bytes).
- Prefetching: Implemented speculative prefetching for hash chain entries.
- Hashing: Replaced masking with
ZXC_LZ_HASH_BITSderived shifts (Fibonacci variant). - Memory: Used
RESTRICTkeyword on critical hot paths to aid compiler optimization.
Safety & Integrity
- Validation: Added destination bounds checks before writing literals in generic number decoding.
- VByte: Added strict bounds checking and overflow detection to variable byte decoding.
- Stream: Validated stream sizes against sequence counts for early error detection.
- Sanitization: Fixed potential out-of-bounds reads in the fast path by falling back to the safe path when remaining data is small.
Internals & Refactoring
- Cleanup: Removed dead code for the original v1 GNR decoder.
- Formatting: Renamed variables for clarity and updated internal documentation.
- CI: Made fuzzer build scripts dynamic and updated benchmark workflows.
Full Changelog: v0.2.0...v0.3.0
v0.2.0
This release brings significant security hardening, performance optimizations, and a major structural refactor of the public API.
Special Thanks
A huge shoutout to @tzcnt for their first public contribution! He spearheaded the restructuring of the public headers to provide a cleaner "sans-IO" API (#9). This makes integrating zxc into projects that manage their own I/O significantly easier. Thank you for your contribution!
Security Hardening
This release includes comprehensive security improvements ensuring robustness against malformed or malicious inputs:
- Decompression Bounds Checking: Implemented strict bounds checking in the decompression fast paths to prevent input buffer over-reads and invalid offset access.
- VByte Hardening: Hardened variable-byte integer reading logic to prevent buffer overruns and potential infinite loops with malformed data.
- Memory Safety: Fixed a MemorySanitizer (MSan) warning by explicitly zero-initializing memory blocks in the stream engine, ensuring no uninitialized values leak into the output.
Performance Improvements
- Reduced Thread Contention: Optimized the stream engine to reduce lock contention, improving scalability on high-core-count systems.
- Short-Circuit Optimization: Optimized decompression safety checks to short-circuit expensive offset validation for valid large blocks (>64KB), recovering performance while maintaining safety.
- Memory Usage: Reduced memory footprint of the chain table.
- Buffer Management: Refactored buffer allocation strategies for better I/O performance.
API & Refactoring
- Sans-IO API: Public headers have been restructured to separate core compression logic from file I/O utilities.
- Bug Fixes: Various fixes for edge cases in raw block handling and fuzzing tests.
Full Changelog
- Restructure public headers to provide a "sans-IO" API (#9) (tzcnt)
- Initializes memory block after allocation (Fix MSan uninitialized bytes)
- Adds comprehensive checks to prevent buffer overflows in decompression
- Optimize hot path logic for decompression
- Raises capacity checks to avoid buffer overflows
- Fixes fuzzers names and updates fuzzing schedule
- Reduces memory usage of chain table
- Reduces thread contention in stream engine
- Updates atomic type definitions and I/O error handling
- Format code and cleanup unused docs
v0.1.2
Release Notes
Bug Fixes & Reliability
Safety: Added input validation to prevent potential crashes. (6bdde9b)
Safety: Fixed a potential null pointer dereference issue. (df0b7f3)
Cleanup: Removed a redundant file pointer check to clean up the code. (64f7088)
Documentation
Docs: Added missing parameter documentation affecting header files. (70291b8)
Testing
Tests: Added additional unit tests to improve coverage. (628eb35)
v0.1.1
Release v0.1.1
v0.1.0
Initial Release - Asymmetric Compression