Measured results, not marketing claims. All benchmarks run on Linux with GCC 13.3, 100,000 iterations, CPU pinned.
Head-to-head comparison across core FIX operations.
How we went from 730ns to 246ns in four compounding phases.
Replace std::string copies with std::span<const char> views into the original buffer. A std::span is 16 bytes on the stack. No heap, no copy, no destructor.
Replace std::map<int, std::string> with a pre-indexed array. Field access becomes a single mov instruction indexed by FIX tag number.
AVX2 vectorized SOH delimiter scanning processes 32 bytes per cycle. ~13x faster than byte-by-byte scanning.
consteval field offset tables and 22 compile-time lookup tables eliminate ~300 runtime branches for enum/type conversion.
Processing a NewOrderSingle message on the hot path.
Design decisions that compound to 3x performance.
| Technique | QuickFIX | NexusFIX |
|---|---|---|
| Memory | Heap allocation per message | Zero-copy std::span views |
| Field Lookup | O(log n) std::map | O(1) direct array indexing |
| Parsing | Byte-by-byte scanning | AVX2 SIMD vectorized |
| Field Offsets | Runtime calculation | consteval compile-time |
| Enum Conversion | Runtime switch (~300 branches) | 22 compile-time lookup tables |
| Error Handling | Exceptions | std::expected (no throw) |
11 industry-leading libraries studied. What we learned, what we built, what we measured.
O(n) iterator lookup is suboptimal for dense FIX packets
consteval field offsets + O(1) direct indexing
Swiss Tables with SIMD probing and H2 fingerprints
absl::flat_hash_map for session store
Lock-free SPSC queue with deferred formatting
Quill as logging backend
Binary encoding + background thread for 7ns logging
DeferredProcessor<T> with static binary serialization
DEFER_TASKRUN eliminates kernel task wakeups
io_uring + registered buffers + multishot
Portable SIMD abstraction across instruction sets
Retained hand-tuned intrinsics for FIX patterns
Share-nothing reactor for high-concurrency I/O
Core-pinning + lock-free pipelining
Advanced memory fencing and lock-free primitives
Native SPSC queue + bit-masking validation
Cache-line padding eliminates false sharing
Native SPSCQueue with identical techniques
Generic SIMD wrappers for math operations
Direct Intel intrinsics for SOH scanning
Monotonic buffer enables arena allocation per message
std::pmr::monotonic_buffer_resource
Three commands to build and run benchmarks yourself.
$ git clone https://github.com/StratCraftsAI/NexusFix.git $ cd NexusFix $ ./start.sh build # 2m18s · release $ ./start.sh bench running 100,000 iterations · cpu pinned · warm cache ExecutionReport parse 246 ns p99 258 ns NewOrderSingle parse 229 ns p99 241 ns field_access 11 ns throughput 4.17 M msg/s ✓ csv written to ./out/bench-2026-05-17.csv