← NexusFIX/Benchmarks
Benchmarks · v1.8 · May 2026

Performance
Benchmarks.

Measured results, not marketing claims. All benchmarks run on Linux with GCC 13.3, 100,000 iterations, CPU pinned.

ExecutionReport Parse
246 ns
QuickFIX 730 ns3.0x faster
Throughput
4.17M msg/sec
QuickFIX 1.19M msg/sec3.5x higher
P99 Latency
258 ns
QuickFIX 784 ns3.0x lower
Heap Allocations / msg
0
QuickFIX ~12 (std::string, std::map nodes)arena reuse

NexusFIX vs QuickFIX.

Head-to-head comparison across core FIX operations.

Metric
QuickFIX
NexusFIX
Improvement
ExecutionReport Parse
730 ns
246 ns
3.0x faster
NewOrderSingle Parse
661 ns
229 ns
2.9x faster
Field Access (4 fields)
31 ns
11 ns
2.9x faster
Throughput
1.19M msg/sec
4.17M msg/sec
3.5x higher
P99 Latency
784 ns
258 ns
3.0x lower

Optimization Journey.

How we went from 730ns to 246ns in four compounding phases.

200 ns400 ns600 ns730 nsbaselineBASELINE520 ns−210PHASE 1380 ns−140PHASE 2290 ns−90PHASE 3246 ns−44PHASE 4TOTAL · -484NS · 3.0× FASTER
1 Phase 1: Zero-Copy Parsing
730ns → 520ns

Replace std::string copies with std::span<const char> views into the original buffer. A std::span is 16 bytes on the stack. No heap, no copy, no destructor.

2 Phase 2: O(1) Field Lookup
520ns → 380ns

Replace std::map<int, std::string> with a pre-indexed array. Field access becomes a single mov instruction indexed by FIX tag number.

3 Phase 3: SIMD Delimiter Scanning
380ns → 290ns

AVX2 vectorized SOH delimiter scanning processes 32 bytes per cycle. ~13x faster than byte-by-byte scanning.

4 Phase 4: Compile-Time Offsets
290ns → 246ns

consteval field offset tables and 22 compile-time lookup tables eliminate ~300 runtime branches for enum/type conversion.

Zero Allocation Proof.

Processing a NewOrderSingle message on the hot path.

QuickFIX /order-flow hot path
~12 heap allocs
Heap Allocations
~12 (std::string, std::map nodes)
Field Storage
std::map<int, std::string> copies
Parsing Logic
Runtime map insertion
Memory Footprint
Dynamic, unpredictable
Destructor Overhead
~12 std::string destructors
HEAP · per message12 allocs
0x7f3a..0000scattered
NexusFIX /order-flow hot path
0 allocs · arena reuse
Heap Allocations
0
Field Storage
std::span views into original buffer
Parsing Logic
Compile-time offset table
Memory Footprint
Static, pre-allocated PMR pool
Destructor Overhead
0 (no owned memory)
ARENA · per message0 allocs
0x0001..a000contiguous · reused

Technique Comparison.

Design decisions that compound to 3x performance.

TechniqueQuickFIXNexusFIX
MemoryHeap allocation per messageZero-copy std::span views
Field LookupO(log n) std::mapO(1) direct array indexing
ParsingByte-by-byte scanningAVX2 SIMD vectorized
Field OffsetsRuntime calculationconsteval compile-time
Enum ConversionRuntime switch (~300 branches)22 compile-time lookup tables
Error HandlingExceptionsstd::expected (no throw)

Architecture Influences.

11 industry-leading libraries studied. What we learned, what we built, what we measured.

hffix
What We Learned

O(n) iterator lookup is suboptimal for dense FIX packets

What We Built

consteval field offsets + O(1) direct indexing

14ns field access
Result
Abseil
What We Learned

Swiss Tables with SIMD probing and H2 fingerprints

What We Built

absl::flat_hash_map for session store

31% faster lookups
Result
Quill
What We Learned

Lock-free SPSC queue with deferred formatting

What We Built

Quill as logging backend

8ns median log latency
Result
NanoLog
What We Learned

Binary encoding + background thread for 7ns logging

What We Built

DeferredProcessor<T> with static binary serialization

84% reduction (75→12ns)
Result
liburing
What We Learned

DEFER_TASKRUN eliminates kernel task wakeups

What We Built

io_uring + registered buffers + multishot

7-27% faster I/O
Result
Highway
What We Learned

Portable SIMD abstraction across instruction sets

What We Built

Retained hand-tuned intrinsics for FIX patterns

13x throughput
Result
Seastar
What We Learned

Share-nothing reactor for high-concurrency I/O

What We Built

Core-pinning + lock-free pipelining

8% P99 improvement
Result
Folly
What We Learned

Advanced memory fencing and lock-free primitives

What We Built

Native SPSC queue + bit-masking validation

Zero dependency
Result
Rigtorp
What We Learned

Cache-line padding eliminates false sharing

What We Built

Native SPSCQueue with identical techniques

88M ops/sec, 11ns
Result
xsimd
What We Learned

Generic SIMD wrappers for math operations

What We Built

Direct Intel intrinsics for SOH scanning

2x faster than wrappers
Result
Boost.PMR
What We Learned

Monotonic buffer enables arena allocation per message

What We Built

std::pmr::monotonic_buffer_resource

Zero heap allocation
Result

Ready to Try NexusFIX?

Three commands to build and run benchmarks yourself.

~/dev. Bench
$ git clone https://github.com/StratCraftsAI/NexusFix.git
$ cd NexusFix
$ ./start.sh build # 2m18s · release
$ ./start.sh bench
  running 100,000 iterations · cpu pinned · warm cache
  ExecutionReport parse   246 ns   p99 258 ns
  NewOrderSingle parse    229 ns   p99 241 ns
  field_access            11 ns
  throughput              4.17 M msg/s
  ✓ csv written to ./out/bench-2026-05-17.csv