SIMD-accelerated string processing has moved from experimental to production in several high-frequency trading firms. Recent conference talks and open-source releases have shed light on what works and what doesn't when applying SIMD to protocol parsing.
AVX2 vs AVX-512
Contrary to expectations, AVX-512 does not always outperform AVX2 for FIX message parsing. The key factor is message size distribution. For typical FIX messages (200-500 bytes), AVX2's 256-bit registers provide sufficient parallelism without the frequency throttling penalties that some processors impose on AVX-512 workloads.
Delimiter Scanning
The biggest SIMD win comes from delimiter scanning — finding SOH (0x01) characters in FIX messages. A VPBROADCASTB + VPCMPEQB + VPMOVMSKB sequence processes 32 bytes per cycle on AVX2, replacing byte-by-byte scanning that dominated profiles in traditional implementations.
Field Lookup
For field lookup (finding tag=value pairs), the bottleneck shifts to branch prediction rather than raw scanning speed. Perfect hash functions for common tag numbers provide more consistent gains than wider SIMD registers.

