C++23 Backtest Engine

From 7 layers and 5 processes to 4 layers and 2 processes. A purpose-built execution engine for quantitative research velocity.

Architecture Evolution

Three generations of refinement: eliminating overhead at every layer.

V1: Multi-Process

7+ abstraction layers, 5 separate processes, IPC serialization overhead. Python-only execution with high architectural complexity.

7 layers, 5 processes

V2: Consolidation

Reduced process count, introduced C++ components. Transitional architecture proving the embedded execution concept.

5 layers, 3 processes

V3: Embedded Engine

4 layers, 2 processes, 1 protocol. C++ executor with pybind11-embedded Python. Zero-copy Apache Arrow to NumPy data flow.

4 layers, 2 processes

Performance Benchmarks

Measured with RDTSC timing, CPU affinity pinning, and P50/P90/P99/P999 percentile tracking.

Metric	Measured	Target
Per-bar latency	2.15 ns	< 1 µs
GIL acquire time	26.62 ns	< 10 µs
GIL hold time	203.30 ns	< 100 µs
SIMD speedup (AVX2)	2.48x	≥ 2x
Lock-free throughput	15.73 M ops/s	> 10 M ops/s
Overall vs Python	500-1000x	> 100x

Zero-Copy Data Pipeline

From market data to strategy execution with zero memory copies.

Parquet Ingestion

Market data stored in Apache Parquet format with columnar compression. Direct memory-mapped file access eliminates deserialization overhead.

Arrow In-Memory Format

Apache Arrow provides a language-agnostic columnar memory layout. C++ and Python share the same memory without any copy or conversion.

NumPy Zero-Copy Access

pybind11 exposes Arrow buffers as NumPy arrays in the same memory space. Strategy Python code reads C++-owned data with zero allocation.

Optimization Journey

From architecture to nanoseconds: a systematic approach to performance.

Phase 0: Architecture500-1000x

V3 single-process executor eliminated IPC serialization, process spawning, and cross-process data marshaling. The biggest gain came from architectural simplification.

Phase 1: Benchmark FrameworkBaseline

RDTSC timing with CPU affinity pinning. P50/P90/P99/P999 percentile tracking across 12 benchmark programs. Established reproducible measurement infrastructure.

Phase 2: Modern C++5-10%

LTO, -march=native, xsimd SIMD abstraction, mimalloc allocator, Quill low-latency logging, Abseil containers. Compiler and library-level optimizations.

Phase 3: Memory Sovereignty1.05x

PMR infrastructure, cache-aligned allocators, zero hot-path allocations. Targeting L1 cache miss < 5% and TLB miss < 0.5%.

Experience the Speed

Clone, build, and run your first backtest in minutes. The engine is included in the free tier.

git clone https://github.com/StratCraftsAI/StratCraft.git
cd StratCraft
pnpm install && pnpm dev:desktop

Get Started ← Overview