StratCraft

C++23 Backtest Engine

From 7 layers and 5 processes to 4 layers and 2 processes. A purpose-built execution engine for quantitative research velocity.

Architecture Evolution

Three generations of refinement: eliminating overhead at every layer.

V1

V1: Multi-Process

7+ abstraction layers, 5 separate processes, IPC serialization overhead. Python-only execution with high architectural complexity.

7 layers, 5 processes
V2

V2: Consolidation

Reduced process count, introduced C++ components. Transitional architecture proving the embedded execution concept.

5 layers, 3 processes
V3

V3: Embedded Engine

4 layers, 2 processes, 1 protocol. C++ executor with pybind11-embedded Python. Zero-copy Apache Arrow to NumPy data flow.

4 layers, 2 processes

Performance Benchmarks

Measured with RDTSC timing, CPU affinity pinning, and P50/P90/P99/P999 percentile tracking.

MetricMeasuredTarget
Per-bar latency2.15 ns< 1 µs
GIL acquire time26.62 ns< 10 µs
GIL hold time203.30 ns< 100 µs
SIMD speedup (AVX2)2.48x≥ 2x
Lock-free throughput15.73 M ops/s> 10 M ops/s
Overall vs Python500-1000x> 100x

Zero-Copy Data Pipeline

From market data to strategy execution with zero memory copies.

1

Parquet Ingestion

Market data stored in Apache Parquet format with columnar compression. Direct memory-mapped file access eliminates deserialization overhead.

2

Arrow In-Memory Format

Apache Arrow provides a language-agnostic columnar memory layout. C++ and Python share the same memory without any copy or conversion.

3

NumPy Zero-Copy Access

pybind11 exposes Arrow buffers as NumPy arrays in the same memory space. Strategy Python code reads C++-owned data with zero allocation.

Optimization Journey

From architecture to nanoseconds: a systematic approach to performance.

Phase 0: Architecture500-1000x

V3 single-process executor eliminated IPC serialization, process spawning, and cross-process data marshaling. The biggest gain came from architectural simplification.

Phase 1: Benchmark FrameworkBaseline

RDTSC timing with CPU affinity pinning. P50/P90/P99/P999 percentile tracking across 12 benchmark programs. Established reproducible measurement infrastructure.

Phase 2: Modern C++5-10%

LTO, -march=native, xsimd SIMD abstraction, mimalloc allocator, Quill low-latency logging, Abseil containers. Compiler and library-level optimizations.

Phase 3: Memory Sovereignty1.05x

PMR infrastructure, cache-aligned allocators, zero hot-path allocations. Targeting L1 cache miss < 5% and TLB miss < 0.5%.

Experience the Speed

Clone, build, and run your first backtest in minutes. The engine is included in the free tier.

git clone https://github.com/StratCraft/StratCraft.git cd StratCraft pnpm install && pnpm dev:desktop