
From 7 layers and 5 processes to 4 layers and 2 processes. A purpose-built execution engine for quantitative research velocity.
Three generations of refinement: eliminating overhead at every layer.
7+ abstraction layers, 5 separate processes, IPC serialization overhead. Python-only execution with high architectural complexity.
Reduced process count, introduced C++ components. Transitional architecture proving the embedded execution concept.
4 layers, 2 processes, 1 protocol. C++ executor with pybind11-embedded Python. Zero-copy Apache Arrow to NumPy data flow.
Measured with RDTSC timing, CPU affinity pinning, and P50/P90/P99/P999 percentile tracking.
| Metric | Measured | Target |
|---|---|---|
| Per-bar latency | 2.15 ns | < 1 µs |
| GIL acquire time | 26.62 ns | < 10 µs |
| GIL hold time | 203.30 ns | < 100 µs |
| SIMD speedup (AVX2) | 2.48x | ≥ 2x |
| Lock-free throughput | 15.73 M ops/s | > 10 M ops/s |
| Overall vs Python | 500-1000x | > 100x |
From market data to strategy execution with zero memory copies.
Market data stored in Apache Parquet format with columnar compression. Direct memory-mapped file access eliminates deserialization overhead.
Apache Arrow provides a language-agnostic columnar memory layout. C++ and Python share the same memory without any copy or conversion.
pybind11 exposes Arrow buffers as NumPy arrays in the same memory space. Strategy Python code reads C++-owned data with zero allocation.
From architecture to nanoseconds: a systematic approach to performance.
V3 single-process executor eliminated IPC serialization, process spawning, and cross-process data marshaling. The biggest gain came from architectural simplification.
RDTSC timing with CPU affinity pinning. P50/P90/P99/P999 percentile tracking across 12 benchmark programs. Established reproducible measurement infrastructure.
LTO, -march=native, xsimd SIMD abstraction, mimalloc allocator, Quill low-latency logging, Abseil containers. Compiler and library-level optimizations.
PMR infrastructure, cache-aligned allocators, zero hot-path allocations. Targeting L1 cache miss < 5% and TLB miss < 0.5%.
Clone, build, and run your first backtest in minutes. The engine is included in the free tier.
git clone https://github.com/StratCraft/StratCraft.git
cd StratCraft
pnpm install && pnpm dev:desktop