LLM-Generated Trading Strategies: Backtesting Reality vs Hype

23 mars 2026·5 min de lecture

LLMstrategy generationbacktestingAI

The promise of AI-generated trading strategies has attracted enormous attention. But what happens when these strategies face rigorous backtesting? The results are more nuanced than either enthusiasts or skeptics suggest.

The Experiment

We generated 500 trading strategies using various LLM providers (Claude, GPT-4, Gemini) and backtested each against 10 years of historical data across multiple asset classes. Strategies ranged from simple moving average crossovers to complex multi-factor models.

Key Findings

About 15% of LLM-generated strategies showed statistically significant alpha after transaction costs — comparable to the hit rate of human-generated strategy ideas. However, LLMs excelled at rapid iteration: generating and testing 500 strategies took 2 hours vs weeks for manual development.

Where LLMs Add Value

The real benefit isn't replacing human traders but accelerating the ideation phase. LLMs are particularly good at combining known factors in novel ways, adapting strategies across asset classes, and generating parameter sweep ranges that cover non-obvious configurations.

Pitfalls

LLMs tend to overfit to well-known patterns from their training data. Strategies based on textbook examples (MACD crossover, RSI divergence) showed the worst out-of-sample performance. The most successful strategies came from prompts that specified unusual constraints or novel market microstructure assumptions.

LLM-Generated Trading Strategies: Backtesting Reality vs Hype

The Experiment

Key Findings

Where LLMs Add Value

Pitfalls

Construisez à l\'échelle institutionnelle: Gratuit