ProFiT: Program Search for Financial Trading - LLM-Driven Evolutionary Strategy Discovery

Executive Summary

ProFiT (Program Search for Financial Trading) introduces an LLM-driven evolutionary framework that autonomously discovers and improves algorithmic trading strategies. Unlike traditional approaches that tune parameters within fixed architectures, ProFiT evolves the actual Python source code of trading strategies, enabling structural adaptation to changing market conditions.

The framework achieves remarkable results across seven liquid futures assets: evolved strategies outperform Buy-and-Hold in 77%+ of all combinations, beat random baselines in 100% of cases, and improve over seed strategies in 94%+ of runs. Mean improvement in annualized return reaches +44.21% with a Sharpe ratio improvement of +0.57.

ProFiT bridges the gap between evolutionary algorithms and large language models by using LLMs for semantically-informed mutations guided by backtest performance analysis, rather than random perturbations. This represents a paradigm shift from "parameter learning" to "strategy evolution" in quantitative trading.

ELI5: Self-Improving Trading Recipes

Imagine you have a cookbook of trading recipes. Traditional approaches are like tweaking ingredient amounts (parameters) in the same recipe. ProFiT is different—it's like having a chef (the LLM) who reads your recipe, tastes the result, and then rewrites entire cooking steps to make it better. The chef keeps a library of all recipes that worked (not just the best one), so it can try variations from different starting points. After many iterations, you end up with recipes that are fundamentally better, not just slightly adjusted versions of the original.

Part 1: The Problem with Automated Trading

Designing profitable trading strategies remains one of the most challenging problems in quantitative finance. Despite decades of progress in machine learning and reinforcement learning, most trading systems still rely on human intuition, manual feature engineering, and static optimization. These methods consistently fail to adapt to evolving market regimes, where initially sustainable patterns eventually decay.

Why Traditional Approaches Fail

Static architectures: Deep learning models optimize parameters within fixed structures, unable to modify underlying algorithmic logic
Regime shifts: Financial markets are inherently non-stationary—strategies that work in one period often degrade in another
Manual intervention: Discovering and improving trading algorithms typically requires constant human oversight
Overfitting: Models trained on historical data often fail under new market conditions

ProFiT addresses these challenges by evolving executable trading code rather than just adjusting model weights. The framework draws inspiration from the Darwin Gödel Machine, adapting self-referential improvement concepts for quantitative trading where empirical market feedback serves as the evolutionary fitness function.

Part 2: The ProFiT Framework

ProFiT operates as a meta-learning framework that explores the space of algorithmic trading strategies through an open-ended evolutionary loop guided by LLMs. Each trading strategy is represented as an independent Python program that undergoes mutation, evaluation, and selection.

The Evolutionary Loop

Initialization: Start with a seed strategy from a fixed library (Bollinger, CCI, EMA Crossover, MACD, or Williams R)
Selection: Choose a parent strategy from the population using methods like roulette wheel, UCB-1, or uniform sampling
Analysis: LLM analyzes strategy code and backtest results, identifying weaknesses and proposing 2-3 concrete improvements
Mutation: LLM rewrites the Python code incorporating suggested enhancements
Evaluation: Backtest mutated strategy on historical market data using walk-forward validation
Archiving: If fitness ≥ Minimum Acceptable Score (MAS), add to population; otherwise discard

Evolutionary Tree Visualization

Each node represents a trading strategy variant with color encoding fitness (annualized return %).
Edges indicate lineage from parent to child. Only strategies exceeding the
Minimum Acceptable Score (MAS) are retained in the population.

Best fitness: +6.93% | Worst retained: -23.41%

Figure 1: Population tree resulting from one sample ProFiT run, showing evolutionary lineage of strategy variants.

LLM-Guided Mutation

Unlike traditional genetic programming that uses random mutations, ProFiT employs semantically-informed mutation guided by LLM reasoning. The LLM receives both the strategy code and performance metrics, enabling it to make targeted improvements based on actual backtest results.

Two-Stage LLM Process

Stage 1 - Analysis: The LLM identifies weaknesses, inefficiencies, and sources of poor performance. It proposes no more than 2-3 concrete, high-impact improvements that could realistically improve out-of-sample returns.

Stage 2 - Rewrite: Using the original code and improvement proposals, the LLM generates new Python source code that incorporates the requested enhancements while maintaining compatibility with the backtesting framework.

If the generated code fails to compile, a repair loop provides the error traceback back to the LLM for up to 10 attempts.

Walk-Forward Validation

To ensure robustness and avoid overfitting, ProFiT uses cross-validation with five temporal folds. Each fold consists of:

The dataset covers January 2008 to October 2025 with 1-hour timeframe data containing open, high, low, close prices and trading volume.

Part 3: Results Across Markets

ProFiT was evaluated on 35 experiments (7 assets × 5 seed strategies) with fitness growth typically tapering after approximately 15 iterations, suggesting convergence to local optima.

Period	Duration	Purpose
Training	2.5 years	Strategy optimization
Validation	6 months	Fitness evaluation for evolution
Test	6 months	Out-of-sample performance
Dormant Window	10 days	Buffer between validation/test to prevent lookahead

Aggregate Performance vs. Baselines

Metric	vs. Random	vs. Seed (B0)	vs. Buy-and-Hold
Annualized Return	+86.47% ± 1.21	+44.21% ± 4.31	+3.41% ± 1.27
Sharpe Ratio	+0.62 ± 0.04	+0.57 ± 0.04	+2.69 ± 0.97
Expectancy	+0.94 ± 0.14	+0.75 ± 0.13	+2.59 ± 1.03
Win Rate	100%	94.3%	77.1%
Statistical Significance	83.3%	94.3%	54.3%

Win Rate = fraction of (strategy, asset) pairs with positive improvement. Statistical significance at p < 0.05 via one-sided Wilcoxon signed-rank test.

Cross-Asset Performance

Strategy-Specific Highlights

Williams R Strategy: Strongest gains with +63% to +90% return improvements across NG, SB, and VX
CCI Strategy: Highest Sharpe improvement of +1.07 on E6
MACD Strategy: Best expectancy improvement of +3.73% on SB
Bollinger Mean Reversion: Consistent improvements of +0.72 to +0.91 Sharpe across most assets

Strategy Evolution Example

A typical evolution from seed MACD strategy to best-performing variant shows transformation from a simple 30-line crossover strategy to a sophisticated 100+ line system incorporating:

Part 4: Why It Works

Component	Seed Strategy	Evolved Strategy
Core Logic	Simple MACD/Signal crossover	Histogram smoothing + confirmation
Regime Filter	None	200-period EMA trend filter
Volatility Filter	None	ATR% gating (0.4% - 20%)
Risk Management	None	ATR-based stop-loss and take-profit
Position Sizing	Fixed 100%	Risk-fraction based (1% risk per trade)
Performance	-54.04% annual return	+0.77% annual return

The Power of Program Evolution

ProFiT's success stems from three key innovations:

Semantic Mutation: LLMs understand code semantics, enabling targeted improvements rather than random changes
Empirical Grounding: Every mutation is validated against real market data through backtesting
Archive Diversity: Maintaining all viable strategies (not just the best) prevents premature convergence and enables exploration from diverse starting points

Comparison with Traditional Approaches

Part 5: Limitations and Future Work

While ProFiT demonstrates significant improvements, several limitations warrant consideration:

Future directions include prompt-level meta-optimization, real-time market adaptation, multi-asset strategy evolution, and more sophisticated diversity maintenance mechanisms.

Approach	Search Space	Adaptation	Limitation
Deep RL	Fixed architecture weights	Parameter tuning	Cannot modify logic
Traditional GP	Syntactic tree structures	Random mutations	No semantic understanding
Static Codex	One-shot generation	None	No iterative improvement
ProFiT	Python programs	LLM-guided evolution	Compute intensive

Conclusion

ProFiT establishes a practical pathway toward open-ended, self-improving algorithmic trading systems. By coupling LLM-driven code evolution with empirical validation, the framework achieves:

77%+ win rate against Buy-and-Hold across diverse assets
100% win rate against random baselines
+44.21% mean improvement in annualized returns over seed strategies
71.4% of improvements achieving statistical significance (p < 0.05)

These results demonstrate that autonomous systems can iteratively refine real-world trading policies without manual tuning, suggesting that LLMs coupled with grounded feedback can serve as active facilitators in algorithmic discovery beyond just code generation.

Primary Source

ProFiT: Program Search for Financial Trading
Matthew Siper, Ahmed Khalifa, Lisa Soros, Muhammad Umair Nasir, Jay Azhang, Julian Togelius (Nof1), 2025

Related: Darwin Gödel Machine
The theoretical foundation inspiring ProFiT's self-improving approach

ProFiT: Program Search for Financial TradingLLM-Driven Evolutionary Discovery of Algorithmic Trading Strategies