ProFiT: Program Search for Financial Trading
LLM-Driven Evolutionary Discovery of Algorithmic Trading Strategies

Matthew Siper, Ahmed Khalifa, Lisa Soros, Muhammad Umair Nasir, Jay Azhang, Julian Togelius
Nof1, New York, NY, USA
2025

Executive Summary

ProFiT (Program Search for Financial Trading) introduces an LLM-driven evolutionary framework that autonomously discovers and improves algorithmic trading strategies. Unlike traditional approaches that tune parameters within fixed architectures, ProFiT evolves the actual Python source code of trading strategies, enabling structural adaptation to changing market conditions.

The framework achieves remarkable results across seven liquid futures assets: evolved strategies outperform Buy-and-Hold in 77%+ of all combinations, beat random baselines in 100% of cases, and improve over seed strategies in 94%+ of runs. Mean improvement in annualized return reaches +44.21% with a Sharpe ratio improvement of +0.57.

ProFiT bridges the gap between evolutionary algorithms and large language models by using LLMs for semantically-informed mutations guided by backtest performance analysis, rather than random perturbations. This represents a paradigm shift from "parameter learning" to "strategy evolution" in quantitative trading.

ELI5: Self-Improving Trading Recipes

Imagine you have a cookbook of trading recipes. Traditional approaches are like tweaking ingredient amounts (parameters) in the same recipe. ProFiT is different—it's like having a chef (the LLM) who reads your recipe, tastes the result, and then rewrites entire cooking steps to make it better. The chef keeps a library of all recipes that worked (not just the best one), so it can try variations from different starting points. After many iterations, you end up with recipes that are fundamentally better, not just slightly adjusted versions of the original.

Part 1: The Problem with Automated Trading

Designing profitable trading strategies remains one of the most challenging problems in quantitative finance. Despite decades of progress in machine learning and reinforcement learning, most trading systems still rely on human intuition, manual feature engineering, and static optimization. These methods consistently fail to adapt to evolving market regimes, where initially sustainable patterns eventually decay.

Why Traditional Approaches Fail

ProFiT addresses these challenges by evolving executable trading code rather than just adjusting model weights. The framework draws inspiration from the Darwin Gödel Machine, adapting self-referential improvement concepts for quantitative trading where empirical market feedback serves as the evolutionary fitness function.

Part 2: The ProFiT Framework

ProFiT operates as a meta-learning framework that explores the space of algorithmic trading strategies through an open-ended evolutionary loop guided by LLMs. Each trading strategy is represented as an independent Python program that undergoes mutation, evaluation, and selection.

The Evolutionary Loop

  1. Initialization: Start with a seed strategy from a fixed library (Bollinger, CCI, EMA Crossover, MACD, or Williams R)
  2. Selection: Choose a parent strategy from the population using methods like roulette wheel, UCB-1, or uniform sampling
  3. Analysis: LLM analyzes strategy code and backtest results, identifying weaknesses and proposing 2-3 concrete improvements
  4. Mutation: LLM rewrites the Python code incorporating suggested enhancements
  5. Evaluation: Backtest mutated strategy on historical market data using walk-forward validation
  6. Archiving: If fitness ≥ Minimum Acceptable Score (MAS), add to population; otherwise discard

Evolutionary Tree Visualization

Each node represents a trading strategy variant with color encoding fitness (annualized return %).
Edges indicate lineage from parent to child. Only strategies exceeding the
Minimum Acceptable Score (MAS) are retained in the population.

Best fitness: +6.93% | Worst retained: -23.41%

Figure 1: Population tree resulting from one sample ProFiT run, showing evolutionary lineage of strategy variants.

LLM-Guided Mutation

Unlike traditional genetic programming that uses random mutations, ProFiT employs semantically-informed mutation guided by LLM reasoning. The LLM receives both the strategy code and performance metrics, enabling it to make targeted improvements based on actual backtest results.

Two-Stage LLM Process

Stage 1 - Analysis: The LLM identifies weaknesses, inefficiencies, and sources of poor performance. It proposes no more than 2-3 concrete, high-impact improvements that could realistically improve out-of-sample returns.

Stage 2 - Rewrite: Using the original code and improvement proposals, the LLM generates new Python source code that incorporates the requested enhancements while maintaining compatibility with the backtesting framework.

If the generated code fails to compile, a repair loop provides the error traceback back to the LLM for up to 10 attempts.

Walk-Forward Validation

To ensure robustness and avoid overfitting, ProFiT uses cross-validation with five temporal folds. Each fold consists of:

Period Duration Purpose
Training 2.5 years Strategy optimization
Validation 6 months Fitness evaluation for evolution
Test 6 months Out-of-sample performance
Dormant Window 10 days Buffer between validation/test to prevent lookahead

The dataset covers January 2008 to October 2025 with 1-hour timeframe data containing open, high, low, close prices and trading volume.

Part 3: Results Across Markets

ProFiT was evaluated on 35 experiments (7 assets × 5 seed strategies) with fitness growth typically tapering after approximately 15 iterations, suggesting convergence to local optima.

Aggregate Performance vs. Baselines

Metric vs. Random vs. Seed (B0) vs. Buy-and-Hold
Annualized Return +86.47% ± 1.21 +44.21% ± 4.31 +3.41% ± 1.27
Sharpe Ratio +0.62 ± 0.04 +0.57 ± 0.04 +2.69 ± 0.97
Expectancy +0.94 ± 0.14 +0.75 ± 0.13 +2.59 ± 1.03
Win Rate 100% 94.3% 77.1%
Statistical Significance 83.3% 94.3% 54.3%

Win Rate = fraction of (strategy, asset) pairs with positive improvement. Statistical significance at p < 0.05 via one-sided Wilcoxon signed-rank test.

Cross-Asset Performance

The seven liquid futures assets tested were:

Strategy-Specific Highlights

Strategy Evolution Example

A typical evolution from seed MACD strategy to best-performing variant shows transformation from a simple 30-line crossover strategy to a sophisticated 100+ line system incorporating:

Component Seed Strategy Evolved Strategy
Core Logic Simple MACD/Signal crossover Histogram smoothing + confirmation
Regime Filter None 200-period EMA trend filter
Volatility Filter None ATR% gating (0.4% - 20%)
Risk Management None ATR-based stop-loss and take-profit
Position Sizing Fixed 100% Risk-fraction based (1% risk per trade)
Performance -54.04% annual return +0.77% annual return

Part 4: Why It Works

The Power of Program Evolution

ProFiT's success stems from three key innovations:

  1. Semantic Mutation: LLMs understand code semantics, enabling targeted improvements rather than random changes
  2. Empirical Grounding: Every mutation is validated against real market data through backtesting
  3. Archive Diversity: Maintaining all viable strategies (not just the best) prevents premature convergence and enables exploration from diverse starting points

Comparison with Traditional Approaches

Approach Search Space Adaptation Limitation
Deep RL Fixed architecture weights Parameter tuning Cannot modify logic
Traditional GP Syntactic tree structures Random mutations No semantic understanding
Static Codex One-shot generation None No iterative improvement
ProFiT Python programs LLM-guided evolution Compute intensive

Part 5: Limitations and Future Work

While ProFiT demonstrates significant improvements, several limitations warrant consideration:

Future directions include prompt-level meta-optimization, real-time market adaptation, multi-asset strategy evolution, and more sophisticated diversity maintenance mechanisms.

Conclusion

ProFiT establishes a practical pathway toward open-ended, self-improving algorithmic trading systems. By coupling LLM-driven code evolution with empirical validation, the framework achieves:

These results demonstrate that autonomous systems can iteratively refine real-world trading policies without manual tuning, suggesting that LLMs coupled with grounded feedback can serve as active facilitators in algorithmic discovery beyond just code generation.

Primary Source

ProFiT: Program Search for Financial Trading
Matthew Siper, Ahmed Khalifa, Lisa Soros, Muhammad Umair Nasir, Jay Azhang, Julian Togelius (Nof1), 2025

Related: Darwin Gödel Machine
The theoretical foundation inspiring ProFiT's self-improving approach