ProFiT (Program Search for Financial Trading) introduces an LLM-driven evolutionary framework that autonomously discovers and improves algorithmic trading strategies. Unlike traditional approaches that tune parameters within fixed architectures, ProFiT evolves the actual Python source code of trading strategies, enabling structural adaptation to changing market conditions.
The framework achieves remarkable results across seven liquid futures assets: evolved strategies outperform Buy-and-Hold in 77%+ of all combinations, beat random baselines in 100% of cases, and improve over seed strategies in 94%+ of runs. Mean improvement in annualized return reaches +44.21% with a Sharpe ratio improvement of +0.57.
ProFiT bridges the gap between evolutionary algorithms and large language models by using LLMs for semantically-informed mutations guided by backtest performance analysis, rather than random perturbations. This represents a paradigm shift from "parameter learning" to "strategy evolution" in quantitative trading.
Imagine you have a cookbook of trading recipes. Traditional approaches are like tweaking ingredient amounts (parameters) in the same recipe. ProFiT is different—it's like having a chef (the LLM) who reads your recipe, tastes the result, and then rewrites entire cooking steps to make it better. The chef keeps a library of all recipes that worked (not just the best one), so it can try variations from different starting points. After many iterations, you end up with recipes that are fundamentally better, not just slightly adjusted versions of the original.
Designing profitable trading strategies remains one of the most challenging problems in quantitative finance. Despite decades of progress in machine learning and reinforcement learning, most trading systems still rely on human intuition, manual feature engineering, and static optimization. These methods consistently fail to adapt to evolving market regimes, where initially sustainable patterns eventually decay.
ProFiT addresses these challenges by evolving executable trading code rather than just adjusting model weights. The framework draws inspiration from the Darwin Gödel Machine, adapting self-referential improvement concepts for quantitative trading where empirical market feedback serves as the evolutionary fitness function.
ProFiT operates as a meta-learning framework that explores the space of algorithmic trading strategies through an open-ended evolutionary loop guided by LLMs. Each trading strategy is represented as an independent Python program that undergoes mutation, evaluation, and selection.
Each node represents a trading strategy variant with color encoding fitness (annualized return %).
Edges indicate lineage from parent to child. Only strategies exceeding the
Minimum Acceptable Score (MAS) are retained in the population.
Best fitness: +6.93% | Worst retained: -23.41%
Unlike traditional genetic programming that uses random mutations, ProFiT employs semantically-informed mutation guided by LLM reasoning. The LLM receives both the strategy code and performance metrics, enabling it to make targeted improvements based on actual backtest results.
Stage 1 - Analysis: The LLM identifies weaknesses, inefficiencies, and sources of poor performance. It proposes no more than 2-3 concrete, high-impact improvements that could realistically improve out-of-sample returns.
Stage 2 - Rewrite: Using the original code and improvement proposals, the LLM generates new Python source code that incorporates the requested enhancements while maintaining compatibility with the backtesting framework.
If the generated code fails to compile, a repair loop provides the error traceback back to the LLM for up to 10 attempts.
To ensure robustness and avoid overfitting, ProFiT uses cross-validation with five temporal folds. Each fold consists of:
| Period | Duration | Purpose |
|---|---|---|
| Training | 2.5 years | Strategy optimization |
| Validation | 6 months | Fitness evaluation for evolution |
| Test | 6 months | Out-of-sample performance |
| Dormant Window | 10 days | Buffer between validation/test to prevent lookahead |
The dataset covers January 2008 to October 2025 with 1-hour timeframe data containing open, high, low, close prices and trading volume.
ProFiT was evaluated on 35 experiments (7 assets × 5 seed strategies) with fitness growth typically tapering after approximately 15 iterations, suggesting convergence to local optima.
| Metric | vs. Random | vs. Seed (B0) | vs. Buy-and-Hold |
|---|---|---|---|
| Annualized Return | +86.47% ± 1.21 | +44.21% ± 4.31 | +3.41% ± 1.27 |
| Sharpe Ratio | +0.62 ± 0.04 | +0.57 ± 0.04 | +2.69 ± 0.97 |
| Expectancy | +0.94 ± 0.14 | +0.75 ± 0.13 | +2.59 ± 1.03 |
| Win Rate | 100% | 94.3% | 77.1% |
| Statistical Significance | 83.3% | 94.3% | 54.3% |
Win Rate = fraction of (strategy, asset) pairs with positive improvement. Statistical significance at p < 0.05 via one-sided Wilcoxon signed-rank test.
The seven liquid futures assets tested were:
A typical evolution from seed MACD strategy to best-performing variant shows transformation from a simple 30-line crossover strategy to a sophisticated 100+ line system incorporating:
| Component | Seed Strategy | Evolved Strategy |
|---|---|---|
| Core Logic | Simple MACD/Signal crossover | Histogram smoothing + confirmation |
| Regime Filter | None | 200-period EMA trend filter |
| Volatility Filter | None | ATR% gating (0.4% - 20%) |
| Risk Management | None | ATR-based stop-loss and take-profit |
| Position Sizing | Fixed 100% | Risk-fraction based (1% risk per trade) |
| Performance | -54.04% annual return | +0.77% annual return |
ProFiT's success stems from three key innovations:
| Approach | Search Space | Adaptation | Limitation |
|---|---|---|---|
| Deep RL | Fixed architecture weights | Parameter tuning | Cannot modify logic |
| Traditional GP | Syntactic tree structures | Random mutations | No semantic understanding |
| Static Codex | One-shot generation | None | No iterative improvement |
| ProFiT | Python programs | LLM-guided evolution | Compute intensive |
While ProFiT demonstrates significant improvements, several limitations warrant consideration:
Future directions include prompt-level meta-optimization, real-time market adaptation, multi-asset strategy evolution, and more sophisticated diversity maintenance mechanisms.
ProFiT establishes a practical pathway toward open-ended, self-improving algorithmic trading systems. By coupling LLM-driven code evolution with empirical validation, the framework achieves:
These results demonstrate that autonomous systems can iteratively refine real-world trading policies without manual tuning, suggesting that LLMs coupled with grounded feedback can serve as active facilitators in algorithmic discovery beyond just code generation.
ProFiT: Program Search for Financial Trading
Matthew Siper, Ahmed Khalifa, Lisa Soros, Muhammad Umair Nasir, Jay Azhang, Julian Togelius (Nof1), 2025
Related: Darwin Gödel Machine
The theoretical foundation inspiring ProFiT's self-improving approach