AlphaAgents introduces a role-based multi-agent framework for systematic stock selection and portfolio construction using large language models. The system deploys three specialized agents—Fundamental, Sentiment, and Valuation—that collaborate through structured debate using Microsoft AutoGen infrastructure. When analyses diverge, agents engage in discussion until consensus emerges, mimicking institutional investment committee reasoning.
Backtesting on 15 technology stocks over a four-month period (February-May 2024) demonstrated that the multi-agent portfolio outperformed single-agent approaches in risk-neutral scenarios, effectively balancing short-term signals with long-term fundamental insights. The framework produces transparent reasoning trails through logged discussions, enabling audit and validation of investment decisions.
Key findings reveal that collaborative debate mechanisms enhance analytical rigor when agents encounter conflicting signals, while explicit risk tolerance integration allows contextual interpretation rather than fixed thresholds. The modular architecture positions the system for integration with established portfolio optimization methods like Mean-Variance or Black-Litterman models.
Imagine you're deciding which toys to buy with your allowance. Instead of asking just one friend, you ask three different friends who each specialize in different things: one knows which toys are well-made and will last (Fundamental), one knows what toys are popular right now (Sentiment), and one knows if the price is fair (Valuation). When they disagree, they debate until they reach agreement. This way, you get a better decision than asking just one friend, because each brings unique knowledge to the table.
The AlphaAgents framework deploys three role-based agents, each with distinct analytical responsibilities and data sources. This specialization mirrors how institutional investment teams organize expertise across fundamental research, market sentiment analysis, and quantitative valuation.
| Agent | Focus Area | Data Sources | Key Tools |
|---|---|---|---|
| Fundamental Agent | Financial health, sector trends, projected performance | 10-K/10-Q filings, financial statements | API calls, RAG for report analysis |
| Sentiment Agent | Investor sentiment, market perception, price impact | Bloomberg news, analyst ratings, disclosures | LLM summarization with reflection prompting |
| Valuation Agent | Price reasonableness, volatility, return trends | Yahoo Finance historical pricing, volumes | Computational tools for annualized returns |
Each agent receives tailored system prompts defining their specific responsibilities. This role-based prompting ensures agents stay within their analytical domain while contributing unique perspectives to the collaborative analysis. The prompts also embed risk tolerance profiles, enabling contextual interpretation rather than fixed numerical thresholds.
The core innovation of AlphaAgents lies in its structured debate mechanism. Built on Microsoft AutoGen infrastructure, the framework uses a coordinating group chat assistant that ensures all specialist agents contribute at least twice before analyses are consolidated.
The multi-agent debate addresses behavioral finance challenges by reducing human biases through AI collaboration. Simultaneously, the consensus mechanism mitigates AI-specific hallucinations—when one agent produces unreliable output, others can challenge and correct through debate. This creates a form of "collective intelligence" that outperforms individual agent reasoning.
A distinguishing feature of AlphaAgents is the integration of explicit risk tolerance profiles through prompt engineering. Rather than using fixed numerical thresholds, investor traits are embedded directly into agent instructions, allowing contextual interpretation of risk.
| Risk Profile | Behavior | Market Context Performance |
|---|---|---|
| Risk-Neutral | Balanced approach, willing to accept volatility for returns | Outperformed benchmarks in bullish markets |
| Risk-Averse | Conservative selection, prioritizes capital preservation | Underperformed in bullish markets but lower drawdowns |
The research revealed that adjacent risk profiles (risk-neutral vs. risk-seeking) showed limited differentiation through prompt engineering alone. This suggests that more sophisticated mechanisms may be needed to capture fine-grained risk preferences—a key area for future development.
The framework was evaluated through backtesting on 15 randomly selected technology stocks over a four-month window from February to May 2024. Portfolios were constructed with equal weights based on agent BUY/SELL recommendations following debate.
The multi-agent system outperformed both single-agent portfolios in cumulative returns and rolling Sharpe ratio throughout the testing window. The framework effectively balanced short-term sentiment/valuation signals with long-term fundamental insights, achieving superior overall results.
All risk-averse portfolios underperformed the benchmark, primarily because technology sector exclusions limited upside participation during the bullish testing period. However, the multi-agent approach achieved relatively stronger performance than individual risk-averse strategies, with lower volatility and reduced drawdowns.
To ensure analytical reliability, AlphaAgents deploys Arize Phoenix for retrieval-augmented generation (RAG) evaluation. The system measures faithfulness and relevance scores for fundamental and sentiment agents, while mathematical tools monitor valuation agent performance.
The framework mirrors how institutional investment committees operate—divergent perspectives undergo structured reconciliation through debate. This positions AlphaAgents for human-in-the-loop deployment where portfolio managers can review transparent reasoning trails and override decisions when warranted. The logged discussions provide audit capabilities essential for regulatory compliance.
| Limitation | Impact | Proposed Solution |
|---|---|---|
| 15 technology stocks only | Limited sector diversification evidence | Expand to broader sector coverage |
| Stock selection only (no optimization) | No portfolio weight allocation | Integrate Mean-Variance or Black-Litterman |
| 4-month testing window | Limited long-term performance assessment | Extended multi-year backtesting |
| Adjacent risk profile conflation | Insufficient differentiation via prompts | Develop enhanced risk modeling |
| News coverage gaps | Sentiment agent data limitations | Expand data source integration |
AlphaAgents demonstrates that LLM-based multi-agent systems can effectively support equity research and portfolio construction. The key contributions include:
While current scope is limited to stock selection within technology sector, the framework establishes a foundation for more comprehensive autonomous portfolio management systems that augment human investment expertise.
AlphaAgents: Large Language Model based Multi-Agents for Equity Portfolio Constructions
Zhao, Lyu, Jones, Garber, Pasquali, Mehta (BlackRock, Inc.), 2025