Title | Description |
---|---|
AgentFlow: In-the-Flow Agentic System Optimization for Effective Planning and Tool Use | Novel trainable agentic framework coordinating four specialized modules (planner, executor, verifier, generator) through evolving memory. Introduces Flow-GRPO training method enabling direct optimization within live multi-turn interactions. Demonstrates 7B models surpassing GPT-4o with 14.9% gains on search tasks, 14.0% on agentic tasks, and 14.5% on mathematical reasoning. |
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory | Novel memory framework enabling AI agents to learn from both successful and failed experiences by distilling generalizable reasoning strategies. Introduces Memory-aware Test-Time Scaling (MaTTS) that creates synergy between memory quality and computational scaling. Demonstrates up to 34.2% relative improvement across web browsing and software engineering tasks, with emergent self-evolution behaviors. |
Curse of Instructions: Large Language Models Cannot Follow Multiple Instructions at Once | Comprehensive analysis revealing fundamental limitations in LLMs' ability to follow multiple simultaneous instructions. Introduces ManyIFEval benchmark showing exponential performance decay with instruction count, with GPT-4o, Claude-3.5, and other models tested. Includes self-refinement mitigation strategies and production implications. |
AI Benchmark Critique: Evidence of Invalid 2026 Predictions | Critical analysis of METR and GDPval benchmarks, revealing statistical flaws, baseline inflation errors, and invalid extrapolation methods |
Recursive Self-Aggregation: Deep Thinking and Test-Time Scaling for LLM Reasoning | Groundbreaking test-time scaling method enabling smaller models to match larger reasoning models through iterative aggregation of reasoning chains |
The OaK Architecture: A Paradigm Shift in Artificial General Intelligence | Rich Sutton's vision for experience-based superintelligence through continual learning, hierarchical abstraction, and reward maximization |
← Home
Research Papers
Curated collection of AI agent engineering research and analysis
GenAI Community
join.maxpool.dev →