General Agentic AI
Collections:
Our Work
*Equal Contribution and † Corresponding AuthorCurrently none.
Paper Reading
- 2025.11.17: Scaling Agent Learning via Experience Synthesis. paper | github #Model based Env
- 2025.11.14: Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments. paper | code #Benchmark
- 2025.11.11: Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey. paper | github #Survey
- 2025.11.07: MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers. paper | github #Benchmark
- 2025.10.30: The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models. paper | github #Benchmark
- 2025.10.27: Towards General Agentic Intelligence via Environment Scaling. paper | github #Environment
- 2025.10.24: TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments. paper | github #Data-Synthesis
- 2025.10.23: $\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment. paper | Github #Benchmark