General Agentic AI

Collections:

Our Work

*Equal Contribution and Corresponding Author
Currently none.

Paper Reading

  • 2025.11.17: Scaling Agent Learning via Experience Synthesis. paper | github #Model based Env
  • 2025.11.14: Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments. paper | code #Benchmark
  • 2025.11.11: Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey. paper | github #Survey
  • 2025.11.07: MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers. paper | github #Benchmark
  • 2025.10.30: The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models. paper | github #Benchmark
  • 2025.10.27: Towards General Agentic Intelligence via Environment Scaling. paper | github #Environment
  • 2025.10.24: TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments. paper | github #Data-Synthesis
  • 2025.10.23: $\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment. paper | Github #Benchmark