General Agentic AI

Collections:

Our Work

^*Equal Contribution and ^† Corresponding Author

Currently none.

2025.12.07: Acting Less is Reasoning More! Teaching Model to Act Efficiently. paper #Tool-RL
2025.11.17: Scaling Agent Learning via Experience Synthesis. paper | github #Model based Env
2025.11.14: Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments. paper | code #Benchmark
2025.11.11: Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey. paper | github #Survey
2025.11.07: MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers. paper | github #Benchmark
2025.10.30: The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models. paper | github #Benchmark
2025.10.27: Towards General Agentic Intelligence via Environment Scaling. paper | github #Environment
2025.10.24: TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments. paper | github #Data-Synthesis
2025.10.23: $\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment. paper | Github #Benchmark