Reinforcement Learning

Collections:

Our Work

^*Equal Contribution and ^† Corresponding Author

Currently none.

2025.12.05: FlowRL: Matching Reward Distributions for LLM Reasoning. paper | github #Optimization
2025.12.01: Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models. paper | github #Optimization
2025.11.29: The Landscape of Agentic Reinforcement Learning for LLMs: A Survey. paper | github #Survey
2025.11.29: How to Explore to Scale RL Training of LLMs on Hard Problems? paper #Optimization
2025.10.12: Learning Ordinal Probabilistic Reward From Preferences. paper | github #Reward