Reinforcement Learning
Collections:
Our Work
*Equal Contribution and † Corresponding AuthorCurrently none.
Paper Reading
- 2025.12.05: FlowRL: Matching Reward Distributions for LLM Reasoning. paper | github #Optimization
- 2025.12.01: Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models. paper | github #Optimization
- 2025.11.29: The Landscape of Agentic Reinforcement Learning for LLMs: A Survey. paper | github #Survey
- 2025.11.29: How to Explore to Scale RL Training of LLMs on Hard Problems? paper #Optimization
- 2025.10.12: Learning Ordinal Probabilistic Reward From Preferences. paper | github #Reward