Reinforcement Learning

Collections:

Our Work

*Equal Contribution and Corresponding Author
Currently none.

Paper Reading

  • 2025.12.05: FlowRL: Matching Reward Distributions for LLM Reasoning. paper | github #Optimization
  • 2025.12.01: Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models. paper | github #Optimization
  • 2025.11.29: The Landscape of Agentic Reinforcement Learning for LLMs: A Survey. paper | github #Survey
  • 2025.11.29: How to Explore to Scale RL Training of LLMs on Hard Problems? paper #Optimization
  • 2025.10.12: Learning Ordinal Probabilistic Reward From Preferences. paper | github #Reward