• Homepage
  • Blogs
Topics

📖 Qiyao Wang's Blog

Total Blogs

2026

  • GRPO: Group Relative Policy Optimization
    May. 21, 2026 Chinese
    #Basics #RL
  • Relearning PPO
    May. 21, 2026 Chinese
    #Basics #RL
  • RL in LLM: An Introduction
    Mar. 26, 2026 Chinese
    #Basics #RL
  • Prompt-OIRL: Query-Dependent Prompt Optimization with Offline Inverse RL
    Mar. 25, 2026 Chinese
    #RL #Paper
  • CS336 Assignment 1: Transformers Language Model Architecture
    Jan. 01, 2026 Chinese
    #Basics #CS336

2025

  • CS336 Assignment 1: BPE Tokenizer's Detailed Implementation
    Aug. 01, 2025 Chinese
    #Basics #CS336
  • Three Sampling Methods: Temperature, Top K and Top P
    Feb. 27, 2025 Chinese
    #Decoding
  • Greedy Search and Beam Search
    Feb. 26, 2025 Chinese
    #Decoding
  • Autoregressive Decoding: Basic Manner of Decoder-Only LLMs
    Feb. 25, 2025 Chinese
    #Decoding
  • Proximal Policy Optimization (PPO) and RLHF
    Feb. 24, 2025 Chinese
    #RL
  • Basic Knowledge of Reinforcement Learning before PPO
    Feb. 23, 2025 Chinese
    #RL
  • CMU DLSys Course Homework 1: Implementation and Reflection (Part1)
    Jan. 08, 2025 Chinese
    #MLSys
  • Chain-of-Thought Reasoning without Prompting
    Jan. 07, 2025 Chinese
    #Reasoning #Paper
  • CMU DLSys Course Homework 0: Implementation and Reflection
    Jan. 01, 2025 Chinese
    #MLSys

2024

  • Pattern Recognition and Machine Learning
    Dec. 28, 2024 Chinese
    #ML