📖 Qiyao Wang's Blog
Total Blogs
2026
2025
-
CS336 Assignment 1: BPE Tokenizer's Detailed Implementation
-
Three Sampling Methods: Temperature, Top K and Top P
-
Greedy Search and Beam Search
-
Autoregressive Decoding: Basic Manner of Decoder-Only LLMs
-
Proximal Policy Optimization (PPO) and RLHF
-
Basic Knowledge of Reinforcement Learning before PPO
-
CMU DLSys Course Homework 1: Implementation and Reflection (Part1)
-
Chain-of-Thought Reasoning without Prompting
-
CMU DLSys Course Homework 0: Implementation and Reflection