📖 Qiyao Wang's Blog
Total Blogs
2026
-
Workspace-Bench:面向大规模文件依赖的AI Agent工作空间学习评测基准
-
LRAT:从Agent轨迹中学习检索 —— 迈向Agentic Search时代的检索训练新范式
-
OpenSeeker:完全开源训练数据,学术团队实现前沿级搜索 Agent
-
Claw-Eval:面向可信自主 Agent 评估的全轨迹审计与多维评分基准
-
ClawGym: Part II - ClawGym-Agents and Bench
-
GDPval: Measuring AI's Economic Impact
-
ClawGym: Part I - ClawGym-SynData
-
GRPO: Group Relative Policy Optimization
-
Relearning PPO
-
RL in LLM: An Introduction
-
Prompt-OIRL: Query-Dependent Prompt Optimization with Offline Inverse RL
-
CS336 Assignment 1: Transformers Language Model Architecture
2025
-
CS336 Assignment 1: BPE Tokenizer's Detailed Implementation
-
Three Sampling Methods: Temperature, Top K and Top P
-
Greedy Search and Beam Search
-
Autoregressive Decoding: Basic Manner of Decoder-Only LLMs
-
Proximal Policy Optimization (PPO) and RLHF
-
Basic Knowledge of Reinforcement Learning before PPO
-
CMU DLSys Course Homework 1: Implementation and Reflection (Part1)
-
Chain-of-Thought Reasoning without Prompting
-
CMU DLSys Course Homework 0: Implementation and Reflection