Ji Hun Wang
Posts
Alignment
Policy Optimization Algorithms: TRPO, PPO, DPO, KTO and GRPO
a