Policy Optimization Algorithms: TRPO, PPO, DPO, KTO and GRPO

a

February 17, 2026 · 16 min · Ji Hun Wang