Policy Optimization for RL and LLM Alignment — Theory

The optimization story behind aligning language models with human preferences

February 17, 2026 · 41 min · Ji Hun Wang