Posts

Reinforcement Learning

Policy Optimization for RL and LLM Alignment — Theory

The optimization story behind aligning language models with human preferences

February 17, 2026 · 41 min · Ji Hun Wang

© 2026 Ji Hun Wang · Powered by Hugo & PaperMod