Policy Optimization for RL and LLM Alignment — Theory
The optimization story behind aligning language models with human preferences
The optimization story behind aligning language models with human preferences
On my favorite generative model of all time!