Posts

Policy Optimization for RL and LLM Alignment — Theory

The optimization story behind aligning language models with human preferences

On my favorite generative model of all time!