Policy Optimization for RL and LLM Alignment — TheoryThe optimization story behind aligning language models with human preferences