Tag: alignment

Blog Post·2024-06-19·7 min read

GRPO vs PPO: Why Removing the Value Head Changes Everything

GRPO achieves competitive alignment results without a value function. Here's exactly what changes in the math and implementation, and why that matters for training efficiency and stability.

rl grpo ppo llm-training alignment

Blog Post·2024-06-19·8 min read

The LLM Alignment Pipeline: SFT, Reward Models, and RL End to End

Training a helpful, harmless, honest LLM requires three sequential stages that each build on the previous one. Here's how SFT, reward modeling, and RL fit together as a system — and where each stage can fail.

alignment rlhf sft reward-model llm-training

Blog Post·2024-06-19·5 min read

Loss Functions: What You Optimize Is What You Get

The loss function is the specification. Everything the model learns is in service of minimizing it. Here's the math behind every major loss used in LLM training and fine-tuning.

math loss-functions llm-training alignment

Blog Post·2024-06-19·7 min read

Learning from Feedback: RLHF, RLAIF, and Beyond

RLHF is three steps: supervised fine-tuning, reward model training, and policy optimization. Each step has a specific failure mode. Here's the full picture.

rl rlhf alignment llm-training

Blog Post·2024-06-19·7 min read

Synthetic Data for Alignment: Curation, Quality Filtering, and Self-Critique

Human annotation doesn't scale to the data volumes modern alignment requires. Synthetic data — generated by LLMs, filtered, and refined — has become the dominant approach. Here's how it's done and where it breaks down.

alignment synthetic-data sft data-curation llm-training