Tag: cheatsheet

Blog Post·2025-01-10·6 min read

Cheatsheet: Attention

Every equation in scaled dot-product attention and multi-head attention annotated term-by-term — the scaling, the softmax, the heads, RoPE, and KV cache — with links to the posts explaining each design choice.

cheatsheet attention transformers architecture

Blog Post·2025-01-10·5 min read

Cheatsheet: GRPO

Every equation in GRPO annotated term-by-term, with links to the posts and visuals explaining the why behind each design choice.

cheatsheet grpo rl llm-training alignment

Blog Post·2025-01-10·5 min read

Cheatsheet: PPO

Every equation in PPO annotated term-by-term — the clipped surrogate, GAE, value loss, and entropy bonus — with links to the posts and visuals explaining each design choice.

cheatsheet ppo rl llm-training alignment