Every equation in scaled dot-product attention and multi-head attention annotated term-by-term — the scaling, the softmax, the heads, RoPE, and KV cache — with links to the posts explaining each design choice.
Every equation in PPO annotated term-by-term — the clipped surrogate, GAE, value loss, and entropy bonus — with links to the posts and visuals explaining each design choice.