Blog Post··6 min read
Optimization for LLMs: Gradient Descent to Adam
Training a neural network is an optimization problem: minimize a loss function over billions of parameters. The journey from vanilla gradient descent to Adam reveals why each step was necessary.