Tag: optimizer

Blog Post·2024-06-19·4 min read

Optimizers for LLMs: Adam, Weight Decay, and Why Learning Rate Matters More Than You Think

Adam is the default optimizer for language model training, but using it correctly — the right β values, weight decay, learning rate schedule — makes a larger difference than most people expect.

transformers optimizer llm-training adam