Tag: regularization

Blog Post·2024-06-19·4 min read

Why Dropout Disappeared from Large Language Models

BERT used dropout everywhere. LLaMA uses none. The reason isn't that regularization stopped mattering — it's that at trillion-token scale, data diversity IS the regularizer.

training regularization llm