Tag: llm

Blog Post·2024-06-19·4 min read

Why Dropout Disappeared from Large Language Models

BERT used dropout everywhere. LLaMA uses none. The reason isn't that regularization stopped mattering — it's that at trillion-token scale, data diversity IS the regularizer.

training regularization llm

Blog Post·2024-06-19·5 min read

Prefill and Decode: The Two Phases of LLM Inference

LLM inference has two fundamentally different compute phases. Prefill processes the prompt in parallel and is compute-bound. Decode generates tokens one at a time and is memory-bandwidth-bound. Understanding both determines how you optimize.

inference llm systems kv-cache

Blog Post·2024-06-19·5 min read

The Lifecycle of a KV Cache: From Prefill to Last Token

A request-level walkthrough of how the KV cache is populated, grown, and read during LLM inference — covering prefill, decode, memory layout, and why decode is memory-bandwidth-bound.

inference systems llm

Blog Post·2024-06-19·9 min read

Comparing Large Model Architectures: Attention, Normalization, and Scale

GPT-4, Gemini, LLaMA, Mistral, DeepSeek, Qwen — they all build on the same transformer skeleton. But the architectural choices diverge sharply. Here's a systematic comparison across model families.

architecture transformers llm comparison moe

Blog Post·2024-06-19·8 min read

Evaluation Metrics: Precision, Recall, Calibration, and Confidence

How do you measure whether a model is actually good? The answer is a set of metrics — precision, recall, F1, perplexity, calibration, confidence intervals — each measuring something different and failing in a different way.

math evaluation metrics statistics llm

Blog Post·2024-06-19·5 min read

Red-Teaming vs Automated Evals: Tradeoffs and When to Use Each

Human red-teaming finds attacks automated evals miss. Automated evals achieve scale humans can't. Here's how to combine them, and what each can and can't tell you.

responsible-ai red-teaming evaluation safety llm

Blog Post·2024-06-19·5 min read

Runtime Guardrails: Architecture Patterns for Production AI Safety

Training-time alignment is not enough. Production AI systems need runtime layers that detect, intercept, and respond to harmful inputs and outputs. Here's how to build them.

responsible-ai safety guardrails production llm

Blog Post·2024-06-19·8 min read

How Decoder-Only Transformers Evolved Since GPT-2

GPT-2 established the decoder-only transformer as the dominant paradigm. What followed was six years of systematic improvements — in scale, efficiency, alignment, and reasoning. Here's the arc.

transformers gpt architecture history llm