Tag: direct-logit-attribution

Blog Post·2025-06-20·8 min read

Logit Lens: How Predictions Form Layer by Layer

Applying the unembedding matrix at intermediate layers to watch how a transformer's prediction evolves — and what direct logit attribution tells us about which components matter.

mechanistic-interpretability logit-lens direct-logit-attribution interpretability transformers