Blog Post··11 min read
Representation Geometry: How Neural Networks Encode Meaning
The linear representation hypothesis, superposition, polysemanticity, and why transformer activations are more structured than they look.
The linear representation hypothesis, superposition, polysemanticity, and why transformer activations are more structured than they look.
Dictionary learning for neural networks — how sparse autoencoders recover monosemantic features from polysemantic activations, and what Anthropic's scaling monosemanticity work found in Claude.