Tag: superposition

Blog Post·2025-06-20·11 min read

Representation Geometry: How Neural Networks Encode Meaning

The linear representation hypothesis, superposition, polysemanticity, and why transformer activations are more structured than they look.

mechanistic-interpretability representation-learning superposition linear-representation-hypothesis transformers

Blog Post·2025-06-20·11 min read

Sparse Autoencoders: Decomposing Neural Networks into Interpretable Features

Dictionary learning for neural networks — how sparse autoencoders recover monosemantic features from polysemantic activations, and what Anthropic's scaling monosemanticity work found in Claude.

mechanistic-interpretability sparse-autoencoders features superposition dictionary-learning