Tag: open-problems

Blog Post·2025-06-20·12 min read

Open Problems in Mechanistic Interpretability

Faithfulness vs. plausibility, scaling to frontier models, the composition problem, automated interpretability, and what it would take to actually understand a large language model.

mechanistic-interpretability open-problems ai-safety interpretability