Blog Post··6 min read
Inside the FFN: MoE, SwiGLU, and the Architectural Details That Scale
The FFN block consumes most of a transformer's parameters. The choices made there — activation function, gating, expert routing — account for much of the quality gap between model families.