Blog Post
Matrices as Linear Maps
A matrix is not just a grid of numbers — it's a function that transforms space. This post builds the geometric intuition for matrix-vector multiplication as rotation, scaling, and shearing.
Views: –8 min readCite
Every time a transformer model computes attention, it performs a sequence of matrix multiplications. The query, key, and value matrices — , , — transform token embeddings into new spaces where similarity can be measured. The attention weights are computed as:
That is a matrix product. So is the projection that maps a token embedding into the query space. If you want to understand why transformers work the way they do, you need to understand what a matrix multiplication does to a vector — not just how to compute it, but what it means geometrically. A matrix is not a grid of numbers. It is a function that transforms space.
What Is a Linear Map?
A linear map (also called a linear transformation) is a function that satisfies two properties:
Additivity:
Homogeneity:
These two rules can be combined into a single condition: for any scalars and vectors ,
This is called superposition. It says that linear maps respect the structure of vector addition and scalar multiplication. They do not bend, curve, or shift the origin — they can only stretch, rotate, and shear.
Some examples of linear maps:
- Rotating the plane by 30°
- Projecting vectors onto the -axis
- Stretching every vector by a factor of 3
- Reflecting across the -axis
Some things that are not linear maps:
- Translating every vector by a fixed offset (it moves the origin)
- Squaring: (fails additivity)
- Any function with (homogeneity requires )
The remarkable fact is that every linear map from to can be written as a matrix multiplication. Matrices are the language of linear maps.
Matrices as Linear Maps: The Column Picture
Let be a matrix:
The product for is:
This is a linear combination of the columns of . The first column is exactly where the standard basis vector lands:
The second column is where lands:
This is the key insight: the columns of a matrix tell you where the basis vectors go. Once you know where and land, you know where everything lands — because every vector is a linear combination of the basis vectors, and linear maps preserve linear combinations.
If , then:
The entire transformation is determined by where the basis lands.
Dashed gray = original unit square. Purple = transformed square. Red/blue arrows show where basis vectors land — these are exactly the columns of the matrix.
Drag the sliders to change the matrix entries. Notice how the columns of the matrix correspond exactly to where the red and blue basis vectors land. The gray unit square deforms into a parallelogram — that parallelogram is the image of the square under the linear map.
Geometric Transformations in 2D
Different matrix structures produce recognizable geometric effects.
Rotation
To rotate vectors counterclockwise by angle :
For :
The columns tell the story: rotates to (pointing northeast), and rotates to (pointing northwest).
Scaling
Scaling by factor in the -direction and in the -direction:
Uniform scaling by factor 2: . Every vector doubles in length; the square becomes a larger square.
Shearing
A horizontal shear shifts the -coordinate by times the -coordinate:
For : stays fixed at , but moves to . The unit square tilts into a parallelogram. This is exactly what happens when you drag the top of a rectangle sideways while holding the bottom fixed.
Reflection
Reflection across the -axis flips the sign of the -coordinate:
maps to (flipped left), stays at .
Click each preset to animate the unit square transforming. Dashed gray = original.
Matrix-Matrix Multiplication as Function Composition
If and are both linear maps, what is their composition ? The function that first applies , then applies ?
The matrix that represents this composition is the matrix product :
This is not just a notational convenience — it is the definition. Matrix multiplication is defined precisely so that it corresponds to function composition. To compute , the -th column of is applied to the -th column of :
For two matrices:
Each entry is the dot product of the -th row of with the -th column of . But the deeper meaning is composition: first apply , then apply .
Why Order Matters:
Function composition is not commutative in general. "First rotate, then scale" is different from "first scale, then rotate" — actually, in this case they happen to be the same. But "first rotate, then shear" is genuinely different from "first shear, then rotate."
Let be a 90° rotation and be a horizontal shear with :
Then:
. Applying shear then rotating is not the same as rotating then shearing. The unit square ends up in different positions depending on the order.
Left: AB·x = A(Bx) — scale first, then rotate Right: BA·x = B(Ax) — rotate first, then scale
AB ≠ BA: the two transformed squares land in different positions. Order of composition matters.
When writing , remember: acts first, acts second. The matrix on the right acts first. This right-to-left reading order trips up many newcomers — it is a consequence of function composition notation.
Special Matrices
The Identity Matrix
The identity matrix is the linear map that changes nothing:
for every . The columns are exactly the standard basis vectors — stays at and stays at . For any matrix , we have . The identity is the matrix analogue of multiplying by 1.
The Zero Matrix
The zero matrix sends every vector to the zero vector:
Every column is zero: both basis vectors collapse to the origin. This is the most destructive linear map — it loses all information. For any matrix , we have .
Putting It Together
The vocabulary of this post gives you the tools to read modern ML papers at a deeper level:
- When a paper writes , it means a linear map applied to — rotates, scales, and shears the input into a new space.
- When attention computes , it is composing two linear maps to measure alignment between queries and keys.
- When a network stacks layers , the learned weight matrices compose like — the rightmost acts first.
In the next post, we will ask: when can a linear map be undone? That is the question of invertibility, and answering it will lead us to determinants.