Blog Post
Vectors: Direction, Magnitude, and the Geometry of Space
Vectors are the atoms of linear algebra — everything else is built on them. This post builds intuition for what a vector is, how addition and scaling work geometrically, and why norms give us a way to measure the world.
Views: –8 min readCite
When a language model embeds the word "king" as a 768-dimensional vector, what does that actually mean? The model has assigned it a location in a high-dimensional space — a point whose coordinates encode something about meaning, context, and relationship to other words. When we say that "king" and "queen" are close in this space, we need a precise notion of close. Close by what measure? Straight-line distance? City-block distance? The maximum coordinate difference? These are not the same thing, and the choice shapes what the model learns.
To answer questions like these, you need linear algebra. And linear algebra begins with vectors.
What Is a Vector?
There are two ways to think about a vector, and both are true simultaneously.
The geometric view: a vector is an arrow in space. It has a direction — the way it points — and a magnitude — how long it is. The arrow's starting point does not matter; what matters is its displacement. The vector from to is the same vector as the one from to . We always anchor vectors at the origin as a convention.
The algebraic view: a vector is an ordered tuple of numbers. In two dimensions, means "3 units in the -direction, 4 units in the -direction." In dimensions, a vector is just a list of real numbers: .
These views are the same thing. The arrow starts at the origin and ends at the point . The numbers are coordinates; the arrow is the geometric object they describe. When you work with 768-dimensional word embeddings, you cannot draw the arrow, but every operation you perform is geometrically meaningful — you are just working in a space you cannot see.
Vector Addition
Given two vectors and , their sum is:
Geometrically, addition has a beautiful interpretation. Place 's tail at 's tip. The sum is the arrow from the origin to the new tip. This is the tip-to-tail rule. Equivalently, you can construct the parallelogram: draw and both from the origin, complete the parallelogram, and the diagonal from the origin to the far corner is . Both constructions give the same result.
For example, if and , then . The sum points diagonally between the two original vectors — it is a compromise of directions, weighted by their magnitudes.
Vector Addition — drag the tips of a and b
Try dragging the tips of a and b in the visualization. Watch how the purple sum vector always occupies the far corner of the parallelogram, and notice that addition is commutative: swapping a and b gives the same result.
Scalar Multiplication
A scalar is just a number — no direction, only magnitude. When you multiply a vector by a scalar , you scale it:
Geometrically, is a vector along the same line as , stretched or shrunk by a factor of . When , the vector grows. When , it shrinks. When , the vector flips to point in the exact opposite direction. When , the result is the zero vector — a vector with no direction and no magnitude, sitting at the origin.
For and , we get . Same direction, two-and-a-half times as long.
Scalar Multiplication — adjust λ
Notice in the visualization that the original vector appears as a gray ghost for reference. As you drag below zero, the vector flips — it now points into the third quadrant while points into the first.
Together, vector addition and scalar multiplication are the two fundamental operations of a vector space. Every other concept in linear algebra — linear combinations, span, basis, linear maps — is built from these two operations.
Vector Norms
A norm is a way of measuring the "size" or "length" of a vector. There is not one unique answer; different applications call for different notions of length, and they are all valid as long as they satisfy certain mathematical properties (non-negativity, the triangle inequality, and the requirement that only the zero vector has zero norm).
The L2 norm (Euclidean norm) is the one you learn in school:
This is the straight-line distance from the origin to the tip of . For , we get . This is the familiar Pythagorean theorem — the hypotenuse of a right triangle with legs 3 and 4 is 5. In ML, L2 is the default for measuring distances in embedding space, for computing gradient magnitudes, and for L2 regularization (weight decay), which penalizes large weights by their squared L2 norm.
The L1 norm (Manhattan norm) counts distance as if you were navigating a city grid:
For , the L1 norm is . You cannot cut diagonally; you must travel along axes. The L1 norm famously promotes sparsity in ML. Lasso regression penalizes the L1 norm of weights, which tends to push many weights exactly to zero — it is a way of making models prefer simple, sparse explanations. Comparing L1 to L2, L1 penalizes all nonzero components equally, while L2 penalizes large components much more harshly than small ones.
The L∞ norm (Chebyshev norm) takes the worst-case coordinate:
For , this is . The L∞ norm asks: "what is the largest coordinate?" It appears in robustness settings — for example, adversarial attacks on neural networks are often constrained by L∞ budgets, meaning no single pixel can change by more than .
A useful way to understand all three norms simultaneously is through their unit balls — the set of all vectors at distance exactly 1 from the origin. The L2 unit ball is the familiar circle. The L1 unit ball is a diamond (a square rotated 45°). The L∞ unit ball is an axis-aligned square.
Unit Ball & Norms — drag the point, toggle the norm
Point p = [1.20, 0.80]
Drag the point and watch how its L1, L2, and L∞ values change independently. Toggle between norms to see how each unit ball is shaped. Notice that the L∞ ball is the largest, the L2 ball sits inside it, and the L1 ball is smallest — a consequence of the inequality (in 2D, ).
Once you have a norm, you can also define a unit vector — a vector of length exactly 1 pointing in the same direction as :
Dividing by the L2 norm strips out the magnitude and leaves only the direction. For , , and indeed .
Distance Between Vectors
Once you have a norm, you get distance for free: the distance between two vectors and is the norm of their difference:
Each norm gives a different distance. L2 distance is straight-line distance. L1 distance is the sum of coordinate differences — how far you would travel on a grid. L∞ distance is the largest coordinate difference between the two vectors.
This is where our opening question about word embeddings gets a real answer. If "king" is embedded at and "queen" at , then tells you how far apart they are in Euclidean space. But there is a subtlety: for embeddings, direction often matters more than magnitude. Two embeddings might have very different L2 norms simply because one word appears more frequently than another, inflating its magnitude. The standard practice is to measure cosine similarity — essentially the angle between the two vectors — which ignores magnitude entirely.
Cosine similarity is the subject of the next post in this series, where we will need the dot product. But you can already see the shape of the idea: cosine similarity compares unit vectors, so it is the L2 distance between and in disguise.
Wrapping Up
When a language model tells you that "king" and "queen" are close in embedding space, it means: the 768-dimensional vectors representing these words are nearby under some notion of distance, usually cosine similarity but sometimes L2. The words exist as points in , and all of vector addition, scalar multiplication, and norms operate in that space exactly as they do in 2D — you just cannot draw them.
That is what makes linear algebra the right language for ML. The geometry is real even when it is invisible.
In the next post, we will add the dot product to our toolkit — the operation that measures how much two vectors point in the same direction, and the foundation of cosine similarity, attention mechanisms, and nearly everything else.