S. Roy

Blog Post

The Spectral Theorem

Symmetric matrices can always be diagonalized by an orthogonal matrix — their eigenvectors form a natural coordinate system for the data. This is the spectral theorem, and it underlies PCA, kernel methods, and graph Laplacians.

Views: 4 min readCite

In Post 5 we found that symmetric matrices have a special property: their eigenvalues are always real and their eigenvectors are always orthogonal. This isn't a coincidence — it's the consequence of one of the deepest results in linear algebra.

The Spectral Theorem says that every real symmetric matrix can be decomposed as:

A=QΛQA = Q \Lambda Q^\top

where QQ is an orthogonal matrix (its columns are the eigenvectors) and Λ\Lambda is a diagonal matrix of eigenvalues.

This factorization is called the eigendecomposition or spectral decomposition, and it unlocks a clean geometric story.


The Three-Step Geometry

Think of multiplying a vector x\mathbf{x} by AA:

Ax=QΛQxA\mathbf{x} = Q \Lambda Q^\top \mathbf{x}

This happens in three steps:

  1. QxQ^\top \mathbf{x} — rotate x\mathbf{x} into the eigenbasis (the coordinate system defined by eigenvectors)
  2. Λ(Qx)\Lambda (Q^\top \mathbf{x}) — scale each coordinate independently by the corresponding eigenvalue
  3. Q()Q(\cdots) — rotate back to the original frame

The eigenvectors are the natural axes of the transformation. In that coordinate system, AA is just a scaling operation — diagonal. No rotation, no mixing.

Spectral Decomposition: A = QΛQᵀ

Start: unit circleλ₁ = 3.62λ₂ = 1.38

Formal Statement

Theorem (Spectral Theorem for real symmetric matrices). Let ARn×nA \in \mathbb{R}^{n \times n} with A=AA = A^\top. Then there exists an orthogonal matrix QRn×nQ \in \mathbb{R}^{n \times n} and a diagonal matrix Λ=diag(λ1,,λn)\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n) such that:

A=QΛQA = Q \Lambda Q^\top

The columns q1,,qn\mathbf{q}_1, \ldots, \mathbf{q}_n of QQ are orthonormal eigenvectors of AA, and λi\lambda_i is the eigenvalue for qi\mathbf{q}_i.

Since QQ is orthogonal, Q=Q1Q^\top = Q^{-1}, so this is equivalent to:

QAQ=ΛQ^\top A Q = \Lambda

That is, QQ diagonalizes AA.


The Eigenbasis

Any vector x\mathbf{x} can be expressed in terms of the eigenvectors:

x=c1q1+c2q2++cnqn\mathbf{x} = c_1 \mathbf{q}_1 + c_2 \mathbf{q}_2 + \cdots + c_n \mathbf{q}_n

The coordinates ci=qixc_i = \mathbf{q}_i^\top \mathbf{x} are just dot products (projections), because the eigenvectors are orthonormal.

Then:

Ax=c1λ1q1+c2λ2q2++cnλnqnA\mathbf{x} = c_1 \lambda_1 \mathbf{q}_1 + c_2 \lambda_2 \mathbf{q}_2 + \cdots + c_n \lambda_n \mathbf{q}_n

Each eigenvector direction is scaled independently. This is the key insight: in the eigenbasis, AA is trivially diagonal.

Eigenbasis — Drag the vector

q₁ (λ=3.6)q₂ (λ=1.4)
Standard basis
x = (1.20, 0.80)
Eigenbasis coords
c₁ = 1.44, c₂ = 0.05

Matrix Powers Made Easy

One of the most practical consequences is that repeated matrix multiplication becomes trivial.

From the spectral decomposition:

A2=(QΛQ)(QΛQ)=QΛ2QA^2 = (Q \Lambda Q^\top)(Q \Lambda Q^\top) = Q \Lambda^2 Q^\top

because QQ=IQ^\top Q = I. By induction:

An=QΛnQA^n = Q \Lambda^n Q^\top

And Λn\Lambda^n is just a diagonal matrix with entries λin\lambda_i^n — raising each eigenvalue to the nn-th power.

This means: the long-term behavior of AnxA^n \mathbf{x} is dominated by the largest eigenvalue. The eigenvector with λ1|\lambda_1| largest grows fastest; directions with λi<1|\lambda_i| < 1 shrink to zero.

Matrix Powers: Aⁿ = QΛⁿQᵀ (λ₁=2, λ₂=0.5)

Dominant direction (q₁) grows by 2×; q₂ shrinks by 0.500×

Applications

PCA (Principal Component Analysis)

Given a data matrix XRm×nX \in \mathbb{R}^{m \times n}, the covariance matrix is:

C=1mXXC = \frac{1}{m} X^\top X

CC is symmetric positive semidefinite. The spectral theorem gives C=QΛQC = Q \Lambda Q^\top. The columns of QQ are the principal components — the natural axes of variance. The eigenvalues λi\lambda_i are the variance along each axis.

Graph Laplacians

Given an undirected graph GG with adjacency matrix WW, the graph Laplacian is L=DWL = D - W where DD is the degree matrix. LL is symmetric positive semidefinite.

Its eigenvectors are the graph Fourier basis — the natural frequencies of signals on the graph. The smallest non-zero eigenvalue (the Fiedler value) measures how well-connected the graph is.

Kernel Matrices

In kernel methods (like SVMs), the kernel matrix Kij=k(xi,xj)K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j) is symmetric positive semidefinite. Its spectral decomposition gives the feature space implicitly defined by the kernel.


Positive Definite Matrices

A symmetric matrix is positive definite (PD) if all its eigenvalues are strictly positive:

λi>0i\lambda_i > 0 \quad \forall i

Equivalently, xAx>0\mathbf{x}^\top A \mathbf{x} > 0 for all x0\mathbf{x} \neq \mathbf{0}. Covariance matrices are positive semidefinite (λi0\lambda_i \geq 0); kernel matrices are typically positive definite. We'll explore these in detail in Post 11.


Summary

The spectral theorem is a complete characterization of symmetric matrices:

PropertyResult
EigenvaluesAll real
EigenvectorsOrthogonal (orthonormal if normalized)
DecompositionA=QΛQA = Q\Lambda Q^\top
InverseA1=QΛ1QA^{-1} = Q\Lambda^{-1}Q^\top (if λi0\lambda_i \neq 0)
Matrix powerAn=QΛnQA^n = Q\Lambda^n Q^\top

Every symmetric matrix is, in its own coordinate system, just a scaling operation. The spectral theorem tells us what that coordinate system is.

In Post 11 we'll focus on the positive definite case and see why it's the natural setting for optimization problems.

Cite this work

Generated from article front matter.

Roy, Swastik. (2026). The Spectral Theorem. S. Roy. https://swastikroy.me/blog/linear-algebra-spectral-theorem

Export PDF opens your browser’s print dialog — choose “Save as PDF” for a Zenodo-ready file.