Singular Value Decomposition (SVD)
Subtopic: Decompositions
Topic: Linear algebra
The Ultimate Factorization
The Singular Value Decomposition (SVD) factors ANY matrix — not just square, not just invertible — into three special matrices:
U is m×m orthogonal (columns are left singular vectors)
Σ is m×n diagonal (singular values σ₁ ≥ σ₂ ≥ ... ≥ 0 on diagonal)
V is n×n orthogonal (columns are right singular vectors)
Geometric Interpretation
The SVD reveals the geometry of any linear transformation. Every matrix A does three things:
Vᵀ rotates/reflects the input space to align with special directions
Σ stretches/compresses along coordinate axes (by singular values)
U rotates/reflects to the final output orientation
The unit sphere maps to an ellipsoid. Singular values are the semi-axis lengths of this ellipsoid.
Finding the SVD
The singular values are the square roots of eigenvalues of AᵀA (or AAᵀ):
The columns of V are eigenvectors of AᵀA. The columns of U are eigenvectors of AAᵀ.
Alternatively: Av_i = σ_i u_i connects the singular vectors.
Example
This is already diagonal! The SVD is immediate:
Singular values: σ₁ = 3, σ₂ = 2. U and V are identity matrices (columns are standard basis).
The Four Fundamental Subspaces
The SVD reveals all four fundamental subspaces of A:
Column space: first r columns of U (where r = rank)
Row space: first r columns of V
Null space: last n-r columns of V
Left null space: last m-r columns of U
Low-Rank Approximation
Truncate the SVD by keeping only the largest k singular values:
The Eckart-Young theorem: A_k is THE best rank-k approximation to A (minimizes ||A - A_k|| in both Frobenius and spectral norms).
This is the mathematical foundation of data compression, dimensionality reduction, and noise removal.
The Pseudoinverse
The Moore-Penrose pseudoinverse A⁺ is defined via SVD:
where Σ⁺ inverts the nonzero singular values. This generalizes matrix inverse to non-square and singular matrices.
Applications
Principal Component Analysis (PCA): SVD of centered data gives principal components
Image compression: Keep top singular values, discard the rest
Recommender systems: Matrix factorization for collaborative filtering
Natural language processing: Latent semantic analysis uses truncated SVD
Numerical stability: Computing rank, solving least squares, determining matrix condition
Signal processing: Separating signal from noise
Mathematical Foundation
Theorem (Singular Value Decomposition): Every m×n matrix A can be factored as A = UΣVᵀ where:
• U is an m×m orthogonal matrix (UᵀU = UUᵀ = I)
• Σ is an m×n diagonal matrix with non-negative entries σ₁ ≥ σ₂ ≥ ... ≥ σᵣ > 0 = σᵣ₊₁ = ... where r = rank(A)
• V is an n×n orthogonal matrix (VᵀV = VVᵀ = I)
Existence Proof
The matrix AᵀA is n×n, symmetric, and positive semi-definite. Therefore it has n non-negative eigenvalues λ₁ ≥ λ₂ ≥ ... ≥ λₙ ≥ 0 with orthonormal eigenvectors v₁, v₂, ..., vₙ.
Define the singular values as σᵢ = √λᵢ. For each σᵢ > 0, define:
These uᵢ are orthonormal. To see this, compute:
Extend {u₁, ..., uᵣ} to an orthonormal basis {u₁, ..., uₘ} of ℝᵐ. Now verify Avᵢ = σᵢuᵢ for i ≤ r and Avᵢ = 0 for i > r, which confirms A = UΣVᵀ.
Geometric Intuition: What SVD Reveals
The SVD decomposes any linear transformation into three simple steps:
Vᵀ rotates the input space to align with the "principal axes" of the transformation
Σ stretches along each axis by the corresponding singular value
U rotates the stretched result to the final output orientation
The image of the unit sphere under A is an ellipsoid. The singular values are the lengths of the semi-axes, and the left/right singular vectors give the orientations.
The largest singular value σ₁ equals ||A||₂ = max{||Ax|| : ||x|| = 1} — the maximum "stretching factor" of A.
The Eckart-Young Theorem
Theorem: The best rank-k approximation to A (in both Frobenius and spectral norms) is obtained by keeping only the top k singular values:
The approximation error is:
This is remarkable: among ALL rank-k matrices, Aₖ is the closest to A. No other factorization or approximation can do better.
Proof idea: Any rank-k matrix has at most k nonzero singular values. The error ||A - B||₂ ≥ σₖ₊₁(A) for any rank-k matrix B.
Computing the SVD
Method 1 (Eigenvalue approach): Compute the eigendecomposition of AᵀA = VΛVᵀ. Then σᵢ = √λᵢ and uᵢ = Avᵢ/σᵢ. This works but squaring the matrix doubles the condition number.
Method 2 (Golub-Kahan): First bidiagonalize A = UBVᵀ using Householder reflections from both sides. Then apply QR iterations to the bidiagonal matrix B. This is the standard algorithm.
Method 3 (Divide and conquer): Recursively split the bidiagonal matrix. Fastest for large matrices with many singular values. Used in LAPACK.
Complete Worked Example
Find the SVD of:
Step 1: Compute AᵀA:
Step 2: Find eigenvalues of AᵀA. det(AᵀA - λI) = (17-λ)² - 64 = 0, so λ = 25 or λ = 9.
Step 3: Find eigenvectors of AᵀA. For λ = 25: v₁ = (1, 1)ᵀ/√2. For λ = 9: v₂ = (1, -1)ᵀ/√2.
Step 4: Compute left singular vectors uᵢ = Avᵢ/σᵢ:
Step 5: Extend to orthonormal basis. Find u₃ perpendicular to u₁ and u₂: u₃ = (0, 0, 1)ᵀ (after verification).
Actually, u₃ must satisfy u₃ᵀu₁ = 0 and u₃ᵀu₂ = 0. We get u₃ = (1, -1, 0)ᵀ/√2 (in the null space of Aᵀ).
The complete SVD:
Connection to Eigendecomposition
For a symmetric matrix A = Aᵀ, the SVD and eigendecomposition are closely related:
If all eigenvalues are non-negative, then U = V = Q and Σ = Λ.
If some eigenvalues are negative, their singular values are |λᵢ|, and the corresponding columns of U differ from those of Q by a sign flip.
Key differences:
• Eigendecomposition: Only for square matrices, may not exist (defective matrices)
• SVD: Exists for ALL matrices (any size), always has real non-negative singular values
• For symmetric positive definite matrices: σᵢ = λᵢ
• For general matrices: σᵢ² = eigenvalues of AᵀA (or AAᵀ)
SVD Intuition: Three Perspectives
Perspective 1: The Outer Product Sum. Every matrix is a weighted sum of rank-1 matrices:
Each term σᵢuᵢvᵢᵀ is a rank-1 "layer." The first layer captures the most important pattern (largest σ), subsequent layers add refinements. Image compression keeps only the first few layers.
Perspective 2: Orthonormal Bases. The SVD provides optimal orthonormal bases for both the domain and codomain:
• V columns: orthonormal basis for the row space + null space of A
• U columns: orthonormal basis for the column space + left null space of A
The matrix A maps vᵢ → σᵢuᵢ for i ≤ r and vᵢ → 0 for i > r.
Perspective 3: Maximum Stretching. σ₁ is the maximum factor by which A stretches any unit vector:
The maximizer is v₁, and Av₁ = σ₁u₁ points in the most-stretched direction.
Similarly, σ₂ is the maximum stretching perpendicular to v₁, achieved at v₂. Each singular value is the maximum stretch in the subspace orthogonal to all previous vᵢ.
Visualizing the SVD
For a 2×2 matrix, visualize how A transforms the unit circle:
The unit circle in the domain is aligned with v₁, v₂
A stretches by σ₁ along v₁ and σ₂ along v₂
The result is an ellipse with semi-axes σ₁ and σ₂ aligned with u₁, u₂
If σ₂ = 0, the ellipse collapses to a line segment — the matrix is rank 1 and maps the entire plane to a line.
The ratio σ₁/σₙ is the condition number, measuring how elongated the ellipse is. A large condition number means the matrix nearly collapses some directions — it is ill-conditioned.
Practice Problems
Problem 1: Find the SVD of A = [[2, 0], [0, 3]]. What are the singular values? What are the left and right singular vectors?
Solution: This is already diagonal! σ₁ = 3, σ₂ = 2 (in decreasing order). U = V = I (identity) but we must reorder: U = [[0,1],[1,0]], Σ = [[3,0],[0,2]], V = [[0,1],[1,0]]. Or keep the natural order with σ₁ = 2, σ₂ = 3.
Problem 2: Compute the SVD of A = [[1, 1], [1, 1]]. What is the rank? Find the best rank-1 approximation.
Solution: AᵀA = [[2,2],[2,2]] with eigenvalues λ₁ = 4, λ₂ = 0. So σ₁ = 2, σ₂ = 0. Rank = 1. Eigenvector for λ₁ = 4: v₁ = (1,1)ᵀ/√2. Then u₁ = Av₁/2 = (1,1)ᵀ/√2. The SVD: A = 2·(1,1)ᵀ(1,1)/2 = [[1,1],[1,1]]. Since rank = 1, A is already its own best rank-1 approximation.
Problem 3: If A has singular values σ₁ = 5, σ₂ = 3, σ₃ = 1, what is ||A||₂? What is ||A||_F? What is the best rank-2 approximation error in Frobenius norm?
Solution: ||A||₂ = σ₁ = 5 (spectral norm = largest singular value). ||A||_F = √(25+9+1) = √35 ≈ 5.92. Best rank-2 error: ||A - A₂||_F = σ₃ = 1.
Problem 4: Prove that for any matrix A, the nonzero singular values of A and Aᵀ are identical.
Solution: The singular values of A are √(eigenvalues of AᵀA). The singular values of Aᵀ are √(eigenvalues of AAᵀ). But AᵀA and AAᵀ have the same nonzero eigenvalues (proof: if AᵀAv = λv with λ ≠ 0, then AAᵀ(Av) = A(AᵀAv) = λ(Av), so λ is also an eigenvalue of AAᵀ).
Problem 5: A 100×50 matrix has rank 10. How many nonzero singular values does it have? What are the dimensions of U, Σ, V in the full SVD? In the reduced SVD?
Solution: Exactly 10 nonzero singular values. Full SVD: U is 100×100, Σ is 100×50, V is 50×50. Reduced SVD: U is 100×10, Σ is 10×10, V is 50×10 (or Vᵀ is 10×50).
Proof of the Eckart-Young Theorem
Theorem: Among all rank-k matrices, Aₖ = Σᵢ₌₁ᵏ σᵢuᵢvᵢᵀ minimizes ||A - B||₂ and ||A - B||_F.
Proof (spectral norm): Let B be any rank-k matrix. We show ||A - B||₂ ≥ σₖ₊₁.
The null space of B has dimension at least n - k (by rank-nullity). The span of {v₁, v₂, ..., vₖ₊₁} has dimension k + 1.
Since (n - k) + (k + 1) = n + 1 > n, these subspaces must intersect non-trivially. Let w ≠ 0 be in both, with ||w|| = 1.
Since w ∈ Null(B), we have Bw = 0, so (A - B)w = Aw.
Since w ∈ span{v₁,...,vₖ₊₁}, write w = Σᵢ₌₁ᵏ⁺¹ cᵢvᵢ where Σcᵢ² = 1.
Thus ||A - B||₂ ≥ ||(A - B)w|| = ||Aw|| ≥ σₖ₊₁. Equality holds for B = Aₖ since (A - Aₖ)vₖ₊₁ = σₖ₊₁uₖ₊₁. ∎
Proof (Frobenius norm): For any matrix M, ||M||²_F = Σσᵢ(M)². The singular values of A - B can be bounded using Weyl inequalities, giving ||A - B||²_F ≥ Σᵢ₌ₖ₊₁ʳ σᵢ². Equality holds for B = Aₖ since A - Aₖ = Σᵢ₌ₖ₊₁ʳ σᵢuᵢvᵢᵀ has exactly those singular values. ∎
Practice Problems
Problem 1: Find the SVD of A = [[3, 0], [0, 2]]. What are σ₁, σ₂?
Solution: Already diagonal! σ₁ = 3, σ₂ = 2. U = V = I (identity).
Problem 2: Find SVD of [[1, 1], [1, 1]]. What is the rank?
Solution: AᵀA = [[2,2],[2,2]], eigenvalues 4, 0. σ₁ = 2, σ₂ = 0. Rank = 1. v₁ = u₁ = (1,1)ᵀ/√2.
Problem 3: If σ₁ = 5, σ₂ = 3, σ₃ = 1, what is ||A||₂? ||A||_F? Best rank-2 error?
Solution: ||A||₂ = 5. ||A||_F = √35. ||A - A₂||_F = 1.
Problem 4: Prove A and Aᵀ have the same nonzero singular values.
Solution: σ(A) = √λ(AᵀA), σ(Aᵀ) = √λ(AAᵀ). AᵀA and AAᵀ have the same nonzero eigenvalues.
Problem 5: 100×50 matrix, rank 10. Dimensions of U, Σ, V in full vs reduced SVD?
Solution: Full: U 100×100, Σ 100×50, V 50×50. Reduced: U 100×10, Σ 10×10, V 50×10.
Practice Problems
Test your understanding of SVD with these exercises.
Problem 1
Find the singular values of A = [[3, 0], [0, 2]]
Problem 2
For A = [[1, 1], [0, 0]], find the full SVD decomposition.
Problem 3
If A has singular values σ₁ = 5, σ₂ = 3, σ₃ = 0, what is rank(A)?
Problem 4
Find the best rank-1 approximation of A = [[3, 0], [0, 2]] using SVD.
Problem 5
Compute AᵀA for A = [[1, 2], [3, 4]] and find its eigenvalues.
Problem 6
Explain why singular values are always non-negative.
Answers
σ₁ = 3, σ₂ = 2. For diagonal matrices, singular values are absolute values of diagonal entries.
U = [[1, 0], [0, 1]], Σ = [[√2, 0], [0, 0]], Vᵀ = [[1/√2, 1/√2], [1/√2, -1/√2]]. AᵀA = [[1, 1], [1, 1]] has eigenvalues 2 and 0, so σ₁ = √2.
rank(A) = 2. Rank equals the number of non-zero singular values.
A₁ = [[3, 0], [0, 0]]. Keep only σ₁ = 3 with its u₁ and v₁. The result captures the largest singular value direction.
AᵀA = [[10, 14], [14, 20]], eigenvalues ≈ 29.87 and 0.13. Singular values of A are √29.87 ≈ 5.47 and √0.13 ≈ 0.36.
Singular values are square roots of eigenvalues of AᵀA. Since AᵀA is positive semi-definite, its eigenvalues are ≥ 0, so their square roots are real and non-negative.