Singular Value Decomposition (SVD)

Subtopic: Decompositions
Topic: Linear algebra

The Ultimate Factorization

The Singular Value Decomposition (SVD) factors ANY matrix — not just square, not just invertible — into three special matrices:

A=U*Σ*VT

U is m×m orthogonal (columns are left singular vectors)
Σ is m×n diagonal (singular values σ₁ ≥ σ₂ ≥ ... ≥ 0 on diagonal)
V is n×n orthogonal (columns are right singular vectors)

Geometric Interpretation

The SVD reveals the geometry of any linear transformation. Every matrix A does three things:

Vᵀ rotates/reflects the input space to align with special directions
Σ stretches/compresses along coordinate axes (by singular values)
U rotates/reflects to the final output orientation

The unit sphere maps to an ellipsoid. Singular values are the semi-axis lengths of this ellipsoid.

Finding the SVD

The singular values are the square roots of eigenvalues of AᵀA (or AAᵀ):

The columns of V are eigenvectors of AᵀA. The columns of U are eigenvectors of AAᵀ.

Alternatively: Av_i = σ_i u_i connects the singular vectors.

Example

A=[[3,0],[0,2],[0,0]]

This is already diagonal! The SVD is immediate:

A=[[1,0,0],[0,1,0],[0,0,1]]*[[3,0],[0,2],[0,0]]*[[1,0],[0,1]]

Singular values: σ₁ = 3, σ₂ = 2. U and V are identity matrices (columns are standard basis).

The Four Fundamental Subspaces

The SVD reveals all four fundamental subspaces of A:

Column space: first r columns of U (where r = rank)
Row space: first r columns of V
Null space: last n-r columns of V
Left null space: last m-r columns of U

Low-Rank Approximation

Truncate the SVD by keeping only the largest k singular values:

(A_k)=(∑_i=1^k)((σ_i))*(u_i)*(v_i)T

The Eckart-Young theorem: A_k is THE best rank-k approximation to A (minimizes ||A - A_k|| in both Frobenius and spectral norms).

This is the mathematical foundation of data compression, dimensionality reduction, and noise removal.

The Pseudoinverse

The Moore-Penrose pseudoinverse A⁺ is defined via SVD:

A+=V*Σ+*UT

where Σ⁺ inverts the nonzero singular values. This generalizes matrix inverse to non-square and singular matrices.

Applications

Principal Component Analysis (PCA): SVD of centered data gives principal components
Image compression: Keep top singular values, discard the rest
Recommender systems: Matrix factorization for collaborative filtering
Natural language processing: Latent semantic analysis uses truncated SVD
Numerical stability: Computing rank, solving least squares, determining matrix condition
Signal processing: Separating signal from noise

Mathematical Foundation

Theorem (Singular Value Decomposition): Every m×n matrix A can be factored as A = UΣVᵀ where:

• U is an m×m orthogonal matrix (UᵀU = UUᵀ = I)

• Σ is an m×n diagonal matrix with non-negative entries σ₁ ≥ σ₂ ≥ ... ≥ σᵣ > 0 = σᵣ₊₁ = ... where r = rank(A)

• V is an n×n orthogonal matrix (VᵀV = VVᵀ = I)

A=U*Σ*VT=[[|,,|],[(u_1),⋯,(u_m)],[|,,|]]*[[(σ_1),,0],[,⋱,],[0,,(σ_r)],[,0,]]*[[−,(v_1)T,−],[,⋮,],[−,(v_n)T,−]]

Existence Proof

The matrix AᵀA is n×n, symmetric, and positive semi-definite. Therefore it has n non-negative eigenvalues λ₁ ≥ λ₂ ≥ ... ≥ λₙ ≥ 0 with orthonormal eigenvectors v₁, v₂, ..., vₙ.

Define the singular values as σᵢ = √λᵢ. For each σᵢ > 0, define:

(u_i)=1/(σ_i)*A*(v_i)

These uᵢ are orthonormal. To see this, compute:

(u_i)T*(u_j)=1/((σ_i)*(σ_j))*(v_i)T*AT*A*(v_j)=(λ_j)/((σ_i)*(σ_j))*(v_i)T*(v_j)={[1,i=j],[0,i≠j])

Extend {u₁, ..., uᵣ} to an orthonormal basis {u₁, ..., uₘ} of ℝᵐ. Now verify Avᵢ = σᵢuᵢ for i ≤ r and Avᵢ = 0 for i > r, which confirms A = UΣVᵀ.

Geometric Intuition: What SVD Reveals

The SVD decomposes any linear transformation into three simple steps:

Vᵀ rotates the input space to align with the "principal axes" of the transformation
Σ stretches along each axis by the corresponding singular value
U rotates the stretched result to the final output orientation

The image of the unit sphere under A is an ellipsoid. The singular values are the lengths of the semi-axes, and the left/right singular vectors give the orientations.

{x:||x||=1}→{A*x:||x||=1}=ellipsoid with semi-axes *(σ_1),(σ_2),…

The largest singular value σ₁ equals ||A||₂ = max{||Ax|| : ||x|| = 1} — the maximum "stretching factor" of A.

The Eckart-Young Theorem

Theorem: The best rank-k approximation to A (in both Frobenius and spectral norms) is obtained by keeping only the top k singular values:

(A_k)=(∑_i=1^k)((σ_i))*(u_i)*(v_i)T

The approximation error is:

||A−(A_k)||=√(,(σ_k+1)2+(σ_k+2)2+⋯+(σ_r)2),||A−(A_k)||=(σ_k+1)

This is remarkable: among ALL rank-k matrices, Aₖ is the closest to A. No other factorization or approximation can do better.

Proof idea: Any rank-k matrix has at most k nonzero singular values. The error ||A - B||₂ ≥ σₖ₊₁(A) for any rank-k matrix B.

Computing the SVD

Method 1 (Eigenvalue approach): Compute the eigendecomposition of AᵀA = VΛVᵀ. Then σᵢ = √λᵢ and uᵢ = Avᵢ/σᵢ. This works but squaring the matrix doubles the condition number.

Method 2 (Golub-Kahan): First bidiagonalize A = UBVᵀ using Householder reflections from both sides. Then apply QR iterations to the bidiagonal matrix B. This is the standard algorithm.

Method 3 (Divide and conquer): Recursively split the bidiagonal matrix. Fastest for large matrices with many singular values. Used in LAPACK.

Complete Worked Example

Find the SVD of:

A=[[3,2],[2,3],[2,−2]]

Step 1: Compute AᵀA:

AT*A=[[3,2,2],[2,3,−2]]*[[3,2],[2,3],[2,−2]]=[[17,8],[8,17]]

Step 2: Find eigenvalues of AᵀA. det(AᵀA - λI) = (17-λ)² - 64 = 0, so λ = 25 or λ = 9.

(σ_1)=√(,25)=5,(σ_2)=√(,9)=3

Step 3: Find eigenvectors of AᵀA. For λ = 25: v₁ = (1, 1)ᵀ/√2. For λ = 9: v₂ = (1, -1)ᵀ/√2.

Step 4: Compute left singular vectors uᵢ = Avᵢ/σᵢ:

(u_1)=1/5*A*[[1/√(,2)],[1/√(,2)]]=1/(5√(,2))*[[5],[5],[0]]=[[1/√(,2)],[1/√(,2)],[0]]

(u_2)=1/3*A*[[1/√(,2)],[−1/√(,2)]]=1/(3√(,2))*[[1],[−1],[4]]=[[1/√(,18)],[−1/√(,18)],[4/√(,18)]]

Step 5: Extend to orthonormal basis. Find u₃ perpendicular to u₁ and u₂: u₃ = (0, 0, 1)ᵀ (after verification).

Actually, u₃ must satisfy u₃ᵀu₁ = 0 and u₃ᵀu₂ = 0. We get u₃ = (1, -1, 0)ᵀ/√2 (in the null space of Aᵀ).

The complete SVD:

U=[[1/√(,2),1/√(,18),1/√(,2)],[1/√(,2),−1/√(,18),−1/√(,2)],[0,4/√(,18),0]],Σ=[[5,0],[0,3],[0,0]],V=[[1/√(,2),1/√(,2)],[1/√(,2),−1/√(,2)]]

Connection to Eigendecomposition

For a symmetric matrix A = Aᵀ, the SVD and eigendecomposition are closely related:

A=Q*Λ*QT=U*Σ*VT

If all eigenvalues are non-negative, then U = V = Q and Σ = Λ.

If some eigenvalues are negative, their singular values are |λᵢ|, and the corresponding columns of U differ from those of Q by a sign flip.

Key differences:

• Eigendecomposition: Only for square matrices, may not exist (defective matrices)

• SVD: Exists for ALL matrices (any size), always has real non-negative singular values

• For symmetric positive definite matrices: σᵢ = λᵢ

• For general matrices: σᵢ² = eigenvalues of AᵀA (or AAᵀ)

SVD Intuition: Three Perspectives

Perspective 1: The Outer Product Sum. Every matrix is a weighted sum of rank-1 matrices:

A=(σ_1)*(u_1)*(v_1)T+(σ_2)*(u_2)*(v_2)T+⋯+(σ_r)*(u_r)*(v_r)T

Each term σᵢuᵢvᵢᵀ is a rank-1 "layer." The first layer captures the most important pattern (largest σ), subsequent layers add refinements. Image compression keeps only the first few layers.

Perspective 2: Orthonormal Bases. The SVD provides optimal orthonormal bases for both the domain and codomain:

• V columns: orthonormal basis for the row space + null space of A

• U columns: orthonormal basis for the column space + left null space of A

The matrix A maps vᵢ → σᵢuᵢ for i ≤ r and vᵢ → 0 for i > r.

Perspective 3: Maximum Stretching. σ₁ is the maximum factor by which A stretches any unit vector:

(σ_1)=max(|)|A*x||=||A||

The maximizer is v₁, and Av₁ = σ₁u₁ points in the most-stretched direction.

Similarly, σ₂ is the maximum stretching perpendicular to v₁, achieved at v₂. Each singular value is the maximum stretch in the subspace orthogonal to all previous vᵢ.

Visualizing the SVD

For a 2×2 matrix, visualize how A transforms the unit circle:

The unit circle in the domain is aligned with v₁, v₂
A stretches by σ₁ along v₁ and σ₂ along v₂
The result is an ellipse with semi-axes σ₁ and σ₂ aligned with u₁, u₂

If σ₂ = 0, the ellipse collapses to a line segment — the matrix is rank 1 and maps the entire plane to a line.

The ratio σ₁/σₙ is the condition number, measuring how elongated the ellipse is. A large condition number means the matrix nearly collapses some directions — it is ill-conditioned.

Practice Problems

Problem 1: Find the SVD of A = [[2, 0], [0, 3]]. What are the singular values? What are the left and right singular vectors?

Solution: This is already diagonal! σ₁ = 3, σ₂ = 2 (in decreasing order). U = V = I (identity) but we must reorder: U = [[0,1],[1,0]], Σ = [[3,0],[0,2]], V = [[0,1],[1,0]]. Or keep the natural order with σ₁ = 2, σ₂ = 3.

Problem 2: Compute the SVD of A = [[1, 1], [1, 1]]. What is the rank? Find the best rank-1 approximation.

Solution: AᵀA = [[2,2],[2,2]] with eigenvalues λ₁ = 4, λ₂ = 0. So σ₁ = 2, σ₂ = 0. Rank = 1. Eigenvector for λ₁ = 4: v₁ = (1,1)ᵀ/√2. Then u₁ = Av₁/2 = (1,1)ᵀ/√2. The SVD: A = 2·(1,1)ᵀ(1,1)/2 = [[1,1],[1,1]]. Since rank = 1, A is already its own best rank-1 approximation.

Problem 3: If A has singular values σ₁ = 5, σ₂ = 3, σ₃ = 1, what is ||A||₂? What is ||A||_F? What is the best rank-2 approximation error in Frobenius norm?

Solution: ||A||₂ = σ₁ = 5 (spectral norm = largest singular value). ||A||_F = √(25+9+1) = √35 ≈ 5.92. Best rank-2 error: ||A - A₂||_F = σ₃ = 1.

Problem 4: Prove that for any matrix A, the nonzero singular values of A and Aᵀ are identical.

Solution: The singular values of A are √(eigenvalues of AᵀA). The singular values of Aᵀ are √(eigenvalues of AAᵀ). But AᵀA and AAᵀ have the same nonzero eigenvalues (proof: if AᵀAv = λv with λ ≠ 0, then AAᵀ(Av) = A(AᵀAv) = λ(Av), so λ is also an eigenvalue of AAᵀ).

Problem 5: A 100×50 matrix has rank 10. How many nonzero singular values does it have? What are the dimensions of U, Σ, V in the full SVD? In the reduced SVD?

Solution: Exactly 10 nonzero singular values. Full SVD: U is 100×100, Σ is 100×50, V is 50×50. Reduced SVD: U is 100×10, Σ is 10×10, V is 50×10 (or Vᵀ is 10×50).

Proof of the Eckart-Young Theorem

Theorem: Among all rank-k matrices, Aₖ = Σᵢ₌₁ᵏ σᵢuᵢvᵢᵀ minimizes ||A - B||₂ and ||A - B||_F.

Proof (spectral norm): Let B be any rank-k matrix. We show ||A - B||₂ ≥ σₖ₊₁.

The null space of B has dimension at least n - k (by rank-nullity). The span of {v₁, v₂, ..., vₖ₊₁} has dimension k + 1.

Since (n - k) + (k + 1) = n + 1 > n, these subspaces must intersect non-trivially. Let w ≠ 0 be in both, with ||w|| = 1.

Since w ∈ Null(B), we have Bw = 0, so (A - B)w = Aw.

Since w ∈ span{v₁,...,vₖ₊₁}, write w = Σᵢ₌₁ᵏ⁺¹ cᵢvᵢ where Σcᵢ² = 1.

||A*w||2=||(∑_i=1^k+1)((c_i))*(σ_i)*(u_i)||2=(∑_i=1^k+1)((c_i)2)*(σ_i)2≥(σ_k+1)2*(∑_i=1^k+1)((c_i)2)=(σ_k+1)2

Thus ||A - B||₂ ≥ ||(A - B)w|| = ||Aw|| ≥ σₖ₊₁. Equality holds for B = Aₖ since (A - Aₖ)vₖ₊₁ = σₖ₊₁uₖ₊₁. ∎

Proof (Frobenius norm): For any matrix M, ||M||²_F = Σσᵢ(M)². The singular values of A - B can be bounded using Weyl inequalities, giving ||A - B||²_F ≥ Σᵢ₌ₖ₊₁ʳ σᵢ². Equality holds for B = Aₖ since A - Aₖ = Σᵢ₌ₖ₊₁ʳ σᵢuᵢvᵢᵀ has exactly those singular values. ∎

Practice Problems

Problem 1: Find the SVD of A = [[3, 0], [0, 2]]. What are σ₁, σ₂?

Solution: Already diagonal! σ₁ = 3, σ₂ = 2. U = V = I (identity).

Problem 2: Find SVD of [[1, 1], [1, 1]]. What is the rank?

Solution: AᵀA = [[2,2],[2,2]], eigenvalues 4, 0. σ₁ = 2, σ₂ = 0. Rank = 1. v₁ = u₁ = (1,1)ᵀ/√2.

Problem 3: If σ₁ = 5, σ₂ = 3, σ₃ = 1, what is ||A||₂? ||A||_F? Best rank-2 error?

Solution: ||A||₂ = 5. ||A||_F = √35. ||A - A₂||_F = 1.

Problem 4: Prove A and Aᵀ have the same nonzero singular values.

Solution: σ(A) = √λ(AᵀA), σ(Aᵀ) = √λ(AAᵀ). AᵀA and AAᵀ have the same nonzero eigenvalues.

Problem 5: 100×50 matrix, rank 10. Dimensions of U, Σ, V in full vs reduced SVD?

Solution: Full: U 100×100, Σ 100×50, V 50×50. Reduced: U 100×10, Σ 10×10, V 50×10.

Practice Problems

Test your understanding of SVD with these exercises.

Problem 1

Find the singular values of A = [[3, 0], [0, 2]]

Problem 2

For A = [[1, 1], [0, 0]], find the full SVD decomposition.

Problem 3

If A has singular values σ₁ = 5, σ₂ = 3, σ₃ = 0, what is rank(A)?

Problem 4

Find the best rank-1 approximation of A = [[3, 0], [0, 2]] using SVD.

Problem 5

Compute AᵀA for A = [[1, 2], [3, 4]] and find its eigenvalues.

Problem 6

Explain why singular values are always non-negative.

Answers

σ₁ = 3, σ₂ = 2. For diagonal matrices, singular values are absolute values of diagonal entries.
U = [[1, 0], [0, 1]], Σ = [[√2, 0], [0, 0]], Vᵀ = [[1/√2, 1/√2], [1/√2, -1/√2]]. AᵀA = [[1, 1], [1, 1]] has eigenvalues 2 and 0, so σ₁ = √2.
rank(A) = 2. Rank equals the number of non-zero singular values.
A₁ = [[3, 0], [0, 0]]. Keep only σ₁ = 3 with its u₁ and v₁. The result captures the largest singular value direction.
AᵀA = [[10, 14], [14, 20]], eigenvalues ≈ 29.87 and 0.13. Singular values of A are √29.87 ≈ 5.47 and √0.13 ≈ 0.36.
Singular values are square roots of eigenvalues of AᵀA. Since AᵀA is positive semi-definite, its eigenvalues are ≥ 0, so their square roots are real and non-negative.