Convex Functions

subtopic: convexity

Introduction

Imagine stretching a rubber band between any two points on a curve. If the rubber band always lies above or on the curve, never dipping below it, that curve represents a convex function. This simple geometric intuition captures one of the most powerful concepts in mathematical optimization.

Convex functions possess remarkable properties that make them the cornerstone of optimization theory. When minimizing a convex function, every local minimum is automatically a global minimum. There are no deceptive valleys or misleading plateaus—the landscape guides us inevitably toward the optimal solution.

This property has profound practical implications. Machine learning algorithms, economic models, signal processing techniques, and countless engineering applications rely on convexity to guarantee that optimization procedures find the best possible solutions efficiently. Understanding convex functions is therefore essential for anyone working with optimization in any quantitative field.

In this page, we develop the theory of convex functions rigorously, starting from the definition and building toward the powerful characterization theorems that make convexity so useful in practice.

Convex Sets

Before defining convex functions, we must understand convex sets, as the domain of a convex function must itself be convex.

Definition of a Convex Set

A set C⊆(R^n) is convex if, for any two points in the set, the entire line segment connecting them also lies within the set. Formally:

C is convex⟺∀x,y∈C,∀θ∈[0,1]:θ*x+(1−θ)*y∈C.

The expression θ*x+(1−θ)*y represents points on the line segment from y (when θ=(0) to x (when θ=1). Such a combination is called a convex combination of x and y.

Examples of Convex Sets

The entire space (R^n) is convex, as is the empty set. Halfspaces defined by linear inequalities are convex:

{x∈(R^n):aT*x≤b}.

The intersection of any collection of convex sets is convex. This implies that polyhedra, defined as intersections of halfspaces, are convex. Balls and ellipsoids are also convex sets.

Affine sets (solutions to systems of linear equations) are convex. The nonnegative orthant
{x∈(R^n):(x_i)≥0∀i} is convex, as is the positive semidefinite cone in the space of symmetric matrices.

Non-Convex Sets

Not all sets are convex. A set with a hole in it fails to be convex because line segments can pass through the hole. The union of two disjoint intervals is non-convex. Any set that is not "filled in" or has indentations will typically fail the convexity test.

Definition of Convex Functions

The Fundamental Definition

A function ƒ: C→R defined on a convex set C⊆(R^n) is convex if for all x,y∈C and all θ∈ [0, 1]:

ƒ*(θ*x+(1-θ)*y)≤θ*ƒ(x)+(1-θ)*ƒ(y)

This inequality states that the function value at any point on the line segment between x and y lies below or on the line segment connecting (x,ƒ(x)) and (y,ƒ(y)) in the graph of ƒ. Geometrically, the chord connecting any two points on the graph lies above the graph itself.

A function ƒ is strictly convex if the inequality is strict whenever x≠y and θ∈(0, 1). Strict convexity means the chord lies strictly above the graph except at the endpoints.

A function ƒ is concave if −ƒ is convex. Equivalently, ƒ is concave if the inequality above is reversed. All results for convex functions have corresponding statements for concave functions obtained by negation.

Epigraph Characterization

There is an elegant connection between convex functions and convex sets through the notion of an epigraph. The epigraph of a function ƒ: C→R is the set of points lying on or above its graph:

ep(ƒ)={(x,t)∈C×R:ƒ(x)≤t}

A function is convex if and only if its epigraph is a convex set. This characterization provides a powerful link between the study of convex functions and the geometry of convex sets, allowing techniques from one domain to be applied in the other.

Jensens Inequality

The definition of convexity extends naturally to convex combinations of more than two points. If f is convex and (θ_1),…,(θ_k)≥0 with (θ_1)+…+(θ_k)=1, then:

ƒ*((∑_i=1^k)((θ_i))*(x_i))≤(∑_i=1^k)((θ_i))*ƒ((x_i))

This is Jensens inequality in its finite form. It generalizes further to expectations: if X is a random variable taking values in the domain of a convex function ƒ, then E[ƒ(X)] ≥ƒ(E[X]). This probabilistic version has widespread applications in statistics, information theory, and machine learning.

First-Order Conditions

When a convex function is differentiable, its convexity can be characterized in terms of its gradient. This first-order characterization provides both computational tools and geometric insight.

The First-Order Condition

Suppose ƒ: C→R is differentiable on the open convex set C. Then ƒ is convex if and only if for all x,y∈C:

ƒ(y≥ƒ(x)+)∇ƒ*(x)T*(y−x)

This inequality states that the first-order Taylor approximation of ƒ at any point x is a global underestimator of ƒ. The tangent hyperplane to the graph of ƒ at any point lies entirely below the graph.

Geometrically, if you stand at any point on the graph of a convex function and look along the tangent plane, the entire function lies above you. This supporting hyperplane property is fundamental to convex analysis.

Implications for Optimization

The first-order condition immediately implies that if ∇ƒ((x^∗))=0 at some point (x^∗), then for all y:

ƒ(y)≥ƒ((x^∗))+0T*(y−(x^∗))=ƒ((x^∗))

Therefore (x^∗) is a global minimizer. This is the key result: for convex functions, the first-order optimality condition ∇ƒ((x^∗))=0 is not just necessary but sufficient for global optimality. There is no need to check second-order conditions or worry about local minima.

Monotonicity of the Gradient

The first-order condition can be rewritten in terms of gradient monotonicity.
A differentiable function f is convex if and only if its gradient is monotone:

(∇ƒ(x)−∇ƒ(y))T*(x−y)≥0 for all x,y∈C

This says that the angle between the difference of gradients and the difference of points is at most 90 degrees. The gradients fan out as we move through the domain.

Second-Order Conditions

For twice-differentiable functions, convexity can be characterized in terms of the Hessian matrix, providing a practical computational test.

The Second-Order Condition

Suppose ƒ: C→R is twice continuously differentiable on the open convex set C. Then f is convex if and only if its Hessian matrix is positive semidefinite everywhere:

∇ƒ(x)⪰0 for all x∈C

The notation ∇ƒ(x)⪰0 means that the Hessian is positive semidefinite: for all vectors v, we have vT∇ƒ(x)*v≥0. This is equivalent to saying all eigenvalues of the Hessian are nonnegative.

In one dimension, this reduces to the familiar condition ƒ″*(x)≥0. A function with nonnegative second derivative curves upward—the definition of convexity we learn in calculus.

Strict Convexity

If the Hessian is positive definite (∇ƒ(x)≻0) everywhere, then f is strictly convex. However, the converse is not true: ƒ(x)=x4 is strictly convex but has ƒ″*(0)=0.

Strict convexity guarantees that if a minimizer exists, it is unique. The function has at most one point where the gradient vanishes.

Strong Convexity

A function ƒ is strongly convex with parameter m>0 if ƒ(x)-m/2*‖x‖2is convex. Equivalently, for twice-differentiable functions:

∇ƒ(x)⪰m*I for all x

Strong convexity is strictly stronger than strict convexity. It guarantees that the function grows at least quadratically away from its minimum, which has important implications for the convergence rate of optimization algorithms.

For strongly convex functions, gradient descent converges linearly (exponentially fast) to the unique minimizer, with the rate depending on the strong convexity parameter m and the smoothness of the function.

Examples of Convex Functions

Many familiar functions are convex. Recognizing convexity in applications is a crucial skill.

Linear and Affine Functions

Every affine function ƒ(x)=aT*x+b is both convex and concave. The Hessian is the zero matrix, which is both positive and negative semidefinite. Affine functions are the only functions that are simultaneously convex and concave.

Quadratic Functions

A quadratic function ƒ(x)=1/2*xT*Q*x+bT*x+c is convex if and only if the matrix Q is positive semidefinite.

The Hessian is ∇f(x)=Q, which is constant. If Q≻0, the function is strongly convex with parameter equal to the smallest eigenvalue of Q.

Norms

Every norm on (R^n) is a convex function. This includes the Euclidean norm‖x‖2=√(,xT*x), the (ℓ_1) norm ‖x‖1=(∑_^)|(x_i)|, and the (ℓ_∞) norm ‖x‖∞=max|(x_i)|. Convexity of norms follows from the triangle inequality and homogeneity.

Exponential Function

The function f(x) = eˣ is convex on ℝ. Its second derivative f′′(x) = eˣ > 0 everywhere. The exponential function is strictly convex but not strongly convex, as f′′(x) → 0 as x → −∞.

The function ƒ(x)=ex is convex on R. Its second derivative ƒ″*(x)=ex>0 everywhere. The exponential function is strictly convex but not strongly convex, since ƒ″*(x)→0 as x→−∞.

Powers

The function ƒ(x)=xp is convex on x≥0 when p≥1 or p≤0. For 0<p<1, the function is concave. The function ƒ(x)=|xp| is convex on R for p≥1.

Negative Entropy

The function ƒ(x)=x*(log_)(x) is convex on x>0. This function appears in information theory and statistical mechanics. Its second derivative is ƒ″*(x)=1/x>0 for x>0.

Operations Preserving Convexity

Complex convex functions can often be built from simpler ones using operations that preserve convexity. Mastering these rules allows you to establish convexity without explicit calculation.

Nonnegative Weighted Sums

If (ƒ_1),…,(ƒ_k) are convex functions and (α_1),…,(α_k)≥0, then the weighted sum
(α_1)*(ƒ_1)+⋯+(α_k)*(ƒ_k) is convex. This follows directly from the definition of convexity. The sum of convex functions is convex.

Pointwise Maximum

If (ƒ_1),…,(ƒ_k) are convex, then g(x)=max((ƒ_1)(x),…,(ƒ_k)(x)) is convex. More generally, if ƒ(x,y) is convex in x for each y, then g(x)=(sup_)(ƒ(x,y)) is convex. The maximum of convex functions is convex.

This rule explains why piecewise-linear convex functions arise naturally. The function
max((a_1)T*x+(b_1),…,(a_k)T*x+(b_k)) is convex as a maximum of affine functions.

Composition with Affine Functions

If ƒ is convex and g(x)=A*x+b is affine, then h(x)=ƒ*(g(x))=ƒ*(A*x+b) is convex. Convexity is preserved under affine changes of variables. This rule is frequently used to establish convexity of composite functions.

Scalar Composition

If ƒ:(R^n)→R is convex and g:R→R is convex and nondecreasing, then h(x)=g(ƒ(x)) is convex.

For example, if ƒ is convex and nonnegative, then ƒ2 is convex since x2 is convex and nondecreasing on x≥0.

Perspective

If ƒ:(R^n)→R is convex, then the perspective function g(x,t)=t*ƒ*(x/t) for t>0 is convex. The perspective operation preserves convexity and appears in many applications, including relative entropy and certain matrix functions.

Connection to Optimization

The importance of convex functions in optimization cannot be overstated. Convexity transforms optimization from a generally intractable problem into a structured problem with strong theoretical guarantees.

Local Minima are Global Minima

For a convex function, every local minimum is a global minimum. If (x^∗) is a local minimizer of a convex function ƒ, meaning ƒ((x^∗))≤ƒ(x) for all x in some neighborhood of (x^∗), then ƒ((x^∗))≤ƒ(x) for all x in the domain.

This property eliminates the major difficulty of nonconvex optimization: the possibility of getting trapped in a suboptimal local minimum. Optimization algorithms for convex functions are guaranteed to find the global optimum if they find any local optimum.

Optimality Conditions

For differentiable convex functions, the condition ∇f(x*) = 0 is necessary and sufficient for x* to be a global minimizer. For nondifferentiable convex functions, the optimality condition is 0 ∈ ∂f(x*), where ∂f(x*) is the subdifferential of f at x*.

For differentiable convex functions, the condition ∇ƒ((x^∗))=0 is necessary and sufficient for (x^∗) to be a global minimizer.

For nondifferentiable convex functions, the optimality condition is 0∈∂(f(x∗)),
where ∂(f(x∗)) is the subdifferential of f at x∗.

Efficient Algorithms

Convex optimization problems can be solved efficiently using a variety of algorithms. Gradient descent, with appropriate step sizes, converges to the global minimum. For smooth strongly convex functions, convergence is linear. Interior-point methods solve convex problems in polynomial time.

The entire field of convex optimization—which underpins machine learning, signal processing, control theory, and operations research—exists because convex functions have these favorable properties.

Summary

A function f defined on a convex set C is convex if the line segment between any two points on its graph lies above the graph: f(θx + (1−θ)y) ≤ θf(x) + (1−θ)f(y) for all θ ∈ [0,1].

For differentiable functions, convexity is equivalent to the first-order condition f(y) ≥ f(x) + ∇f(x)^T(y−x), which says that tangent hyperplanes are global underestimators. For twice-differentiable functions, convexity is equivalent to the Hessian being positive semidefinite everywhere.

Convexity is preserved by nonnegative weighted sums, pointwise maxima, and composition with affine functions. These rules allow complex convex functions to be built from simpler pieces.

The fundamental importance of convex functions lies in optimization: every local minimum of a convex function is a global minimum, and the optimality condition ∇f(x*) = 0 is both necessary and sufficient. This transforms optimization from a computationally hard problem into a tractable one with strong guarantees.