Convex Functions
subtopic: convexity
Introduction
Imagine stretching a rubber band between any two points on a curve. If the rubber band always lies above or on the curve, never dipping below it, that curve represents a convex function. This simple geometric intuition captures one of the most powerful concepts in mathematical optimization.
Convex functions possess remarkable properties that make them the cornerstone of optimization theory. When minimizing a convex function, every local minimum is automatically a global minimum. There are no deceptive valleys or misleading plateaus—the landscape guides us inevitably toward the optimal solution.
This property has profound practical implications. Machine learning algorithms, economic models, signal processing techniques, and countless engineering applications rely on convexity to guarantee that optimization procedures find the best possible solutions efficiently. Understanding convex functions is therefore essential for anyone working with optimization in any quantitative field.
In this page, we develop the theory of convex functions rigorously, starting from the definition and building toward the powerful characterization theorems that make convexity so useful in practice.
Convex Sets
Before defining convex functions, we must understand convex sets, as the domain of a convex function must itself be convex.
Definition of a Convex Set
A set
The expression
Examples of Convex Sets
The entire space
The intersection of any collection of convex sets is convex. This implies that polyhedra, defined as intersections of halfspaces, are convex. Balls and ellipsoids are also convex sets.
Affine sets (solutions to systems of linear equations) are convex. The nonnegative orthant
Non-Convex Sets
Not all sets are convex. A set with a hole in it fails to be convex because line segments can pass through the hole. The union of two disjoint intervals is non-convex. Any set that is not "filled in" or has indentations will typically fail the convexity test.
Definition of Convex Functions
The Fundamental Definition
A function
This inequality states that the function value at any point on the line segment between
A function
A function
Epigraph Characterization
There is an elegant connection between convex functions and convex sets through the notion of an epigraph. The epigraph of a function
ep
A function is convex if and only if its epigraph is a convex set. This characterization provides a powerful link between the study of convex functions and the geometry of convex sets, allowing techniques from one domain to be applied in the other.
Jensens Inequality
The definition of convexity extends naturally to convex combinations of more than two points. If
This is Jensens inequality in its finite form. It generalizes further to expectations: if
First-Order Conditions
When a convex function is differentiable, its convexity can be characterized in terms of its gradient. This first-order characterization provides both computational tools and geometric insight.
The First-Order Condition
Suppose
This inequality states that the first-order Taylor approximation of
Geometrically, if you stand at any point on the graph of a convex function and look along the tangent plane, the entire function lies above you. This supporting hyperplane property is fundamental to convex analysis.
Implications for Optimization
The first-order condition immediately implies that if
Therefore
Monotonicity of the Gradient
The first-order condition can be rewritten in terms of gradient monotonicity.
A differentiable function f is convex if and only if its gradient is monotone:
This says that the angle between the difference of gradients and the difference of points is at most
Second-Order Conditions
For twice-differentiable functions, convexity can be characterized in terms of the Hessian matrix, providing a practical computational test.
The Second-Order Condition
Suppose
The notation
In one dimension, this reduces to the familiar condition
Strict Convexity
If the Hessian is positive definite
Strict convexity guarantees that if a minimizer exists, it is unique. The function has at most one point where the gradient vanishes.
Strong Convexity
A function
Strong convexity is strictly stronger than strict convexity. It guarantees that the function grows at least quadratically away from its minimum, which has important implications for the convergence rate of optimization algorithms.
For strongly convex functions, gradient descent converges linearly (exponentially fast) to the unique minimizer, with the rate depending on the strong convexity parameter m and the smoothness of the function.
Examples of Convex Functions
Many familiar functions are convex. Recognizing convexity in applications is a crucial skill.
Linear and Affine Functions
Every affine function
Quadratic Functions
A quadratic function
The Hessian is
Norms
Every norm on
Exponential Function
The function f(x)
The function
Powers
The function
Negative Entropy
The function
Operations Preserving Convexity
Complex convex functions can often be built from simpler ones using operations that preserve convexity. Mastering these rules allows you to establish convexity without explicit calculation.
Nonnegative Weighted Sums
If
Pointwise Maximum
If
This rule explains why piecewise-linear convex functions arise naturally. The function
Composition with Affine Functions
If
Scalar Composition
If
For example, if
Perspective
If
Connection to Optimization
The importance of convex functions in optimization cannot be overstated. Convexity transforms optimization from a generally intractable problem into a structured problem with strong theoretical guarantees.
Local Minima are Global Minima
For a convex function, every local minimum is a global minimum. If
This property eliminates the major difficulty of nonconvex optimization: the possibility of getting trapped in a suboptimal local minimum. Optimization algorithms for convex functions are guaranteed to find the global optimum if they find any local optimum.
Optimality Conditions
For differentiable convex functions, the condition
For differentiable convex functions, the condition
For nondifferentiable convex functions, the optimality condition is
where
Efficient Algorithms
Convex optimization problems can be solved efficiently using a variety of algorithms. Gradient descent, with appropriate step sizes, converges to the global minimum. For smooth strongly convex functions, convergence is linear. Interior-point methods solve convex problems in polynomial time.
The entire field of convex optimization—which underpins machine learning, signal processing, control theory, and operations research—exists because convex functions have these favorable properties.
Summary
A function f defined on a convex set C is convex if the line segment between any two points on its graph lies above the graph: f(θx
For differentiable functions, convexity is equivalent to the first-order condition f(y)
Convexity is preserved by nonnegative weighted sums, pointwise maxima, and composition with affine functions. These rules allow complex convex functions to be built from simpler pieces.
The fundamental importance of convex functions lies in optimization: every local minimum of a convex function is a global minimum, and the optimality condition