Trees and Spanning Trees
Introduction
Trees are among the most important structures in mathematics and computer science. A tree is a connected graph with no cycles, a structure that appears in family genealogies, organizational hierarchies, file systems, decision processes, and countless algorithms.
The simplicity of trees belies their power. Because they have no cycles, trees have unique paths between vertices, making them ideal for organizing data and solving optimization problems. Every connected graph contains a spanning tree that preserves connectivity while eliminating redundant edges.
Trees form the backbone of data structures like binary search trees, heaps, and B-trees. Algorithms from depth-first search to minimum spanning trees exploit the special properties of trees. Understanding trees is essential for both theoretical computer science and practical programming.
This page develops the theory of trees from basic definitions through characterization theorems, counting formulas, and algorithms for finding minimum spanning trees.
Definition and Basic Properties
Definition
A tree is a connected graph with no cycles. Equivalently, a tree is a connected acyclic graph. A forest is a graph with no cycles (a disjoint union of trees).
For a graph
G is connected and has no cyclesG is connected and has exactlyn−1 edgesG has no cycles and has exactlyn−1 edgesThere is exactly one path between any two vertices
G is connected, but removing any edge disconnects itG has no cycles, but adding any edge creates exactly one cycle
Leaves
A leaf is a vertex of degree
Removing a leaf from a tree leaves a smaller tree (or a single vertex). This recursive structure is often exploited in proofs and algorithms on trees.
The Edge-Vertex Relationship
A tree with
The sum of degrees in any tree is
Rooted Trees
Definition
A rooted tree is a tree with one designated vertex called the root. The root imposes a natural direction: edges point away from the root toward the leaves. This creates parent-child relationships between adjacent vertices.
Every vertex except the root has exactly one parent (the neighbor closer to the root). The root has no parent. Vertices with no children are leaves. The depth of a vertex is its distance from the root.
Binary Trees
A binary tree is a rooted tree where each vertex has at most two children, designated as left and right. Binary trees are fundamental in computer science for implementing search structures, expression parsing, and divide-and-conquer algorithms.
A full binary tree has every non-leaf vertex with exactly two children. A complete binary tree has all levels fully filled except possibly the last, which is filled from left to right.
Tree Traversals
The three main traversal orders for binary trees are preorder (root, left, right), inorder (left, root, right), and postorder (left, right, root). These traversals arise naturally in applications: inorder traversal of a binary search tree visits nodes in sorted order.
Spanning Trees
Definition
A spanning tree of a connected graph
Every connected graph has at least one spanning tree. If the graph is already a tree, the spanning tree is the graph itself. If the graph has cycles, we can obtain a spanning tree by removing edges from cycles until none remain.
Counting Spanning Trees
The number of spanning trees of a graph is given by Kirchhoff's theorem (the matrix-tree theorem). For the complete graph
For example,
Minimum Spanning Trees
Definition
Given a connected weighted graph
The MST problem arises in network design: connecting cities with minimum total cable length, designing circuit boards with minimum wire, or clustering data points.
The Cut Property
A cut in a graph is a partition of vertices into two non-empty sets. The cut property states: for any cut, if an edge e has the minimum weight among all edges crossing the cut, then e belongs to some MST.
This property is the foundation of both Kruskal and Prim algorithms. It guarantees that greedy choices (taking the minimum-weight edge satisfying certain conditions) lead to an optimal solution.
Kruskal Algorithm
Kruskal's algorithm builds the MST by processing edges in order of increasing weight. Start with no edges. For each edge in sorted order, add it to the tree if it does not create a cycle.
The algorithm maintains a forest that gradually becomes connected. Using a union-find data structure to track connected components, cycle detection takes nearly constant time per edge. The overall complexity is
Kruskal's algorithm is particularly efficient for sparse graphs where the number of edges is close to the number of vertices.
Prim Algorithm
Prim's algorithm grows a single tree from a starting vertex. At each step, add the minimum-weight edge that connects a vertex in the tree to a vertex outside the tree.
Using a priority queue to track the minimum-weight edge to each vertex not yet in the tree, Prim's algorithm runs in
Prim's algorithm is often preferred for dense graphs. It naturally produces the tree by growing from a single component, which can be useful when the starting vertex matters.
Prufer Sequences
A Prüfer sequence provides a bijection between labeled trees on
To encode a tree as a Prüfer sequence: repeatedly remove the leaf with the smallest label and record its neighbor. Stop when two vertices remain.
To decode a Prüfer sequence back to a tree: the sequence determines which edges to add. A vertex appears in the sequence exactly
Applications
Network Design
When designing communication networks, power grids, or transportation systems, we often want to connect all nodes with minimum total cost. The MST provides the optimal solution when direct connections are possible between any pair of nodes.
Clustering
Single-linkage clustering uses MSTs: build the MST of a complete graph where edge weights are distances between data points, then remove the k-1 heaviest edges to obtain k clusters.
Approximation Algorithms
The MST provides a
Data Structures
Binary search trees, AVL trees, red-black trees, B-trees, and heaps are all tree-based data structures. They exploit the unique path property of trees to enable efficient search, insertion, and deletion operations.
Tree Isomorphism
Two trees are isomorphic if there is a bijection between their vertices that preserves adjacency. For rooted trees, the isomorphism must also preserve the root.
Testing tree isomorphism can be done in linear time using canonical forms. Each subtree is assigned a code, and trees with the same code are isomorphic. This is much faster than general graph isomorphism, which has unknown complexity.
Special Types of Trees
A path is a tree where every vertex has degree at most
A balanced binary tree has all leaves at the same depth or within one level. Balanced trees are crucial for efficient data structures because they guarantee
Summary
A tree is a connected acyclic graph. Trees with
Rooted trees have a designated root vertex, inducing parent-child relationships. Binary trees are rooted trees where each vertex has at most two children. Tree traversals (preorder, inorder, postorder) visit vertices in systematic orders.
A spanning tree of a connected graph includes all vertices with the minimum number of edges. The number of labeled trees on
Minimum spanning trees minimize total edge weight among all spanning trees. Kruskal's algorithm adds edges in order of weight, avoiding cycles. Prim's algorithm grows a tree from a starting vertex by adding minimum-weight edges to new vertices.
Trees are fundamental structures in both mathematics and computer science. Their absence of cycles creates unique paths between vertices, enabling efficient algorithms for search, sorting, and optimization problems.
The cut property guarantees that greedy algorithms find optimal minimum spanning trees. Both Kruskal's and Prim's algorithms run in time
Understanding trees is essential for algorithm design, data structure implementation, and solving optimization problems in networks, clustering, and approximation algorithms.
The Matrix–Tree theorem connects spanning tree enumeration to linear algebra: the number of spanning trees equals any cofactor of the Laplacian matrix. This provides both a computational method and deep theoretical connections.
Tree decomposition and treewidth measure how tree-like a graph is. Many NP-hard problems become tractable on graphs with bounded treewidth, making tree structure a key concept in computational complexity.