Corca

Collaborative Math Editor

Trees and Spanning Trees

Introduction

Trees are among the most important structures in mathematics and computer science. A tree is a connected graph with no cycles, a structure that appears in family genealogies, organizational hierarchies, file systems, decision processes, and countless algorithms.

The simplicity of trees belies their power. Because they have no cycles, trees have unique paths between vertices, making them ideal for organizing data and solving optimization problems. Every connected graph contains a spanning tree that preserves connectivity while eliminating redundant edges.

Trees form the backbone of data structures like binary search trees, heaps, and B-trees. Algorithms from depth-first search to minimum spanning trees exploit the special properties of trees. Understanding trees is essential for both theoretical computer science and practical programming.

This page develops the theory of trees from basic definitions through characterization theorems, counting formulas, and algorithms for finding minimum spanning trees.

Definition and Basic Properties

Definition

A tree is a connected graph with no cycles. Equivalently, a tree is a connected acyclic graph. A forest is a graph with no cycles (a disjoint union of trees).

For a graph G with n vertices and m edges, the following are equivalent characterizations of a tree:

G is connected and has no cycles
G is connected and has exactly n−1 edges
G has no cycles and has exactly n−1 edges
There is exactly one path between any two vertices
G is connected, but removing any edge disconnects it
G has no cycles, but adding any edge creates exactly one cycle

Leaves

A leaf is a vertex of degree 1, connected to exactly one edge. Every tree with at least two vertices has at least two leaves. This can be proven by considering a longest path in the tree: its endpoints must be leaves.

Removing a leaf from a tree leaves a smaller tree (or a single vertex). This recursive structure is often exploited in proofs and algorithms on trees.

The Edge-Vertex Relationship

A tree with n vertices has exactly n−1 edges. This fundamental property follows from induction: a single vertex has no edges, and adding each new vertex to an existing tree requires exactly one new edge to maintain connectivity without creating cycles.

The sum of degrees in any tree is 2*(n−1)=2*n−2. Since the minimum degree is 1 (for leaves), there must be at least two vertices of degree 1 in any tree with n≥2.

Rooted Trees

Definition

A rooted tree is a tree with one designated vertex called the root. The root imposes a natural direction: edges point away from the root toward the leaves. This creates parent-child relationships between adjacent vertices.

Every vertex except the root has exactly one parent (the neighbor closer to the root). The root has no parent. Vertices with no children are leaves. The depth of a vertex is its distance from the root.

Binary Trees

A binary tree is a rooted tree where each vertex has at most two children, designated as left and right. Binary trees are fundamental in computer science for implementing search structures, expression parsing, and divide-and-conquer algorithms.

A full binary tree has every non-leaf vertex with exactly two children. A complete binary tree has all levels fully filled except possibly the last, which is filled from left to right.

Tree Traversals

The three main traversal orders for binary trees are preorder (root, left, right), inorder (left, root, right), and postorder (left, right, root). These traversals arise naturally in applications: inorder traversal of a binary search tree visits nodes in sorted order.

Spanning Trees

Definition

A spanning tree of a connected graph G is a subgraph that is a tree and includes all vertices of G. A spanning tree preserves connectivity while using the minimum number of edges: exactly n−1 edges for a graph with n vertices.

Every connected graph has at least one spanning tree. If the graph is already a tree, the spanning tree is the graph itself. If the graph has cycles, we can obtain a spanning tree by removing edges from cycles until none remain.

Counting Spanning Trees

The number of spanning trees of a graph is given by Kirchhoff's theorem (the matrix-tree theorem). For the complete graph (K_n), Cayley's formula states that the number of labeled spanning trees is:

τ((K_n))=n(n−2)

For example, (K_3) (a triangle) has 3 spanning trees, (K_4) has 16 spanning trees, and (K_5) has 125 spanning trees. This formula has numerous proofs, including bijective proofs using Prüfer sequences.

Minimum Spanning Trees

Definition

Given a connected weighted graph G where each edge e has a weight w(e), a minimum spanning tree (MST) is a spanning tree with the smallest possible total edge weight:

weight(T)=(∑_e∈T^)(w(e))

The MST problem arises in network design: connecting cities with minimum total cable length, designing circuit boards with minimum wire, or clustering data points.

The Cut Property

A cut in a graph is a partition of vertices into two non-empty sets. The cut property states: for any cut, if an edge e has the minimum weight among all edges crossing the cut, then e belongs to some MST.

This property is the foundation of both Kruskal and Prim algorithms. It guarantees that greedy choices (taking the minimum-weight edge satisfying certain conditions) lead to an optimal solution.

Kruskal Algorithm

Kruskal's algorithm builds the MST by processing edges in order of increasing weight. Start with no edges. For each edge in sorted order, add it to the tree if it does not create a cycle.

The algorithm maintains a forest that gradually becomes connected. Using a union-find data structure to track connected components, cycle detection takes nearly constant time per edge. The overall complexity is O*(m*(log_)(m)) for sorting plus nearly O(m) for union-find operations.

Kruskal's algorithm is particularly efficient for sparse graphs where the number of edges is close to the number of vertices.

Prim Algorithm

Prim's algorithm grows a single tree from a starting vertex. At each step, add the minimum-weight edge that connects a vertex in the tree to a vertex outside the tree.

Using a priority queue to track the minimum-weight edge to each vertex not yet in the tree, Prim's algorithm runs in O*(m*(log_)(n)) time with a binary heap, or O*(m+n*(log_)(n)) with a Fibonacci heap.

Prim's algorithm is often preferred for dense graphs. It naturally produces the tree by growing from a single component, which can be useful when the starting vertex matters.

Prufer Sequences

A Prüfer sequence provides a bijection between labeled trees on n vertices and sequences of length n−2 with entries from {1,2,…,n}. This bijection proves Cayley's formula.

To encode a tree as a Prüfer sequence: repeatedly remove the leaf with the smallest label and record its neighbor. Stop when two vertices remain.

To decode a Prüfer sequence back to a tree: the sequence determines which edges to add. A vertex appears in the sequence exactly (degree−1) times, so degrees can be recovered, and the tree can be reconstructed.

Applications

Network Design

When designing communication networks, power grids, or transportation systems, we often want to connect all nodes with minimum total cost. The MST provides the optimal solution when direct connections are possible between any pair of nodes.

Clustering

Single-linkage clustering uses MSTs: build the MST of a complete graph where edge weights are distances between data points, then remove the k-1 heaviest edges to obtain k clusters.

Approximation Algorithms

The MST provides a 2-approximation for the traveling salesman problem in metric spaces. Doubling the MST edges and finding an Eulerian path gives a tour of length at most twice optimal.

Data Structures

Binary search trees, AVL trees, red-black trees, B-trees, and heaps are all tree-based data structures. They exploit the unique path property of trees to enable efficient search, insertion, and deletion operations.

Tree Isomorphism

Two trees are isomorphic if there is a bijection between their vertices that preserves adjacency. For rooted trees, the isomorphism must also preserve the root.

Testing tree isomorphism can be done in linear time using canonical forms. Each subtree is assigned a code, and trees with the same code are isomorphic. This is much faster than general graph isomorphism, which has unknown complexity.

Special Types of Trees

A path is a tree where every vertex has degree at most 2. A star is a tree with one central vertex connected to all others. A caterpillar is a tree where removing all leaves yields a path.

A balanced binary tree has all leaves at the same depth or within one level. Balanced trees are crucial for efficient data structures because they guarantee O((log_)(n)) height.

Summary

A tree is a connected acyclic graph. Trees with n vertices have exactly n−1 edges, and between any two vertices there is exactly one path. Every tree with at least two vertices has at least two leaves.

Rooted trees have a designated root vertex, inducing parent-child relationships. Binary trees are rooted trees where each vertex has at most two children. Tree traversals (preorder, inorder, postorder) visit vertices in systematic orders.

A spanning tree of a connected graph includes all vertices with the minimum number of edges. The number of labeled trees on n vertices is n(n−2) by Cayley's formula.

Minimum spanning trees minimize total edge weight among all spanning trees. Kruskal's algorithm adds edges in order of weight, avoiding cycles. Prim's algorithm grows a tree from a starting vertex by adding minimum-weight edges to new vertices.

Trees are fundamental structures in both mathematics and computer science. Their absence of cycles creates unique paths between vertices, enabling efficient algorithms for search, sorting, and optimization problems.

The cut property guarantees that greedy algorithms find optimal minimum spanning trees. Both Kruskal's and Prim's algorithms run in time O*(m*(log_)(n)) or better, making MST computation practical even for very large graphs.

Understanding trees is essential for algorithm design, data structure implementation, and solving optimization problems in networks, clustering, and approximation algorithms.

The Matrix–Tree theorem connects spanning tree enumeration to linear algebra: the number of spanning trees equals any cofactor of the Laplacian matrix. This provides both a computational method and deep theoretical connections.

Tree decomposition and treewidth measure how tree-like a graph is. Many NP-hard problems become tractable on graphs with bounded treewidth, making tree structure a key concept in computational complexity.

About · What Is Corca? · FAQ · Terms of Service · Privacy Policy

X (Twitter) · Discord · Bluesky

Trees and Spanning Trees

Introduction

This page develops the theory of trees from basic definitions through characterization theorems, counting formulas, and algorithms for finding minimum spanning trees.

Definition and Basic Properties

Definition

A tree is a connected graph with no cycles. Equivalently, a tree is a connected acyclic graph. A forest is a graph with no cycles (a disjoint union of trees).

For a graph G with n vertices and m edges, the following are equivalent characterizations of a tree:

G is connected and has no cycles
G is connected and has exactly n−1 edges
G has no cycles and has exactly n−1 edges
There is exactly one path between any two vertices
G is connected, but removing any edge disconnects it
G has no cycles, but adding any edge creates exactly one cycle

Leaves

Removing a leaf from a tree leaves a smaller tree (or a single vertex). This recursive structure is often exploited in proofs and algorithms on trees.

The Edge-Vertex Relationship

The sum of degrees in any tree is 2*(n−1)=2*n−2. Since the minimum degree is 1 (for leaves), there must be at least two vertices of degree 1 in any tree with n≥2.

Rooted Trees

Definition

Binary Trees

A full binary tree has every non-leaf vertex with exactly two children. A complete binary tree has all levels fully filled except possibly the last, which is filled from left to right.

Tree Traversals

Spanning Trees

Definition

Counting Spanning Trees

The number of spanning trees of a graph is given by Kirchhoff's theorem (the matrix-tree theorem). For the complete graph (K_n), Cayley's formula states that the number of labeled spanning trees is:

τ((K_n))=n(n−2)

Minimum Spanning Trees

Definition

Given a connected weighted graph G where each edge e has a weight w(e), a minimum spanning tree (MST) is a spanning tree with the smallest possible total edge weight:

weight(T)=(∑_e∈T^)(w(e))

The MST problem arises in network design: connecting cities with minimum total cable length, designing circuit boards with minimum wire, or clustering data points.

The Cut Property

This property is the foundation of both Kruskal and Prim algorithms. It guarantees that greedy choices (taking the minimum-weight edge satisfying certain conditions) lead to an optimal solution.

Kruskal Algorithm

Kruskal's algorithm builds the MST by processing edges in order of increasing weight. Start with no edges. For each edge in sorted order, add it to the tree if it does not create a cycle.

Kruskal's algorithm is particularly efficient for sparse graphs where the number of edges is close to the number of vertices.

Prim Algorithm

Prim's algorithm grows a single tree from a starting vertex. At each step, add the minimum-weight edge that connects a vertex in the tree to a vertex outside the tree.

Prim's algorithm is often preferred for dense graphs. It naturally produces the tree by growing from a single component, which can be useful when the starting vertex matters.

Prufer Sequences

A Prüfer sequence provides a bijection between labeled trees on n vertices and sequences of length n−2 with entries from {1,2,…,n}. This bijection proves Cayley's formula.

To encode a tree as a Prüfer sequence: repeatedly remove the leaf with the smallest label and record its neighbor. Stop when two vertices remain.

Applications

Network Design

Clustering

Single-linkage clustering uses MSTs: build the MST of a complete graph where edge weights are distances between data points, then remove the k-1 heaviest edges to obtain k clusters.

Approximation Algorithms

The MST provides a 2-approximation for the traveling salesman problem in metric spaces. Doubling the MST edges and finding an Eulerian path gives a tour of length at most twice optimal.

Data Structures

Tree Isomorphism

Two trees are isomorphic if there is a bijection between their vertices that preserves adjacency. For rooted trees, the isomorphism must also preserve the root.

Special Types of Trees

A path is a tree where every vertex has degree at most 2. A star is a tree with one central vertex connected to all others. A caterpillar is a tree where removing all leaves yields a path.

A balanced binary tree has all leaves at the same depth or within one level. Balanced trees are crucial for efficient data structures because they guarantee O((log_)(n)) height.

Summary

A spanning tree of a connected graph includes all vertices with the minimum number of edges. The number of labeled trees on n vertices is n(n−2) by Cayley's formula.

Understanding trees is essential for algorithm design, data structure implementation, and solving optimization problems in networks, clustering, and approximation algorithms.