# Kruskal for maximum spanning forest

(Redirected from Kruskal)
Chapters
1. [00:00] Einführung
2. [01:59] Der Algorithmus von Kruskal anhand eines Beispiels
3. [06:56] Und wie ist das bei mehreren Zusammenhangskomponenten?
4. [07:29] Welches Problem löst der Algorithmus von Kruskal genau?
5. [08:05] Wie lautet die Invariante?
6. [08:33] Warum ist der Algorithmus korrekt?
7. [08:59] Wie wird die Invariante sichergestellt?
8. [10:25] Was ist die asymptotische Komplexität des Algorithmus?

## General information

Algorithmic problem: Maximum spanning forest

Prerequisites:

Type of algorithm: loop

Auxiliary data:

1. A sorted sequence $S$ whose items are the edges of $E$. The edges are sorted in descending order, that is, the edge $e \in E$ with the largest value of $\ell(e)$ is the first element of the sequence (ties arbitrarily broken). Below, $e_i$ denotes the $i$-th edge in $S$.
2. A union-find data structure $U F$ on $V$.

## Abstract view

Invariant: After $i \ge 0$ iterations:

1. The undirected graph $F_i = (V,E_i)$, where $E_i$ is the set of all selected edges immediately after the $i$-th iteration, is cycle-free; in particular, it is maximally cycle-free in the sense that, for $j \in \{1,...,i\}$, adding an edge $e_J \notin F_i$ would close a cycle in $F_i$.
2. For the undirected graph $F_i = (V,E_i)$, there is a maximum spanning forest $F_i' = (V,E'_i)$ of $G$ such that $E_i \subseteq E'_i$.
3. The sets of $U F$ are the connected components of $F_i$.

Variant: $i$ increases by $1$.

Break condition: $i = \vert E \vert$.

## Induction basis

Abstract view:

1. $E_0 := \emptyset$
2. $U F$ is initialized by $V$, so each node $v \in V$ is a singleton in $U F$.

Implementation: Obvious.

Proof: Nothing to show.

## Induction step

Abstract view: If inserting $e_i$ would close a cycle, reject $e_i$; otherwise, insert $e_i$.

Implementation:

1. Let $v,w \in V$ denote the endnodes of $e_i$, that is, $e_i = \{v,w\}$.
2. If $v$ and $w$ belong to different sets in $U F$:
1. Insert $e_i$ in the result.
2. Unite the sets of $v$ and $w$ in $U F$.

Correctness: The first and third invariants are obvious. So we focus on the second invariant. We may safely set $F'_i := F'_{i-1}$ if

1. either $e_i \in E'_{i-1}$ and $e_i$ is inserted in the $i$-th iteration
2. or $e_i \notin E'_{i-1}$ and $e_i$ is rejected in the -th iteration.

Note that the case that $e_i \in E'_{i-1}$ and $e_i$ is rejected cannot occur. In fact, $e_i \in E'_{i-1}$ implies that $(V,E_{i-1} \cup{\{e_i\}})$ is cycle-free, but rejecting $e_i$ implies that $(V,E_{i-1} \cup{\{e_i\}})$ contains a cycle. So it remains to consider the case that $e_i$ is accepted but $e_i \notin E'_{i-1}$.

As $\ell(e_i) \gt 0$ and $F'_{i-1}$ is maximal, $(V,E'_{i-1} \cup{\{e_i\}})$ contains a cycle. Since $F_{i-1}$ is cycle-free, there must be an edge $e \in E'_{i-1} \setminus E_{i-1}$ on that cycle. By assumption, $(V,E_{i-1} \cup{\{e_i\}})$ is cycle-free as well. Therefore, $e$ is none of the edges considered so far but one of $e_{i+1},...,e_{\vert E \vert }$, which implies $\ell(e) \leq \ell(e_i)$ and, hence, maximality of the spanning forest $F'_i := (V,E'_{i-1} \setminus \{e\} \cup{\{ e_i \}})$. Clearly, this $F'_i$ includes $F_i$, so the second invariant is maintained again.

## Complexity

Statement: The asymptotic complexity is in $\mathcal{O}(n \cdot T(n) + m\cdot\log m )$ in the worst case, where $n = \vert V \vert$ and $m = \vert E \vert$, and $T(n)$ is the complexity of a single unite operation.

Proof: The summand $\mathcal{O}(m\cdot \log m)$ results from sorting all edges in the preprocessing. The main loop has $m$ iteration. There are $n-1$ unite operations.