Kruskal for maximum spanning forest

From Algowiki
(Redirected from Kruskal)
Jump to: navigation, search
  1. [00:00] Einführung
  2. [01:59] Der Algorithmus von Kruskal anhand eines Beispiels
  3. [06:56] Und wie ist das bei mehreren Zusammenhangskomponenten?
  4. [07:29] Welches Problem löst der Algorithmus von Kruskal genau?
  5. [08:05] Wie lautet die Invariante?
  6. [08:33] Warum ist der Algorithmus korrekt?
  7. [08:59] Wie wird die Invariante sichergestellt?
  8. [10:25] Was ist die asymptotische Komplexität des Algorithmus?

General information

Algorithmic problem: Maximum spanning forest


Type of algorithm: loop

Auxiliary data:

  1. A sorted sequence [math]S[/math] whose items are the edges of [math]E[/math]. The edges are sorted in descending order, that is, the edge [math]e \in E[/math] with the largest value of [math]\ell(e)[/math] is the first element of the sequence (ties arbitrarily broken). Below, [math]e_i[/math] denotes the [math]i[/math]-th edge in [math]S[/math].
  2. A union-find data structure [math]U F[/math] on [math]V[/math].

Abstract view

Invariant: After [math]i \ge 0[/math] iterations:

  1. The undirected graph [math]F_i = (V,E_i)[/math], where [math]E_i[/math] is the set of all selected edges immediately after the [math]i[/math]-th iteration, is cycle-free; in particular, it is maximally cycle-free in the sense that, for [math]j \in \{1,...,i\}[/math], adding an edge [math]e_J \notin F_i[/math] would close a cycle in [math]F_i[/math].
  2. For the undirected graph [math]F_i = (V,E_i)[/math], there is a maximum spanning forest [math]F_i' = (V,E'_i)[/math] of [math]G[/math] such that [math]E_i \subseteq E'_i[/math].
  3. The sets of [math]U F[/math] are the connected components of [math]F_i[/math].

Variant: [math]i[/math] increases by [math]1[/math].

Break condition: [math]i = \vert E \vert[/math].

Induction basis

Abstract view:

  1. [math]E_0 := \emptyset[/math]
  2. [math]U F[/math] is initialized by [math]V[/math], so each node [math]v \in V[/math] is a singleton in [math]U F[/math].

Implementation: Obvious.

Proof: Nothing to show.

Induction step

Abstract view: If inserting [math]e_i[/math] would close a cycle, reject [math]e_i[/math]; otherwise, insert [math]e_i[/math].


  1. Let [math]v,w \in V[/math] denote the endnodes of [math]e_i[/math], that is, [math]e_i = \{v,w\}[/math].
  2. If [math]v[/math] and [math]w[/math] belong to different sets in [math]U F[/math]:
    1. Insert [math]e_i[/math] in the result.
    2. Unite the sets of [math]v[/math] and [math]w[/math] in [math]U F[/math].

Correctness: The first and third invariants are obvious. So we focus on the second invariant. We may safely set [math]F'_i := F'_{i-1}[/math] if

  1. either [math]e_i \in E'_{i-1}[/math] and [math]e_i[/math] is inserted in the [math]i[/math]-th iteration
  2. or [math]e_i \notin E'_{i-1}[/math] and [math]e_i[/math] is rejected in the -th iteration.

Note that the case that [math]e_i \in E'_{i-1}[/math] and [math]e_i[/math] is rejected cannot occur. In fact, [math]e_i \in E'_{i-1}[/math] implies that [math](V,E_{i-1} \cup{\{e_i\}})[/math] is cycle-free, but rejecting [math]e_i[/math] implies that [math](V,E_{i-1} \cup{\{e_i\}})[/math] contains a cycle. So it remains to consider the case that [math]e_i[/math] is accepted but [math]e_i \notin E'_{i-1}[/math].

As [math]\ell(e_i) \gt 0[/math] and [math]F'_{i-1}[/math] is maximal, [math](V,E'_{i-1} \cup{\{e_i\}})[/math] contains a cycle. Since [math]F_{i-1}[/math] is cycle-free, there must be an edge [math]e \in E'_{i-1} \setminus E_{i-1}[/math] on that cycle. By assumption, [math](V,E_{i-1} \cup{\{e_i\}})[/math] is cycle-free as well. Therefore, [math]e[/math] is none of the edges considered so far but one of [math]e_{i+1},...,e_{\vert E \vert }[/math], which implies [math]\ell(e) \leq \ell(e_i)[/math] and, hence, maximality of the spanning forest [math]F'_i := (V,E'_{i-1} \setminus \{e\} \cup{\{ e_i \}})[/math]. Clearly, this [math]F'_i[/math] includes [math]F_i[/math], so the second invariant is maintained again.


Statement: The asymptotic complexity is in [math]\mathcal{O}(n \cdot T(n) + m\cdot\log m )[/math] in the worst case, where [math]n = \vert V \vert[/math] and [math]m = \vert E \vert[/math], and [math]T(n)[/math] is the complexity of a single unite operation.

Proof: The summand [math]\mathcal{O}(m\cdot \log m)[/math] results from sorting all edges in the preprocessing. The main loop has [math]m[/math] iteration. There are [math]n-1[/math] unite operations.