Kruskal for maximum spanning forest: Difference between revisions

From Algowiki
Jump to navigation Jump to search
(Created page with "<div class="plainlinks" style="float:right;margin:0 0 5px 5px; border:1px solid #AAAAAA; width:auto; padding:1em; margin: 0px 0px 1em 1em;"> <div style="font-size: 1.8em;font-...")
 
 
(24 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<div class="plainlinks" style="float:right;margin:0 0 5px 5px; border:1px solid #AAAAAA; width:auto; padding:1em; margin: 0px 0px 1em 1em;">
[[Category:Videos]]
<div style="font-size: 1.8em;font-weight:bold;text-align: center;margin:0.2em 0 1em 0">Analytics</div>
{{#ev:youtube|https://www.youtube.com/watch?v=Pz6x3BB86YA|500|right|Chapters
#[00:00] Einführung
#[01:59] Der Algorithmus von Kruskal anhand eines Beispiels
#[06:56] Und wie ist das bei mehreren Zusammenhangskomponenten?
#[07:29] Welches Problem löst der Algorithmus von Kruskal genau?
#[08:05] Wie lautet die Invariante?
#[08:33] Warum ist der Algorithmus korrekt?
#[08:59] Wie wird die Invariante sichergestellt?
#[10:25] Was ist die asymptotische Komplexität des Algorithmus?
|frame}}


<div style="font-size: 1.2em; margin:.5em 0 1em 0; text-align:center">whatever</div>
== General information ==


<div style="font-size: 1.2em; margin:.5em 0 .5em 0;text-align:center">[https://openlearnware.tu-darmstadt.de/#!/resource/kruskal-2430 Openlearnware]</div>
'''Algorithmic problem:''' [[Maximum spanning forest]]
</div>
 
'''Prerequisites:'''
 
'''Type of algorithm:''' loop
 
'''Auxiliary data:'''
 
#A [[sorted sequence]] <math>S</math> whose items are the edges of <math>E</math>. The edges are sorted in descending order, that is, the edge <math>e \in E</math> with the largest value of <math>\ell(e)</math> is the first element of the sequence (ties arbitrarily broken). Below, <math>e_i</math> denotes the <math>i</math>-th edge in <math>S</math>.
#A [[union-find]] data structure <math>U F</math> on <math>V</math>.
 
==Abstract view==
'''Invariant:''' After <math>i \ge 0</math> iterations:
#The [[undirected graph]] <math>F_i = (V,E_i)</math>, where <math>E_i</math> is the set of all selected edges immediately after the <math>i</math>-th iteration, is cycle-free; in particular, it is maximally cycle-free in the sense that, for <math>j \in \{1,...,i\}</math>, adding an edge <math>e_J \notin F_i</math> would close a cycle in <math>F_i</math>.
#For the [[undirected graph]] <math>F_i = (V,E_i)</math>, there is a maximum spanning forest <math>F_i' = (V,E'_i)</math> of <math>G</math> such that <math>E_i \subseteq E'_i</math>.
#The sets of <math>U F</math> are the connected components of <math>F_i</math>.
 
'''Variant:'''  <math>i</math> increases by <math>1</math>.
 
'''Break condition:'''  <math>i = \vert E \vert</math>.
 
==Induction basis==
'''Abstract view:'''
# <math>E_0 := \emptyset</math>
# <math>U F</math> is initialized by <math>V</math>, so each node <math>v \in V</math> is a singleton in <math>U F</math>.
'''Implementation:''' Obvious.
 
'''Proof:''' Nothing to show.
 
==Induction step==
'''Abstract view:''' If inserting <math>e_i</math> would close a cycle, reject <math>e_i</math>; otherwise, insert <math>e_i</math>.
 
'''Implementation:'''
#Let <math>v,w \in V</math> denote the endnodes of <math>e_i</math>, that is, <math>e_i = \{v,w\}</math>.
#If <math>v</math> and <math>w</math> belong to different sets in <math>U F</math>:
##Insert <math>e_i</math> in the result.
##Unite the sets of <math>v</math> and <math>w</math> in <math>U F</math>.
 
'''Correctness:''' The first and third invariants are obvious. So we focus on the second invariant. We may safely set <math>F'_i := F'_{i-1}</math> if
#either <math>e_i \in E'_{i-1}</math> and <math>e_i</math> is inserted in the <math>i</math>-th iteration
#or <math>e_i \notin E'_{i-1}</math> and <math>e_i</math> is rejected in the -th iteration.
Note that the case that <math>e_i \in E'_{i-1}</math> and <math>e_i</math> is rejected cannot occur. In fact, <math>e_i \in E'_{i-1}</math> implies that <math>(V,E_{i-1} \cup{\{e_i\}})</math> is cycle-free, but rejecting <math>e_i</math> implies that <math>(V,E_{i-1} \cup{\{e_i\}})</math> contains a cycle. So it remains to consider the case that <math>e_i</math> is accepted but <math>e_i \notin E'_{i-1}</math>.
As <math>\ell(e_i) > 0</math> and <math>F'_{i-1}</math> is maximal, <math>(V,E'_{i-1} \cup{\{e_i\}})</math> contains a cycle. Since <math>F_{i-1}</math> is cycle-free, there must be an edge <math>e \in E'_{i-1} \setminus E_{i-1}</math> on that cycle. By assumption, <math>(V,E_{i-1} \cup{\{e_i\}})</math> is cycle-free as well. Therefore, <math>e</math> is none of the edges considered so far but one of <math>e_{i+1},...,e_{\vert E \vert }</math>, which implies  <math>\ell(e) \leq \ell(e_i)</math> and, hence, maximality of the spanning forest <math>F'_i := (V,E'_{i-1} \setminus \{e\} \cup{\{ e_i \}})</math>. Clearly, this <math>F'_i</math> includes <math>F_i</math>, so the second invariant is maintained again.
 
==Complexity==
'''Statement:''' The asymptotic complexity is in <math>\mathcal{O}(n \cdot T(n) + m\cdot\log m )</math> in the worst case, where <math>n = \vert V \vert</math> and <math>m = \vert E \vert</math>, and <math>T(n)</math> is the complexity of a single unite operation.
 
'''Proof:''' The summand <math>\mathcal{O}(m\cdot \log m)</math> results from sorting all edges in the preprocessing. The main loop has <math>m</math> iteration. There are <math>n-1</math> unite operations.

Latest revision as of 18:31, 20 September 2015

Chapters
  1. [00:00] Einführung
  2. [01:59] Der Algorithmus von Kruskal anhand eines Beispiels
  3. [06:56] Und wie ist das bei mehreren Zusammenhangskomponenten?
  4. [07:29] Welches Problem löst der Algorithmus von Kruskal genau?
  5. [08:05] Wie lautet die Invariante?
  6. [08:33] Warum ist der Algorithmus korrekt?
  7. [08:59] Wie wird die Invariante sichergestellt?
  8. [10:25] Was ist die asymptotische Komplexität des Algorithmus?

General information

Algorithmic problem: Maximum spanning forest

Prerequisites:

Type of algorithm: loop

Auxiliary data:

  1. A sorted sequence [math]\displaystyle{ S }[/math] whose items are the edges of [math]\displaystyle{ E }[/math]. The edges are sorted in descending order, that is, the edge [math]\displaystyle{ e \in E }[/math] with the largest value of [math]\displaystyle{ \ell(e) }[/math] is the first element of the sequence (ties arbitrarily broken). Below, [math]\displaystyle{ e_i }[/math] denotes the [math]\displaystyle{ i }[/math]-th edge in [math]\displaystyle{ S }[/math].
  2. A union-find data structure [math]\displaystyle{ U F }[/math] on [math]\displaystyle{ V }[/math].

Abstract view

Invariant: After [math]\displaystyle{ i \ge 0 }[/math] iterations:

  1. The undirected graph [math]\displaystyle{ F_i = (V,E_i) }[/math], where [math]\displaystyle{ E_i }[/math] is the set of all selected edges immediately after the [math]\displaystyle{ i }[/math]-th iteration, is cycle-free; in particular, it is maximally cycle-free in the sense that, for [math]\displaystyle{ j \in \{1,...,i\} }[/math], adding an edge [math]\displaystyle{ e_J \notin F_i }[/math] would close a cycle in [math]\displaystyle{ F_i }[/math].
  2. For the undirected graph [math]\displaystyle{ F_i = (V,E_i) }[/math], there is a maximum spanning forest [math]\displaystyle{ F_i' = (V,E'_i) }[/math] of [math]\displaystyle{ G }[/math] such that [math]\displaystyle{ E_i \subseteq E'_i }[/math].
  3. The sets of [math]\displaystyle{ U F }[/math] are the connected components of [math]\displaystyle{ F_i }[/math].

Variant: [math]\displaystyle{ i }[/math] increases by [math]\displaystyle{ 1 }[/math].

Break condition: [math]\displaystyle{ i = \vert E \vert }[/math].

Induction basis

Abstract view:

  1. [math]\displaystyle{ E_0 := \emptyset }[/math]
  2. [math]\displaystyle{ U F }[/math] is initialized by [math]\displaystyle{ V }[/math], so each node [math]\displaystyle{ v \in V }[/math] is a singleton in [math]\displaystyle{ U F }[/math].

Implementation: Obvious.

Proof: Nothing to show.

Induction step

Abstract view: If inserting [math]\displaystyle{ e_i }[/math] would close a cycle, reject [math]\displaystyle{ e_i }[/math]; otherwise, insert [math]\displaystyle{ e_i }[/math].

Implementation:

  1. Let [math]\displaystyle{ v,w \in V }[/math] denote the endnodes of [math]\displaystyle{ e_i }[/math], that is, [math]\displaystyle{ e_i = \{v,w\} }[/math].
  2. If [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math] belong to different sets in [math]\displaystyle{ U F }[/math]:
    1. Insert [math]\displaystyle{ e_i }[/math] in the result.
    2. Unite the sets of [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math] in [math]\displaystyle{ U F }[/math].

Correctness: The first and third invariants are obvious. So we focus on the second invariant. We may safely set [math]\displaystyle{ F'_i := F'_{i-1} }[/math] if

  1. either [math]\displaystyle{ e_i \in E'_{i-1} }[/math] and [math]\displaystyle{ e_i }[/math] is inserted in the [math]\displaystyle{ i }[/math]-th iteration
  2. or [math]\displaystyle{ e_i \notin E'_{i-1} }[/math] and [math]\displaystyle{ e_i }[/math] is rejected in the -th iteration.

Note that the case that [math]\displaystyle{ e_i \in E'_{i-1} }[/math] and [math]\displaystyle{ e_i }[/math] is rejected cannot occur. In fact, [math]\displaystyle{ e_i \in E'_{i-1} }[/math] implies that [math]\displaystyle{ (V,E_{i-1} \cup{\{e_i\}}) }[/math] is cycle-free, but rejecting [math]\displaystyle{ e_i }[/math] implies that [math]\displaystyle{ (V,E_{i-1} \cup{\{e_i\}}) }[/math] contains a cycle. So it remains to consider the case that [math]\displaystyle{ e_i }[/math] is accepted but [math]\displaystyle{ e_i \notin E'_{i-1} }[/math].

As [math]\displaystyle{ \ell(e_i) \gt 0 }[/math] and [math]\displaystyle{ F'_{i-1} }[/math] is maximal, [math]\displaystyle{ (V,E'_{i-1} \cup{\{e_i\}}) }[/math] contains a cycle. Since [math]\displaystyle{ F_{i-1} }[/math] is cycle-free, there must be an edge [math]\displaystyle{ e \in E'_{i-1} \setminus E_{i-1} }[/math] on that cycle. By assumption, [math]\displaystyle{ (V,E_{i-1} \cup{\{e_i\}}) }[/math] is cycle-free as well. Therefore, [math]\displaystyle{ e }[/math] is none of the edges considered so far but one of [math]\displaystyle{ e_{i+1},...,e_{\vert E \vert } }[/math], which implies [math]\displaystyle{ \ell(e) \leq \ell(e_i) }[/math] and, hence, maximality of the spanning forest [math]\displaystyle{ F'_i := (V,E'_{i-1} \setminus \{e\} \cup{\{ e_i \}}) }[/math]. Clearly, this [math]\displaystyle{ F'_i }[/math] includes [math]\displaystyle{ F_i }[/math], so the second invariant is maintained again.

Complexity

Statement: The asymptotic complexity is in [math]\displaystyle{ \mathcal{O}(n \cdot T(n) + m\cdot\log m ) }[/math] in the worst case, where [math]\displaystyle{ n = \vert V \vert }[/math] and [math]\displaystyle{ m = \vert E \vert }[/math], and [math]\displaystyle{ T(n) }[/math] is the complexity of a single unite operation.

Proof: The summand [math]\displaystyle{ \mathcal{O}(m\cdot \log m) }[/math] results from sorting all edges in the preprocessing. The main loop has [math]\displaystyle{ m }[/math] iteration. There are [math]\displaystyle{ n-1 }[/math] unite operations.