Kruskal for maximum spanning forest: Difference between revisions
No edit summary |
|||
Line 39: | Line 39: | ||
#either <math>e_i \in E'_{i-1}</math> and <math>e_i</math> is inserted in the <math>i</math>-th iteration | #either <math>e_i \in E'_{i-1}</math> and <math>e_i</math> is inserted in the <math>i</math>-th iteration | ||
#or <math>e_i \notin E'_{i-1}</math> and <math>e_i</math> is rejected in the -th iteration. | #or <math>e_i \notin E'_{i-1}</math> and <math>e_i</math> is rejected in the -th iteration. | ||
Note that the case that <math>e_i \in E'_{i-1}</math> and <math>e_i</math> is rejected cannot occur. In fact, <math>e_i \in E'_{i-1}</math> implies that <math>(V,E_{i-1} \cup_{\{e_i\}}</math> is cycle-free, but rejecting <math>e_i</math> implies that <math>(V,E_{i-1} \cup_{\{e_i\}}</math> contains a cycle. So it remains to consider the case that <math>e_i</math> is accepted but <math>e_i \notin E'_{i-1}</math>. | Note that the case that <math>e_i \in E'_{i-1}</math> and <math>e_i</math> is rejected cannot occur. In fact, <math>e_i \in E'_{i-1}</math> implies that <math>(V,E_{i-1} \cup_{\{e_i\}}</math> is cycle-free, but rejecting <math>e_i</math> implies that <math>(V,E_{i-1} \cup_{\{e_i\}}</math> contains a cycle. So it remains to consider the case that <math>e_i</math> is accepted but <math>e_i \notin E'_{i-1}</math>. | ||
As <math>l(e_i) > 0</math> and <math>F'_{i-1}</math> is maximal, <math>(V,E'_{i-1} \cup_{\{e_i\}}</math> contains a cycle. Since <math>F_{i-1}</math> is cycle-free, there must be an edge <math>e \in E'_{i-1} \setminus E_{i-1}</math> on that cycle. By assumption, <math>(V,E_{i-1} \cup_{\{e_i\}}</math> is cycle-free as well. Therefore, <math>e</math> is none of the edges considered so far but one of <math>e_{i+1},...,e_{\vert E \vert }</math>, which implies <math>l(e) \leq l(e_i)</math>and, hence, maximality of the spanning forest <math>F'_i := (V,E'_{i-1} \setminus \{e\} \cup_{\{ e_i \}})</math>. Clearly, this <math>F'_i</math> includes <math>F_i</math>, so the second invariant is maintained again. | As <math>l(e_i) > 0</math> and <math>F'_{i-1}</math> is maximal, <math>(V,E'_{i-1} \cup_{\{e_i\}}</math> contains a cycle. Since <math>F_{i-1}</math> is cycle-free, there must be an edge <math>e \in E'_{i-1} \setminus E_{i-1}</math> on that cycle. By assumption, <math>(V,E_{i-1} \cup_{\{e_i\}}</math> is cycle-free as well. Therefore, <math>e</math> is none of the edges considered so far but one of <math>e_{i+1},...,e_{\vert E \vert }</math>, which implies <math>l(e) \leq l(e_i)</math>and, hence, maximality of the spanning forest <math>F'_i := (V,E'_{i-1} \setminus \{e\} \cup_{\{ e_i \}})</math>. Clearly, this <math>F'_i</math> includes <math>F_i</math>, so the second invariant is maintained again. | ||
Revision as of 23:40, 16 June 2015
Algorithmic problem: Maximum spanning forest
Prerequisites:
Type of algorithm: loop
Auxiliary data:
- A sorted sequence [math]\displaystyle{ S }[/math] whose items are the edges of [math]\displaystyle{ E }[/math]. The edges are sorted in descending order, that is, the edge [math]\displaystyle{ e \in E }[/math] with the largest value of [math]\displaystyle{ l(e) }[/math] is the first element of the sequence (ties arbitrarily broken). Below, [math]\displaystyle{ e_i }[/math] denotes the [math]\displaystyle{ i }[/math]-th edge in [math]\displaystyle{ S }[/math].
- A union-find data structure [math]\displaystyle{ U F }[/math] on [math]\displaystyle{ V }[/math].
Abstract view
Invariant: After [math]\displaystyle{ i \ge 0 }[/math] iterations:
- The undirected graph [math]\displaystyle{ F_i = (V,E_i) }[/math], where [math]\displaystyle{ E_i }[/math] is the set of all selected edges immediately after the [math]\displaystyle{ i }[/math]-th iteration, is cycle-free; in particular, it is maximally cycle-free in the sense that, for [math]\displaystyle{ j \in \{1,...,i\} }[/math], adding an edge [math]\displaystyle{ e_J \notin F_i }[/math] would close a cycle in [math]\displaystyle{ F_i }[/math].
- For the undirected graph [math]\displaystyle{ F_i = (V,E_i) }[/math], there is a maximum spanning forest [math]\displaystyle{ F_i' = (V,E'_i) }[/math] of [math]\displaystyle{ G }[/math] such that [math]\displaystyle{ E_i \subseteq E'_i }[/math].
- The sets of [math]\displaystyle{ U F }[/math] are the connected components of [math]\displaystyle{ F_i }[/math].
Variant: [math]\displaystyle{ i }[/math] increases by [math]\displaystyle{ 1 }[/math].
Break condition: [math]\displaystyle{ i = \vert E \vert }[/math].
Induction basis
Abstract view:
- [math]\displaystyle{ E_0 := \emptyset }[/math]
- [math]\displaystyle{ U F }[/math] is initialized by [math]\displaystyle{ V }[/math], so each node [math]\displaystyle{ v \in V }[/math] is a singleton in [math]\displaystyle{ U F }[/math].
Implementation: Obvious. Proof: Nothing to show.
Induction step
Abstract view: If inserting [math]\displaystyle{ e_i }[/math] would close a cycle, reject [math]\displaystyle{ e_i }[/math]; otherwise, insert [math]\displaystyle{ e_i }[/math].
Implementation:
- Let [math]\displaystyle{ v,w \in V }[/math] denote the endnodes of [math]\displaystyle{ e_i }[/math], that is, [math]\displaystyle{ e_i = \{v,w\} }[/math].
- If [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math] belong to different sets in [math]\displaystyle{ U F }[/math]:
- Insert [math]\displaystyle{ e_i }[/math] in the result.
- Unite the sets of [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math] in [math]\displaystyle{ U F }[/math].
Correctness: The first and third invariants are obvious. So we focus on the second invariant. We may safely set [math]\displaystyle{ F'_i := F'_{i-1} }[/math] if
- either [math]\displaystyle{ e_i \in E'_{i-1} }[/math] and [math]\displaystyle{ e_i }[/math] is inserted in the [math]\displaystyle{ i }[/math]-th iteration
- or [math]\displaystyle{ e_i \notin E'_{i-1} }[/math] and [math]\displaystyle{ e_i }[/math] is rejected in the -th iteration.
Note that the case that [math]\displaystyle{ e_i \in E'_{i-1} }[/math] and [math]\displaystyle{ e_i }[/math] is rejected cannot occur. In fact, [math]\displaystyle{ e_i \in E'_{i-1} }[/math] implies that [math]\displaystyle{ (V,E_{i-1} \cup_{\{e_i\}} }[/math] is cycle-free, but rejecting [math]\displaystyle{ e_i }[/math] implies that [math]\displaystyle{ (V,E_{i-1} \cup_{\{e_i\}} }[/math] contains a cycle. So it remains to consider the case that [math]\displaystyle{ e_i }[/math] is accepted but [math]\displaystyle{ e_i \notin E'_{i-1} }[/math].
As [math]\displaystyle{ l(e_i) \gt 0 }[/math] and [math]\displaystyle{ F'_{i-1} }[/math] is maximal, [math]\displaystyle{ (V,E'_{i-1} \cup_{\{e_i\}} }[/math] contains a cycle. Since [math]\displaystyle{ F_{i-1} }[/math] is cycle-free, there must be an edge [math]\displaystyle{ e \in E'_{i-1} \setminus E_{i-1} }[/math] on that cycle. By assumption, [math]\displaystyle{ (V,E_{i-1} \cup_{\{e_i\}} }[/math] is cycle-free as well. Therefore, [math]\displaystyle{ e }[/math] is none of the edges considered so far but one of [math]\displaystyle{ e_{i+1},...,e_{\vert E \vert } }[/math], which implies [math]\displaystyle{ l(e) \leq l(e_i) }[/math]and, hence, maximality of the spanning forest [math]\displaystyle{ F'_i := (V,E'_{i-1} \setminus \{e\} \cup_{\{ e_i \}}) }[/math]. Clearly, this [math]\displaystyle{ F'_i }[/math] includes [math]\displaystyle{ F_i }[/math], so the second invariant is maintained again.
Complexity
Statement: The asymptotic complexity is in [math]\displaystyle{ \mathcal{O}(m \cdot T(n) + m\ log\ m) }[/math] in the worst case, where [math]\displaystyle{ n = \vert V \vert }[/math] and [math]\displaystyle{ m = \vert E \vert }[/math], and [math]\displaystyle{ T(n) }[/math] is the complexity of a single union-find operation.
Proof: The summand [math]\displaystyle{ \mathcal{O}(m\ log\ m) }[/math] results from sorting all edges in the preprocessing. The main loop has iterations, and each iteration takes [math]\displaystyle{ \mathcal{O}(T(n)) }[/math] time.