Hopcroft-Karp: Difference between revisions
(35 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
'''Type of algorithm:''' loop. | '''Type of algorithm:''' loop. | ||
'''Auxiliary data | '''Auxiliary data:''' | ||
a [[Sets and sequences|set]] <math>S</math> of [[Breadth-first search|BFS]] runs as iterators (cf. the [[Graph traversal#Remarks| | a [[Sets and sequences|set]] <math>S</math> of [[Breadth-first search|BFS]] runs as iterators (cf. the [[Graph traversal#Remarks|remark on iterators]]). The [[Matchings in graphs#Finding augmenting paths|modification to find augmenting paths]] is applied. | ||
'''Invariant:''' | '''Invariant:''' After <math>i\geq 0</math> iterations: | ||
# <math>M</math> is a matching in <math>G</math>. | # <math>M</math> is a matching in <math>G</math>. | ||
# All exposed nodes that are end nodes of augmenting paths, are in <math>S</math>. | # All exposed nodes that are end nodes of augmenting paths, are start nodes of BFS iterators in <math>S</math> (there may be start nodes in <math>S</math> that are ''not'' end nodes of augmenting paths). | ||
# Each node in <math>S</math> is the root of an arborescence: | |||
## These arborescences are pairwise node-disjoint. | |||
## Each leaf of each arborescence has distance <math>i</math> form the root of this arborescence (both in <math>G</math> and in the arborescence). | |||
'''Variant:''' | '''Variant:''' | ||
Line 23: | Line 26: | ||
'''Abstract view:''' | '''Abstract view:''' | ||
# Start with an arbitrary matching, for example, the empty matching. | # Start with an arbitrary matching, for example, the empty matching. | ||
# Create and initialize one BFS run for each exposed node. | # Create and initialize one BFS run for each exposed node, and insert it in <math>S</math>. | ||
== Induction step == | == Induction step == | ||
Line 31: | Line 34: | ||
# Process all nodes of distance <math>i-1</math> in the queues of all BFS runs. | # Process all nodes of distance <math>i-1</math> in the queues of all BFS runs. | ||
# For each BFS run: If it had no nodes to process in step 1, remove it from <math>S</math>. | # For each BFS run: If it had no nodes to process in step 1, remove it from <math>S</math>. | ||
# | # For each edge <math>\{v,w\}</math> that is seen by one BFS run from <math>v</math> and by one BFS run from <math>w</math>: | ||
## Choose two such BFS runs and such an edge <math>\{v,w\}</math>, say. | ## Choose two such BFS runs and such an edge <math>\{v,w\}</math>, say. | ||
## [[Matchings in graphs#Alternating and augmenting paths|Augment]] <math>M</math> along the concatenation of the paths in the arborescences of these two BFS runs from the start nodes to <math>v</math> and <math>w</math>, respectively, and of <math>\{v,w\}</math>. | ## [[Matchings in graphs#Alternating and augmenting paths|Augment]] <math>M</math> along the concatenation of the paths in the arborescences of these two BFS runs from the start nodes to <math>v</math> and <math>w</math>, respectively, and of <math>\{v,w\}</math>. | ||
Line 37: | Line 40: | ||
'''Proof:''' | '''Proof:''' | ||
The | The invariant is obviously maintained. To prove the variant, consider an execution of step 3.2. Let <math>p</math> denote the path along which the augmentation takes place. Let <math>p_1</math> and <math>p_2</math> be the paths in the two arborescences from the two start nodes to <math>v</math> and <math>w</math>, respectively. So, it is <math>p=p_1+\{v,w\}+p_2</math>. Suppose augmenting along <math>p</math> has created a new augmenting path <math>q</math>. We have to show that <math>q</math> is longer than <math>p</math>. | ||
In the orientation of <math>p</math> from the start node of <math>p_1</math> to the start node of <math>p_2</math>, let <math>v'</math> be the first node of <math>p</math> that also belongs to <math>q</math>, and let <math>w'</math> be the last node of <math>p</math> that also belongs to <math>q</math>. Since <math>q</math> has become augmenting by the augmentation along <math>p</math>, <math>q</math> must share at least one edge with <math>p</math>, which means <math>v'\neq w'</math>. Let <math>q_1</math> and <math>q_2</math> denote the ''loose ends'' of <math>q</math> up to <math>v</math> and <math>w</math>, respectively. If <math>v'</math> belongs to <math>p_1</math>, it must be <math>v'=v</math> because, otherwise, the concatenation <math>p_1+q_1</math> would yield an augmenting path that is shorter than <math>p</math>, which contradicts the induction hypothesis. Mirror-symmetrically, if <math>w'</math> belongs to <math>p_2</math>, it must be <math>w'=w</math>. In summary, <math>v'\neq w'</math> implies that <math>v'=v</math> and <math>w'=w</math> is the only possible scenario. | |||
Note that the edges of <math>q_1</math> and <math>q_2</math> incident to <math>v'</math> and <math>w'</math>, respectively, are unmatched because <math>v'</math> and <math>w'</math> are matched by edges of <math>p</math>. Immediately before the current iteration, the edges over which the BFS runs entered <math>v'</math> and <math>w'</math>, respectively, were matched. Therefore, the parities of the lengths of <math>p_1</math> and <math>q_1</math> are different. In particular, the lengths themselves are different. Again, if <math>q_1</math> were shorter than <math>p_1</math>, <math>p_1+q_1</math> were shorter than <math>p</math>. However, <math>p_1+q_1</math> was augmenting immediately before the current iteration, which contradicts the induction hypothesis. For the same reasons, <math>q_2</math> is longer than <math>p_2</math>. In summary, <math>q</math> is longer than <math>p</math>. | |||
'''Remark:''' | |||
The union of all arborescences at termination is called a '''Hungarian forest''' in the literature. The algorithm proves, constructively, that a matching in a bipartite graph is cardinality-maximal if, and only if, a Hungarian forest exists. | |||
== Correctness == | == Correctness == | ||
Due to the invariant, the matching is cardinality-maximal if the main loop terminates. | Due to the invariant, the matching is cardinality-maximal if the main loop terminates. Termination of the main loop results from the following complexity considerations. | ||
== Complexity == | |||
'''Statement:''' | |||
The asymptotic complexity is in <math>\mathcal{O}(m\!\cdot\!\sqrt{n})</math>, where <math>n=|V|</math> and <math>m=|E|</math>. | |||
'''Proof:''' | |||
Each edge is touched at most twice in each iteration, once from either incident node. Hence, it suffices to prove that the number of iterations is in <math>\mathcal{O}(\sqrt{n})</math>. | |||
After <math>\lceil\sqrt{n}\rceil</math> iterations, any augmenting path has a length of at least <math>\lceil\sqrt{n}\rceil</math>. Consequently, there cannot be more than <math>\lfloor\sqrt{n}\rfloor</math> node-disjoint augmenting paths at this moment. The remark on the proof of [[Cardinality-maximal matching#Berge's theorem|Berge's theorem]] implies that the maximal cardinality of a matching is at most <math>|M|+\lfloor\sqrt{n}\rfloor</math>. Therefore, <math>|M|</math> is maximal after another <math>\lfloor\sqrt{n}\rfloor</math> iterations. The claim follows. |
Latest revision as of 15:00, 6 December 2014
Abstract view
Algorithmic problem: Cardinality-maximal matching in bipartite graphs.
Type of algorithm: loop.
Auxiliary data: a set [math]\displaystyle{ S }[/math] of BFS runs as iterators (cf. the remark on iterators). The modification to find augmenting paths is applied.
Invariant: After [math]\displaystyle{ i\geq 0 }[/math] iterations:
- [math]\displaystyle{ M }[/math] is a matching in [math]\displaystyle{ G }[/math].
- All exposed nodes that are end nodes of augmenting paths, are start nodes of BFS iterators in [math]\displaystyle{ S }[/math] (there may be start nodes in [math]\displaystyle{ S }[/math] that are not end nodes of augmenting paths).
- Each node in [math]\displaystyle{ S }[/math] is the root of an arborescence:
- These arborescences are pairwise node-disjoint.
- Each leaf of each arborescence has distance [math]\displaystyle{ i }[/math] form the root of this arborescence (both in [math]\displaystyle{ G }[/math] and in the arborescence).
Variant: The shortest length of an augmenting path in terms of the number of edges is increased.
Break condition: [math]\displaystyle{ S=\emptyset }[/math].
Induction basis
Abstract view:
- Start with an arbitrary matching, for example, the empty matching.
- Create and initialize one BFS run for each exposed node, and insert it in [math]\displaystyle{ S }[/math].
Induction step
Abstract view: In the [math]\displaystyle{ i }[/math]-th iteration:
- Process all nodes of distance [math]\displaystyle{ i-1 }[/math] in the queues of all BFS runs.
- For each BFS run: If it had no nodes to process in step 1, remove it from [math]\displaystyle{ S }[/math].
- For each edge [math]\displaystyle{ \{v,w\} }[/math] that is seen by one BFS run from [math]\displaystyle{ v }[/math] and by one BFS run from [math]\displaystyle{ w }[/math]:
- Choose two such BFS runs and such an edge [math]\displaystyle{ \{v,w\} }[/math], say.
- Augment [math]\displaystyle{ M }[/math] along the concatenation of the paths in the arborescences of these two BFS runs from the start nodes to [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math], respectively, and of [math]\displaystyle{ \{v,w\} }[/math].
- Remove these two BFS runs from the set of all BFS runs.
Proof: The invariant is obviously maintained. To prove the variant, consider an execution of step 3.2. Let [math]\displaystyle{ p }[/math] denote the path along which the augmentation takes place. Let [math]\displaystyle{ p_1 }[/math] and [math]\displaystyle{ p_2 }[/math] be the paths in the two arborescences from the two start nodes to [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math], respectively. So, it is [math]\displaystyle{ p=p_1+\{v,w\}+p_2 }[/math]. Suppose augmenting along [math]\displaystyle{ p }[/math] has created a new augmenting path [math]\displaystyle{ q }[/math]. We have to show that [math]\displaystyle{ q }[/math] is longer than [math]\displaystyle{ p }[/math].
In the orientation of [math]\displaystyle{ p }[/math] from the start node of [math]\displaystyle{ p_1 }[/math] to the start node of [math]\displaystyle{ p_2 }[/math], let [math]\displaystyle{ v' }[/math] be the first node of [math]\displaystyle{ p }[/math] that also belongs to [math]\displaystyle{ q }[/math], and let [math]\displaystyle{ w' }[/math] be the last node of [math]\displaystyle{ p }[/math] that also belongs to [math]\displaystyle{ q }[/math]. Since [math]\displaystyle{ q }[/math] has become augmenting by the augmentation along [math]\displaystyle{ p }[/math], [math]\displaystyle{ q }[/math] must share at least one edge with [math]\displaystyle{ p }[/math], which means [math]\displaystyle{ v'\neq w' }[/math]. Let [math]\displaystyle{ q_1 }[/math] and [math]\displaystyle{ q_2 }[/math] denote the loose ends of [math]\displaystyle{ q }[/math] up to [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math], respectively. If [math]\displaystyle{ v' }[/math] belongs to [math]\displaystyle{ p_1 }[/math], it must be [math]\displaystyle{ v'=v }[/math] because, otherwise, the concatenation [math]\displaystyle{ p_1+q_1 }[/math] would yield an augmenting path that is shorter than [math]\displaystyle{ p }[/math], which contradicts the induction hypothesis. Mirror-symmetrically, if [math]\displaystyle{ w' }[/math] belongs to [math]\displaystyle{ p_2 }[/math], it must be [math]\displaystyle{ w'=w }[/math]. In summary, [math]\displaystyle{ v'\neq w' }[/math] implies that [math]\displaystyle{ v'=v }[/math] and [math]\displaystyle{ w'=w }[/math] is the only possible scenario.
Note that the edges of [math]\displaystyle{ q_1 }[/math] and [math]\displaystyle{ q_2 }[/math] incident to [math]\displaystyle{ v' }[/math] and [math]\displaystyle{ w' }[/math], respectively, are unmatched because [math]\displaystyle{ v' }[/math] and [math]\displaystyle{ w' }[/math] are matched by edges of [math]\displaystyle{ p }[/math]. Immediately before the current iteration, the edges over which the BFS runs entered [math]\displaystyle{ v' }[/math] and [math]\displaystyle{ w' }[/math], respectively, were matched. Therefore, the parities of the lengths of [math]\displaystyle{ p_1 }[/math] and [math]\displaystyle{ q_1 }[/math] are different. In particular, the lengths themselves are different. Again, if [math]\displaystyle{ q_1 }[/math] were shorter than [math]\displaystyle{ p_1 }[/math], [math]\displaystyle{ p_1+q_1 }[/math] were shorter than [math]\displaystyle{ p }[/math]. However, [math]\displaystyle{ p_1+q_1 }[/math] was augmenting immediately before the current iteration, which contradicts the induction hypothesis. For the same reasons, [math]\displaystyle{ q_2 }[/math] is longer than [math]\displaystyle{ p_2 }[/math]. In summary, [math]\displaystyle{ q }[/math] is longer than [math]\displaystyle{ p }[/math].
Remark: The union of all arborescences at termination is called a Hungarian forest in the literature. The algorithm proves, constructively, that a matching in a bipartite graph is cardinality-maximal if, and only if, a Hungarian forest exists.
Correctness
Due to the invariant, the matching is cardinality-maximal if the main loop terminates. Termination of the main loop results from the following complexity considerations.
Complexity
Statement: The asymptotic complexity is in [math]\displaystyle{ \mathcal{O}(m\!\cdot\!\sqrt{n}) }[/math], where [math]\displaystyle{ n=|V| }[/math] and [math]\displaystyle{ m=|E| }[/math].
Proof: Each edge is touched at most twice in each iteration, once from either incident node. Hence, it suffices to prove that the number of iterations is in [math]\displaystyle{ \mathcal{O}(\sqrt{n}) }[/math].
After [math]\displaystyle{ \lceil\sqrt{n}\rceil }[/math] iterations, any augmenting path has a length of at least [math]\displaystyle{ \lceil\sqrt{n}\rceil }[/math]. Consequently, there cannot be more than [math]\displaystyle{ \lfloor\sqrt{n}\rfloor }[/math] node-disjoint augmenting paths at this moment. The remark on the proof of Berge's theorem implies that the maximal cardinality of a matching is at most [math]\displaystyle{ |M|+\lfloor\sqrt{n}\rfloor }[/math]. Therefore, [math]\displaystyle{ |M| }[/math] is maximal after another [math]\displaystyle{ \lfloor\sqrt{n}\rfloor }[/math] iterations. The claim follows.