Hopcroft-Karp: Difference between revisions

From Algowiki
Jump to navigation Jump to search
 
(19 intermediate revisions by the same user not shown)
Line 6: Line 6:
'''Type of algorithm:''' loop.
'''Type of algorithm:''' loop.


'''Auxiliary data structures:'''
'''Auxiliary data:'''
a [[Sets and sequences|set]] <math>S</math> of [[Breadth-first search|BFS]] runs as iterators (cf. the [[Graph traversal#Remarks|corresponding remark]]). The [[Matchings in graphs#Finding augmenting paths|modification to find augmenting paths]] is applied.
a [[Sets and sequences|set]] <math>S</math> of [[Breadth-first search|BFS]] runs as iterators (cf. the [[Graph traversal#Remarks|remark on iterators]]). The [[Matchings in graphs#Finding augmenting paths|modification to find augmenting paths]] is applied.


'''Invariant:''' After <math>i\geq 0</math> iterations:
'''Invariant:''' After <math>i\geq 0</math> iterations:
# <math>M</math> is a matching in <math>G</math>.
# <math>M</math> is a matching in <math>G</math>.
# All exposed nodes that are end nodes of augmenting paths, are in <math>S</math>.
# All exposed nodes that are end nodes of augmenting paths, are start nodes of BFS iterators in <math>S</math> (there may be start nodes in <math>S</math> that are ''not'' end nodes of augmenting paths).
# Each node in <math>S</math> is the root of an arborescence:
# Each node in <math>S</math> is the root of an arborescence:
## These arborescences are pairwise node-disjoint.
## These arborescences are pairwise node-disjoint.
Line 26: Line 26:
'''Abstract view:'''
'''Abstract view:'''
# Start with an arbitrary matching, for example, the empty matching.
# Start with an arbitrary matching, for example, the empty matching.
# Create and initialize one BFS run for each exposed node.
# Create and initialize one BFS run for each exposed node, and insert it in <math>S</math>.


== Induction step ==
== Induction step ==
Line 34: Line 34:
# Process all nodes of distance <math>i-1</math> in the queues of all BFS runs.
# Process all nodes of distance <math>i-1</math> in the queues of all BFS runs.
# For each BFS run: If it had no nodes to process in step 1, remove it from <math>S</math>.
# For each BFS run: If it had no nodes to process in step 1, remove it from <math>S</math>.
# For each edge <matH>\{v,w\}</math> that is seen by more than one:
# For each edge <math>\{v,w\}</math> that is seen by one BFS run from <math>v</math> and by one BFS run from <math>w</math>:
## Choose two such BFS runs and such an edge <math>\{v,w\}</math>, say.
## Choose two such BFS runs and such an edge <math>\{v,w\}</math>, say.
## [[Matchings in graphs#Alternating and augmenting paths|Augment]] <math>M</math> along the concatenation of the paths in the arborescences of these two BFS runs from the start nodes to <math>v</math> and <math>w</math>, respectively, and of <math>\{v,w\}</math>.
## [[Matchings in graphs#Alternating and augmenting paths|Augment]] <math>M</math> along the concatenation of the paths in the arborescences of these two BFS runs from the start nodes to <math>v</math> and <math>w</math>, respectively, and of <math>\{v,w\}</math>.
Line 40: Line 40:


'''Proof:'''
'''Proof:'''
The invariant is obviously maintained. To prove the variant, consider an execution of step 3.2. Let <math>p</math> denote the path along which the augmentation takes place. Let <math>p_1</math> and <math>p'_2</math> be the paths in the arborescences from the start nodes to <math>v</math> and <math>w</math>, respectively. So, it is <math>p=p_1+\{v,w\}+p_2</math>. Suppose augmenting along <math>p</math> has created a new augmenting path <math>q</math>. We have to show that <math>q</math> is longer than <math>p</math>.
The invariant is obviously maintained. To prove the variant, consider an execution of step 3.2. Let <math>p</math> denote the path along which the augmentation takes place. Let <math>p_1</math> and <math>p_2</math> be the paths in the two arborescences from the two start nodes to <math>v</math> and <math>w</math>, respectively. So, it is <math>p=p_1+\{v,w\}+p_2</math>. Suppose augmenting along <math>p</math> has created a new augmenting path <math>q</math>. We have to show that <math>q</math> is longer than <math>p</math>.


In either orientation of <math>p</math>, let <math>v'</math> be the first node of <math>p</math> that also belongs to <math>q</math>, and let <math>w'</math> be the last node of <math>p</math> that also belongs to <math>q</math>. Since <math>\{v,w\}</math> belongs to <math>q</math>, we may assume, without loss of generality, that <math>v</math> is on <math>p_1</math> and <math>w</math> is on <math>p_2</math>. Let <math>q_1</math> and <math>q_2</math> denote the ''loose ends'' of <math>q</math> up to <math>v</math> and <math>w</math>, respectively.
In the orientation of <math>p</math> from the start node of <math>p_1</math> to the start node of <math>p_2</math>, let <math>v'</math> be the first node of <math>p</math> that also belongs to <math>q</math>, and let <math>w'</math> be the last node of <math>p</math> that also belongs to <math>q</math>. Since <math>q</math> has become augmenting by the augmentation along <math>p</math>, <math>q</math> must share at least one edge with <math>p</math>, which  means <math>v'\neq w'</math>. Let <math>q_1</math> and <math>q_2</math> denote the ''loose ends'' of <math>q</math> up to <math>v</math> and <math>w</math>, respectively. If <math>v'</math> belongs to <math>p_1</math>, it must be <math>v'=v</math> because, otherwise, the concatenation <math>p_1+q_1</math> would yield an augmenting path that is shorter than <math>p</math>, which contradicts the induction hypothesis. Mirror-symmetrically, if <math>w'</math> belongs to <math>p_2</math>, it must be <math>w'=w</math>. In summary, <math>v'\neq w'</math> implies that <math>v'=v</math> and <math>w'=w</math> is the only possible scenario.


Note that <math>v'=v</math> must hold because, otherwise, the concatenation <math>p_1+q_1</math> would yield an augmenting path that is shorter than <math>p</math>. This would contradict the induction hypothesis. Analogously, we may conclude <math>w'=w</math>. Note that each edge of <math>q</math> that is incident to exactly one node on <math>p</math> must be unmatched. Therefore, the parities of the length of <math>p_1</math> and <math>q_1</math> are different. Again, if <math>q_1</math> were shorter than <math>p_1</math>, <math>p_1+q_1</math> were an augmenting path shorter than <math>p</math>. For the same reason, <math>q_2</math> is longer than <math>p_2</math>. In summary, <math>q</math> is longer than <math>p</math>.
Note that the edges of <math>q_1</math> and <math>q_2</math> incident to <math>v'</math> and <math>w'</math>, respectively, are unmatched because <math>v'</math> and <math>w'</math> are matched by edges of <math>p</math>. Immediately before the current iteration, the edges over which the BFS runs entered <math>v'</math> and <math>w'</math>, respectively, were matched. Therefore, the parities of the lengths of <math>p_1</math> and <math>q_1</math> are different. In particular, the lengths themselves are different. Again, if <math>q_1</math> were shorter than <math>p_1</math>, <math>p_1+q_1</math> were shorter than <math>p</math>. However, <math>p_1+q_1</math> was augmenting immediately before the current iteration, which contradicts the induction hypothesis. For the same reasons, <math>q_2</math> is longer than <math>p_2</math>. In summary, <math>q</math> is longer than <math>p</math>.


'''Remark:'''
'''Remark:'''
Line 56: Line 56:


'''Statement:'''
'''Statement:'''
The asymptotic complexity is in <math>\mathcal{O}(m\!\cdot\!\sqrt{n})</math>, where <math>n=|V|</math> and <math>m=E|</math>.
The asymptotic complexity is in <math>\mathcal{O}(m\!\cdot\!\sqrt{n})</math>, where <math>n=|V|</math> and <math>m=|E|</math>.


'''Proof:'''
'''Proof:'''
Each edge is touched at most twice in each iteration
Each edge is touched at most twice in each iteration, once from either incident node. Hence, it suffices to prove that the number of iterations is in <math>\mathcal{O}(\sqrt{n})</math>.
After <math>\lfloor\sqrt{n}\rfloor</math> iterations, any augmenting path has a length of at least <math>\lfloor\sqrt{n}\rfloor</math>. It suffices to show that all
 
After <math>\lceil\sqrt{n}\rceil</math> iterations, any augmenting path has a length of at least <math>\lceil\sqrt{n}\rceil</math>. Consequently, there cannot be more than <math>\lfloor\sqrt{n}\rfloor</math> node-disjoint augmenting paths at this moment. The remark on the proof of [[Cardinality-maximal matching#Berge's theorem|Berge's theorem]] implies that the maximal cardinality of a matching is at most <math>|M|+\lfloor\sqrt{n}\rfloor</math>. Therefore, <math>|M|</math> is maximal after another <math>\lfloor\sqrt{n}\rfloor</math> iterations. The claim follows.

Latest revision as of 15:00, 6 December 2014

Abstract view

Algorithmic problem: Cardinality-maximal matching in bipartite graphs.

Type of algorithm: loop.

Auxiliary data: a set [math]\displaystyle{ S }[/math] of BFS runs as iterators (cf. the remark on iterators). The modification to find augmenting paths is applied.

Invariant: After [math]\displaystyle{ i\geq 0 }[/math] iterations:

  1. [math]\displaystyle{ M }[/math] is a matching in [math]\displaystyle{ G }[/math].
  2. All exposed nodes that are end nodes of augmenting paths, are start nodes of BFS iterators in [math]\displaystyle{ S }[/math] (there may be start nodes in [math]\displaystyle{ S }[/math] that are not end nodes of augmenting paths).
  3. Each node in [math]\displaystyle{ S }[/math] is the root of an arborescence:
    1. These arborescences are pairwise node-disjoint.
    2. Each leaf of each arborescence has distance [math]\displaystyle{ i }[/math] form the root of this arborescence (both in [math]\displaystyle{ G }[/math] and in the arborescence).

Variant: The shortest length of an augmenting path in terms of the number of edges is increased.

Break condition: [math]\displaystyle{ S=\emptyset }[/math].

Induction basis

Abstract view:

  1. Start with an arbitrary matching, for example, the empty matching.
  2. Create and initialize one BFS run for each exposed node, and insert it in [math]\displaystyle{ S }[/math].

Induction step

Abstract view: In the [math]\displaystyle{ i }[/math]-th iteration:

  1. Process all nodes of distance [math]\displaystyle{ i-1 }[/math] in the queues of all BFS runs.
  2. For each BFS run: If it had no nodes to process in step 1, remove it from [math]\displaystyle{ S }[/math].
  3. For each edge [math]\displaystyle{ \{v,w\} }[/math] that is seen by one BFS run from [math]\displaystyle{ v }[/math] and by one BFS run from [math]\displaystyle{ w }[/math]:
    1. Choose two such BFS runs and such an edge [math]\displaystyle{ \{v,w\} }[/math], say.
    2. Augment [math]\displaystyle{ M }[/math] along the concatenation of the paths in the arborescences of these two BFS runs from the start nodes to [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math], respectively, and of [math]\displaystyle{ \{v,w\} }[/math].
    3. Remove these two BFS runs from the set of all BFS runs.

Proof: The invariant is obviously maintained. To prove the variant, consider an execution of step 3.2. Let [math]\displaystyle{ p }[/math] denote the path along which the augmentation takes place. Let [math]\displaystyle{ p_1 }[/math] and [math]\displaystyle{ p_2 }[/math] be the paths in the two arborescences from the two start nodes to [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math], respectively. So, it is [math]\displaystyle{ p=p_1+\{v,w\}+p_2 }[/math]. Suppose augmenting along [math]\displaystyle{ p }[/math] has created a new augmenting path [math]\displaystyle{ q }[/math]. We have to show that [math]\displaystyle{ q }[/math] is longer than [math]\displaystyle{ p }[/math].

In the orientation of [math]\displaystyle{ p }[/math] from the start node of [math]\displaystyle{ p_1 }[/math] to the start node of [math]\displaystyle{ p_2 }[/math], let [math]\displaystyle{ v' }[/math] be the first node of [math]\displaystyle{ p }[/math] that also belongs to [math]\displaystyle{ q }[/math], and let [math]\displaystyle{ w' }[/math] be the last node of [math]\displaystyle{ p }[/math] that also belongs to [math]\displaystyle{ q }[/math]. Since [math]\displaystyle{ q }[/math] has become augmenting by the augmentation along [math]\displaystyle{ p }[/math], [math]\displaystyle{ q }[/math] must share at least one edge with [math]\displaystyle{ p }[/math], which means [math]\displaystyle{ v'\neq w' }[/math]. Let [math]\displaystyle{ q_1 }[/math] and [math]\displaystyle{ q_2 }[/math] denote the loose ends of [math]\displaystyle{ q }[/math] up to [math]\displaystyle{ v }[/math] and [math]\displaystyle{ w }[/math], respectively. If [math]\displaystyle{ v' }[/math] belongs to [math]\displaystyle{ p_1 }[/math], it must be [math]\displaystyle{ v'=v }[/math] because, otherwise, the concatenation [math]\displaystyle{ p_1+q_1 }[/math] would yield an augmenting path that is shorter than [math]\displaystyle{ p }[/math], which contradicts the induction hypothesis. Mirror-symmetrically, if [math]\displaystyle{ w' }[/math] belongs to [math]\displaystyle{ p_2 }[/math], it must be [math]\displaystyle{ w'=w }[/math]. In summary, [math]\displaystyle{ v'\neq w' }[/math] implies that [math]\displaystyle{ v'=v }[/math] and [math]\displaystyle{ w'=w }[/math] is the only possible scenario.

Note that the edges of [math]\displaystyle{ q_1 }[/math] and [math]\displaystyle{ q_2 }[/math] incident to [math]\displaystyle{ v' }[/math] and [math]\displaystyle{ w' }[/math], respectively, are unmatched because [math]\displaystyle{ v' }[/math] and [math]\displaystyle{ w' }[/math] are matched by edges of [math]\displaystyle{ p }[/math]. Immediately before the current iteration, the edges over which the BFS runs entered [math]\displaystyle{ v' }[/math] and [math]\displaystyle{ w' }[/math], respectively, were matched. Therefore, the parities of the lengths of [math]\displaystyle{ p_1 }[/math] and [math]\displaystyle{ q_1 }[/math] are different. In particular, the lengths themselves are different. Again, if [math]\displaystyle{ q_1 }[/math] were shorter than [math]\displaystyle{ p_1 }[/math], [math]\displaystyle{ p_1+q_1 }[/math] were shorter than [math]\displaystyle{ p }[/math]. However, [math]\displaystyle{ p_1+q_1 }[/math] was augmenting immediately before the current iteration, which contradicts the induction hypothesis. For the same reasons, [math]\displaystyle{ q_2 }[/math] is longer than [math]\displaystyle{ p_2 }[/math]. In summary, [math]\displaystyle{ q }[/math] is longer than [math]\displaystyle{ p }[/math].

Remark: The union of all arborescences at termination is called a Hungarian forest in the literature. The algorithm proves, constructively, that a matching in a bipartite graph is cardinality-maximal if, and only if, a Hungarian forest exists.

Correctness

Due to the invariant, the matching is cardinality-maximal if the main loop terminates. Termination of the main loop results from the following complexity considerations.

Complexity

Statement: The asymptotic complexity is in [math]\displaystyle{ \mathcal{O}(m\!\cdot\!\sqrt{n}) }[/math], where [math]\displaystyle{ n=|V| }[/math] and [math]\displaystyle{ m=|E| }[/math].

Proof: Each edge is touched at most twice in each iteration, once from either incident node. Hence, it suffices to prove that the number of iterations is in [math]\displaystyle{ \mathcal{O}(\sqrt{n}) }[/math].

After [math]\displaystyle{ \lceil\sqrt{n}\rceil }[/math] iterations, any augmenting path has a length of at least [math]\displaystyle{ \lceil\sqrt{n}\rceil }[/math]. Consequently, there cannot be more than [math]\displaystyle{ \lfloor\sqrt{n}\rfloor }[/math] node-disjoint augmenting paths at this moment. The remark on the proof of Berge's theorem implies that the maximal cardinality of a matching is at most [math]\displaystyle{ |M|+\lfloor\sqrt{n}\rfloor }[/math]. Therefore, [math]\displaystyle{ |M| }[/math] is maximal after another [math]\displaystyle{ \lfloor\sqrt{n}\rfloor }[/math] iterations. The claim follows.