Breadth-first search: Difference between revisions

From Algowiki
Jump to navigation Jump to search
No edit summary
 
(50 intermediate revisions by 3 users not shown)
Line 12: Line 12:
== Abstract view ==
== Abstract view ==


'''Definition:‘‘‘
'''Definition:'''
On this page, the '''distance''' of a node <math>v\in V</math> is the minimal number of arcs on a path from the start node <math>s</math> to <math>v</math>.
On this page, the '''distance''' of a node <math>v\in V</math> is the minimal number of arcs on a path from the start node <math>s</math> to <math>v</math>.
'''Additional output:''' An [[Basic graph definitions#Forests, trees, branchings, arborescences|arborescence]] <math>A=(V',A')</math> rooted at <math>s</math> such that <math>V'\subseteq V</math> is the set of all nodes reachable from <math>s</math> (including <math>s</math>). For each node <math>v\in V'</math>, the path from <math>s</math> to <math>v</math> in <math>A</math> is a shortest <math>(s,v)</math>-path in <math>G</math> with respect to that notion of distance.


'''Specific characteristic:'''
'''Specific characteristic:'''
The nodes are finished in the order of increasing distance.
The nodes are finished in the order of increasing distance (which is not unique, in general).


'''Auxiliary data:'''
'''Auxiliary data:'''
A queue <math>Q</math> whose elements are nodes in <math>V</math>.
A [[Sets and sequences#Stacks and queues|FIFO queue]] <math>Q</math> whose elements are nodes in <math>V</math>.


'''Invariant:'''  
'''Invariant:'''  
Before and after each iteration:
Before and after each iteration:
# There is a '''current distance''' <math>k\in\mathbb{N}</math>.
# There is a '''current distance''' <math>k\in\mathbb{N}</math>.
# Let <math>n</math> denote the current size of <math>Q</math>. There is <math>\ell\in\{1,\ldots,n\}</math> such that the first <math>\ell</math> elements of <math>Q</math> have distance <math>k</math> and the last <math>n-\ell</math> elements have distance <math>k+1</math>.
# All nodes with distance at most <math>k</math> are already seen.
# Let <math>n</math> denote the current size of <math>Q</math>. There is <math>\ell\in\{1,\ldots,n\}</math> such that the first <math>\ell</math> elements of <math>Q</math> have distance <math>k</math> and the last <math>n-\ell</math> elements have distance <math>k+1</math> (in particular, ''all'' elements have distance <math>k</math> if, and only if, it is <math>\ell=n</math>).


'''Variant:''' One node is removed from <math>Q</math>.
'''Variant:''' One node is removed from <math>Q</math>.


'''Break condition:''' <math>S=\emptyset</math>.
'''Break condition:''' <math>Q=\emptyset</math>.


== Induction basis ==
== Induction basis ==


'''Abstract view:''' No node is finished. The start node is seen, no other node is seen. The start node is the only element of <math>S</math>. The current arc of the start node is its first outgoing arc. The arborescence <math>T</math> is initialized so as to contain all nodes and no arcs.
'''Abstract view:''' The start node is seen, no other node is seen. The start node is the only element of <math>Q</math>. The output sequence is empty and the arborescence <math>A</math> contains the start node <math>s</math> and nothing else.


'''Implementation:''' Obvious.
'''Implementation:''' Obvious.
Line 41: Line 44:


'''Abstract view:'''
'''Abstract view:'''
# Let <math>v</math> be the last node of <math>p</math> (=the top element of <math>S</math>).
# Extract the first element <math>v</math> from <math>Q</math>.
# While the current arc of <math>v</math> is not void and the head of the current arc is labeled as seen, move the current arc one step forward.
# Append <math>v</math> to the output sequence.
# If the current arc of <math>v</math> is not void, add the current arc to <math>T</math>, push the head of the current arc on <math>S</math> and label this node as seen.
# For each outgoing arc <math>(v,w)</math> of <math>v</math> such that <math>w</math> is not yet seen:
# Otherwise, remove <math>v</math> from <math>S</math> and label <math>v</math> as finished.
## Insert <math>w</math> and <math>(v,w)</math> in <math>A</math>.
## Label <math>w</math> as seen.
## Append <math>w</math> to <math>Q</math>.


'''Implementation:''' Obvious.
'''Implementation:''' Obvious.


'''Proof:'''
'''Proof:'''
The loop variant is obviously fulfilled.
The variant is obviously fulfilled. The first point of the invariant is only a notation, so nothing is to show for that.


The first point of the invariant is obviously fulfilled. The second point follows from the fact that the current arc of a node is initialized accordingly and only changed after the node is labele as seen. Point 2.1 follows from the fact that the current arc never skips an arc to an unseen node.
Let <math>w\in V</math> not yet seen before the current iteration, and let <math>(v,w)\in A</math>. The induction hypothesis (second point of the invariant) implies that the distance of <math>w</math> is greater than <math>k</math>. However, <math>v</math> has distance <math>k</math>, so the distance of <math>w</math> cannot be greater than <math>k+1</math>. Im summary, this proves the third point of the invariant.


For a contradiction to 2.2, note that any lexicographically smaller path <math>p'</math> has at least one node that is lexicographically smaller than <math>p</math>. By induction hypothesis (point 4), these nodes are finished. Let <math>w</math> denote the last finished node on <math>p'</math>. Since <math>v</math> is not finished immediately before the iteration, <math>w</math> has a successor on <math>p'</math>, which is unfinished. However, obviously, a nodes is marked finished not before all of its immediate successors are finished.
The critical iterations for the second invariant are those where <math>k</math> increases. These are exactly the iterations in which <math>\ell=1</math> immediately before (and, consequently, <math>\ell=n</math> immediately after) that iteration. All nodes with old distance <math>k</math> have been seen and no such node is in <math>Q</math> anymore after the current iteration. Therefore, all nodes <math>w</math> such that <math>(v,w)\in A</math> for some <math>v</math> with distance <math>k</math> have been seen as well. Clearly, this includes all nodes with old distance <math>k+1</math>, in other words, the nodes with new distance <math>k</math>.
 
When a node is pushed on <math>S</math>, it is neither seen nor finished immediately before that iteration and labaled as seen in that iteration. This proves the first sentence. The other statements of point 4 follow from the observation that <math>p</math> increases lexicographically in each iteration.


== Correctness ==
== Correctness ==


It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Immediately before the last iteration, <math>p</math> consists of the start node <math>s</math> only, and the current arc of <math>s</math> is void. Therefore, ''all'' nodes reachable from <math>s</math> except for <math>s</math> itself are lexicographically smaller than <math>p</math>. Due to point 4 of the invariant, all of these nodes are finished. In the last iteration, <math>s</math> is finished as well.
It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Based on a straightforward induction on the iterations, the third invariant ensures that the nodes leave <math>Q</math> in ascending order of their distances. Thus, it remains to show that ''all'' nodes that are reachable from <math>s</math> have been seen.


Suppose for a contradictin that the specific characteristic is not fulfilled by <math>v,w\in V</math>. That is, <math>v</math> is seen and finished before <math>w</math>. Consider the situation immediately after the iteration in which <math>v</math> is finished. So, in this moment, <math>w</math> is not yet finished. By assumption, there is a path from <math>v</math> to <math>w</math> in <math>G</math>. There are two successive nodes on this path, <math>x</math> and <math>y</math>, such that <math>x</math> is finished and <math>y</math> is not at that moment. However, obviously, <math>x</math> cannot be finished prior to <math>y</math>.
Suppose some node <math>v</math> is reachable from <math>s</math> but has not been seen. Let <math>p</math> be an <math>(s,v)</math>-path. There is an arc <math>(x,y)</math> on <math>p</math> such that <math>x</math> is seen and <math>y</math> is not. However, since <math>x</math> is seen, <math>x</math> has been inserted in <math>Q</math> in some iteration and, consequently, removed from <math>Q</math> in a later iteration. However, in that later iteration, <math>y</math> had been seen.


== Complexity ==
== Complexity ==


'''Statement:''' The asymptotic complexity is in <math>\Theta(|V|+|A|)</math> in the best and worst case.
'''Statement:''' The asymptotic complexity is in <math>\Theta(|V|+|A|)</math> in the worst case.
 
'''Proof:'''
The complexity of each iteration is linear in the number of arcs leaving the current node <math>v</math>. Therefore, the complexity of the entire loop is in <math>\Theta(|A|)</math>.  The complexity of the initialization is dominated by the initialization of the seen labels of all nodes, which is clearly in <math>\Theta(|V|)</math>.
 
== Remark ==


'''Proof:''' From every node, the algorithm goes forward at most once for each of its outgoing arcs. And from each node, th algorithm goes backwards only once. Obviously, each of these steps requires a constant number of operations.
The nodes in <math>Q</math> form kind of a "frontier line", which separates the nodes that already left <math>Q</math> from the nodes that have not yet entered <math>Q</math> so far.
More specifically, each path from some seen node to some unseen node contains at least one node that is currently stored in <math>Q</math>.


To see that, we apply a straightforward induction on the iterations. Let <math>p</math> be a path from some seen node outside <math>Q</math> to some node that is not yet seen immediately after the current iteration. If <math>p</math> does not contain <math>v</math>, the claim follows from the induction hypothesis. So assume <math>p</math> contains <math>v</math>. Let <math>x</math> be the last node on <math>p</math> that is already seen immediately after the current iteration. If <math>x=v</math>, the claim follows from the fact that all unseen nodes <math>w\in V</math> with <math>(v,w)\in A</math> were put in <math>Q</math>, so the immediate successor of <math>v</math> on <math>p</math> is in <math>Q</math> now. Otherwise, the claim follows from the induction hypothesis again.


== Pseudocode ==  
== Pseudocode ==  

Latest revision as of 07:50, 26 October 2015


General information

Algorithmic problem: Graph traversal

Type of algorithm: loop

Abstract view

Definition: On this page, the distance of a node [math]\displaystyle{ v\in V }[/math] is the minimal number of arcs on a path from the start node [math]\displaystyle{ s }[/math] to [math]\displaystyle{ v }[/math].

Additional output: An arborescence [math]\displaystyle{ A=(V',A') }[/math] rooted at [math]\displaystyle{ s }[/math] such that [math]\displaystyle{ V'\subseteq V }[/math] is the set of all nodes reachable from [math]\displaystyle{ s }[/math] (including [math]\displaystyle{ s }[/math]). For each node [math]\displaystyle{ v\in V' }[/math], the path from [math]\displaystyle{ s }[/math] to [math]\displaystyle{ v }[/math] in [math]\displaystyle{ A }[/math] is a shortest [math]\displaystyle{ (s,v) }[/math]-path in [math]\displaystyle{ G }[/math] with respect to that notion of distance.

Specific characteristic: The nodes are finished in the order of increasing distance (which is not unique, in general).

Auxiliary data: A FIFO queue [math]\displaystyle{ Q }[/math] whose elements are nodes in [math]\displaystyle{ V }[/math].

Invariant: Before and after each iteration:

  1. There is a current distance [math]\displaystyle{ k\in\mathbb{N} }[/math].
  2. All nodes with distance at most [math]\displaystyle{ k }[/math] are already seen.
  3. Let [math]\displaystyle{ n }[/math] denote the current size of [math]\displaystyle{ Q }[/math]. There is [math]\displaystyle{ \ell\in\{1,\ldots,n\} }[/math] such that the first [math]\displaystyle{ \ell }[/math] elements of [math]\displaystyle{ Q }[/math] have distance [math]\displaystyle{ k }[/math] and the last [math]\displaystyle{ n-\ell }[/math] elements have distance [math]\displaystyle{ k+1 }[/math] (in particular, all elements have distance [math]\displaystyle{ k }[/math] if, and only if, it is [math]\displaystyle{ \ell=n }[/math]).

Variant: One node is removed from [math]\displaystyle{ Q }[/math].

Break condition: [math]\displaystyle{ Q=\emptyset }[/math].

Induction basis

Abstract view: The start node is seen, no other node is seen. The start node is the only element of [math]\displaystyle{ Q }[/math]. The output sequence is empty and the arborescence [math]\displaystyle{ A }[/math] contains the start node [math]\displaystyle{ s }[/math] and nothing else.

Implementation: Obvious.

Proof: Obvious.

Induction step

Abstract view:

  1. Extract the first element [math]\displaystyle{ v }[/math] from [math]\displaystyle{ Q }[/math].
  2. Append [math]\displaystyle{ v }[/math] to the output sequence.
  3. For each outgoing arc [math]\displaystyle{ (v,w) }[/math] of [math]\displaystyle{ v }[/math] such that [math]\displaystyle{ w }[/math] is not yet seen:
    1. Insert [math]\displaystyle{ w }[/math] and [math]\displaystyle{ (v,w) }[/math] in [math]\displaystyle{ A }[/math].
    2. Label [math]\displaystyle{ w }[/math] as seen.
    3. Append [math]\displaystyle{ w }[/math] to [math]\displaystyle{ Q }[/math].

Implementation: Obvious.

Proof: The variant is obviously fulfilled. The first point of the invariant is only a notation, so nothing is to show for that.

Let [math]\displaystyle{ w\in V }[/math] not yet seen before the current iteration, and let [math]\displaystyle{ (v,w)\in A }[/math]. The induction hypothesis (second point of the invariant) implies that the distance of [math]\displaystyle{ w }[/math] is greater than [math]\displaystyle{ k }[/math]. However, [math]\displaystyle{ v }[/math] has distance [math]\displaystyle{ k }[/math], so the distance of [math]\displaystyle{ w }[/math] cannot be greater than [math]\displaystyle{ k+1 }[/math]. Im summary, this proves the third point of the invariant.

The critical iterations for the second invariant are those where [math]\displaystyle{ k }[/math] increases. These are exactly the iterations in which [math]\displaystyle{ \ell=1 }[/math] immediately before (and, consequently, [math]\displaystyle{ \ell=n }[/math] immediately after) that iteration. All nodes with old distance [math]\displaystyle{ k }[/math] have been seen and no such node is in [math]\displaystyle{ Q }[/math] anymore after the current iteration. Therefore, all nodes [math]\displaystyle{ w }[/math] such that [math]\displaystyle{ (v,w)\in A }[/math] for some [math]\displaystyle{ v }[/math] with distance [math]\displaystyle{ k }[/math] have been seen as well. Clearly, this includes all nodes with old distance [math]\displaystyle{ k+1 }[/math], in other words, the nodes with new distance [math]\displaystyle{ k }[/math].

Correctness

It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Based on a straightforward induction on the iterations, the third invariant ensures that the nodes leave [math]\displaystyle{ Q }[/math] in ascending order of their distances. Thus, it remains to show that all nodes that are reachable from [math]\displaystyle{ s }[/math] have been seen.

Suppose some node [math]\displaystyle{ v }[/math] is reachable from [math]\displaystyle{ s }[/math] but has not been seen. Let [math]\displaystyle{ p }[/math] be an [math]\displaystyle{ (s,v) }[/math]-path. There is an arc [math]\displaystyle{ (x,y) }[/math] on [math]\displaystyle{ p }[/math] such that [math]\displaystyle{ x }[/math] is seen and [math]\displaystyle{ y }[/math] is not. However, since [math]\displaystyle{ x }[/math] is seen, [math]\displaystyle{ x }[/math] has been inserted in [math]\displaystyle{ Q }[/math] in some iteration and, consequently, removed from [math]\displaystyle{ Q }[/math] in a later iteration. However, in that later iteration, [math]\displaystyle{ y }[/math] had been seen.

Complexity

Statement: The asymptotic complexity is in [math]\displaystyle{ \Theta(|V|+|A|) }[/math] in the worst case.

Proof: The complexity of each iteration is linear in the number of arcs leaving the current node [math]\displaystyle{ v }[/math]. Therefore, the complexity of the entire loop is in [math]\displaystyle{ \Theta(|A|) }[/math]. The complexity of the initialization is dominated by the initialization of the seen labels of all nodes, which is clearly in [math]\displaystyle{ \Theta(|V|) }[/math].

Remark

The nodes in [math]\displaystyle{ Q }[/math] form kind of a "frontier line", which separates the nodes that already left [math]\displaystyle{ Q }[/math] from the nodes that have not yet entered [math]\displaystyle{ Q }[/math] so far. More specifically, each path from some seen node to some unseen node contains at least one node that is currently stored in [math]\displaystyle{ Q }[/math].

To see that, we apply a straightforward induction on the iterations. Let [math]\displaystyle{ p }[/math] be a path from some seen node outside [math]\displaystyle{ Q }[/math] to some node that is not yet seen immediately after the current iteration. If [math]\displaystyle{ p }[/math] does not contain [math]\displaystyle{ v }[/math], the claim follows from the induction hypothesis. So assume [math]\displaystyle{ p }[/math] contains [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ x }[/math] be the last node on [math]\displaystyle{ p }[/math] that is already seen immediately after the current iteration. If [math]\displaystyle{ x=v }[/math], the claim follows from the fact that all unseen nodes [math]\displaystyle{ w\in V }[/math] with [math]\displaystyle{ (v,w)\in A }[/math] were put in [math]\displaystyle{ Q }[/math], so the immediate successor of [math]\displaystyle{ v }[/math] on [math]\displaystyle{ p }[/math] is in [math]\displaystyle{ Q }[/math] now. Otherwise, the claim follows from the induction hypothesis again.

Pseudocode

BFS(G,s)
 1  for each vertex uG.V - {s}
 2       u.color = WHITE
 3       u.d = ∞
 4       u.π = NIL
 5  s.color = GRAY
 6  s.d = 0
 7  s.π = NIL
 8  Q = Ø
 9  ENQUE(Q, s)
10  while Q ≠ Ø
11      u = DEQUEUE(Q)
12      for each vG.Adj[u]
13              if v.color == WHITE
14                       v.color = GRAY
15                       v.d = u.d + 1
16                       v.π = u
17                       ENQUEUE(Q, v)
18      u.color = BLACK