Breadth-first search: Difference between revisions
(56 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:Algorithms]] | |||
[[Category:Search Algorithms]] | |||
[[Category:Tree Algorithms]] | |||
[[Category:Graph Traversal]] | |||
== General information == | |||
'''Algorithmic problem:''' [[Graph traversal]] | |||
'''Type of algorithm:''' loop | |||
== Abstract view == | |||
'''Definition:''' | |||
On this page, the '''distance''' of a node <math>v\in V</math> is the minimal number of arcs on a path from the start node <math>s</math> to <math>v</math>. | |||
'''Additional output:''' An [[Basic graph definitions#Forests, trees, branchings, arborescences|arborescence]] <math>A=(V',A')</math> rooted at <math>s</math> such that <math>V'\subseteq V</math> is the set of all nodes reachable from <math>s</math> (including <math>s</math>). For each node <math>v\in V'</math>, the path from <math>s</math> to <math>v</math> in <math>A</math> is a shortest <math>(s,v)</math>-path in <math>G</math> with respect to that notion of distance. | |||
'''Specific characteristic:''' | |||
The nodes are finished in the order of increasing distance (which is not unique, in general). | |||
'''Auxiliary data:''' | |||
A [[Sets and sequences#Stacks and queues|FIFO queue]] <math>Q</math> whose elements are nodes in <math>V</math>. | |||
'''Invariant:''' | |||
Before and after each iteration: | |||
# There is a '''current distance''' <math>k\in\mathbb{N}</math>. | |||
# All nodes with distance at most <math>k</math> are already seen. | |||
# Let <math>n</math> denote the current size of <math>Q</math>. There is <math>\ell\in\{1,\ldots,n\}</math> such that the first <math>\ell</math> elements of <math>Q</math> have distance <math>k</math> and the last <math>n-\ell</math> elements have distance <math>k+1</math> (in particular, ''all'' elements have distance <math>k</math> if, and only if, it is <math>\ell=n</math>). | |||
'''Variant:''' One node is removed from <math>Q</math>. | |||
'''Break condition:''' <math>Q=\emptyset</math>. | |||
== Induction basis == | |||
'''Abstract view:''' The start node is seen, no other node is seen. The start node is the only element of <math>Q</math>. The output sequence is empty and the arborescence <math>A</math> contains the start node <math>s</math> and nothing else. | |||
'''Implementation:''' Obvious. | |||
'''Proof:''' Obvious. | |||
== Induction step == | |||
'''Abstract view:''' | |||
# Extract the first element <math>v</math> from <math>Q</math>. | |||
# Append <math>v</math> to the output sequence. | |||
# For each outgoing arc <math>(v,w)</math> of <math>v</math> such that <math>w</math> is not yet seen: | |||
## Insert <math>w</math> and <math>(v,w)</math> in <math>A</math>. | |||
## Label <math>w</math> as seen. | |||
## Append <math>w</math> to <math>Q</math>. | |||
'''Implementation:''' Obvious. | |||
'''Proof:''' | |||
The variant is obviously fulfilled. The first point of the invariant is only a notation, so nothing is to show for that. | |||
Let <math>w\in V</math> not yet seen before the current iteration, and let <math>(v,w)\in A</math>. The induction hypothesis (second point of the invariant) implies that the distance of <math>w</math> is greater than <math>k</math>. However, <math>v</math> has distance <math>k</math>, so the distance of <math>w</math> cannot be greater than <math>k+1</math>. Im summary, this proves the third point of the invariant. | |||
The critical iterations for the second invariant are those where <math>k</math> increases. These are exactly the iterations in which <math>\ell=1</math> immediately before (and, consequently, <math>\ell=n</math> immediately after) that iteration. All nodes with old distance <math>k</math> have been seen and no such node is in <math>Q</math> anymore after the current iteration. Therefore, all nodes <math>w</math> such that <math>(v,w)\in A</math> for some <math>v</math> with distance <math>k</math> have been seen as well. Clearly, this includes all nodes with old distance <math>k+1</math>, in other words, the nodes with new distance <math>k</math>. | |||
== Correctness == | |||
It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Based on a straightforward induction on the iterations, the third invariant ensures that the nodes leave <math>Q</math> in ascending order of their distances. Thus, it remains to show that ''all'' nodes that are reachable from <math>s</math> have been seen. | |||
Suppose some node <math>v</math> is reachable from <math>s</math> but has not been seen. Let <math>p</math> be an <math>(s,v)</math>-path. There is an arc <math>(x,y)</math> on <math>p</math> such that <math>x</math> is seen and <math>y</math> is not. However, since <math>x</math> is seen, <math>x</math> has been inserted in <math>Q</math> in some iteration and, consequently, removed from <math>Q</math> in a later iteration. However, in that later iteration, <math>y</math> had been seen. | |||
== Complexity == | |||
'''Statement:''' The asymptotic complexity is in <math>\Theta(|V|+|A|)</math> in the worst case. | |||
'''Proof:''' | |||
The complexity of each iteration is linear in the number of arcs leaving the current node <math>v</math>. Therefore, the complexity of the entire loop is in <math>\Theta(|A|)</math>. The complexity of the initialization is dominated by the initialization of the seen labels of all nodes, which is clearly in <math>\Theta(|V|)</math>. | |||
== Remark == | |||
The nodes in <math>Q</math> form kind of a "frontier line", which separates the nodes that already left <math>Q</math> from the nodes that have not yet entered <math>Q</math> so far. | |||
More specifically, each path from some seen node to some unseen node contains at least one node that is currently stored in <math>Q</math>. | |||
To see that, we apply a straightforward induction on the iterations. Let <math>p</math> be a path from some seen node outside <math>Q</math> to some node that is not yet seen immediately after the current iteration. If <math>p</math> does not contain <math>v</math>, the claim follows from the induction hypothesis. So assume <math>p</math> contains <math>v</math>. Let <math>x</math> be the last node on <math>p</math> that is already seen immediately after the current iteration. If <math>x=v</math>, the claim follows from the fact that all unseen nodes <math>w\in V</math> with <math>(v,w)\in A</math> were put in <math>Q</math>, so the immediate successor of <math>v</math> on <math>p</math> is in <math>Q</math> now. Otherwise, the claim follows from the induction hypothesis again. | |||
== Pseudocode == | == Pseudocode == | ||
<code> | <code> | ||
BFS(''G, s'') | BFS(''G,s'') | ||
1 '''for''' each vertex ''u'' ∈ ''G.V'' - {''s''} | 1 '''for''' each vertex ''u'' ∈ ''G.V'' - {''s''} | ||
2 ''u.color'' = WHITE | 2 ''u.color'' = WHITE | ||
Line 12: | Line 93: | ||
8 ''Q'' = Ø | 8 ''Q'' = Ø | ||
9 ENQUE(''Q, s'') | 9 ENQUE(''Q, s'') | ||
10 '''while''' ''Q'' ≠ Ø | 10 '''while''' ''Q'' ≠ Ø | ||
11 ''u'' = DEQUEUE(''Q'') | 11 ''u'' = DEQUEUE(''Q'') | ||
12 '''for''' each ''v'' ∈ ''G.Adj''[''u''] | 12 '''for''' each ''v'' ∈ ''G.Adj''[''u''] |
Latest revision as of 07:50, 26 October 2015
General information
Algorithmic problem: Graph traversal
Type of algorithm: loop
Abstract view
Definition: On this page, the distance of a node [math]\displaystyle{ v\in V }[/math] is the minimal number of arcs on a path from the start node [math]\displaystyle{ s }[/math] to [math]\displaystyle{ v }[/math].
Additional output: An arborescence [math]\displaystyle{ A=(V',A') }[/math] rooted at [math]\displaystyle{ s }[/math] such that [math]\displaystyle{ V'\subseteq V }[/math] is the set of all nodes reachable from [math]\displaystyle{ s }[/math] (including [math]\displaystyle{ s }[/math]). For each node [math]\displaystyle{ v\in V' }[/math], the path from [math]\displaystyle{ s }[/math] to [math]\displaystyle{ v }[/math] in [math]\displaystyle{ A }[/math] is a shortest [math]\displaystyle{ (s,v) }[/math]-path in [math]\displaystyle{ G }[/math] with respect to that notion of distance.
Specific characteristic: The nodes are finished in the order of increasing distance (which is not unique, in general).
Auxiliary data: A FIFO queue [math]\displaystyle{ Q }[/math] whose elements are nodes in [math]\displaystyle{ V }[/math].
Invariant: Before and after each iteration:
- There is a current distance [math]\displaystyle{ k\in\mathbb{N} }[/math].
- All nodes with distance at most [math]\displaystyle{ k }[/math] are already seen.
- Let [math]\displaystyle{ n }[/math] denote the current size of [math]\displaystyle{ Q }[/math]. There is [math]\displaystyle{ \ell\in\{1,\ldots,n\} }[/math] such that the first [math]\displaystyle{ \ell }[/math] elements of [math]\displaystyle{ Q }[/math] have distance [math]\displaystyle{ k }[/math] and the last [math]\displaystyle{ n-\ell }[/math] elements have distance [math]\displaystyle{ k+1 }[/math] (in particular, all elements have distance [math]\displaystyle{ k }[/math] if, and only if, it is [math]\displaystyle{ \ell=n }[/math]).
Variant: One node is removed from [math]\displaystyle{ Q }[/math].
Break condition: [math]\displaystyle{ Q=\emptyset }[/math].
Induction basis
Abstract view: The start node is seen, no other node is seen. The start node is the only element of [math]\displaystyle{ Q }[/math]. The output sequence is empty and the arborescence [math]\displaystyle{ A }[/math] contains the start node [math]\displaystyle{ s }[/math] and nothing else.
Implementation: Obvious.
Proof: Obvious.
Induction step
Abstract view:
- Extract the first element [math]\displaystyle{ v }[/math] from [math]\displaystyle{ Q }[/math].
- Append [math]\displaystyle{ v }[/math] to the output sequence.
- For each outgoing arc [math]\displaystyle{ (v,w) }[/math] of [math]\displaystyle{ v }[/math] such that [math]\displaystyle{ w }[/math] is not yet seen:
- Insert [math]\displaystyle{ w }[/math] and [math]\displaystyle{ (v,w) }[/math] in [math]\displaystyle{ A }[/math].
- Label [math]\displaystyle{ w }[/math] as seen.
- Append [math]\displaystyle{ w }[/math] to [math]\displaystyle{ Q }[/math].
Implementation: Obvious.
Proof: The variant is obviously fulfilled. The first point of the invariant is only a notation, so nothing is to show for that.
Let [math]\displaystyle{ w\in V }[/math] not yet seen before the current iteration, and let [math]\displaystyle{ (v,w)\in A }[/math]. The induction hypothesis (second point of the invariant) implies that the distance of [math]\displaystyle{ w }[/math] is greater than [math]\displaystyle{ k }[/math]. However, [math]\displaystyle{ v }[/math] has distance [math]\displaystyle{ k }[/math], so the distance of [math]\displaystyle{ w }[/math] cannot be greater than [math]\displaystyle{ k+1 }[/math]. Im summary, this proves the third point of the invariant.
The critical iterations for the second invariant are those where [math]\displaystyle{ k }[/math] increases. These are exactly the iterations in which [math]\displaystyle{ \ell=1 }[/math] immediately before (and, consequently, [math]\displaystyle{ \ell=n }[/math] immediately after) that iteration. All nodes with old distance [math]\displaystyle{ k }[/math] have been seen and no such node is in [math]\displaystyle{ Q }[/math] anymore after the current iteration. Therefore, all nodes [math]\displaystyle{ w }[/math] such that [math]\displaystyle{ (v,w)\in A }[/math] for some [math]\displaystyle{ v }[/math] with distance [math]\displaystyle{ k }[/math] have been seen as well. Clearly, this includes all nodes with old distance [math]\displaystyle{ k+1 }[/math], in other words, the nodes with new distance [math]\displaystyle{ k }[/math].
Correctness
It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Based on a straightforward induction on the iterations, the third invariant ensures that the nodes leave [math]\displaystyle{ Q }[/math] in ascending order of their distances. Thus, it remains to show that all nodes that are reachable from [math]\displaystyle{ s }[/math] have been seen.
Suppose some node [math]\displaystyle{ v }[/math] is reachable from [math]\displaystyle{ s }[/math] but has not been seen. Let [math]\displaystyle{ p }[/math] be an [math]\displaystyle{ (s,v) }[/math]-path. There is an arc [math]\displaystyle{ (x,y) }[/math] on [math]\displaystyle{ p }[/math] such that [math]\displaystyle{ x }[/math] is seen and [math]\displaystyle{ y }[/math] is not. However, since [math]\displaystyle{ x }[/math] is seen, [math]\displaystyle{ x }[/math] has been inserted in [math]\displaystyle{ Q }[/math] in some iteration and, consequently, removed from [math]\displaystyle{ Q }[/math] in a later iteration. However, in that later iteration, [math]\displaystyle{ y }[/math] had been seen.
Complexity
Statement: The asymptotic complexity is in [math]\displaystyle{ \Theta(|V|+|A|) }[/math] in the worst case.
Proof: The complexity of each iteration is linear in the number of arcs leaving the current node [math]\displaystyle{ v }[/math]. Therefore, the complexity of the entire loop is in [math]\displaystyle{ \Theta(|A|) }[/math]. The complexity of the initialization is dominated by the initialization of the seen labels of all nodes, which is clearly in [math]\displaystyle{ \Theta(|V|) }[/math].
Remark
The nodes in [math]\displaystyle{ Q }[/math] form kind of a "frontier line", which separates the nodes that already left [math]\displaystyle{ Q }[/math] from the nodes that have not yet entered [math]\displaystyle{ Q }[/math] so far. More specifically, each path from some seen node to some unseen node contains at least one node that is currently stored in [math]\displaystyle{ Q }[/math].
To see that, we apply a straightforward induction on the iterations. Let [math]\displaystyle{ p }[/math] be a path from some seen node outside [math]\displaystyle{ Q }[/math] to some node that is not yet seen immediately after the current iteration. If [math]\displaystyle{ p }[/math] does not contain [math]\displaystyle{ v }[/math], the claim follows from the induction hypothesis. So assume [math]\displaystyle{ p }[/math] contains [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ x }[/math] be the last node on [math]\displaystyle{ p }[/math] that is already seen immediately after the current iteration. If [math]\displaystyle{ x=v }[/math], the claim follows from the fact that all unseen nodes [math]\displaystyle{ w\in V }[/math] with [math]\displaystyle{ (v,w)\in A }[/math] were put in [math]\displaystyle{ Q }[/math], so the immediate successor of [math]\displaystyle{ v }[/math] on [math]\displaystyle{ p }[/math] is in [math]\displaystyle{ Q }[/math] now. Otherwise, the claim follows from the induction hypothesis again.
Pseudocode
BFS(G,s)
1 for each vertex u ∈ G.V - {s}
2 u.color = WHITE
3 u.d = ∞
4 u.π = NIL
5 s.color = GRAY
6 s.d = 0
7 s.π = NIL
8 Q = Ø
9 ENQUE(Q, s)
10 while Q ≠ Ø
11 u = DEQUEUE(Q)
12 for each v ∈ G.Adj[u]
13 if v.color == WHITE
14 v.color = GRAY
15 v.d = u.d + 1
16 v.π = u
17 ENQUEUE(Q, v)
18 u.color = BLACK