Breadth-first search

From Algowiki
Revision as of 07:50, 26 October 2015 by Weihe (talk | contribs) (→‎Abstract view)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


General information

Algorithmic problem: Graph traversal

Type of algorithm: loop

Abstract view

Definition: On this page, the distance of a node [math]\displaystyle{ v\in V }[/math] is the minimal number of arcs on a path from the start node [math]\displaystyle{ s }[/math] to [math]\displaystyle{ v }[/math].

Additional output: An arborescence [math]\displaystyle{ A=(V',A') }[/math] rooted at [math]\displaystyle{ s }[/math] such that [math]\displaystyle{ V'\subseteq V }[/math] is the set of all nodes reachable from [math]\displaystyle{ s }[/math] (including [math]\displaystyle{ s }[/math]). For each node [math]\displaystyle{ v\in V' }[/math], the path from [math]\displaystyle{ s }[/math] to [math]\displaystyle{ v }[/math] in [math]\displaystyle{ A }[/math] is a shortest [math]\displaystyle{ (s,v) }[/math]-path in [math]\displaystyle{ G }[/math] with respect to that notion of distance.

Specific characteristic: The nodes are finished in the order of increasing distance (which is not unique, in general).

Auxiliary data: A FIFO queue [math]\displaystyle{ Q }[/math] whose elements are nodes in [math]\displaystyle{ V }[/math].

Invariant: Before and after each iteration:

  1. There is a current distance [math]\displaystyle{ k\in\mathbb{N} }[/math].
  2. All nodes with distance at most [math]\displaystyle{ k }[/math] are already seen.
  3. Let [math]\displaystyle{ n }[/math] denote the current size of [math]\displaystyle{ Q }[/math]. There is [math]\displaystyle{ \ell\in\{1,\ldots,n\} }[/math] such that the first [math]\displaystyle{ \ell }[/math] elements of [math]\displaystyle{ Q }[/math] have distance [math]\displaystyle{ k }[/math] and the last [math]\displaystyle{ n-\ell }[/math] elements have distance [math]\displaystyle{ k+1 }[/math] (in particular, all elements have distance [math]\displaystyle{ k }[/math] if, and only if, it is [math]\displaystyle{ \ell=n }[/math]).

Variant: One node is removed from [math]\displaystyle{ Q }[/math].

Break condition: [math]\displaystyle{ Q=\emptyset }[/math].

Induction basis

Abstract view: The start node is seen, no other node is seen. The start node is the only element of [math]\displaystyle{ Q }[/math]. The output sequence is empty and the arborescence [math]\displaystyle{ A }[/math] contains the start node [math]\displaystyle{ s }[/math] and nothing else.

Implementation: Obvious.

Proof: Obvious.

Induction step

Abstract view:

  1. Extract the first element [math]\displaystyle{ v }[/math] from [math]\displaystyle{ Q }[/math].
  2. Append [math]\displaystyle{ v }[/math] to the output sequence.
  3. For each outgoing arc [math]\displaystyle{ (v,w) }[/math] of [math]\displaystyle{ v }[/math] such that [math]\displaystyle{ w }[/math] is not yet seen:
    1. Insert [math]\displaystyle{ w }[/math] and [math]\displaystyle{ (v,w) }[/math] in [math]\displaystyle{ A }[/math].
    2. Label [math]\displaystyle{ w }[/math] as seen.
    3. Append [math]\displaystyle{ w }[/math] to [math]\displaystyle{ Q }[/math].

Implementation: Obvious.

Proof: The variant is obviously fulfilled. The first point of the invariant is only a notation, so nothing is to show for that.

Let [math]\displaystyle{ w\in V }[/math] not yet seen before the current iteration, and let [math]\displaystyle{ (v,w)\in A }[/math]. The induction hypothesis (second point of the invariant) implies that the distance of [math]\displaystyle{ w }[/math] is greater than [math]\displaystyle{ k }[/math]. However, [math]\displaystyle{ v }[/math] has distance [math]\displaystyle{ k }[/math], so the distance of [math]\displaystyle{ w }[/math] cannot be greater than [math]\displaystyle{ k+1 }[/math]. Im summary, this proves the third point of the invariant.

The critical iterations for the second invariant are those where [math]\displaystyle{ k }[/math] increases. These are exactly the iterations in which [math]\displaystyle{ \ell=1 }[/math] immediately before (and, consequently, [math]\displaystyle{ \ell=n }[/math] immediately after) that iteration. All nodes with old distance [math]\displaystyle{ k }[/math] have been seen and no such node is in [math]\displaystyle{ Q }[/math] anymore after the current iteration. Therefore, all nodes [math]\displaystyle{ w }[/math] such that [math]\displaystyle{ (v,w)\in A }[/math] for some [math]\displaystyle{ v }[/math] with distance [math]\displaystyle{ k }[/math] have been seen as well. Clearly, this includes all nodes with old distance [math]\displaystyle{ k+1 }[/math], in other words, the nodes with new distance [math]\displaystyle{ k }[/math].

Correctness

It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Based on a straightforward induction on the iterations, the third invariant ensures that the nodes leave [math]\displaystyle{ Q }[/math] in ascending order of their distances. Thus, it remains to show that all nodes that are reachable from [math]\displaystyle{ s }[/math] have been seen.

Suppose some node [math]\displaystyle{ v }[/math] is reachable from [math]\displaystyle{ s }[/math] but has not been seen. Let [math]\displaystyle{ p }[/math] be an [math]\displaystyle{ (s,v) }[/math]-path. There is an arc [math]\displaystyle{ (x,y) }[/math] on [math]\displaystyle{ p }[/math] such that [math]\displaystyle{ x }[/math] is seen and [math]\displaystyle{ y }[/math] is not. However, since [math]\displaystyle{ x }[/math] is seen, [math]\displaystyle{ x }[/math] has been inserted in [math]\displaystyle{ Q }[/math] in some iteration and, consequently, removed from [math]\displaystyle{ Q }[/math] in a later iteration. However, in that later iteration, [math]\displaystyle{ y }[/math] had been seen.

Complexity

Statement: The asymptotic complexity is in [math]\displaystyle{ \Theta(|V|+|A|) }[/math] in the worst case.

Proof: The complexity of each iteration is linear in the number of arcs leaving the current node [math]\displaystyle{ v }[/math]. Therefore, the complexity of the entire loop is in [math]\displaystyle{ \Theta(|A|) }[/math]. The complexity of the initialization is dominated by the initialization of the seen labels of all nodes, which is clearly in [math]\displaystyle{ \Theta(|V|) }[/math].

Remark

The nodes in [math]\displaystyle{ Q }[/math] form kind of a "frontier line", which separates the nodes that already left [math]\displaystyle{ Q }[/math] from the nodes that have not yet entered [math]\displaystyle{ Q }[/math] so far. More specifically, each path from some seen node to some unseen node contains at least one node that is currently stored in [math]\displaystyle{ Q }[/math].

To see that, we apply a straightforward induction on the iterations. Let [math]\displaystyle{ p }[/math] be a path from some seen node outside [math]\displaystyle{ Q }[/math] to some node that is not yet seen immediately after the current iteration. If [math]\displaystyle{ p }[/math] does not contain [math]\displaystyle{ v }[/math], the claim follows from the induction hypothesis. So assume [math]\displaystyle{ p }[/math] contains [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ x }[/math] be the last node on [math]\displaystyle{ p }[/math] that is already seen immediately after the current iteration. If [math]\displaystyle{ x=v }[/math], the claim follows from the fact that all unseen nodes [math]\displaystyle{ w\in V }[/math] with [math]\displaystyle{ (v,w)\in A }[/math] were put in [math]\displaystyle{ Q }[/math], so the immediate successor of [math]\displaystyle{ v }[/math] on [math]\displaystyle{ p }[/math] is in [math]\displaystyle{ Q }[/math] now. Otherwise, the claim follows from the induction hypothesis again.

Pseudocode

BFS(G,s)
 1  for each vertex uG.V - {s}
 2       u.color = WHITE
 3       u.d = ∞
 4       u.π = NIL
 5  s.color = GRAY
 6  s.d = 0
 7  s.π = NIL
 8  Q = Ø
 9  ENQUE(Q, s)
10  while Q ≠ Ø
11      u = DEQUEUE(Q)
12      for each vG.Adj[u]
13              if v.color == WHITE
14                       v.color = GRAY
15                       v.d = u.d + 1
16                       v.π = u
17                       ENQUEUE(Q, v)
18      u.color = BLACK