## General information

Algorithmic problem: Graph traversal

Type of algorithm: loop

## Abstract view

Definition: On this page, the distance of a node $v\in V$ is the minimal number of arcs on a path from the start node $s$ to $v$.

Additional output: An arborescence $A=(V',A')$ rooted at $s$ such that $V'\subseteq V$ is the set of all nodes reachable from $s$ (including $s$). For each node $v\in V'$, the path from $s$ to $v$ in $A$ is a shortest $(s,v)$-path in $G$ with respect to that notion of distance.

Specific characteristic: The nodes are finished in the order of increasing distance (which is not unique, in general).

Auxiliary data: A FIFO queue $Q$ whose elements are nodes in $V$.

Invariant: Before and after each iteration:

1. There is a current distance $k\in\mathbb{N}$.
2. All nodes with distance at most $k$ are already seen.
3. Let $n$ denote the current size of $Q$. There is $\ell\in\{1,\ldots,n\}$ such that the first $\ell$ elements of $Q$ have distance $k$ and the last $n-\ell$ elements have distance $k+1$ (in particular, all elements have distance $k$ if, and only if, it is $\ell=n$).

Variant: One node is removed from $Q$.

Break condition: $Q=\emptyset$.

## Induction basis

Abstract view: The start node is seen, no other node is seen. The start node is the only element of $Q$. The output sequence is empty and the arborescence $A$ contains the start node $s$ and nothing else.

Implementation: Obvious.

Proof: Obvious.

## Induction step

Abstract view:

1. Extract the first element $v$ from $Q$.
2. Append $v$ to the output sequence.
3. For each outgoing arc $(v,w)$ of $v$ such that $w$ is not yet seen:
1. Insert $w$ and $(v,w)$ in $A$.
2. Label $w$ as seen.
3. Append $w$ to $Q$.

Implementation: Obvious.

Proof: The variant is obviously fulfilled. The first point of the invariant is only a notation, so nothing is to show for that.

Let $w\in V$ not yet seen before the current iteration, and let $(v,w)\in A$. The induction hypothesis (second point of the invariant) implies that the distance of $w$ is greater than $k$. However, $v$ has distance $k$, so the distance of $w$ cannot be greater than $k+1$. Im summary, this proves the third point of the invariant.

The critical iterations for the second invariant are those where $k$ increases. These are exactly the iterations in which $\ell=1$ immediately before (and, consequently, $\ell=n$ immediately after) that iteration. All nodes with old distance $k$ have been seen and no such node is in $Q$ anymore after the current iteration. Therefore, all nodes $w$ such that $(v,w)\in A$ for some $v$ with distance $k$ have been seen as well. Clearly, this includes all nodes with old distance $k+1$, in other words, the nodes with new distance $k$.

## Correctness

It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Based on a straightforward induction on the iterations, the third invariant ensures that the nodes leave $Q$ in ascending order of their distances. Thus, it remains to show that all nodes that are reachable from $s$ have been seen.

Suppose some node $v$ is reachable from $s$ but has not been seen. Let $p$ be an $(s,v)$-path. There is an arc $(x,y)$ on $p$ such that $x$ is seen and $y$ is not. However, since $x$ is seen, $x$ has been inserted in $Q$ in some iteration and, consequently, removed from $Q$ in a later iteration. However, in that later iteration, $y$ had been seen.

## Complexity

Statement: The asymptotic complexity is in $\Theta(|V|+|A|)$ in the worst case.

Proof: The complexity of each iteration is linear in the number of arcs leaving the current node $v$. Therefore, the complexity of the entire loop is in $\Theta(|A|)$. The complexity of the initialization is dominated by the initialization of the seen labels of all nodes, which is clearly in $\Theta(|V|)$.

## Remark

The nodes in $Q$ form kind of a "frontier line", which separates the nodes that already left $Q$ from the nodes that have not yet entered $Q$ so far. More specifically, each path from some seen node to some unseen node contains at least one node that is currently stored in $Q$.

To see that, we apply a straightforward induction on the iterations. Let $p$ be a path from some seen node outside $Q$ to some node that is not yet seen immediately after the current iteration. If $p$ does not contain $v$, the claim follows from the induction hypothesis. So assume $p$ contains $v$. Let $x$ be the last node on $p$ that is already seen immediately after the current iteration. If $x=v$, the claim follows from the fact that all unseen nodes $w\in V$ with $(v,w)\in A$ were put in $Q$, so the immediate successor of $v$ on $p$ is in $Q$ now. Otherwise, the claim follows from the induction hypothesis again.

## Pseudocode

 

 BFS(G,s) 1 for each vertex u ∈ G.V - {s} 2 u.color = WHITE 3 u.d = ∞ 4 u.π = NIL 5 s.color = GRAY 6 s.d = 0 7 s.π = NIL 8 Q = Ø 9 ENQUE(Q, s) 10 while Q ≠ Ø 11 u = DEQUEUE(Q) 12 for each v ∈ G.Adj[u] 13 if v.color == WHITE 14 v.color = GRAY 15 v.d = u.d + 1 16 v.π = u 17 ENQUEUE(Q, v) 18 u.color = BLACK