# Breadth-first search

## Contents

## General information

**Algorithmic problem:** Graph traversal

**Type of algorithm:** loop

## Abstract view

**Definition:**
On this page, the **distance** of a node is the minimal number of arcs on a path from the start node to .

**Additional output:** An arborescence rooted at such that is the set of all nodes reachable from (including ). For each node , the path from to in is a shortest -path in with respect to that notion of distance.

**Specific characteristic:**
The nodes are finished in the order of increasing distance (which is not unique, in general).

**Auxiliary data:**
A FIFO queue whose elements are nodes in .

**Invariant:**
Before and after each iteration:

- There is a
**current distance**. - All nodes with distance at most are already seen.
- Let denote the current size of . There is such that the first elements of have distance and the last elements have distance (in particular,
*all*elements have distance if, and only if, it is ).

**Variant:** One node is removed from .

**Break condition:** .

## Induction basis

**Abstract view:** The start node is seen, no other node is seen. The start node is the only element of . The output sequence is empty and the arborescence contains the start node and nothing else.

**Implementation:** Obvious.

**Proof:** Obvious.

## Induction step

**Abstract view:**

- Extract the first element from .
- Append to the output sequence.
- For each outgoing arc of such that is not yet seen:
- Insert and in .
- Label as seen.
- Append to .

**Implementation:** Obvious.

**Proof:**
The variant is obviously fulfilled. The first point of the invariant is only a notation, so nothing is to show for that.

Let not yet seen before the current iteration, and let . The induction hypothesis (second point of the invariant) implies that the distance of is greater than . However, has distance , so the distance of cannot be greater than . Im summary, this proves the third point of the invariant.

The critical iterations for the second invariant are those where increases. These are exactly the iterations in which immediately before (and, consequently, immediately after) that iteration. All nodes with old distance have been seen and no such node is in anymore after the current iteration. Therefore, all nodes such that for some with distance have been seen as well. Clearly, this includes all nodes with old distance , in other words, the nodes with new distance .

## Correctness

It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Based on a straightforward induction on the iterations, the third invariant ensures that the nodes leave in ascending order of their distances. Thus, it remains to show that *all* nodes that are reachable from have been seen.

Suppose some node is reachable from but has not been seen. Let be an -path. There is an arc on such that is seen and is not. However, since is seen, has been inserted in in some iteration and, consequently, removed from in a later iteration. However, in that later iteration, had been seen.

## Complexity

**Statement:** The asymptotic complexity is in in the worst case.

**Proof:**
The complexity of each iteration is linear in the number of arcs leaving the current node . Therefore, the complexity of the entire loop is in . The complexity of the initialization is dominated by the initialization of the seen labels of all nodes, which is clearly in .

## Remark

The nodes in form kind of a "frontier line", which separates the nodes that already left from the nodes that have not yet entered so far. More specifically, each path from some seen node to some unseen node contains at least one node that is currently stored in .

To see that, we apply a straightforward induction on the iterations. Let be a path from some seen node outside to some node that is not yet seen immediately after the current iteration. If does not contain , the claim follows from the induction hypothesis. So assume contains . Let be the last node on that is already seen immediately after the current iteration. If , the claim follows from the fact that all unseen nodes with were put in , so the immediate successor of on is in now. Otherwise, the claim follows from the induction hypothesis again.

## Pseudocode

```
```

```
```BFS(*G,s*)
1 **for** each vertex *u* ∈ *G.V* - {*s*}
2 *u.color* = WHITE
3 *u.d* = ∞
4 *u.*π = NIL
5 *s.color* = GRAY
6 *s.d* = 0
7 *s.π* = NIL
8 *Q* = Ø
9 ENQUE(*Q, s*)
10 **while** *Q* ≠ Ø
11 *u* = DEQUEUE(*Q*)
12 **for** each *v* ∈ *G.Adj*[*u*]
13 **if** *v.color* == WHITE
14 *v.color* = GRAY
15 *v.d* = *u.d* + 1
16 *v*.π = *u*
17 ENQUEUE(*Q, v*)
18 *u.color* = BLACK

```
```