Depth-first search

From Algowiki
Revision as of 14:59, 17 October 2014 by Weihe (talk | contribs) (→‎Correctness)
Jump to navigation Jump to search


General information

Algorithmic problem: Graph traversal

Type of algorithm: loop

Abstract view

Definitions:

  1. For each node, an arbitrary but fixed ordering of the outgoing arcs is assumed. An arc [math]\displaystyle{ (v,w) }[/math] preceding an arc [math]\displaystyle{ (v,w') }[/math] in this ordering is called lexicographically smaller than [math]\displaystyle{ (v,w) }[/math].
  2. Let [math]\displaystyle{ p }[/math] and [math]\displaystyle{ p' }[/math] be two paths that start from the same node [math]\displaystyle{ v\in V }[/math], but may or may not have the same endnode. Let [math]\displaystyle{ w }[/math] be the last common node such that the subpaths of [math]\displaystyle{ p }[/math] and [math]\displaystyle{ p' }[/math] from [math]\displaystyle{ v }[/math] up to [math]\displaystyle{ w }[/math] are identical (possibly [math]\displaystyle{ v=w }[/math]). If the next arc of [math]\displaystyle{ p }[/math] is lexicographically smaller than the next arc of [math]\displaystyle{ p' }[/math], [math]\displaystyle{ p }[/math] is said to be lexicograpically smaller than [math]\displaystyle{ p' }[/math].
  3. Note that the lexicographically smallest path from [math]\displaystyle{ v\in V }[/math] to [math]\displaystyle{ w\in V }[/math] is well defined and unique. With respect to a starting node [math]\displaystyle{ s\in V }[/math], a node [math]\displaystyle{ v\in V }[/math] is lexicographically smaller than [math]\displaystyle{ w\in V }[/math] if the lexicographically smallest path from [math]\displaystyle{ s }[/math] to [math]\displaystyle{ v }[/math] is lexicographically smaller than the lexicographically smallest path from [math]\displaystyle{ s }[/math] to [math]\displaystyle{ w }[/math].
  4. In all of the above cases, the reverse relation is called lexicographically larger.
  5. A node [math]\displaystyle{ v\in V }[/math] is lexicographically smaller (resp., lexicograpically larger) than a path [math]\displaystyle{ p }[/math] if the lexicographically smallest path from the start node of [math]\displaystyle{ p }[/math] to [math]\displaystyle{ v }[/math] is lexicographically smaller (resp., larger) than [math]\displaystyle{ p }[/math]. (Note the asymmetry: In both cases, the lexicographically smallest path to [math]\displaystyle{ v }[/math] is used.)
  6. In all cases, we also say precedes and succeeds, respectively, instead of "is lexicograpically smaller/larger".

Additional output:

  1. Each node has two Boolean labels with semantics, "is seen" and "is finished".
  2. An arborescence [math]\displaystyle{ A=(V',A') }[/math] rooted at [math]\displaystyle{ s }[/math] such that [math]\displaystyle{ V'\subseteq V }[/math] is the set of all nodes reachable from [math]\displaystyle{ s }[/math] (including [math]\displaystyle{ s }[/math]). For each node [math]\displaystyle{ v\in V' }[/math], the path in [math]\displaystyle{ A }[/math] is the lexicographically smallest [math]\displaystyle{ (s,v) }[/math]-path in [math]\displaystyle{ G }[/math].

Specific characteristic: The nodes may be returned either in lexicographic order or (alternatively or simultaneously) in an order that fulfills the following property: Let [math]\displaystyle{ v,w\in V }[/math] such that [math]\displaystyle{ v }[/math] is seen before [math]\displaystyle{ w }[/math]. If there is a path from [math]\displaystyle{ v }[/math] to [math]\displaystyle{ w }[/math], [math]\displaystyle{ w }[/math] is finished prior to [math]\displaystyle{ v }[/math].

Auxiliary data:

  1. A stack [math]\displaystyle{ S }[/math] whose elements are nodes in [math]\displaystyle{ V }[/math].
  2. Each node has a current arc [math]\displaystyle{ a_v\in V }[/math], which is either void or an outgoing arc [math]\displaystyle{ a_v=(v,w) }[/math] of [math]\displaystyle{ v }[/math].

Invariant: Before and after each iteration:

  1. [math]\displaystyle{ S }[/math] forms a path [math]\displaystyle{ p }[/math] from the start node [math]\displaystyle{ s }[/math] to some other node, that is, the order of the nodes on [math]\displaystyle{ p }[/math] is the order in which they appear in [math]\displaystyle{ S }[/math] (start node [math]\displaystyle{ s }[/math] at the bottom of [math]\displaystyle{ S }[/math]).
  2. For each node not yet seen, the current arc is the first arc (or void if the node has no outgoing arcs).
  3. For each node [math]\displaystyle{ v }[/math] on [math]\displaystyle{ p }[/math]:
    1. If there are arcs [math]\displaystyle{ (v,w)\in A }[/math] such that [math]\displaystyle{ w }[/math] is not yet seen, the current arc equals or precedes the first such arc.
    2. The subpath of [math]\displaystyle{ p }[/math] from the start node [math]\displaystyle{ s }[/math] to [math]\displaystyle{ v }[/math] is the lexicographically first [math]\displaystyle{ (s,v) }[/math]-path.
  4. The nodes on [math]\displaystyle{ p }[/math] are seen but not finished. Let [math]\displaystyle{ p+a }[/math] denote the concatenation of [math]\displaystyle{ p }[/math] with the current arc [math]\displaystyle{ a }[/math] of the last node of [math]\displaystyle{ p }[/math]. The nodes that are lexicographically smaller than [math]\displaystyle{ p+a }[/math] are seen and finished, and the nodes that lexicographically succeed [math]\displaystyle{ p+a }[/math] are neither seen nor finished. (Note that nothing is said about the head of [math]\displaystyle{ a }[/math]).

Variant: Either one node is finished or the current arc of one node is moved forward.

Break condition: [math]\displaystyle{ S=\emptyset }[/math].

Induction basis

Abstract view: No node is finished. The start node [math]\displaystyle{ s }[/math] is seen, no other node is seen. The start node is the only element of [math]\displaystyle{ S }[/math]. The current arc of each node is its first outgoing arc. If the nodes are to be returned in lexicographic order, the start node [math]\displaystyle{ }[/math] is, initially, the only member of the output sequence; otherwise, the initial output sequence is empty. Arborescence [math]\displaystyle{ A }[/math] is initialized so as to contain [math]\displaystyle{ s }[/math] and nothing else.

Implementation: Obvious.

Proof: Obvious.

Induction step

Abstract view:

  1. Let [math]\displaystyle{ v }[/math] be the last node of [math]\displaystyle{ p }[/math] (=the top element of [math]\displaystyle{ S }[/math]).
  2. While the current arc of [math]\displaystyle{ v }[/math] is not void and while the head of the current arc is labeled as seen: Move the current arc one step forward.
  3. If the current arc of [math]\displaystyle{ v }[/math] is not void, say, [math]\displaystyle{ a_v=(v,w) }[/math]:
    1. Insert [math]\displaystyle{ w }[/math] and [math]\displaystyle{ (v,w) }[/math] in [math]\displaystyle{ A }[/math].
    2. Push [math]\displaystyle{ w }[/math] on [math]\displaystyle{ S }[/math].
    3. Label [math]\displaystyle{ w }[/math] as seen.
    4. If the output order is the lexicographic one: Append [math]\displaystyle{ v }[/math] to the output sequence.
  4. Otherwise:
    1. Remove [math]\displaystyle{ v }[/math] from [math]\displaystyle{ S }[/math]
    2. Label [math]\displaystyle{ v }[/math] as finished.
    3. If the output order is not the lexicographic one: Put [math]\displaystyle{ v }[/math] in the output sequence.

Implementation: Obvious.

Proof: The loop variant is obviously fulfilled.

The first point of the invariant is obviously fulfilled, too. The second point follows from the fact that the current arc of a node is initialized to be the node's very first outgoing arc and only changed after the node is labeled as seen. Point 3.1 follows from the fact that the current arc never skips an arc that points to an unseen node.

For point 3.2, note that any lexicographically smaller path [math]\displaystyle{ p' }[/math] from [math]\displaystyle{ s }[/math] to [math]\displaystyle{ v' }[/math] has at least one node that is lexicographically smaller than [math]\displaystyle{ p }[/math]. By induction hypothesis (point 4), these nodes are finished. Let [math]\displaystyle{ w }[/math] denote the last finished node on [math]\displaystyle{ p' }[/math]. Since [math]\displaystyle{ v }[/math] is not yet finished immediately before the iteration, it is [math]\displaystyle{ v\neq w }[/math], so [math]\displaystyle{ w }[/math] has a successor on [math]\displaystyle{ p' }[/math]; this successor is unfinished due to the specific choice of [math]\displaystyle{ w }[/math]. However, obviously, a node is marked as finished not before all of its immediate successors are finished. This proves point 3.2.

When a node is pushed on [math]\displaystyle{ S }[/math], it is neither seen nor finished immediately before that iteration and then labeled as seen in that iteration. The node is finished when it leaves [math]\displaystyle{ S }[/math]. Both facts together give the first sentence of point 4. The other statements of point 4 follow from the observation that the concatenation of [math]\displaystyle{ p }[/math] with the current arc of the endnode of [math]\displaystyle{ p }[/math] increases lexicographically in each iteration.

Correctness

It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Immediately before the last iteration, [math]\displaystyle{ p }[/math] consists of the start node [math]\displaystyle{ s }[/math] only, and the current arc of [math]\displaystyle{ s }[/math] is void. Therefore, all nodes reachable from [math]\displaystyle{ s }[/math] except for [math]\displaystyle{ s }[/math] itself are lexicographically smaller than [math]\displaystyle{ p }[/math] at that moment. Due to point 4 of the invariant, all of these nodes are finished. In the last iteration, [math]\displaystyle{ s }[/math] is finished as well.

Finally, we have to show that the specific characteristic is fulfilled in both cases. Due to point 3.2 of the invariant, a node is seen for the first time via its lexicographically smallest path from [math]\displaystyle{ s }[/math]. Since the current path increases lexicographically in each iteration, the nodes are labeled as seen in lexicographic order. So consider the second case. Let [math]\displaystyle{ v }[/math] be seen before [math]\displaystyle{ w }[/math] and assume there is a path from [math]\displaystyle{ v }[/math] to [math]\displaystyle{ w }[/math]. We have to show that [math]\displaystyle{ w }[/math] is finished prior to [math]\displaystyle{ v }[/math]. Consider the situation immediately after the iteration in which [math]\displaystyle{ v }[/math] is finished. If [math]\displaystyle{ w }[/math] were not finished at this moment, there would be two successive nodes on the assumed path, [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math], such that [math]\displaystyle{ x }[/math] is finished and [math]\displaystyle{ y }[/math] is not finished at that moment. However, obviously, [math]\displaystyle{ x }[/math] cannot be finished prior to [math]\displaystyle{ y }[/math].

Complexity

Statement: The asymptotic complexity is in [math]\displaystyle{ \Theta(|V|+|A|) }[/math] in the worst case.

Proof: For every node reachable from [math]\displaystyle{ s }[/math] (including [math]\displaystyle{ s }[/math]), the algorithm processes each of its outgoing arcs exactly one. And from each of these nodes, the algorithm goes backwards exactly once. Obviously, each of these steps requires a constant number of operations.

Remark

Alternatively, DFS could be implemented as a recursive procedure. However, this excludes the option to implement DFS as an iterator, which means to turn the loop inside-out (cf. remarks on graph traversal).

Pseudocode recursive implementation

DFS(G)

for each vertex uV [G]
do color[u] ← WHITE
π[u] ← NIL
time ← 0
do if color[u] == WHITE
then DFS-VISiT(u)


DFS-VISIT(u)

color[u] ← GRAY
timetime + 1
d[u] ← time
for each vAdj[u]
do if color[v] = WHITE
then π [v] ← u
DFS-VISIT(v)
color[u] ← BLACK
f[u] ← timetime + 1