Simple string matching algorithm

From Algowiki
Jump to navigation Jump to search

Algorithmic problem: One-dimensional string matching

Prerequisites:

Type of algorithm: loop

Auxiliary data A FIFO queue [math]\displaystyle{ I }[/math] of natural numbers, which always contains the start indexes of all occurrences of [math]\displaystyle{ T }[/math] in [math]\displaystyle{ S }[/math] that have been seen only partially thus far.

Abstract view

Invariant: After [math]\displaystyle{ i\ge 0 }[/math] iterations:

  1. In ascending order, [math]\displaystyle{ R }[/math] contains exactly the start indexes of all occurrences of [math]\displaystyle{ T }[/math] in [math]\displaystyle{ S }[/math] that lie completely in the substring [math]\displaystyle{ (S[1],...,S[i]) }[/math] of [math]\displaystyle{ S }[/math]. In other words, [math]\displaystyle{ R }[/math] contains the start indexes in the range [math]\displaystyle{ S[1],...,S[i-m+1] }[/math], where [math]\displaystyle{ m }[/math] is the length of [math]\displaystyle{ T }[/math].
  2. In ascending order, [math]\displaystyle{ I }[/math] contains exactly the start indexes of all occurrences of [math]\displaystyle{ T }[/math] in [math]\displaystyle{ S }[/math] that lie partially in the substring [math]\displaystyle{ (S[1],...,S[i]) }[/math] of [math]\displaystyle{ S }[/math]. In other words, [math]\displaystyle{ I }[/math] contains the start indexes in the range [math]\displaystyle{ S[i-m+2],...,S[i] }[/math].

Variant: [math]\displaystyle{ i }[/math] increases by [math]\displaystyle{ 1 }[/math].

Break condition: [math]\displaystyle{ i=n }[/math].

Induction basis

Induction step

Complexity

Further information