String matching based on finite automaton: Difference between revisions

From Algowiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 4: Line 4:
'''Algorithmic problem:''' [[One-dimensional string matching]]
'''Algorithmic problem:''' [[One-dimensional string matching]]


'''Type of algorithm:'''
'''Prerequisites:'''
 
'''Type of algorithm:''' loop


'''Auxiliary data:'''
'''Auxiliary data:'''
# A current '''maximum prefix length''' <math>q\in \{ 0,...,m\}</math>.
# A look-up table <math>\Delta</math> with rows <math>\{ 0,...,m\}</math> and one column for each character in the alphabet <math>\Sigma</math>. Intended semantics: The value <math>\Delta [j,c]</math> is the length of the longest [[Strings|prefix]] of <math>T</math> that is also a [[Strings|suffix]] of <math>(T[1],...,T[j],c)</math>.


==Abstract view==
==Abstract view==
'''Invariant:''' After <math>i \ge 0</math>:
# In ascending order, <math>R</math> contains the start indexes of all occurrences of <math>T</math> in <math>S</math> that lie completely in <math>(S[1],...,S[i])</math>; in other words, the start indexes in the range <math>(1,...,i-m+1])</math>.
# The value of <math>q</math> is the largest value <math>k</math> such that <math>(S[i-k+1],...,S[i])=(T[1],...,T[k])</math>.
To understand the rationale of the second invariant, note that <math>(S[i-k+1],...,S[i])=(T[1],...,T[k])</math> is equivalent to the informal statement that <math>(S[i-k+1],...,S[i])</math> is a candidate for an occurrence of <math>T</math>. In particular, <math>i-q+1</math> is always the "most advanced" candidate (and <math>q=0</math> means that, at the moment, there is no candidate at all).
'''Variant:''' <math>i</math> increases by <math>1</math>.
'''Break condition:''' <math>i=n</math>.


==Induction basis==
==Induction basis==

Revision as of 13:23, 1 October 2014

Algorithmic problem: One-dimensional string matching

Prerequisites:

Type of algorithm: loop

Auxiliary data:

  1. A current maximum prefix length [math]\displaystyle{ q\in \{ 0,...,m\} }[/math].
  2. A look-up table [math]\displaystyle{ \Delta }[/math] with rows [math]\displaystyle{ \{ 0,...,m\} }[/math] and one column for each character in the alphabet [math]\displaystyle{ \Sigma }[/math]. Intended semantics: The value [math]\displaystyle{ \Delta [j,c] }[/math] is the length of the longest prefix of [math]\displaystyle{ T }[/math] that is also a suffix of [math]\displaystyle{ (T[1],...,T[j],c) }[/math].

Abstract view

Invariant: After [math]\displaystyle{ i \ge 0 }[/math]:

  1. In ascending order, [math]\displaystyle{ R }[/math] contains the start indexes of all occurrences of [math]\displaystyle{ T }[/math] in [math]\displaystyle{ S }[/math] that lie completely in [math]\displaystyle{ (S[1],...,S[i]) }[/math]; in other words, the start indexes in the range [math]\displaystyle{ (1,...,i-m+1]) }[/math].
  2. The value of [math]\displaystyle{ q }[/math] is the largest value [math]\displaystyle{ k }[/math] such that [math]\displaystyle{ (S[i-k+1],...,S[i])=(T[1],...,T[k]) }[/math].

To understand the rationale of the second invariant, note that [math]\displaystyle{ (S[i-k+1],...,S[i])=(T[1],...,T[k]) }[/math] is equivalent to the informal statement that [math]\displaystyle{ (S[i-k+1],...,S[i]) }[/math] is a candidate for an occurrence of [math]\displaystyle{ T }[/math]. In particular, [math]\displaystyle{ i-q+1 }[/math] is always the "most advanced" candidate (and [math]\displaystyle{ q=0 }[/math] means that, at the moment, there is no candidate at all).

Variant: [math]\displaystyle{ i }[/math] increases by [math]\displaystyle{ 1 }[/math].

Break condition: [math]\displaystyle{ i=n }[/math].

Induction basis

Induction step

Complexity

Further information