String matching based on finite automaton: Difference between revisions
No edit summary |
No edit summary |
||
Line 4: | Line 4: | ||
'''Algorithmic problem:''' [[One-dimensional string matching]] | '''Algorithmic problem:''' [[One-dimensional string matching]] | ||
'''Type of algorithm:''' | '''Prerequisites:''' | ||
'''Type of algorithm:''' loop | |||
'''Auxiliary data:''' | '''Auxiliary data:''' | ||
# A current '''maximum prefix length''' <math>q\in \{ 0,...,m\}</math>. | |||
# A look-up table <math>\Delta</math> with rows <math>\{ 0,...,m\}</math> and one column for each character in the alphabet <math>\Sigma</math>. Intended semantics: The value <math>\Delta [j,c]</math> is the length of the longest [[Strings|prefix]] of <math>T</math> that is also a [[Strings|suffix]] of <math>(T[1],...,T[j],c)</math>. | |||
==Abstract view== | ==Abstract view== | ||
'''Invariant:''' After <math>i \ge 0</math>: | |||
# In ascending order, <math>R</math> contains the start indexes of all occurrences of <math>T</math> in <math>S</math> that lie completely in <math>(S[1],...,S[i])</math>; in other words, the start indexes in the range <math>(1,...,i-m+1])</math>. | |||
# The value of <math>q</math> is the largest value <math>k</math> such that <math>(S[i-k+1],...,S[i])=(T[1],...,T[k])</math>. | |||
To understand the rationale of the second invariant, note that <math>(S[i-k+1],...,S[i])=(T[1],...,T[k])</math> is equivalent to the informal statement that <math>(S[i-k+1],...,S[i])</math> is a candidate for an occurrence of <math>T</math>. In particular, <math>i-q+1</math> is always the "most advanced" candidate (and <math>q=0</math> means that, at the moment, there is no candidate at all). | |||
'''Variant:''' <math>i</math> increases by <math>1</math>. | |||
'''Break condition:''' <math>i=n</math>. | |||
==Induction basis== | ==Induction basis== |
Revision as of 13:23, 1 October 2014
Algorithmic problem: One-dimensional string matching
Prerequisites:
Type of algorithm: loop
Auxiliary data:
- A current maximum prefix length [math]\displaystyle{ q\in \{ 0,...,m\} }[/math].
- A look-up table [math]\displaystyle{ \Delta }[/math] with rows [math]\displaystyle{ \{ 0,...,m\} }[/math] and one column for each character in the alphabet [math]\displaystyle{ \Sigma }[/math]. Intended semantics: The value [math]\displaystyle{ \Delta [j,c] }[/math] is the length of the longest prefix of [math]\displaystyle{ T }[/math] that is also a suffix of [math]\displaystyle{ (T[1],...,T[j],c) }[/math].
Abstract view
Invariant: After [math]\displaystyle{ i \ge 0 }[/math]:
- In ascending order, [math]\displaystyle{ R }[/math] contains the start indexes of all occurrences of [math]\displaystyle{ T }[/math] in [math]\displaystyle{ S }[/math] that lie completely in [math]\displaystyle{ (S[1],...,S[i]) }[/math]; in other words, the start indexes in the range [math]\displaystyle{ (1,...,i-m+1]) }[/math].
- The value of [math]\displaystyle{ q }[/math] is the largest value [math]\displaystyle{ k }[/math] such that [math]\displaystyle{ (S[i-k+1],...,S[i])=(T[1],...,T[k]) }[/math].
To understand the rationale of the second invariant, note that [math]\displaystyle{ (S[i-k+1],...,S[i])=(T[1],...,T[k]) }[/math] is equivalent to the informal statement that [math]\displaystyle{ (S[i-k+1],...,S[i]) }[/math] is a candidate for an occurrence of [math]\displaystyle{ T }[/math]. In particular, [math]\displaystyle{ i-q+1 }[/math] is always the "most advanced" candidate (and [math]\displaystyle{ q=0 }[/math] means that, at the moment, there is no candidate at all).
Variant: [math]\displaystyle{ i }[/math] increases by [math]\displaystyle{ 1 }[/math].
Break condition: [math]\displaystyle{ i=n }[/math].