Bucketsort: Difference between revisions
Jump to navigation
Jump to search
Line 21: | Line 21: | ||
== Abstract view == | == Abstract view == | ||
'''Invariant:''' After <math>i \geq 0</math> iterations: | |||
# For <math>j \in \{1,\dots,N-i\}</math>, <math>A[j]</math> contains all input strings of length <math>j</math>. | |||
# <math>S'</math> contains all other input strings, that is, the ones with length at least <math>N - i + 1</math>. The sequence <math>S'</math> is sorted according to the following definition of comparison: <math>str1 < str2</math> (resp. <math>str1 \leq str2</math>) means that the substring of <math>str1</math> starting at position <math>N - i + 1</math> is lexicographically smaller (resp., smaller or equal) than the substring of <math>str2</math> starting at position <math>N - i + 1</math>. | |||
# All buckets are empty. | |||
'''Variant:''' <math>i</math> increases by <math>1</math> | |||
'''Break condition: <math>i = N</math>. | |||
== Induction basis == | == Induction basis == |
Revision as of 21:44, 2 October 2014
General information
Algorithmic problem: Sorting sequences of strings
Type of algorithm: loop
Axiliary data:
- An ordered sequence [math]\displaystyle{ S' }[/math] of strings, which will eventually hold the overall result of the algorithm.
- The buckets, that is, an array [math]\displaystyle{ B }[/math] whose index range contains the ID range of [math]\displaystyle{ \Sigma }[/math] (e.g. 26 for the alphabet) and whose components are ordered sequences of strings.
- Let [math]\displaystyle{ N }[/math] denote the maximum length of an input string.
- An array [math]\displaystyle{ A }[/math] with index range [math]\displaystyle{ [1,\dots,N] }[/math] and multisets of strings as components.
Abstract view
Invariant: After [math]\displaystyle{ i \geq 0 }[/math] iterations:
- For [math]\displaystyle{ j \in \{1,\dots,N-i\} }[/math], [math]\displaystyle{ A[j] }[/math] contains all input strings of length [math]\displaystyle{ j }[/math].
- [math]\displaystyle{ S' }[/math] contains all other input strings, that is, the ones with length at least [math]\displaystyle{ N - i + 1 }[/math]. The sequence [math]\displaystyle{ S' }[/math] is sorted according to the following definition of comparison: [math]\displaystyle{ str1 \lt str2 }[/math] (resp. [math]\displaystyle{ str1 \leq str2 }[/math]) means that the substring of [math]\displaystyle{ str1 }[/math] starting at position [math]\displaystyle{ N - i + 1 }[/math] is lexicographically smaller (resp., smaller or equal) than the substring of [math]\displaystyle{ str2 }[/math] starting at position [math]\displaystyle{ N - i + 1 }[/math].
- All buckets are empty.
Variant: [math]\displaystyle{ i }[/math] increases by [math]\displaystyle{ 1 }[/math]
Break condition: [math]\displaystyle{ i = N }[/math].
Induction basis
Induction step
Complexity
Statement: Let [math]\displaystyle{ M }[/math] denote the total sum of all input string lengths. Then the asymptotic complexity is in [math]\displaystyle{ \Theta(M) }[/math] in the best and the worst case.
Proof: Obviously, the preprocessing takes <mathO(M)</math> time. In the main loop, each character of each string is read exactly once. Obviously, no operation is applied more often than the reading of single characters.