Bucketsort

Bucket Sort

whatever

Openlearnware

General information

Algorithmic problem: Sorting sequences of strings

Type of algorithm: loop

Axiliary data:

An ordered sequence [math]\displaystyle{ S' }[/math] of strings, which will eventually hold the overall result of the algorithm.
The buckets, that is, an array [math]\displaystyle{ B }[/math] whose index range contains the ID range of [math]\displaystyle{ \Sigma }[/math] (e.g. 26 for the alphabet) and whose components are ordered sequences of strings.
Let [math]\displaystyle{ N }[/math] denote the maximum length of an input string.
An array [math]\displaystyle{ A }[/math] with index range [math]\displaystyle{ [1,\dots,N] }[/math] and multisets of strings as components.

Abstract view

Invariant: After [math]\displaystyle{ i \geq 0 }[/math] iterations:

For [math]\displaystyle{ j \in \{1,\dots,N-i\} }[/math], [math]\displaystyle{ A[j] }[/math] contains all input strings of length [math]\displaystyle{ j }[/math].
[math]\displaystyle{ S' }[/math] contains all other input strings, that is, the ones with length at least [math]\displaystyle{ N - i + 1 }[/math]. The sequence [math]\displaystyle{ S' }[/math] is sorted according to the following definition of comparison: [math]\displaystyle{ str1 \lt str2 }[/math] (resp. [math]\displaystyle{ str1 \leq str2 }[/math]) means that the substring of [math]\displaystyle{ str1 }[/math] starting at position [math]\displaystyle{ N - i + 1 }[/math] is lexicographically smaller (resp., smaller or equal) than the substring of [math]\displaystyle{ str2 }[/math] starting at position [math]\displaystyle{ N - i + 1 }[/math].
All buckets are empty.

Variant: [math]\displaystyle{ i }[/math] increases by [math]\displaystyle{ 1 }[/math]

Break condition: [math]\displaystyle{ i = N }[/math].

Induction basis

Abstract view: The auxiliary data must be initialized.

Implementation:

[math]\displaystyle{ S' := \emptyset }[/math]
For each [math]\displaystyle{ c \in \Sigma }[/math], set [math]\displaystyle{ B[c] := \emptyset }[/math]
For each [math]\displaystyle{ i \in \{1,\dots,N\}, set \lt math\gt A[i] := \emptyset }[/math]
Move each string [math]\displaystyle{ str }[/math] in [math]\displaystyle{ S }[/math] to [math]\displaystyle{ A[l(str)] }[/math] where [math]\displaystyle{ l(str) }[/math] denotes the length of [math]\displaystyle{ str }[/math].

Proof: Obvious.

Induction step

Complexity

Statement: Let [math]\displaystyle{ M }[/math] denote the total sum of all input string lengths. Then the asymptotic complexity is in [math]\displaystyle{ \Theta(M) }[/math] in the best and the worst case.

Proof: Obviously, the preprocessing takes <mathO(M)</math> time. In the main loop, each character of each string is read exactly once. Obviously, no operation is applied more often than the reading of single characters.

Bucketsort

Contents

General information

Abstract view

Induction basis

Induction step

Complexity

Navigation menu

Bucketsort

General information

Abstract view

Induction basis

Induction step

Complexity

Navigation menu

Search