Bucketsort: Difference between revisions

From Algowiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
[[Category:Sorting Algorithms]]
[[Category:Sorting Algorithms]]
{{#ev:youtube|https://www.youtube.com/watch?v=6nSc8ojXZ1A|500|right|Bucketsort|frame}}
{{#ev:youtube|https://www.youtube.com/watch?v=-POIDU_ew98|500|right|Bucketsort|frame}}
== General information ==
== General information ==



Revision as of 17:17, 17 June 2015

Bucketsort

General information

Algorithmic problem: Sorting Sequences of Strings

Type of algorithm: loop

Auxiliary data:

  1. An ordered sequence S of strings, which will eventually hold the overall result of the algorithm.
  2. The buckets, that is, an array B whose index range includes the ID range of Σ (e.g. 0...127 suffices for all alphabets covered by ASCII) and whose components are ordered sequences of strings.
  3. Let N denote the maximum length of an input string.
  4. An array A with index range [1,,N] and multisets of strings as components.

Abstract view

Invariant: After i0 iterations:

  1. For j{1,,Ni}, A[j] contains all input strings of length j.
  2. S contains all other input strings, that is, the ones with length at least Ni+1. The sequence S is sorted according to the following definition of comparison: str1str2 (resp. str1str2) means that the substring of str1 starting at position Ni+1 is lexicographically smaller (resp., smaller or equal) than the substring of str2 starting at position Ni+1.
  3. All buckets are empty.

Variant: i increases by 1.

Break condition: N iterations are completed.

Induction basis

Abstract view: The auxiliary data must be initialized.

Implementation:

  1. S:=.
  2. For each cΣ, set B[c]:=.
  3. For each i{1,,N}, set A[i]:=.
  4. Move each string str in S to A[l(str)] where l(str) denotes the length of str.

Proof: Obvious.

Induction step

Abstract view:

  1. Move each string str in A[Ni+1] to B[str[Ni+1]].
  2. Afterwards: Append each string str in S at the tail of B[str[Ni+1]]. It is essential that the strings are considered in the order in which they are stored in S. (Now S is empty.)
  3. For each cΣ in ascending order: append B[c] at the tail of S and make B[c] empty.

Implementation: Obvious.

Correctness: We have to show that the relative order of each pair of strings, str1 and str2, in the new sequence S is correct. For that, we distinguish four cases:

  1. If str1[Ni+1]str2[Ni+1], str1 and str2 are placed in different buckets in Step 3 of the i-th iteration, so they are correctly ordered according to the character at position Ni+1.
  2. If str1[Ni+1]=str2[Ni+1] and both strings have length Ni+1, their relative order is irrelevant at this stage, so nothing is to show.
  3. Next, consider the case that str1[Ni+1]=str2[Ni+1], one string (say, str1) has length Ni+1 and the other one (str2) has a length strictly greater than Ni+1. Since the strings of length Ni+1 are placed in their buckets prior to the longer strings (cf. Steps 1 and 2), str2 appears after str1, which is correct.
  4. Finally, consider the case that str1[Ni+1]=str2[Ni+1], and the lengths of both strings are larger than Ni+1. Then the induction hypothesis guarantees the correct relative order due to the specific order in which the strings are considered in Step 2.

Complexity

Statement: Let M denote the total sum of all input string lengths. Then the asymptotic complexity is in Θ(M) in the best and the worst case.

Proof: Obviously, the preprocessing takes O(M) time. In the main loop, each character of each string is read exactly once. Obviously, no operation is applied more often than the reading of single characters.