Bucketsort: Difference between revisions

From Algowiki
Jump to navigation Jump to search
No edit summary
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Videos]]
[[Category:Sorting Algorithms]]
[[Category:Sorting Algorithms]]
<div class="plainlinks" style="float:right;margin:0 0 5px 5px; border:1px solid #AAAAAA; width:auto; padding:1em; margin: 0px 0px 1em 1em;">
{{#ev:youtube|https://www.youtube.com/watch?v=-POIDU_ew98|500|right|Bucketsort|frame}}
<div style="font-size: 1.8em;font-weight:bold;text-align: center;margin:0.2em 0 1em 0">Bucket Sort</div>
 
<div style="font-size: 1.2em; margin:.5em 0 1em 0; text-align:center">whatever</div>
 
<div style="font-size: 1.2em; margin:.5em 0 .5em 0;text-align:center">[[File:olw_logo1.png|20px]][https://openlearnware.tu-darmstadt.de/#!/resource/bucket-sort-1941 Openlearnware]</div>
</div>
 
== General information ==
== General information ==


Line 36: Line 30:


'''Implementation:'''
'''Implementation:'''
# <math>S' := \emptyset</math>
# <math>S' := \emptyset</math>.
# For each <math>c \in \Sigma</math>, set <math>B[c] := \emptyset</math>
# For each <math>c \in \Sigma</math>, set <math>B[c] := \emptyset</math>.
# For each <math>i \in \{1,\dots,N\}</math>, set <math>A[i] := \emptyset</math>
# For each <math>i \in \{1,\dots,N\}</math>, set <math>A[i] := \emptyset</math>.
# Move each string <math>str</math> in <math>S</math> to <math>A[l(str)]</math> where <math>l(str)</math> denotes the length of <math>str</math>.
# Move each string <math>str</math> in <math>S</math> to <math>A[l(str)]</math> where <math>l(str)</math> denotes the length of <math>str</math>.


Line 45: Line 39:
== Induction step ==
== Induction step ==
'''Abstract view:'''
'''Abstract view:'''
# Move each string <math>str</math> in <math>A[N-i+1]</math> to <math>B[str[N-i+1]]</math>
# Move each string <math>str</math> in <math>A[N-i+1]</math> to <math>B[str[N-i+1]]</math>.
# Afterwards: Append each string <math>str</math> in <math>S'</math> at the tail of <math>B[str[N - i + 1]]</math>. It is essential that the strings are considered in the order in which they are stored in <math>S'</math>. (Now <math>S'</math> is empty.)
# Afterwards: Append each string <math>str</math> in <math>S'</math> at the tail of <math>B[str[N - i + 1]]</math>. It is essential that the strings are considered in the order in which they are stored in <math>S'</math>. (Now <math>S'</math> is empty.)
# For each <math>c \in \Sigma</math> in ascending order: append <math>B[c]</math> at the tail of <math>S'</math> and make <math>B[c]</math> empty.
# For each <math>c \in \Sigma</math> in ascending order: append <math>B[c]</math> at the tail of <math>S'</math> and make <math>B[c]</math> empty.


'''Implementation:''' Obvious
'''Implementation:''' Obvious.


'''Correctness:''' We have to show that the relative order of each pair of strings, <math>str1</math> and <math>str2</math>, in the new sequence <math>S'</math> are correct. For that, we distinguish between four cases:
'''Correctness:''' We have to show that the relative order of each pair of strings, <math>str1</math> and <math>str2</math>, in the new sequence <math>S'</math> is correct. For that, we distinguish four cases:
# If <math>str1[N-i+1] \neq str2[N-i+1]</math>, <math>str1</math> and <math>str2</math> are placed in different buckets in Step 3 of the <math>i</math>-th iteration, so they are correctly ordered according to the character at position <math>N - i + 1</math>.
# If <math>str1[N-i+1] \neq str2[N-i+1]</math>, <math>str1</math> and <math>str2</math> are placed in different buckets in Step 3 of the <math>i</math>-th iteration, so they are correctly ordered according to the character at position <math>N - i + 1</math>.
# If <math>str1[N-i+1] = str2[N-i+1]</math> and both strings have length <math>N - i + 1</math>, their relative order is irrelevant, so nothing is to show.
# If <math>str1[N-i+1] = str2[N-i+1]</math> and both strings have length <math>N - i + 1</math>, their relative order is irrelevant at this stage, so nothing is to show.
# Next, consider the case that <math>str1[N-i+1] = str2[N-i+1]</math>, one string (say, <math>str1</math>) has length <math>N - i + 1</math> and the other one (<math>str2</math>) has a length strictly greater than <math>N - i + 1</math>. Since the strings of length <math>N - i + 1</math> are placed in their buckets prior to the longer strings (cf. Steps 1 and 2), <math>str2</math> appears after <math>str1</math>, which is correct.
# Next, consider the case that <math>str1[N-i+1] = str2[N-i+1]</math>, one string (say, <math>str1</math>) has length <math>N - i + 1</math> and the other one (<math>str2</math>) has a length strictly greater than <math>N - i + 1</math>. Since the strings of length <math>N - i + 1</math> are placed in their buckets prior to the longer strings (cf. Steps 1 and 2), <math>str2</math> appears after <math>str1</math>, which is correct.
# Finally, consider the case that <math>str1[N-i+1] = str2[N-i+1]</math>, and the lengths of both strings are larger than <math>N - i + 1</math>. Then the induction hypothesis guarantees the correct relative order due to the specific order in which the strings are considered in Step 2.
# Finally, consider the case that <math>str1[N-i+1] = str2[N-i+1]</math>, and the lengths of both strings are larger than <math>N - i + 1</math>. Then the induction hypothesis guarantees the correct relative order due to the specific order in which the strings are considered in Step 2.

Latest revision as of 22:51, 19 June 2015

Bucketsort

General information

Algorithmic problem: Sorting Sequences of Strings

Type of algorithm: loop

Auxiliary data:

  1. An ordered sequence [math]\displaystyle{ S' }[/math] of strings, which will eventually hold the overall result of the algorithm.
  2. The buckets, that is, an array [math]\displaystyle{ B }[/math] whose index range includes the ID range of [math]\displaystyle{ \Sigma }[/math] (e.g. 0...127 suffices for all alphabets covered by ASCII) and whose components are ordered sequences of strings.
  3. Let [math]\displaystyle{ N }[/math] denote the maximum length of an input string.
  4. An array [math]\displaystyle{ A }[/math] with index range [math]\displaystyle{ [1,\dots,N] }[/math] and multisets of strings as components.

Abstract view

Invariant: After [math]\displaystyle{ i \geq 0 }[/math] iterations:

  1. For [math]\displaystyle{ j \in \{1,\dots,N-i\} }[/math], [math]\displaystyle{ A[j] }[/math] contains all input strings of length [math]\displaystyle{ j }[/math].
  2. [math]\displaystyle{ S' }[/math] contains all other input strings, that is, the ones with length at least [math]\displaystyle{ N - i + 1 }[/math]. The sequence [math]\displaystyle{ S' }[/math] is sorted according to the following definition of comparison: [math]\displaystyle{ str1 \prec str2 }[/math] (resp. [math]\displaystyle{ str1 \preceq str2 }[/math]) means that the substring of [math]\displaystyle{ str1 }[/math] starting at position [math]\displaystyle{ N - i + 1 }[/math] is lexicographically smaller (resp., smaller or equal) than the substring of [math]\displaystyle{ str2 }[/math] starting at position [math]\displaystyle{ N - i + 1 }[/math].
  3. All buckets are empty.

Variant: [math]\displaystyle{ i }[/math] increases by [math]\displaystyle{ 1 }[/math].

Break condition: [math]\displaystyle{ N }[/math] iterations are completed.

Induction basis

Abstract view: The auxiliary data must be initialized.

Implementation:

  1. [math]\displaystyle{ S' := \emptyset }[/math].
  2. For each [math]\displaystyle{ c \in \Sigma }[/math], set [math]\displaystyle{ B[c] := \emptyset }[/math].
  3. For each [math]\displaystyle{ i \in \{1,\dots,N\} }[/math], set [math]\displaystyle{ A[i] := \emptyset }[/math].
  4. Move each string [math]\displaystyle{ str }[/math] in [math]\displaystyle{ S }[/math] to [math]\displaystyle{ A[l(str)] }[/math] where [math]\displaystyle{ l(str) }[/math] denotes the length of [math]\displaystyle{ str }[/math].

Proof: Obvious.

Induction step

Abstract view:

  1. Move each string [math]\displaystyle{ str }[/math] in [math]\displaystyle{ A[N-i+1] }[/math] to [math]\displaystyle{ B[str[N-i+1]] }[/math].
  2. Afterwards: Append each string [math]\displaystyle{ str }[/math] in [math]\displaystyle{ S' }[/math] at the tail of [math]\displaystyle{ B[str[N - i + 1]] }[/math]. It is essential that the strings are considered in the order in which they are stored in [math]\displaystyle{ S' }[/math]. (Now [math]\displaystyle{ S' }[/math] is empty.)
  3. For each [math]\displaystyle{ c \in \Sigma }[/math] in ascending order: append [math]\displaystyle{ B[c] }[/math] at the tail of [math]\displaystyle{ S' }[/math] and make [math]\displaystyle{ B[c] }[/math] empty.

Implementation: Obvious.

Correctness: We have to show that the relative order of each pair of strings, [math]\displaystyle{ str1 }[/math] and [math]\displaystyle{ str2 }[/math], in the new sequence [math]\displaystyle{ S' }[/math] is correct. For that, we distinguish four cases:

  1. If [math]\displaystyle{ str1[N-i+1] \neq str2[N-i+1] }[/math], [math]\displaystyle{ str1 }[/math] and [math]\displaystyle{ str2 }[/math] are placed in different buckets in Step 3 of the [math]\displaystyle{ i }[/math]-th iteration, so they are correctly ordered according to the character at position [math]\displaystyle{ N - i + 1 }[/math].
  2. If [math]\displaystyle{ str1[N-i+1] = str2[N-i+1] }[/math] and both strings have length [math]\displaystyle{ N - i + 1 }[/math], their relative order is irrelevant at this stage, so nothing is to show.
  3. Next, consider the case that [math]\displaystyle{ str1[N-i+1] = str2[N-i+1] }[/math], one string (say, [math]\displaystyle{ str1 }[/math]) has length [math]\displaystyle{ N - i + 1 }[/math] and the other one ([math]\displaystyle{ str2 }[/math]) has a length strictly greater than [math]\displaystyle{ N - i + 1 }[/math]. Since the strings of length [math]\displaystyle{ N - i + 1 }[/math] are placed in their buckets prior to the longer strings (cf. Steps 1 and 2), [math]\displaystyle{ str2 }[/math] appears after [math]\displaystyle{ str1 }[/math], which is correct.
  4. Finally, consider the case that [math]\displaystyle{ str1[N-i+1] = str2[N-i+1] }[/math], and the lengths of both strings are larger than [math]\displaystyle{ N - i + 1 }[/math]. Then the induction hypothesis guarantees the correct relative order due to the specific order in which the strings are considered in Step 2.

Complexity

Statement: Let [math]\displaystyle{ M }[/math] denote the total sum of all input string lengths. Then the asymptotic complexity is in [math]\displaystyle{ \Theta(M) }[/math] in the best and the worst case.

Proof: Obviously, the preprocessing takes [math]\displaystyle{ O(M) }[/math] time. In the main loop, each character of each string is read exactly once. Obviously, no operation is applied more often than the reading of single characters.