Bucketsort: Difference between revisions

Revision as of 17:17, 17 June 2015

Bucketsort

General information

Algorithmic problem: Sorting Sequences of Strings

Type of algorithm: loop

Auxiliary data:

An ordered sequence $S^{'}$ of strings, which will eventually hold the overall result of the algorithm.
The buckets, that is, an array $B$ whose index range includes the ID range of $Σ$ (e.g. 0...127 suffices for all alphabets covered by ASCII) and whose components are ordered sequences of strings.
Let $N$ denote the maximum length of an input string.
An array $A$ with index range $[1, \dots, N]$ and multisets of strings as components.

Abstract view

Invariant: After $i \geq 0$ iterations:

For $j \in {1, \dots, N - i}$ , $A [j]$ contains all input strings of length $j$ .
$S^{'}$ contains all other input strings, that is, the ones with length at least $N - i + 1$ . The sequence $S^{'}$ is sorted according to the following definition of comparison: $s t r 1 ≺ s t r 2$ (resp. $s t r 1 ⪯ s t r 2$ ) means that the substring of $s t r 1$ starting at position $N - i + 1$ is lexicographically smaller (resp., smaller or equal) than the substring of $s t r 2$ starting at position $N - i + 1$ .
All buckets are empty.

Variant: $i$ increases by $1$ .

Break condition: $N$ iterations are completed.

Induction basis

Abstract view: The auxiliary data must be initialized.

Implementation:

$S^{'} := \emptyset$ .
For each $c \in Σ$ , set $B [c] := \emptyset$ .
For each $i \in {1, \dots, N}$ , set $A [i] := \emptyset$ .
Move each string $s t r$ in $S$ to $A [l (s t r)]$ where $l (s t r)$ denotes the length of $s t r$ .

Proof: Obvious.

Induction step

Abstract view:

Move each string $s t r$ in $A [N - i + 1]$ to $B [s t r [N - i + 1]]$ .
Afterwards: Append each string $s t r$ in $S^{'}$ at the tail of $B [s t r [N - i + 1]]$ . It is essential that the strings are considered in the order in which they are stored in $S^{'}$ . (Now $S^{'}$ is empty.)
For each $c \in Σ$ in ascending order: append $B [c]$ at the tail of $S^{'}$ and make $B [c]$ empty.

Implementation: Obvious.

Correctness: We have to show that the relative order of each pair of strings, $s t r 1$ and $s t r 2$ , in the new sequence $S^{'}$ is correct. For that, we distinguish four cases:

If $s t r 1 [N - i + 1] \neq s t r 2 [N - i + 1]$ , $s t r 1$ and $s t r 2$ are placed in different buckets in Step 3 of the $i$ -th iteration, so they are correctly ordered according to the character at position $N - i + 1$ .
If $s t r 1 [N - i + 1] = s t r 2 [N - i + 1]$ and both strings have length $N - i + 1$ , their relative order is irrelevant at this stage, so nothing is to show.
Next, consider the case that $s t r 1 [N - i + 1] = s t r 2 [N - i + 1]$ , one string (say, $s t r 1$ ) has length $N - i + 1$ and the other one ( $s t r 2$ ) has a length strictly greater than $N - i + 1$ . Since the strings of length $N - i + 1$ are placed in their buckets prior to the longer strings (cf. Steps 1 and 2), $s t r 2$ appears after $s t r 1$ , which is correct.
Finally, consider the case that $s t r 1 [N - i + 1] = s t r 2 [N - i + 1]$ , and the lengths of both strings are larger than $N - i + 1$ . Then the induction hypothesis guarantees the correct relative order due to the specific order in which the strings are considered in Step 2.

Complexity

Statement: Let $M$ denote the total sum of all input string lengths. Then the asymptotic complexity is in $Θ (M)$ in the best and the worst case.

Proof: Obviously, the preprocessing takes $O (M)$ time. In the main loop, each character of each string is read exactly once. Obviously, no operation is applied more often than the reading of single characters.

@@ Line 1: / Line 1: @@
 [[Category:Sorting Algorithms]]
-{{#ev:youtube|https://www.youtube.com/watch?v=6nSc8ojXZ1A|500|right|Bucketsort|frame}}
+{{#ev:youtube|https://www.youtube.com/watch?v=-POIDU_ew98|500|right|Bucketsort|frame}}
 == General information ==

Bucketsort: Difference between revisions

Revision as of 17:17, 17 June 2015

Contents

General information

Abstract view

Induction basis

Induction step

Complexity

Navigation menu

Bucketsort: Difference between revisions

Revision as of 17:17, 17 June 2015

General information

Abstract view

Induction basis

Induction step

Complexity

Navigation menu

Search