# Quicksort

## General information

Algorithmic problem: Sorting based on pairwise comparison

Type of algorithm: recursion

## Abstract view

Invariant: After a recursive call, the input sequence of this recursive call is sorted.

Variant: In each recursive call, the sequence of the callee is strictly shorter than that of the caller.

Break condition: The sequence is empty or a singleton.

Remark: For a particular recursive call $C$, we may, for example, choose the height of the recursion subtree with root $C$ as the induction parameter. For conciseness, the induction parameter is omitted in the following.

## Induction basis

Abstract view: Nothing to do on an empty sequence or a singleton.

Implementation: Ditto.

Proof: Empty sequences and singletons are trivially sorted.

## Induction step

### Abstract view:

1. Choose a pivot value $p \in [min\{x|x \in S\},\dots,max\{x|x \in S\}]$ (note that $p$ is not required to be an element of $S$.
2. Partition $S$ into sequences, $S_1$, $S_2$ and $S_3$, such that $x \lt p$ for all $x \in S_1$, $x = p$ for all $x \in S_2$, and $x \gt p$ for all $x \in S_3$.
3. Sort $S_1$ and $S_3$ recursively.
4. The concatenation of all three lists, $S_1 + S_2 + S_3$, is the result of the algorithm.

### Implementation:

1. Choose $p \in [min\{x|x \in S\},\dots,max\{x|x \in S\}]$ according to some pivoting rule.
2. $S_1 := S_2 := S_3 := \emptyset$.
3. For all $x \in S$, append $x$ to
1. $S_1$ if $x \lt p$,
2. $S_2$ if $x = p$,
3. $S_3$ if $x \gt p$.
4. Call Quicksort on $S_1$ giving $S_1'$
5. Call Quicksort on $S_3$ giving $S_3'$
6. Return $S_1' + S_2' + S_3'$.

### Correctness:

By induction hypothesis, $S_1'$ and $S_3'$ are sorted permutations of $S_1$ and $S_3$, respectively. In particular $S_1' + S_2 + S_3'$ is a permutation of $S$. To see that this permutation is sorted, let $x$ and $y$ be two members of $S$ such that $y$ immediately succeeds $x$ in the resulting sequence $S_1' + S_2 + S_3'$. We have to show $x \leq y$.

1. If $x,y \in S_1'$ or $x,y \in S_3'$, $x \leq y$ resultes from the induction hypothesis.
2. On the other hand, if $x,y \in S_2$. It is $x = y = p$, which trivially implies $x \leq y$.
3. Finally, for the following cases, $x \leq y$ is implied by the specific way of partitioning $S$ into $S_1'$, $S_2$ and $S_3'$:
1. $x \in S_1'$ and $y \in S_2$
2. $x \in S_2$ and $y \in S_3'$
3. $x \in S_1'$ and $y \in S_3'$

Obviously, this case distinction covers all potential cases, so the claim is proved.

## Complexity

### Statement: recursion depth of quick sort : [A] best case, [B] avg. case, [C] worst case

In the worst case, the complexity is $\Theta(T\cdot n^2)$, where $T$ is the complexity of the comparison.

If the pivoting rule ensures for some $\alpha \lt 1$ that the lengths of $S_1$ and $S_3$ are at most $\alpha$ times the size of $S$, then it is even $\Theta(T\cdot n \log n)$ in the worst case.

If each pivot value is chosen uniformly randomly from members of the respective sequence and if all selections of pivot values are stochastically independent, the average-case complexity is $\Theta(T\cdot n \log n)$.

### Proof:

First note that the complexity for a single recursive call on $S$ (excluding the complexity for the recursive descents on $S_1$ and $S_3$) is in $\Theta(T\cdot|S|)$. On each recursive level, all calls are on distinct subsets of $S$. Therefore, the number of recursive calls with non-empty sequences on one recursive level is in $\Theta(n)$. The number of calls with empty sequences on one level is at most twice the total number of calls with non-empty sequences on the previous level. Hence, the number of calls with empty sequences on one recursive level is in $O(n)$ as well. In summary, the total complexity on a recursive level is $O(T\cdot n)$. So, for the total complexity, it remains to estimate the number of recursive levels.

Now consider the first statement. The recursion variant implies that the deepest recursive level is $O(n)$. On the other hand, $\Omega(n)$ in the very worst case is obvious. This gives the claimed $\Theta(T\cdot n^2)$ in the worst case.

Next assume there is a fixed $\alpha \lt 1$ such that $|S_1| \leq \alpha \cdot|S|$ and $|S_3| \leq \alpha \cdot|S|$ is guaranteed in each recursive call. Then the length of any sequence on recursive level $\#i$ is at most $\alpha ^ i \cdot |S|$. Therefore, the maximal recursive depth is $\lceil \log_{a^{-1}}(n)\rceil$. Since $\alpha^{-1} \gt 1$, the total complexity is in $O(T\cdot n \log n)$ in the worst case.

For the last statement, the average-case analysis, first note that the number of comparisons alone has the same asymptotic complexity as the algorithm as a whole. Next note that any $x,y \in S$ are compared at most once throughout the entire algorithm if, and only if, $x$ or $y$ is chosen as the pivot value for a subsequence to which both elements belong. For $x,y \in S$, let $Pr(x,y)$ denote the probability that $x$ and $y$ are indeed compared. Since comparison events are distinct and $Pr(x,y )\in \{0,1\}$ for all $x,y \in S$, the expected number of comparisons is

$\sum_{x,y \in S, x \neq y} Pr(x,y)$

Let $n := |S|$, and for $i,j \in \{1,\dots,n\}$, let $Pr(i,j)$ denote the probability that the $i$-th and the $j$-th elemet of the eventual sorted sequence are compared throughout the algorithm. Using this notation, we may rewrite the above summation as follows:

$\sum_{x,y \in S, x \neq y} Pr(x,y) = \sum_{i=1}^{n-1} \sum_{j = i+1}^{n} Pr(i,j)$.

For $i,j \in \{1,\dots,n\}$, $i \lt j$, let $S_{ij}$ denote the subsequence of the eventual sorted sequence that starts with $i$ and ends with $j$. The elements $\#i$ and $\#j$ are compared if, and only if, $\#i$ or $\#j$ is the very first element of $S_{ij}$ to be chosen as a pivot. The probability of this event is $\frac{2}{|S_{ij}|} = \frac{2}{j - i + 1}$, so we obtain

$\sum_{i=1}^{n-1} \sum_{j = i+1}^{n} \frac{2}{j-i+1}$.

Substituting $k := j-i$, this gives

$\sum_{i=1}^{n-1} \sum_{j = i+1}^{n} \frac{2}{j-i+1} = \sum_{i=1}^{n-1} \sum_{k=1}^{n} \frac{2}{k+1} \leq 2(n - 1)$.

$\sum_{k=1}^n \frac{1}{k+1} \leq 2(n-1)$.

$\sum_{k+1}^n \frac{1}{k}$.

The asymptotic behavior of the harmonic series is $\Theta(\log n)$, so the last expression is $\Theta(T\cdot n \log n)$.

## Further information

If $S$ is an array, $S$ cannot be decomposed into subsequences. We would liko to avoid the need for additional arrays and copy operations. Instead, the array should be sorted in-place, that is, by swap operations on pairs of elements. The auxiliary procedure, Pivot partitioning by scanning, is designed exactly for that: it permutes the array such that each of $S_1$ and $S_2$ is a subarray. Then each recursive call of Quicksort operates on a subarray of the input array, which is specified by two index pointers.