Mergesort
General Information
Algorithmic problem: Sorting based on pairwise comparison
Type of algorithm: recursion
Abstract View
Invariant: After a recursive call, the input sequence of this recursive call is sorted.
Variant: For a recursive call on a subsequence [math]\displaystyle{ S' }[/math] of [math]\displaystyle{ S }[/math], let [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math] denote the subsequences of [math]\displaystyle{ S' }[/math] with which Mergesort is called recursively from that call. Then it is [math]\displaystyle{ |S_1'| \leq \lceil|S'| /2\rceil }[/math] and [math]\displaystyle{ |S_2'| \leq \lceil|S'| /2\rceil }[/math].
Break condition: The current subsequence of the recursive call is empty or a singleton.
Induction Basis
Abstract view: Nothing to do on an empty sequence or a singleton.
Implementation: Ditto.
Proof: An empty set or singleton is trivially sorted.
Induction Step
Abstract view: The sequence is divided into two subsequences of approximately half size, it does not matter at all in which way this is done. Both subsequences are sorted recursively using Mergesort. The sorted subsequences are "merged" to one using algorithm Merge.
Implementation: Obvious.
Correctness: By induction hypothesis, the recursive calls sort and correctly. Algorithm Merge unites two sorted sequences into one sorted sequence.
Complexity
Statement: The complexity is in [math]\displaystyle{ O(n \log n) }[/math] in the best and worst case.
Proof: Obviously, the variant is correct. So, the lengths of [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math] are at most [math]\displaystyle{ \lceil1/2\rceil }[/math] of the length of [math]\displaystyle{ S' }[/math]. Consequently, the lengths of [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math] are at least [math]\displaystyle{ \lfloor1/2\rfloor }[/math] of the length of [math]\displaystyle{ S' }[/math]. In summary, the overall recursion depth is in [math]\displaystyle{ \Theta(\log n) }[/math] in the best and worst case. Next consider the run time of a single recursive call, which receives some [math]\displaystyle{ S' }[/math] as input and calls Mergesort recursively with two sub-sequences [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math]. The run time of this recursive call (disregarding the run times of the recursive calls with [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math]) is linear in the length of [math]\displaystyle{ S' }[/math]. Since all recursive calls on the same recursion level operate on pairwise disjoint sub-sequences, the total run time of all calls on the same recursive level is linear in the length of the original sequence.
