Mergesort: Difference between revisions

From Algowiki
Jump to navigation Jump to search
m (→‎Complexity: Tippfehler beseitigt)
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Videos]]
[[Category:Sorting Algorithms]]
[[Category:Sorting Algorithms]]
[[Category:Divide and Conquer]]
[[Category:Divide and Conquer]]
 
{{#ev:youtube|https://www.youtube.com/watch?v=7kdQwh-WvhA|500|right|Chapters
<div class="plainlinks" style="float:right;margin:0 0 5px 5px; border:1px solid #AAAAAA; width:auto; padding:1em; margin: 0px 0px 1em 1em;">
#[00:00] Mergesort
<div style="font-size: 1.8em;font-weight:bold;text-align: center;margin:0.2em 0 1em 0">Merge Sort</div>
#[02:36] Fragen
 
#[02:44] Wie funktioniert der Algorithmus?
<div style="font-size: 1.2em; margin:.5em 0 1em 0; text-align:center">whatever</div>
#[03:04] Was ist die asymptotische Komplexität des Algorithmus?
 
#[03:23] Was macht Merge?
<div style="font-size: 1.2em; margin:.5em 0 .5em 0;text-align:center">[[File:olw_logo1.png|20px]][https://openlearnware.tu-darmstadt.de/#!/resource/merge-sort-1944 Openlearnware]</div>
#[03:34] Wie lautet die Invariante?
</div>
#[03:58] Warum ist der Algorithmus korrekt?
#[04:10] Wie wird die Invariante sichergestellt?
#[04:31] Was ist die asymptotische Komplexität des Algorithmus?
|frame}}
== General Information ==
== General Information ==
'''Algorithmic problem:''' Sorting based on pairwise comparison
'''Algorithmic problem:''' [[Sorting based on pairwise comparison]]


'''Type of algorithm:''' recursion
'''Type of algorithm:''' recursion
Line 19: Line 23:
'''Variant:''' For a recursive call on a subsequence <math>S'</math> of <math>S</math>, let <math>S_1'</math> and <math>S_2'</math> denote the subsequences of <math>S'</math> with which Mergesort is called recursively from that call. Then it is <math>|S_1'| \leq \lceil|S'| /2\rceil</math> and <math>|S_2'| \leq \lceil|S'| /2\rceil</math>.
'''Variant:''' For a recursive call on a subsequence <math>S'</math> of <math>S</math>, let <math>S_1'</math> and <math>S_2'</math> denote the subsequences of <math>S'</math> with which Mergesort is called recursively from that call. Then it is <math>|S_1'| \leq \lceil|S'| /2\rceil</math> and <math>|S_2'| \leq \lceil|S'| /2\rceil</math>.


'''Break condition:''' The current subsequence of the recursive call is empty or a singleton.
'''Break condition:''' The current subsequence of the recursive call is a [[Sets and sequences#Singleton, pair, triple, quadruple|singleton]].
 
'''Remark:''' For a particular recursive call <math>C</math>, we may, for example, choose the height of the recursion subtree with root <math>C</math> as the induction parameter. For conciseness, the induction parameter is omitted in the following.


== Induction Basis ==
== Induction Basis ==
'''Abstract view:''' Nothing to do on an empty sequence or a singleton.
'''Abstract view:''' Nothing to do on a singleton.


'''Implementation:''' Ditto.
'''Implementation:''' Ditto.


'''Proof:''' An empty set or singleton is trivially sorted.
'''Proof:''' A singleton is trivially sorted.


== Induction Step ==
== Induction Step ==
'''Abstract view:''' The sequence is divided into two subsequences of approximately half size, it does not matter at all in which way this is done. Both subsequences are sorted recursively using Mergesort. The sorted subsequences are "merged" to one using algorithm [[Merge]].
'''Abstract view:''' The sequence is divided into two subsequences of approximately half size, it does not matter at all in which way this is done. Both subsequences are sorted recursively using Mergesort. The sorted subsequences are "merged" into one using algorithm [[Merge]].


'''Implementation:''' Obvious.
'''Implementation:''' Obvious.


'''Correctness:''' By induction hypothesis, the recursive calls sort correctly. Algorithm [[Merge]] unites two sorted sequences into one sorted sequence.
'''Correctness:''' By induction hypothesis, the recursive calls sort correctly. So, correctness of [[Merge]] implies correctness of Mergesort.


== Complexity ==
== Complexity ==
'''Statement:''' The complexity is in <math>O(n \log n)</math> in the best and worst case.
'''Statement:''' The complexity is in <math>O(T\cdot n \log n)</math> in the best and worst case, where <math>T</math> is the complexity of the comparison.
 
'''Proof:''' Obviously, the variant is correct. So, the lengths of <math>S_1'</math> and <math>S_2'</math> are at most <math>\lceil1/2\rceil</math> of the length of <math>S'</math>. Consequently, the lengths of <math>S_1'</math> and <math>S_2'</math> are at least <math>\lfloor1/2\rfloor</math> of the length of <math>S'</math>. In summary, the overall recursion depth is in <math>\Theta(\log n)</math> in the best and worst case. Next consider the run time of a single recursive call, which receives some <math>S'</math> as input and calls Mergesort recursively with two subsequences <math>S_1'</math> and <math>S_2'</math>. The run time of this recursive call (excluding the run times of the recursive calls with <math>S_1'</math> and <math>S_2'</math>) is linear in the length of <math>S'</math>. Since all recursive calls on the same recursion level operate on pairwise disjoint subsequences, the total run time of all calls on the same recursive level is linear in the length of the original sequence.
 
==Example implementations==
===Java===
<syntaxhighlight lang="java">
 
 
public static <T> void mergesort(List<T> liste, Comparator<T> cmp) {
if (liste.size() <= 1)
return;
LinkedList<T> teilliste1 = new LinkedList<T>(); // leer
LinkedList<T> teilliste2 = new LinkedList<T>();
zerlegeInTeillisten(liste, teilliste1, teilliste2);
mergesort(teilliste1, cmp);
mergesort(teilliste2, cmp);
liste.clear();
merge(teilliste1, teilliste2, liste, cmp);
}


'''Proof:''' Obviously, the variant is correct. So, the lengths of <math>S_1'</math> and <math>S_2'</math> are at most <math>\lceil1/2\rceil</math> of the length of <math>S'</math>. Consequently, the lengths of <math>S_1'</math> and <math>S_2'</math> are at least <math>\lfloor1/2\rfloor</math> of the length of <math>S'</math>. In summary, the overall recursion depth is in <math>\Theta(\log n)</math> in the best and worst case. Next consider the run time of a single recursive call, which receives some <math>S'</math> as input and calls Mergesort recursively with two subsequences <math>S_1'</math> and <math>S_2'</math>. The run time of this recursive call (disregarding the run times of the recursive calls with <math>S_1'</math> and <math>S_2'</math>) is linear in the length of <math>S'</math>. Since all recursive calls on the same recursion level operate on pairwise disjoint subsequences, the total run time of all calls on the same recursive level is linear in the length of the original sequence.
// -----------------
public static <T> void zerlegeInTeillisten(List<T> liste,
List<T> teilliste1, List<T> teilliste2) {
ListIterator<T> it = liste.listIterator();
for (int i = 0; i < liste.size(); i++) {
T elem = it.next();
if (i <= liste.size() / 2)
teilliste1.add(elem); // Haengt elem hinten an teilliste1 an
else
teilliste2.add(elem); // Dito teilliste2
}
}
</syntaxhighlight>

Latest revision as of 13:36, 3 March 2017

Chapters
  1. [00:00] Mergesort
  2. [02:36] Fragen
  3. [02:44] Wie funktioniert der Algorithmus?
  4. [03:04] Was ist die asymptotische Komplexität des Algorithmus?
  5. [03:23] Was macht Merge?
  6. [03:34] Wie lautet die Invariante?
  7. [03:58] Warum ist der Algorithmus korrekt?
  8. [04:10] Wie wird die Invariante sichergestellt?
  9. [04:31] Was ist die asymptotische Komplexität des Algorithmus?

General Information

Algorithmic problem: Sorting based on pairwise comparison

Type of algorithm: recursion

Abstract View

Invariant: After a recursive call, the input sequence of this recursive call is sorted.

Variant: For a recursive call on a subsequence [math]\displaystyle{ S' }[/math] of [math]\displaystyle{ S }[/math], let [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math] denote the subsequences of [math]\displaystyle{ S' }[/math] with which Mergesort is called recursively from that call. Then it is [math]\displaystyle{ |S_1'| \leq \lceil|S'| /2\rceil }[/math] and [math]\displaystyle{ |S_2'| \leq \lceil|S'| /2\rceil }[/math].

Break condition: The current subsequence of the recursive call is a singleton.

Remark: For a particular recursive call [math]\displaystyle{ C }[/math], we may, for example, choose the height of the recursion subtree with root [math]\displaystyle{ C }[/math] as the induction parameter. For conciseness, the induction parameter is omitted in the following.

Induction Basis

Abstract view: Nothing to do on a singleton.

Implementation: Ditto.

Proof: A singleton is trivially sorted.

Induction Step

Abstract view: The sequence is divided into two subsequences of approximately half size, it does not matter at all in which way this is done. Both subsequences are sorted recursively using Mergesort. The sorted subsequences are "merged" into one using algorithm Merge.

Implementation: Obvious.

Correctness: By induction hypothesis, the recursive calls sort correctly. So, correctness of Merge implies correctness of Mergesort.

Complexity

Statement: The complexity is in [math]\displaystyle{ O(T\cdot n \log n) }[/math] in the best and worst case, where [math]\displaystyle{ T }[/math] is the complexity of the comparison.

Proof: Obviously, the variant is correct. So, the lengths of [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math] are at most [math]\displaystyle{ \lceil1/2\rceil }[/math] of the length of [math]\displaystyle{ S' }[/math]. Consequently, the lengths of [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math] are at least [math]\displaystyle{ \lfloor1/2\rfloor }[/math] of the length of [math]\displaystyle{ S' }[/math]. In summary, the overall recursion depth is in [math]\displaystyle{ \Theta(\log n) }[/math] in the best and worst case. Next consider the run time of a single recursive call, which receives some [math]\displaystyle{ S' }[/math] as input and calls Mergesort recursively with two subsequences [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math]. The run time of this recursive call (excluding the run times of the recursive calls with [math]\displaystyle{ S_1' }[/math] and [math]\displaystyle{ S_2' }[/math]) is linear in the length of [math]\displaystyle{ S' }[/math]. Since all recursive calls on the same recursion level operate on pairwise disjoint subsequences, the total run time of all calls on the same recursive level is linear in the length of the original sequence.

Example implementations

Java

	public static <T> void mergesort(List<T> liste, Comparator<T> cmp) {
		if (liste.size() <= 1)
			return;
		LinkedList<T> teilliste1 = new LinkedList<T>(); // leer
		LinkedList<T> teilliste2 = new LinkedList<T>();
		zerlegeInTeillisten(liste, teilliste1, teilliste2);
		mergesort(teilliste1, cmp);
		mergesort(teilliste2, cmp);
		liste.clear();
		merge(teilliste1, teilliste2, liste, cmp);
	}

	// -----------------
	public static <T> void zerlegeInTeillisten(List<T> liste,
			List<T> teilliste1, List<T> teilliste2) {
		ListIterator<T> it = liste.listIterator();
		for (int i = 0; i < liste.size(); i++) {
			T elem = it.next();
			if (i <= liste.size() / 2)
				teilliste1.add(elem); // Haengt elem hinten an teilliste1 an
			else
				teilliste2.add(elem); // Dito teilliste2
		}
	}