B-tree: insert: Difference between revisions

From Algowiki
Jump to navigation Jump to search
 
(19 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Videos]]
{{#ev:youtube|https://www.youtube.com/watch?v=vbRZ8h6ROYc|500|right||frame}}
[[Category: B-Tree]]
[[Category: B-Tree]]
[[Category: Algorithm]]
[[Category: Algorithm]]
[[Category: Checkup]]


== General Information ==
== General Information ==
Line 7: Line 11:
'''Type of algorithm:''' loop
'''Type of algorithm:''' loop


'''Auxiliary data:''' Pointers '''''p''''' and '''''p'''''' of type "pointer to a B-tree node".
'''Auxiliary data:''' Pointers <math>p</math> and <math>p'</math> of type "pointer to a B-tree node of key type <math>\mathcal{K}</math>."


== Abstract View ==
== Abstract View ==
'''Invariant:''' Before and after each iteration:
'''Invariant:''' After <math>i\geq 0</math> iterations:


# '''''p''''' points to some node '''''N''''' of the B-tree such that <math>n < 2M -1</math>.
# Pointer <math>p</math> points to some node of the B-tree on height level <math>i</math> such that <math>n < 2M -1</math> for this node.
# The key to be inserted is in the [[Directed tree|range]] of '''''N'''''.
# The key to be inserted is in the [[Directed Tree#Ranges of Search Tree Nodes|range]] of that node


'''Variant:''' '''''p''''' is redirected from the current node '''''N''''' to some child of '''''N'''''.
'''Variant:''' <math>i</math> is increased by <math>1</math>.


Break condition: '''''p''''' points to a leaf.
Break condition: <math>p</math> points to a leaf.
 
'''Remark:''' For example, the height of the subtree rooted at the node to which <math>p</math> points may be chosen as the induction parameter. For conciseness, the induction parameter is omitted in the following.


== Induction Basis ==
== Induction Basis ==
'''Abstract view:'''
# If the tree is empty, create a new root with <math>K</math> the only key and terminate the algorithm.
# Otherwise, '''''p''''' points to the root.
# If the root is full:
## A new root is created, and the old root is dhe unique child of the new root.
## The old root is '''split''' into two siblings.
## '''''p''''' points to the new root.
'''Implementation:'''
# If the tree is empty:
## Create a new, empty node and let <math>p</math> point to it.
## Set <math>p.keys[1] := K</math>.
## Terminate the algorithm.
# Let <math>p</math> point to the root of the tree.
#If <math>p.n = 2M - 1</math>:
## Create a new, empty node and let <math>p'</math> point to it.
## Cat <math>p'</math>.children[0] <math>:= p</math>.
## Call [[B-Trees|B-tree: split node into two siblings]] with pointer <math>p'</math> and index <math>0</math>.
## Set <math> p := p'</math>
'''Proof:'''
Basically, we have to verify that the [[B-Trees|implementation invariants of the B-tree data structure]] and the above-mentioned loop invariants of the insert procedure are fulfilled after the preprocessing.
If the tree is empty or, otherwise, the root is not full, all invariants are trivially fulfilled by Step 1 and Step 2, respectively. So consider the case that the tree is not empty and the root is full, that is, Step 3 is executed.
Implementation invariants #1, #2, and #8 are trivially. Maintenance of the implementation invariants #3-#7 is guaranteed by the split procedure.
Finally, the loop invariants result from Step 3.4.


== Induction Step ==
== Induction Step ==


== Complexity ==
=== Abstract view: ===
 
# If the current node '''''N''''' is a leaf, insert the new key in '''''N''''' and terminate the algorithm.
# Otherwise, let '''''N'''''' be the child of '''''N''''' such that the key to be inserted is in the [[Directed Tree#Ranges of Search Tree Nodes|range]] of that child (ties arbitrarily broken).
# If '''''N'''''' is full, '''''N'''''' is '''split''' into two siblings.
# The new current node is the child of '''''N''''' such that the key to be inserted is in the [[Directed Tree#Ranges of Search Tree Nodes|range]] of that child (one of these two siblings, in fact; ties arbitrarily broken).
 
=== Implementation: ===
# If <math>p.children[0] = void</math> (that is, <math>N</math> is a leaf):
## If <math> K \geq p.keys[p.n]</math>, insert the key at position <math>p.n + 1</math>; otherwise:
### Let <math>k \in \{1,\dots,p.n\}</math> be the minimal position such that <math>K < p.keys[k]</math>.
### For <math>j = p.n,\dots,k</math> (in that order), set <math>p.keys[j+1] := p.keys[k]</math>.
### Insert the new key at position <math>k</math>
## Set <math>p.n := p.n + 1</math>.
## Terminate the algorithm.
# If <math>K < p.keys[1]</math> set <math>k := 0</math>; otherwise, let <math>k \in \{1,\dots,p.n\}</math> be maximal such that <math>K \geq p.keys[k]</math>.
# If <math>p.children[k].n = 2M - 1</math>: call [[B-Trees|B-tree: split node into two siblings]] with pointer <math>p</math> and index <math>k</math>.
# If <math>K < p.keys[1]</math> set <math>p := p.children[0]</math>; otherwise, set <math>p := p.children[k]</math> where <math>k \in \{1,\dots,p.n\}</math> is maximal such that <math>K \geq p.keys[k]</math>.
 
=== Correctness: ===
Analogously to the induction basis, we have to verify that the [[B-Trees|implementation invariants of the B-tree data structure]] and the above-mentioned loop invariants of the insert procedure are maintained through an iteration of the main loop.
 
Again, the implementation invariants #1, #2 and #8 are trivially maintained. If the node is not full, the other implementation invariants and the above-mentioned loop invariants of the insert procedure are maintained as well. So consider the case that the node is full.
 
Analogously to the induction basis, the implementation invariants #3-#7 are the guaranteed by the split operation.


Finally, the loop invariants of the insert procedure result from Step 4


== Pseudocode ==
== Pseudocode ==
Line 66: Line 125:
  17        B-TREE-INSERT-NONFULL(''x.c<sub>i</sub>, k)   
  17        B-TREE-INSERT-NONFULL(''x.c<sub>i</sub>, k)   
</code>
</code>
== Complexity ==
'''Statement:''' The asymptotic complexity is in <math>\Theta(T\cdot\log n)</math> in the worst case, where <math>T</math> is the complexity of the comparison.
'''Proof:''' Follows immediately from the facts that
# <math>M</math> is assumed to be fixed, and
# the height of a B-tree with <math>n</math> nodes is in <math>\Theta(\log n)</math> (cf. the remark clause of the [[B-Trees|B-tree]] page).


== Further Information ==
== Further Information ==
In the above specification, each full node is split into two when visited. This is done for precautionary reasons only. With a slight modification, this can be avoided: when the leaf is reached, go back along the traversed path and split full nodes until the first non-full node is reached. Evidently, this reduces the number of splits. However, chances are high that these nodes have to be split sooner or later, so the true benefit is not clear. The version presented here has primarily been selected because its loop invariant is simpler and more intuitive.
In the above specification, each full node is split into two when visited. This is done for precautionary reasons only. With a slight modification, this can be avoided: when the leaf is reached, go back along the traversed path and split full nodes until the first non-full node is reached. Evidently, this reduces the number of splits. However, chances are high that these nodes have to be split sooner or later, so the true benefit is not clear. The version presented here has primarily been selected because its loop invariant is simpler and more intuitive.

Latest revision as of 13:47, 3 March 2017

General Information

Algorithmic problem: Sorted sequence: insert

Type of algorithm: loop

Auxiliary data: Pointers [math]\displaystyle{ p }[/math] and [math]\displaystyle{ p' }[/math] of type "pointer to a B-tree node of key type [math]\displaystyle{ \mathcal{K} }[/math]."

Abstract View

Invariant: After [math]\displaystyle{ i\geq 0 }[/math] iterations:

  1. Pointer [math]\displaystyle{ p }[/math] points to some node of the B-tree on height level [math]\displaystyle{ i }[/math] such that [math]\displaystyle{ n \lt 2M -1 }[/math] for this node.
  2. The key to be inserted is in the range of that node

Variant: [math]\displaystyle{ i }[/math] is increased by [math]\displaystyle{ 1 }[/math].

Break condition: [math]\displaystyle{ p }[/math] points to a leaf.

Remark: For example, the height of the subtree rooted at the node to which [math]\displaystyle{ p }[/math] points may be chosen as the induction parameter. For conciseness, the induction parameter is omitted in the following.

Induction Basis

Abstract view:

  1. If the tree is empty, create a new root with [math]\displaystyle{ K }[/math] the only key and terminate the algorithm.
  2. Otherwise, p points to the root.
  3. If the root is full:
    1. A new root is created, and the old root is dhe unique child of the new root.
    2. The old root is split into two siblings.
    3. p points to the new root.

Implementation:

  1. If the tree is empty:
    1. Create a new, empty node and let [math]\displaystyle{ p }[/math] point to it.
    2. Set [math]\displaystyle{ p.keys[1] := K }[/math].
    3. Terminate the algorithm.
  2. Let [math]\displaystyle{ p }[/math] point to the root of the tree.
  3. If [math]\displaystyle{ p.n = 2M - 1 }[/math]:
    1. Create a new, empty node and let [math]\displaystyle{ p' }[/math] point to it.
    2. Cat [math]\displaystyle{ p' }[/math].children[0] [math]\displaystyle{ := p }[/math].
    3. Call B-tree: split node into two siblings with pointer [math]\displaystyle{ p' }[/math] and index [math]\displaystyle{ 0 }[/math].
    4. Set [math]\displaystyle{ p := p' }[/math]

Proof: Basically, we have to verify that the implementation invariants of the B-tree data structure and the above-mentioned loop invariants of the insert procedure are fulfilled after the preprocessing.

If the tree is empty or, otherwise, the root is not full, all invariants are trivially fulfilled by Step 1 and Step 2, respectively. So consider the case that the tree is not empty and the root is full, that is, Step 3 is executed.

Implementation invariants #1, #2, and #8 are trivially. Maintenance of the implementation invariants #3-#7 is guaranteed by the split procedure. Finally, the loop invariants result from Step 3.4.

Induction Step

Abstract view:

  1. If the current node N is a leaf, insert the new key in N and terminate the algorithm.
  2. Otherwise, let N' be the child of N such that the key to be inserted is in the range of that child (ties arbitrarily broken).
  3. If N' is full, N' is split into two siblings.
  4. The new current node is the child of N such that the key to be inserted is in the range of that child (one of these two siblings, in fact; ties arbitrarily broken).

Implementation:

  1. If [math]\displaystyle{ p.children[0] = void }[/math] (that is, [math]\displaystyle{ N }[/math] is a leaf):
    1. If [math]\displaystyle{ K \geq p.keys[p.n] }[/math], insert the key at position [math]\displaystyle{ p.n + 1 }[/math]; otherwise:
      1. Let [math]\displaystyle{ k \in \{1,\dots,p.n\} }[/math] be the minimal position such that [math]\displaystyle{ K \lt p.keys[k] }[/math].
      2. For [math]\displaystyle{ j = p.n,\dots,k }[/math] (in that order), set [math]\displaystyle{ p.keys[j+1] := p.keys[k] }[/math].
      3. Insert the new key at position [math]\displaystyle{ k }[/math]
    2. Set [math]\displaystyle{ p.n := p.n + 1 }[/math].
    3. Terminate the algorithm.
  2. If [math]\displaystyle{ K \lt p.keys[1] }[/math] set [math]\displaystyle{ k := 0 }[/math]; otherwise, let [math]\displaystyle{ k \in \{1,\dots,p.n\} }[/math] be maximal such that [math]\displaystyle{ K \geq p.keys[k] }[/math].
  3. If [math]\displaystyle{ p.children[k].n = 2M - 1 }[/math]: call B-tree: split node into two siblings with pointer [math]\displaystyle{ p }[/math] and index [math]\displaystyle{ k }[/math].
  4. If [math]\displaystyle{ K \lt p.keys[1] }[/math] set [math]\displaystyle{ p := p.children[0] }[/math]; otherwise, set [math]\displaystyle{ p := p.children[k] }[/math] where [math]\displaystyle{ k \in \{1,\dots,p.n\} }[/math] is maximal such that [math]\displaystyle{ K \geq p.keys[k] }[/math].

Correctness:

Analogously to the induction basis, we have to verify that the implementation invariants of the B-tree data structure and the above-mentioned loop invariants of the insert procedure are maintained through an iteration of the main loop.

Again, the implementation invariants #1, #2 and #8 are trivially maintained. If the node is not full, the other implementation invariants and the above-mentioned loop invariants of the insert procedure are maintained as well. So consider the case that the node is full.

Analogously to the induction basis, the implementation invariants #3-#7 are the guaranteed by the split operation.

Finally, the loop invariants of the insert procedure result from Step 4

Pseudocode

B-TREE-INSERT(T,k)

B-TREE-INSERT(T,k)
 1 r = T.root
 2 if r.n == 2t - 1
 3      s = ALLOCATE-NODE()
 4      T.root = s
 5      s.leaf = FALSE
 6      s.n = 0
 7      s.c1 = r
 8      B-TREE-SPLIT-CHILD(s, 1)
 9      B-TREE-INSERT-NONFULL(s, k)
10 else B-TREE-INSERT-NONFULL(r, k)   

B-TREE-INSERT-NONFULL(x,k)

B-TREE-INSERT-NONFULL(x,k)
 1 i = x.n
 2 if x.leaf
 3      while i ≥ 1 and k < x.keyi
 4              x.keyi+1= x.keyi
 5              i = i - 1
 6      x.keyi+111 = k
 7      x.n = x.n + 1
 8      DISK-WRITE(x)
 9 else while i ≥ 1 and k < x.keyi
10             i = i - 1
11        i = i + 1
12        DISK-READ(x.ci)
13        if x.ci.n == 2t - 1
14               B-TREE-SPLIT-CHILD(x, i)
15               if k > x.keyi
16                       i = i + 1
17         B-TREE-INSERT-NONFULL(x.ci, k)   

Complexity

Statement: The asymptotic complexity is in [math]\displaystyle{ \Theta(T\cdot\log n) }[/math] in the worst case, where [math]\displaystyle{ T }[/math] is the complexity of the comparison.

Proof: Follows immediately from the facts that

  1. [math]\displaystyle{ M }[/math] is assumed to be fixed, and
  2. the height of a B-tree with [math]\displaystyle{ n }[/math] nodes is in [math]\displaystyle{ \Theta(\log n) }[/math] (cf. the remark clause of the B-tree page).

Further Information

In the above specification, each full node is split into two when visited. This is done for precautionary reasons only. With a slight modification, this can be avoided: when the leaf is reached, go back along the traversed path and split full nodes until the first non-full node is reached. Evidently, this reduces the number of splits. However, chances are high that these nodes have to be split sooner or later, so the true benefit is not clear. The version presented here has primarily been selected because its loop invariant is simpler and more intuitive.