B-tree: insert

From Algowiki
Jump to: navigation, search

General Information

Algorithmic problem: Sorted sequence: insert

Type of algorithm: loop

Auxiliary data: Pointers [math]p[/math] and [math]p'[/math] of type "pointer to a B-tree node of key type [math]\mathcal{K}[/math]."

Abstract View

Invariant: After [math]i\geq 0[/math] iterations:

  1. Pointer [math]p[/math] points to some node of the B-tree on height level [math]i[/math] such that [math]n \lt 2M -1[/math] for this node.
  2. The key to be inserted is in the range of that node

Variant: [math]i[/math] is increased by [math]1[/math].

Break condition: [math]p[/math] points to a leaf.

Remark: For example, the height of the subtree rooted at the node to which [math]p[/math] points may be chosen as the induction parameter. For conciseness, the induction parameter is omitted in the following.

Induction Basis

Abstract view:

  1. If the tree is empty, create a new root with [math]K[/math] the only key and terminate the algorithm.
  2. Otherwise, p points to the root.
  3. If the root is full:
    1. A new root is created, and the old root is dhe unique child of the new root.
    2. The old root is split into two siblings.
    3. p points to the new root.

Implementation:

  1. If the tree is empty:
    1. Create a new, empty node and let [math]p[/math] point to it.
    2. Set [math]p.keys[1] := K[/math].
    3. Terminate the algorithm.
  2. Let [math]p[/math] point to the root of the tree.
  3. If [math]p.n = 2M - 1[/math]:
    1. Create a new, empty node and let [math]p'[/math] point to it.
    2. Cat [math]p'[/math].children[0] [math]:= p[/math].
    3. Call B-tree: split node into two siblings with pointer [math]p'[/math] and index [math]0[/math].
    4. Set [math] p := p'[/math]

Proof: Basically, we have to verify that the implementation invariants of the B-tree data structure and the above-mentioned loop invariants of the insert procedure are fulfilled after the preprocessing.

If the tree is empty or, otherwise, the root is not full, all invariants are trivially fulfilled by Step 1 and Step 2, respectively. So consider the case that the tree is not empty and the root is full, that is, Step 3 is executed.

Implementation invariants #1, #2, and #8 are trivially. Maintenance of the implementation invariants #3-#7 is guaranteed by the split procedure. Finally, the loop invariants result from Step 3.4.

Induction Step

Abstract view:

  1. If the current node N is a leaf, insert the new key in N and terminate the algorithm.
  2. Otherwise, let N' be the child of N such that the key to be inserted is in the range of that child (ties arbitrarily broken).
  3. If N' is full, N' is split into two siblings.
  4. The new current node is the child of N such that the key to be inserted is in the range of that child (one of these two siblings, in fact; ties arbitrarily broken).

Implementation:

  1. If [math]p.children[0] = void[/math] (that is, [math]N[/math] is a leaf):
    1. If [math] K \geq p.keys[p.n][/math], insert the key at position [math]p.n + 1[/math]; otherwise:
      1. Let [math]k \in \{1,\dots,p.n\}[/math] be the minimal position such that [math]K \lt p.keys[k][/math].
      2. For [math]j = p.n,\dots,k[/math] (in that order), set [math]p.keys[j+1] := p.keys[k][/math].
      3. Insert the new key at position [math]k[/math]
    2. Set [math]p.n := p.n + 1[/math].
    3. Terminate the algorithm.
  2. If [math]K \lt p.keys[1][/math] set [math]k := 0[/math]; otherwise, let [math]k \in \{1,\dots,p.n\}[/math] be maximal such that [math]K \geq p.keys[k][/math].
  3. If [math]p.children[k].n = 2M - 1[/math]: call B-tree: split node into two siblings with pointer [math]p[/math] and index [math]k[/math].
  4. If [math]K \lt p.keys[1][/math] set [math]p := p.children[0][/math]; otherwise, set [math]p := p.children[k][/math] where [math]k \in \{1,\dots,p.n\}[/math] is maximal such that [math]K \geq p.keys[k][/math].

Correctness:

Analogously to the induction basis, we have to verify that the implementation invariants of the B-tree data structure and the above-mentioned loop invariants of the insert procedure are maintained through an iteration of the main loop.

Again, the implementation invariants #1, #2 and #8 are trivially maintained. If the node is not full, the other implementation invariants and the above-mentioned loop invariants of the insert procedure are maintained as well. So consider the case that the node is full.

Analogously to the induction basis, the implementation invariants #3-#7 are the guaranteed by the split operation.

Finally, the loop invariants of the insert procedure result from Step 4

Pseudocode

B-TREE-INSERT(T,k)

B-TREE-INSERT(T,k)
 1 r = T.root
 2 if r.n == 2t - 1
 3      s = ALLOCATE-NODE()
 4      T.root = s
 5      s.leaf = FALSE
 6      s.n = 0
 7      s.c1 = r
 8      B-TREE-SPLIT-CHILD(s, 1)
 9      B-TREE-INSERT-NONFULL(s, k)
10 else B-TREE-INSERT-NONFULL(r, k)   

B-TREE-INSERT-NONFULL(x,k)

B-TREE-INSERT-NONFULL(x,k)
 1 i = x.n
 2 if x.leaf
 3      while i ≥ 1 and k < x.keyi
 4              x.keyi+1= x.keyi
 5              i = i - 1
 6      x.keyi+111 = k
 7      x.n = x.n + 1
 8      DISK-WRITE(x)
 9 else while i ≥ 1 and k < x.keyi
10             i = i - 1
11        i = i + 1
12        DISK-READ(x.ci)
13        if x.ci.n == 2t - 1
14               B-TREE-SPLIT-CHILD(x, i)
15               if k > x.keyi
16                       i = i + 1
17         B-TREE-INSERT-NONFULL(x.ci, k)   

Complexity

Statement: The asymptotic complexity is in [math]\Theta(T\cdot\log n)[/math] in the worst case, where [math]T[/math] is the complexity of the comparison.

Proof: Follows immediately from the facts that

  1. [math]M[/math] is assumed to be fixed, and
  2. the height of a B-tree with [math]n[/math] nodes is in [math]\Theta(\log n)[/math] (cf. the remark clause of the B-tree page).

Further Information

In the above specification, each full node is split into two when visited. This is done for precautionary reasons only. With a slight modification, this can be avoided: when the leaf is reached, go back along the traversed path and split full nodes until the first non-full node is reached. Evidently, this reduces the number of splits. However, chances are high that these nodes have to be split sooner or later, so the true benefit is not clear. The version presented here has primarily been selected because its loop invariant is simpler and more intuitive.