B-tree: insert

From Algowiki
Jump to navigation Jump to search


General Information

Algorithmic problem: Sorted sequence: insert

Type of algorithm: loop

Auxiliary data: Pointers p and p' of type "pointer to a B-tree node".

Abstract View

Invariant: Before and after each iteration:

  1. p points to some node N of the B-tree such that [math]\displaystyle{ n \lt 2M -1 }[/math].
  2. The key to be inserted is in the range of N.

Variant: p is redirected from the current node N to some child of N.

Break condition: p points to a leaf.

Induction Basis

Abstract view:

  1. If the tree is empty, create a new root with K the only key and terminate the algorithm.
  2. Otherwise, p points to the root.
  3. If the root is full:
    1. A new root is created, and the old root is dhe unique child of the new root.
    2. The old root is split into two siblings.
    3. p points to the new root.

Implementation:

  1. If the tree is empty:
    1. Create a new, empty node and let [math]\displaystyle{ p }[/math] point to it.
    2. Set [math]\displaystyle{ p.keys[1] := K }[/math].
    3. Terminate the algorithm.
  2. Let [math]\displaystyle{ p }[/math] point to the root of the tree.
  3. If [math]\displaystyle{ p.n = 2M = 1 }[/math]:
    1. Create a new, empty node and let [math]\displaystyle{ p' }[/math] point to it.
    2. Cat [math]\displaystyle{ p'.children[0] := p }[/math].
    3. Call B-tree: split node into two siblings with pointer [math]\displaystyle{ p' }[/math] and index [math]\displaystyle{ 0 }[/math].
    4. Set [math]\displaystyle{ p := p' }[/math]

Proof:

Basically, we have to verify that the implementation invariants of the B-tree data structure and the above-mentioned loop invariants of the insert procedure are fulfilled after the preprocessing.

If the tree is empty or, otherwise, the root is not full, all invariants are trivially fulfilled by Step 1 and Step 2, respectively. So consider the case that the tree is not empty and the root is full, that is, Step 3 is executed.

Implementation invariants #1, #2, and #8 are trivially. Maintenance of the implementation invariants #3-#7 is guaranteed by the split procedure. Finally, the loop invariants result from Step 3.4.

Induction Step

Abstract view:

  1. If the current node N is a leaf, insert the new key in N and terminate the algorithm.
  2. Otherwise, let N' be the child of N such that the key to be inserted is in the range of that child (ties arbitrarily broken).
  3. If N' is full, N' is split into two siblings.
  4. The new current node is the child of N such that the key to be inserted is in the range of that child (one of these two siblings, in fact; ties arbitrarily broken).

Implementation:

  1. If [math]\displaystyle{ p.children[0] = void }[/math] (that is, [math]\displaystyle{ N }[/math] is a leaf):
    1. If [math]\displaystyle{ K \geq p.keys[p.n] }[/math], insert the key at position [math]\displaystyle{ p.n + 1 }[/math]; otherwise:
      1. Let [math]\displaystyle{ k \in \{1,\dots,p.n\} }[/math] be the minimal position such that [math]\displaystyle{ K \lt p.keys[k] }[/math].
      2. For [math]\displaystyle{ j = p.n,\dots,k }[/math] (in that order), set [math]\displaystyle{ p.keys[j+1] := p.keys[k] }[/math].
      3. Insert the new key at position [math]\displaystyle{ k }[/math]
    2. Set [math]\displaystyle{ p.n := p.n + 1 }[/math].
    3. Terminate the algorithm.
  2. If [math]\displaystyle{ K \lt p.keys[1] }[/math] set [math]\displaystyle{ k := 0 }[/math]; otherwise, let [math]\displaystyle{ k \in \{1,\dots,p.n\} }[/math] be maximal such that [math]\displaystyle{ K \geq p.keys[k] }[/math].
  3. If [math]\displaystyle{ p.children[k].n = 2M - 1 }[/math]: call B-tree: split node into two siblings with pointer [math]\displaystyle{ p }[/math] and index [math]\displaystyle{ k }[/math].
  4. If [math]\displaystyle{ K \lt p.keys[1] }[/math] set [math]\displaystyle{ p := p.children[0] }[/math]; otherwise, set [math]\displaystyle{ p := p.children[k] }[/math] where [math]\displaystyle{ k \in \{1,\dots,p.n\} }[/math] is maximal such that [math]\displaystyle{ K \geq p.keys[k] }[/math].

Complexity

Statement: The asymptotic complexity is in [math]\displaystyle{ \Theta(\log n) }[/math] in the worst case.

Proof: Follows immediately from the facts that

  1. [math]\displaystyle{ M }[/math] is assumed to be fixed, and
  2. the height of a B-tree with [math]\displaystyle{ n }[/math] nodes is in [math]\displaystyle{ \Theta(\log n) }[/math] (cf. the remark clause of the B-tree page).

Pseudocode

B-TREE-INSERT(T,k)

B-TREE-INSERT(T,k)
 1 r = T.root
 2 if r.n == 2t - 1
 3      s = ALLOCATE-NODE()
 4      T.root = s
 5      s.leaf = FALSE
 6      s.n = 0
 7      s.c1 = r
 8      B-TREE-SPLIT-CHILD(s, 1)
 9      B-TREE-INSERT-NONFULL(s, k)
10 else B-TREE-INSERT-NONFULL(r, k)   

B-TREE-INSERT-NONFULL(x,k)

B-TREE-INSERT-NONFULL(x,k)
 1 i = x.n
 2 if x.leaf
 3      while i ≥ 1 and k < x.keyi
 4              x.keyi+1= x.keyi
 5              i = i - 1
 6      x.keyi+111 = k
 7      x.n = x.n + 1
 8      DISK-WRITE(x)
 9 else while i ≥ 1 and k < x.keyi
10             i = i - 1
11        i = i + 1
12        DISK-READ(x.ci)
13        if x.ci.n == 2t - 1
14               B-TREE-SPLIT-CHILD(x, i)
15               if k > x.keyi
16                       i = i + 1
17         B-TREE-INSERT-NONFULL(x.ci, k)   

Further Information

In the above specification, each full node is split into two when visited. This is done for precautionary reasons only. With a slight modification, this can be avoided: when the leaf is reached, go back along the traversed path and split full nodes until the first non-full node is reached. Evidently, this reduces the number of splits. However, chances are high that these nodes have to be split sooner or later, so the true benefit is not clear. The version presented here has primarily been selected because its loop invariant is simpler and more intuitive.