B-tree: insert
General Information
Algorithmic problem: Sorted sequence: insert
Type of algorithm: loop
Auxiliary data: Pointers p and p' of type "pointer to a B-tree node".
Abstract View
Invariant: Before and after each iteration:
- p points to some node N of the B-tree such that [math]\displaystyle{ n \lt 2M -1 }[/math].
- The key to be inserted is in the range of N.
Variant: p is redirected from the current node N to some child of N.
Break condition: p points to a leaf.
Induction Basis
Induction Step
Complexity
Statement: The asymptotic complexity is in [math]\displaystyle{ \Theta(\log n) }[/math] in the worst case.
Proof: Follows immediately from the facts that
- [math]\displaystyle{ M }[/math] is assumed to be fixed, and
- the height of a B-tree with [math]\displaystyle{ n }[/math] nodes is in [math]\displaystyle{ \Theta(\log n) }[/math] (cf. the remark clause of the B-tree page).
Pseudocode
B-TREE-INSERT(T,k)
B-TREE-INSERT(T,k)
1 r = T.root
2 if r.n == 2t - 1
3 s = ALLOCATE-NODE()
4 T.root = s
5 s.leaf = FALSE
6 s.n = 0
7 s.c1 = r
8 B-TREE-SPLIT-CHILD(s, 1)
9 B-TREE-INSERT-NONFULL(s, k)
10 else B-TREE-INSERT-NONFULL(r, k)
B-TREE-INSERT-NONFULL(x,k)
B-TREE-INSERT-NONFULL(x,k)
1 i = x.n
2 if x.leaf
3 while i ≥ 1 and k < x.keyi
4 x.keyi+1= x.keyi
5 i = i - 1
6 x.keyi+111 = k
7 x.n = x.n + 1
8 DISK-WRITE(x)
9 else while i ≥ 1 and k < x.keyi
10 i = i - 1
11 i = i + 1
12 DISK-READ(x.ci)
13 if x.ci.n == 2t - 1
14 B-TREE-SPLIT-CHILD(x, i)
15 if k > x.keyi
16 i = i + 1
17 B-TREE-INSERT-NONFULL(x.ci, k)
Further Information
In the above specification, each full node is split into two when visited. This is done for precautionary reasons only. With a slight modification, this can be avoided: when the leaf is reached, go back along the traversed path and split full nodes until the first non-full node is reached. Evidently, this reduces the number of splits. However, chances are high that these nodes have to be split sooner or later, so the true benefit is not clear. The version presented here has primarily been selected because its loop invariant is simpler and more intuitive.