B-tree: insert
General Information
Algorithmic problem: Sorted sequence: insert
Type of algorithm: loop
Auxiliary data: Pointers [math]\displaystyle{ p }[/math] and [math]\displaystyle{ p' }[/math] of type "pointer to a B-tree node of key type [math]\displaystyle{ \mathcal{K} }[/math]."
Abstract View
Invariant: After [math]\displaystyle{ i\geq 0 }[/math] iterations:
- Pointer [math]\displaystyle{ p }[/math] points to some node of the B-tree on height level [math]\displaystyle{ i }[/math] such that [math]\displaystyle{ n \lt 2M -1 }[/math] for this node.
- The key to be inserted is in the range of that node
Variant: [math]\displaystyle{ i }[/math] is increased by [math]\displaystyle{ 1 }[/math].
Break condition: [math]\displaystyle{ p }[/math] points to a leaf.
Induction Basis
Abstract view:
- If the tree is empty, create a new root with K the only key and terminate the algorithm.
- Otherwise, p points to the root.
- If the root is full:
- A new root is created, and the old root is dhe unique child of the new root.
- The old root is split into two siblings.
- p points to the new root.
Implementation:
- If the tree is empty:
- Create a new, empty node and let [math]\displaystyle{ p }[/math] point to it.
- Set [math]\displaystyle{ p.keys[1] := K }[/math].
- Terminate the algorithm.
- Let [math]\displaystyle{ p }[/math] point to the root of the tree.
- If [math]\displaystyle{ p.n = 2M = 1 }[/math]:
- Create a new, empty node and let [math]\displaystyle{ p' }[/math] point to it.
- Cat [math]\displaystyle{ p'.children[0] := p }[/math].
- Call B-tree: split node into two siblings with pointer [math]\displaystyle{ p' }[/math] and index [math]\displaystyle{ 0 }[/math].
- Set [math]\displaystyle{ p := p' }[/math]
Proof:
Basically, we have to verify that the implementation invariants of the B-tree data structure and the above-mentioned loop invariants of the insert procedure are fulfilled after the preprocessing.
If the tree is empty or, otherwise, the root is not full, all invariants are trivially fulfilled by Step 1 and Step 2, respectively. So consider the case that the tree is not empty and the root is full, that is, Step 3 is executed.
Implementation invariants #1, #2, and #8 are trivially. Maintenance of the implementation invariants #3-#7 is guaranteed by the split procedure. Finally, the loop invariants result from Step 3.4.
Induction Step
Abstract view:
- If the current node N is a leaf, insert the new key in N and terminate the algorithm.
- Otherwise, let N' be the child of N such that the key to be inserted is in the range of that child (ties arbitrarily broken).
- If N' is full, N' is split into two siblings.
- The new current node is the child of N such that the key to be inserted is in the range of that child (one of these two siblings, in fact; ties arbitrarily broken).
Implementation:
- If [math]\displaystyle{ p.children[0] = void }[/math] (that is, [math]\displaystyle{ N }[/math] is a leaf):
- If [math]\displaystyle{ K \geq p.keys[p.n] }[/math], insert the key at position [math]\displaystyle{ p.n + 1 }[/math]; otherwise:
- Let [math]\displaystyle{ k \in \{1,\dots,p.n\} }[/math] be the minimal position such that [math]\displaystyle{ K \lt p.keys[k] }[/math].
- For [math]\displaystyle{ j = p.n,\dots,k }[/math] (in that order), set [math]\displaystyle{ p.keys[j+1] := p.keys[k] }[/math].
- Insert the new key at position [math]\displaystyle{ k }[/math]
- Set [math]\displaystyle{ p.n := p.n + 1 }[/math].
- Terminate the algorithm.
- If [math]\displaystyle{ K \geq p.keys[p.n] }[/math], insert the key at position [math]\displaystyle{ p.n + 1 }[/math]; otherwise:
- If [math]\displaystyle{ K \lt p.keys[1] }[/math] set [math]\displaystyle{ k := 0 }[/math]; otherwise, let [math]\displaystyle{ k \in \{1,\dots,p.n\} }[/math] be maximal such that [math]\displaystyle{ K \geq p.keys[k] }[/math].
- If [math]\displaystyle{ p.children[k].n = 2M - 1 }[/math]: call B-tree: split node into two siblings with pointer [math]\displaystyle{ p }[/math] and index [math]\displaystyle{ k }[/math].
- If [math]\displaystyle{ K \lt p.keys[1] }[/math] set [math]\displaystyle{ p := p.children[0] }[/math]; otherwise, set [math]\displaystyle{ p := p.children[k] }[/math] where [math]\displaystyle{ k \in \{1,\dots,p.n\} }[/math] is maximal such that [math]\displaystyle{ K \geq p.keys[k] }[/math].
Correctness:
Analogously to the induction basis, we have to verify that the implementation invariants of the B-tree data structure and the above-mentioned loop invariants of the insert procedure are maintained through an iteration of the main loop.
Again, the implementation invariants #1, #2 and #8 are trivially maintained. If the node is not full, the other implementation invariants and the above-mentioned loop invariants of the insert procedure are maintained as well. So consider the case that the node is full.
Analogously to the induction basis, the implementation invariants #3-#7 are the guaranteed by the split operation.
Finally, the loop invariants of the insert procedure result from Step 4
Pseudocode
B-TREE-INSERT(T,k)
B-TREE-INSERT(T,k)
1 r = T.root
2 if r.n == 2t - 1
3 s = ALLOCATE-NODE()
4 T.root = s
5 s.leaf = FALSE
6 s.n = 0
7 s.c1 = r
8 B-TREE-SPLIT-CHILD(s, 1)
9 B-TREE-INSERT-NONFULL(s, k)
10 else B-TREE-INSERT-NONFULL(r, k)
B-TREE-INSERT-NONFULL(x,k)
B-TREE-INSERT-NONFULL(x,k)
1 i = x.n
2 if x.leaf
3 while i ≥ 1 and k < x.keyi
4 x.keyi+1= x.keyi
5 i = i - 1
6 x.keyi+111 = k
7 x.n = x.n + 1
8 DISK-WRITE(x)
9 else while i ≥ 1 and k < x.keyi
10 i = i - 1
11 i = i + 1
12 DISK-READ(x.ci)
13 if x.ci.n == 2t - 1
14 B-TREE-SPLIT-CHILD(x, i)
15 if k > x.keyi
16 i = i + 1
17 B-TREE-INSERT-NONFULL(x.ci, k)
Complexity
Statement: The asymptotic complexity is in [math]\displaystyle{ \Theta(T\cdot\log n) }[/math] in the worst case, where [math]\displaystyle{ T }[/math] is the complexity of the comparison.
Proof: Follows immediately from the facts that
- [math]\displaystyle{ M }[/math] is assumed to be fixed, and
- the height of a B-tree with [math]\displaystyle{ n }[/math] nodes is in [math]\displaystyle{ \Theta(\log n) }[/math] (cf. the remark clause of the B-tree page).
Further Information
In the above specification, each full node is split into two when visited. This is done for precautionary reasons only. With a slight modification, this can be avoided: when the leaf is reached, go back along the traversed path and split full nodes until the first non-full node is reached. Evidently, this reduces the number of splits. However, chances are high that these nodes have to be split sooner or later, so the true benefit is not clear. The version presented here has primarily been selected because its loop invariant is simpler and more intuitive.