I will start by saying this implementation was based on the approach taken with B-Trees in the book:
Introduction to Algorithms, 3rd Edition - Cormen, 2011
The implementation stores all the b-tree nodes in a binary file, and constantly writes and reads from the said file in order to add new ones and to update the nodes information.
The file created is organized and treated like an array, the tree keeps track of the numbers of nodes it has and where each one is, so, when a new key/item is added a new position in the file/array is given/added.
To access directly to the node in the file fseek() is used to "jump" directly to the node location, witch gives the file "the array like properties".
Note: the print method prints the tree starting from the root and to do so it requieres a queue, that i will also provide for testing purposes.
This implementation lacks the delete method and other small but usefull methods, like get min/max item, etc... because before i tackle those i want to know if the current ones are good.
Methods Implementaded:
- Crete B-Tree
- Insertion
- Search
- Print tree
The code is highly commented, so will abstain from explainning how the algorithm of the functions works, but if requested, i will update the post.
From the tests i ran the implementation worked as expected, but i must say i didnt do an extensive testing.
I am sorry if the question got "too big", the essential code to be reviewed is the btree.c one
btree.c
#include "btree.h"
//#############################################################################
// HELPER METHODS
//#############################################################################
btNode disk_read(int disk, int order, FILE *fp){
btNode read_node;
// calculate the no of bytes a node has
int size_of_btNode = (sizeof(int) * 3) + (sizeof(element) * order-1) + (sizeof(int) * order);
int offset = size_of_btNode * disk; // calculate the position of the node in the file
fseek(fp, offset, SEEK_SET); // set the file pointer there
fread(&read_node.numKeys, sizeof(read_node.numKeys), 1, fp); // read the information from the file
fread(&read_node.isLeaf, sizeof(read_node.isLeaf), 1, fp);
fread(&read_node.pos_in_disk, sizeof(read_node.pos_in_disk), 1, fp);
read_node.keys = malloc(sizeof(element) * order-1);
fread(read_node.keys, sizeof(element), order-1, fp);
read_node.kids = malloc(sizeof(int) * order);
fread(read_node.kids, sizeof(int), order, fp);
return read_node;
}
void disk_write(btNode node, int order, FILE *fp){
// calculate the no of bytes a node has
int size_of_btNode = (sizeof(int) * 3) + (sizeof(element) * order-1) + (sizeof(int) * order);
int offset = size_of_btNode * node.pos_in_disk; // calculate the position of the node in the file
fseek(fp, offset, SEEK_SET); // set the file pointer there
fwrite(&node.numKeys, sizeof(node.numKeys), 1, fp); // write the information to the file
fwrite(&node.isLeaf, sizeof(node.isLeaf), 1, fp);
fwrite(&node.pos_in_disk, sizeof(node.pos_in_disk), 1, fp);
fwrite(node.keys, sizeof(element), order-1, fp);
fwrite(node.kids, sizeof(int), order, fp);
}
btNode new_node(int order, int is_leaf) {
btNode n;
n.numKeys = 0; // set no of keys to 0
n.isLeaf = is_leaf;
n.keys = malloc((order-1) * sizeof(element)); // allocate space for the array of keys
for(int i=0; i < order-1; i++){ // initialize the keys in the array
n.keys[i].key = -1;
n.keys[i].data = -1;
}
n.kids = malloc((order) * sizeof(int)); // allocate space for the array of keys
for(int i=0; i < order; i++){ // initialize the kids in the array
n.kids[i] = -1;
}
return n;
}
void bt_split_child(btNode x, int pos, bTree *tree, FILE *fp, int split_root){
btNode y = disk_read(x.kids[pos], tree->order, fp); // node to split (pos-th child)
if(split_root == 1){ // special case when splitting the root of the tree
tree->node_count++; // increment no of total nodes
y.pos_in_disk = tree->node_count; // attribute a new location in the file
}
btNode z = new_node(tree->order, y.isLeaf); // new (pos+1)-th child
tree->node_count++; // increment no of total nodes
z.pos_in_disk = tree->node_count; // attribute a new location in the file
int t = (tree->order / 2); // calculate minimum ramification degree
if(tree->order % 2 == 0){
t--;
}
z.numKeys = t; // no of keys the new node will receive
if(tree->order % 2 != 0){
t--;
}
for(int j = 0; j <= t && (j+t+1)<= y.numKeys-1; j++){ // move elements to new node
z.keys[j] = y.keys[j+t+1];
y.keys[j+t+1].key = -1; // erase the element from the previous node
y.keys[j+t+1].data = -1;
}
if(y.isLeaf == 0){ // if y is not a leaf
for(int j = 0; j <= t; j++){ // move children as well
z.kids[j] = y.kids[j+t+1];
y.kids[j+t+1] = -1; // erase the element from the previous node
}
}
y.numKeys = t; // update the no of keys the node has after split
if(split_root == 1){ // special case when splitting the root of the tree
x.kids[pos] = y.pos_in_disk;
x.kids[pos+1] = z.pos_in_disk;
}else{
int j, i, r;
for(j = 0; j < tree->order;j++){ // make room for x`s new child
if(x.kids[j] == y.pos_in_disk){
for(i = j+1; i < tree->order;i+=2){
if(i+1 < tree->order)
x.kids[i+1] = x.kids[i];
}
r = j;
}
}
x.kids[r+1] = z.pos_in_disk;
}
for(int j = pos; j < tree->order-2; j+=2){ // make room for the element
x.keys[j+1] = x.keys[j]; // that will be promoted
}
x.keys[pos] = y.keys[t]; // promote element
y.keys[t].key = -1; // erase the updated element from the previous node
y.keys[t].data = -1;
x.numKeys++; // increment the no of keys the root node has
disk_write(x, tree->order, fp); // update the information in the file
disk_write(y, tree->order, fp); // update the information in the file
disk_write(z, tree->order, fp); // update the information in the file
}
btNode bt_insert_nonfull(btNode node, element key, bTree *tree, FILE *fp){
int pos = node.numKeys;
if(node.isLeaf == 1){ // if in a leaf insert the new element
int i = pos-1;
while(i >= 0 && key.key < node.keys[i].key){ // find the correct position
node.keys[i+1] = node.keys[i];
node.keys[i].key = -1;
node.keys[i].data = -1;
i--;
}
if(i+1 != pos){
node.keys[i+1] = key;
}else{
node.keys[pos] = key;
}
node.numKeys++;
disk_write(node, tree->order, fp);
return node;
}else{ // otherwise, descend to the appropriate child
int n_pd = node.pos_in_disk;
int i = pos-1;
while (key.key < node.keys[i].key && i >= 0) { // get the correct child of the node
i--;
pos--;
}
btNode x = disk_read(node.kids[pos], tree->order, fp); // get the child node
if(x.numKeys == tree->order-1){ // is this child full?
bt_split_child(node, pos, tree, fp, 0); // split the child
btNode x1 = disk_read(n_pd, tree->order, fp); // get the updated node
if(key.key > x1.keys[pos].key) // adjust the position if needed
pos++;
}
btNode x1 = disk_read(n_pd, tree->order, fp); // get the updated node
btNode x2 = disk_read(x1.kids[pos], tree->order, fp); // get the child node
bt_insert_nonfull(x2, key, tree, fp);
}
}
//#############################################################################
// METHODS
//#############################################################################
bTree *btCreate(int order){
bTree *tree; // creates the "header" of the B-Tree
if((tree = malloc(sizeof(bTree))) == NULL) // allocate space for the new tree
return NULL;
btNode root = new_node(order, true); // creates the root of the new B-Tree
root.pos_in_disk = 0; // give the root a position in the file
tree->order = order; // give the tree it`s order
tree->root = root; // give the tree it`s root
tree->node_count = 0; // set the tree`s node count to 0
return tree;
}
void btInsert(bTree *tree, element key, FILE *fp){
if(tree->node_count > 0)
tree->root = disk_read(0, tree->order, fp); // update the root of the tree
btNode root = tree->root;
if(root.numKeys == tree->order-1){ // if the root is full
btNode s = new_node(tree->order, 0); // create a new root node
s.kids[0] = root.pos_in_disk; // root becomes the first child
bt_split_child(s, 0, tree, fp, 1); // split the root
s = disk_read(0, tree->order, fp); // get the new root
tree->root = s; // make it the new root after the split
bt_insert_nonfull(s, key, tree, fp); // now insert the new element
}else{
tree->root = bt_insert_nonfull(root, key, tree, fp); // insert the new element in a non-full node
}
}
int btSearch(btNode node, int order, element key, FILE *fp){
int pos = 0;
while(pos < node.numKeys && key.key > node.keys[pos].key){ // find the correct position
pos++;
}
if(pos <= node.numKeys && key.key == node.keys[pos].key){ // is the item one of the key`s of this node?
return node.pos_in_disk;
}else if(node.isLeaf == 1){ // if a leaf was hit and no item was found
return -1;
}else{
btNode x = disk_read(node.kids[pos], order, fp); // go deeper in the tree
return btSearch(x, order, key, fp);
}
}
void print_node_keys(btNode node, int order){
printf("[");
for(int i = 0; i < order-1; i++){
if(node.keys[i].key != -1)
printf("key: %d, ", node.keys[i].key);
}
printf("] ");
}
void btPrintTree(bTree *tree, queue *q,FILE *fp){
btNode end = { .numKeys = -1}; // marker to know when a level of the tree ends
insert(q, tree->root); // insert the root in the queue
int item_count= 1; // real item/node counter
while(!isEmpty(q)){
btNode current = removeData(q); // remove the first item in the queue and return that node
if(current.numKeys == -1){ // was a marker found?
printf("\n");
insert(q, end);
if(item_count == 0) // to avoid and endless loop of markers
break; // when the tree is already printed
}else{
item_count--;
print_node_keys(current, tree->order);
if(current.pos_in_disk == 0) // special case for the root
insert(q, end);
for(int i = 0; i < tree->order; i++){ // insert all the kids os the next node in the queue
if(current.kids[i] != -1){
btNode x = disk_read(current.kids[i], tree->order, fp); // get the kid
insert(q, x);
item_count++;
}
}
}
}
}
btree.h
#ifndef BTREE_H
# define BTREE_H
#include <stdio.h>
#include <malloc.h>
#include "queue.h"
//#############################################################################
// STRUCTS
//#############################################################################
typedef struct element{
int key; // the key of the element
int data; // that data that each element contains
}element;
typedef struct btNode{
int numKeys; // no of keys the node has
int isLeaf; // is this a leaf node? 1 = true, 0 = false
int pos_in_disk; // position of the node in the file
element *keys; // holds the keys of the node
int *kids; // holds the children of the node
}btNode;
typedef struct bTree {
int order; // order of the B-Tree
btNode root; // root of the B-Tree
int node_count; // total no of nodes the tree has
} bTree;
typedef struct queue queue;
//#############################################################################
// METHODS
//#############################################################################
// create a new empty tree
bTree *btCreate(int order);
// return nonzero if key is present in tree
int btSearch(btNode node, int order, element key, FILE *fp);
// insert a new element into a tree
void btInsert(bTree *tree, element key, FILE *fp);
// print all keys of the tree from the root
void btPrintTree(bTree *tree, queue *q,FILE *fp);
// read and returns a node from the file
btNode disk_read(int disk, int order, FILE *fp);
#endif
queue.c
#include "queue.h"
queue *createQueue(int size) {
queue *q;
if((q = malloc(sizeof(queue))) == NULL)
return NULL;
if((q->list = malloc(sizeof(btNode) * size)) == NULL)
return NULL;
q->size = size;
q->front = 0;
q->rear = -1;
q->itemCount = 0;
return q;
}
btNode peek(queue *q) {
return q->list[q->front];
}
bool isEmpty(queue *q) {
return q->itemCount == 0;
}
bool isFull(queue *q) {
return q->itemCount == q->size;
}
int size(queue *q) {
return q->itemCount;
}
void insert(queue *q ,btNode data) {
if(!isFull(q)) {
if(q->rear == q->size-1) {
q->rear = -1;
}
q->list[++q->rear] = data;
q->itemCount++;
}
}
btNode removeData(queue *q) {
btNode data = q->list[q->front++];
if(q->front == q->size) {
q->front = 0;
}
q->itemCount--;
return data;
}
queue.h
#ifndef QUEUE_H
# define QUEUE_H
#include "btree.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
//#############################################################################
// STRUCTS USED
//#############################################################################
typedef struct element element;
typedef struct btNode btNode;
typedef struct bTree bTree;
typedef struct queue{
int size;
int front;
int rear;
int itemCount;
btNode *list;
}queue;
//#############################################################################
// METHODS
//#############################################################################
queue *createQueue(int size);
btNode peek(queue *q);
bool isEmpty(queue *q);
bool isFull(queue *q);
int size(queue *q);
void insert(queue *q ,btNode data);
btNode removeData(queue *q);
#endif
Small testing program:
test.c
#include "btree.h"
#include "queue.h"
int main(){
element n1 = {.key = 20};
element n2 = {.key = 30};
element n3 = {.key = 10};
element n4 = {.key = 40};
element n5 = {.key = 15};
element n6 = {.key = 17};
element n7 = {.key = 18};
element n8 = {.key = 50};
element n9 = {.key = 60};
element n10 = {.key = 70};
FILE *fp;
fp = fopen("file.bin", "wb+");
bTree *tree = btCreate(4);
btInsert(tree, n2, fp);
btInsert(tree, n1, fp);
btInsert(tree, n3, fp);
btInsert(tree, n4, fp);
btInsert(tree, n5, fp);
btInsert(tree, n6, fp);
btInsert(tree, n7, fp);
btInsert(tree, n8, fp);
btInsert(tree, n9, fp);
btInsert(tree, n10, fp);
queue *q = createQueue(15);
btPrintTree(tree, q, fp);
int pos = btSearch(tree->root, tree->order, n10, fp);
if(pos != -1) {
btNode x = disk_read(pos, tree->order, fp);
printf("node has: %d keys", x.numKeys);
}else
printf("item doesnt exist!");
return 0;
}
1 Answer 1
Reading and Writing from the disk
You have the same code in disk_read
and disk_write
to calculate size_of_btNode
. This can be moved into a helper function to avoid the code duplication. In this calculation you use sizeof(int) * 3
, while for the actual read or write you're using sizeof(read_node.numKeys)
etc. You should use the sizeof the 3 variables you're reading rather than assuming they are always an int
.
Similarly, the calculation of offset
is repeated.
These two functions are very similar. It is possible to merge them into one function which would avoid the duplication of the I/O code.
There is no error checking or handling in the disk operations.
Node creation
new_node
leaves the pos_in_disk
field uninitialized. It also makes assumptions about the types it is allocating memory for. Using sizeof(*n.keys)
rather than sizeof(element)
would get rid of that. (Same for kids
.)
Node insertion
In bt_insert_nonfull
the condition
if(i+1 != pos){
node.keys[i+1] = key;
}else{
node.keys[pos] = key;
}
can be reduced to one line, since the code in both branches does the same thing (since in the else
branch, pos
will equal i+1
).
numKeys
meaning
The way it is used in the code, sometimes it seems that numKeys
is the number of keys, other times that it is the index of the last valid key (which is one less than the number of keys). Comparisons with tree->order-1
or pos<=node.numKeys
point to the latter, while other code for creating new nodes or inserting looks like the former. These need to be checked and a consistent usage applied throughout.
Miscellaneous
btCreate
and createQueue
have the same issue with sizeof
using a specific type as other uses of sizeof
.
There are numerous btNode
structures passed by value to functions. In some cases these can be replaced with passing by const btNode *
to avoid the copy.
-
\$\begingroup\$ Thanks, i will take your notes and improve the code \$\endgroup\$MiguelD– MiguelD2018年07月05日 12:18:23 +00:00Commented Jul 5, 2018 at 12:18