4
\$\begingroup\$

I will start by saying this implementation was based on the approach taken with B-Trees in the book:

Introduction to Algorithms, 3rd Edition - Cormen, 2011

The implementation stores all the b-tree nodes in a binary file, and constantly writes and reads from the said file in order to add new ones and to update the nodes information.

The file created is organized and treated like an array, the tree keeps track of the numbers of nodes it has and where each one is, so, when a new key/item is added a new position in the file/array is given/added.

To access directly to the node in the file fseek() is used to "jump" directly to the node location, witch gives the file "the array like properties".

Note: the print method prints the tree starting from the root and to do so it requieres a queue, that i will also provide for testing purposes.

This implementation lacks the delete method and other small but usefull methods, like get min/max item, etc... because before i tackle those i want to know if the current ones are good.

Methods Implementaded:

  • Crete B-Tree
  • Insertion
  • Search
  • Print tree

The code is highly commented, so will abstain from explainning how the algorithm of the functions works, but if requested, i will update the post.

From the tests i ran the implementation worked as expected, but i must say i didnt do an extensive testing.

I am sorry if the question got "too big", the essential code to be reviewed is the btree.c one

btree.c

#include "btree.h"
//#############################################################################
// HELPER METHODS
//#############################################################################
btNode disk_read(int disk, int order, FILE *fp){
 btNode read_node;
 // calculate the no of bytes a node has
 int size_of_btNode = (sizeof(int) * 3) + (sizeof(element) * order-1) + (sizeof(int) * order);
 int offset = size_of_btNode * disk; // calculate the position of the node in the file
 fseek(fp, offset, SEEK_SET); // set the file pointer there
 fread(&read_node.numKeys, sizeof(read_node.numKeys), 1, fp); // read the information from the file
 fread(&read_node.isLeaf, sizeof(read_node.isLeaf), 1, fp);
 fread(&read_node.pos_in_disk, sizeof(read_node.pos_in_disk), 1, fp);
 read_node.keys = malloc(sizeof(element) * order-1);
 fread(read_node.keys, sizeof(element), order-1, fp);
 read_node.kids = malloc(sizeof(int) * order);
 fread(read_node.kids, sizeof(int), order, fp);
 return read_node;
}
void disk_write(btNode node, int order, FILE *fp){
 // calculate the no of bytes a node has
 int size_of_btNode = (sizeof(int) * 3) + (sizeof(element) * order-1) + (sizeof(int) * order);
 int offset = size_of_btNode * node.pos_in_disk; // calculate the position of the node in the file
 fseek(fp, offset, SEEK_SET); // set the file pointer there
 fwrite(&node.numKeys, sizeof(node.numKeys), 1, fp); // write the information to the file
 fwrite(&node.isLeaf, sizeof(node.isLeaf), 1, fp);
 fwrite(&node.pos_in_disk, sizeof(node.pos_in_disk), 1, fp);
 fwrite(node.keys, sizeof(element), order-1, fp);
 fwrite(node.kids, sizeof(int), order, fp);
}
btNode new_node(int order, int is_leaf) {
 btNode n;
 n.numKeys = 0; // set no of keys to 0
 n.isLeaf = is_leaf;
 n.keys = malloc((order-1) * sizeof(element)); // allocate space for the array of keys
 for(int i=0; i < order-1; i++){ // initialize the keys in the array
 n.keys[i].key = -1;
 n.keys[i].data = -1;
 }
 n.kids = malloc((order) * sizeof(int)); // allocate space for the array of keys
 for(int i=0; i < order; i++){ // initialize the kids in the array
 n.kids[i] = -1;
 }
 return n;
}
void bt_split_child(btNode x, int pos, bTree *tree, FILE *fp, int split_root){
 btNode y = disk_read(x.kids[pos], tree->order, fp); // node to split (pos-th child)
 if(split_root == 1){ // special case when splitting the root of the tree
 tree->node_count++; // increment no of total nodes
 y.pos_in_disk = tree->node_count; // attribute a new location in the file
 }
 btNode z = new_node(tree->order, y.isLeaf); // new (pos+1)-th child
 tree->node_count++; // increment no of total nodes
 z.pos_in_disk = tree->node_count; // attribute a new location in the file
 int t = (tree->order / 2); // calculate minimum ramification degree
 if(tree->order % 2 == 0){
 t--;
 }
 z.numKeys = t; // no of keys the new node will receive
 if(tree->order % 2 != 0){
 t--;
 }
 for(int j = 0; j <= t && (j+t+1)<= y.numKeys-1; j++){ // move elements to new node
 z.keys[j] = y.keys[j+t+1];
 y.keys[j+t+1].key = -1; // erase the element from the previous node
 y.keys[j+t+1].data = -1;
 }
 if(y.isLeaf == 0){ // if y is not a leaf
 for(int j = 0; j <= t; j++){ // move children as well
 z.kids[j] = y.kids[j+t+1];
 y.kids[j+t+1] = -1; // erase the element from the previous node
 }
 }
 y.numKeys = t; // update the no of keys the node has after split
 if(split_root == 1){ // special case when splitting the root of the tree
 x.kids[pos] = y.pos_in_disk;
 x.kids[pos+1] = z.pos_in_disk;
 }else{
 int j, i, r;
 for(j = 0; j < tree->order;j++){ // make room for x`s new child
 if(x.kids[j] == y.pos_in_disk){
 for(i = j+1; i < tree->order;i+=2){
 if(i+1 < tree->order)
 x.kids[i+1] = x.kids[i];
 }
 r = j;
 }
 }
 x.kids[r+1] = z.pos_in_disk;
 }
 for(int j = pos; j < tree->order-2; j+=2){ // make room for the element
 x.keys[j+1] = x.keys[j]; // that will be promoted
 }
 x.keys[pos] = y.keys[t]; // promote element
 y.keys[t].key = -1; // erase the updated element from the previous node
 y.keys[t].data = -1;
 x.numKeys++; // increment the no of keys the root node has
 disk_write(x, tree->order, fp); // update the information in the file
 disk_write(y, tree->order, fp); // update the information in the file
 disk_write(z, tree->order, fp); // update the information in the file
}
btNode bt_insert_nonfull(btNode node, element key, bTree *tree, FILE *fp){
 int pos = node.numKeys;
 if(node.isLeaf == 1){ // if in a leaf insert the new element
 int i = pos-1;
 while(i >= 0 && key.key < node.keys[i].key){ // find the correct position
 node.keys[i+1] = node.keys[i];
 node.keys[i].key = -1;
 node.keys[i].data = -1;
 i--;
 }
 if(i+1 != pos){
 node.keys[i+1] = key;
 }else{
 node.keys[pos] = key;
 }
 node.numKeys++;
 disk_write(node, tree->order, fp);
 return node;
 }else{ // otherwise, descend to the appropriate child
 int n_pd = node.pos_in_disk;
 int i = pos-1;
 while (key.key < node.keys[i].key && i >= 0) { // get the correct child of the node
 i--;
 pos--;
 }
 btNode x = disk_read(node.kids[pos], tree->order, fp); // get the child node
 if(x.numKeys == tree->order-1){ // is this child full?
 bt_split_child(node, pos, tree, fp, 0); // split the child
 btNode x1 = disk_read(n_pd, tree->order, fp); // get the updated node
 if(key.key > x1.keys[pos].key) // adjust the position if needed
 pos++;
 }
 btNode x1 = disk_read(n_pd, tree->order, fp); // get the updated node
 btNode x2 = disk_read(x1.kids[pos], tree->order, fp); // get the child node
 bt_insert_nonfull(x2, key, tree, fp);
 }
}
//#############################################################################
// METHODS
//#############################################################################
bTree *btCreate(int order){
 bTree *tree; // creates the "header" of the B-Tree
 if((tree = malloc(sizeof(bTree))) == NULL) // allocate space for the new tree
 return NULL;
 btNode root = new_node(order, true); // creates the root of the new B-Tree
 root.pos_in_disk = 0; // give the root a position in the file
 tree->order = order; // give the tree it`s order
 tree->root = root; // give the tree it`s root
 tree->node_count = 0; // set the tree`s node count to 0
 return tree;
}
void btInsert(bTree *tree, element key, FILE *fp){
 if(tree->node_count > 0)
 tree->root = disk_read(0, tree->order, fp); // update the root of the tree
 btNode root = tree->root;
 if(root.numKeys == tree->order-1){ // if the root is full
 btNode s = new_node(tree->order, 0); // create a new root node
 s.kids[0] = root.pos_in_disk; // root becomes the first child
 bt_split_child(s, 0, tree, fp, 1); // split the root
 s = disk_read(0, tree->order, fp); // get the new root
 tree->root = s; // make it the new root after the split
 bt_insert_nonfull(s, key, tree, fp); // now insert the new element
 }else{
 tree->root = bt_insert_nonfull(root, key, tree, fp); // insert the new element in a non-full node
 }
}
int btSearch(btNode node, int order, element key, FILE *fp){
 int pos = 0;
 while(pos < node.numKeys && key.key > node.keys[pos].key){ // find the correct position
 pos++;
 }
 if(pos <= node.numKeys && key.key == node.keys[pos].key){ // is the item one of the key`s of this node?
 return node.pos_in_disk;
 }else if(node.isLeaf == 1){ // if a leaf was hit and no item was found
 return -1;
 }else{
 btNode x = disk_read(node.kids[pos], order, fp); // go deeper in the tree
 return btSearch(x, order, key, fp);
 }
}
void print_node_keys(btNode node, int order){
 printf("[");
 for(int i = 0; i < order-1; i++){
 if(node.keys[i].key != -1)
 printf("key: %d, ", node.keys[i].key);
 }
 printf("] ");
}
void btPrintTree(bTree *tree, queue *q,FILE *fp){
 btNode end = { .numKeys = -1}; // marker to know when a level of the tree ends
 insert(q, tree->root); // insert the root in the queue
 int item_count= 1; // real item/node counter
 while(!isEmpty(q)){
 btNode current = removeData(q); // remove the first item in the queue and return that node
 if(current.numKeys == -1){ // was a marker found?
 printf("\n");
 insert(q, end);
 if(item_count == 0) // to avoid and endless loop of markers
 break; // when the tree is already printed
 }else{
 item_count--;
 print_node_keys(current, tree->order);
 if(current.pos_in_disk == 0) // special case for the root
 insert(q, end);
 for(int i = 0; i < tree->order; i++){ // insert all the kids os the next node in the queue
 if(current.kids[i] != -1){
 btNode x = disk_read(current.kids[i], tree->order, fp); // get the kid
 insert(q, x);
 item_count++;
 }
 }
 }
 }
}

btree.h

#ifndef BTREE_H
# define BTREE_H
#include <stdio.h>
#include <malloc.h>
#include "queue.h"
//#############################################################################
// STRUCTS
//#############################################################################
typedef struct element{
 int key; // the key of the element
 int data; // that data that each element contains
}element;
typedef struct btNode{
 int numKeys; // no of keys the node has
 int isLeaf; // is this a leaf node? 1 = true, 0 = false
 int pos_in_disk; // position of the node in the file
 element *keys; // holds the keys of the node
 int *kids; // holds the children of the node
}btNode;
typedef struct bTree {
 int order; // order of the B-Tree
 btNode root; // root of the B-Tree
 int node_count; // total no of nodes the tree has
} bTree;
typedef struct queue queue;
//#############################################################################
// METHODS
//#############################################################################
// create a new empty tree
bTree *btCreate(int order);
// return nonzero if key is present in tree
int btSearch(btNode node, int order, element key, FILE *fp);
// insert a new element into a tree
void btInsert(bTree *tree, element key, FILE *fp);
// print all keys of the tree from the root
void btPrintTree(bTree *tree, queue *q,FILE *fp);
// read and returns a node from the file
btNode disk_read(int disk, int order, FILE *fp);
#endif

queue.c

#include "queue.h"
queue *createQueue(int size) {
 queue *q;
 if((q = malloc(sizeof(queue))) == NULL)
 return NULL;
 if((q->list = malloc(sizeof(btNode) * size)) == NULL)
 return NULL;
 q->size = size;
 q->front = 0;
 q->rear = -1;
 q->itemCount = 0;
 return q;
}
btNode peek(queue *q) {
 return q->list[q->front];
}
bool isEmpty(queue *q) {
 return q->itemCount == 0;
}
bool isFull(queue *q) {
 return q->itemCount == q->size;
}
int size(queue *q) {
 return q->itemCount;
}
void insert(queue *q ,btNode data) {
 if(!isFull(q)) {
 if(q->rear == q->size-1) {
 q->rear = -1;
 }
 q->list[++q->rear] = data;
 q->itemCount++;
 }
}
btNode removeData(queue *q) {
 btNode data = q->list[q->front++];
 if(q->front == q->size) {
 q->front = 0;
 }
 q->itemCount--;
 return data;
}

queue.h

#ifndef QUEUE_H
# define QUEUE_H
#include "btree.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
//#############################################################################
// STRUCTS USED
//#############################################################################
typedef struct element element;
typedef struct btNode btNode;
typedef struct bTree bTree;
typedef struct queue{
 int size;
 int front;
 int rear;
 int itemCount;
 btNode *list;
}queue;
//#############################################################################
// METHODS
//#############################################################################
queue *createQueue(int size);
btNode peek(queue *q);
bool isEmpty(queue *q);
bool isFull(queue *q);
int size(queue *q);
void insert(queue *q ,btNode data);
btNode removeData(queue *q);
#endif

Small testing program:

test.c

#include "btree.h"
#include "queue.h"
int main(){
 element n1 = {.key = 20};
 element n2 = {.key = 30};
 element n3 = {.key = 10};
 element n4 = {.key = 40};
 element n5 = {.key = 15};
 element n6 = {.key = 17};
 element n7 = {.key = 18};
 element n8 = {.key = 50};
 element n9 = {.key = 60};
 element n10 = {.key = 70};
 FILE *fp;
 fp = fopen("file.bin", "wb+");
 bTree *tree = btCreate(4);
 btInsert(tree, n2, fp);
 btInsert(tree, n1, fp);
 btInsert(tree, n3, fp);
 btInsert(tree, n4, fp);
 btInsert(tree, n5, fp);
 btInsert(tree, n6, fp);
 btInsert(tree, n7, fp);
 btInsert(tree, n8, fp);
 btInsert(tree, n9, fp);
 btInsert(tree, n10, fp);
 queue *q = createQueue(15);
 btPrintTree(tree, q, fp);
 int pos = btSearch(tree->root, tree->order, n10, fp);
 if(pos != -1) {
 btNode x = disk_read(pos, tree->order, fp);
 printf("node has: %d keys", x.numKeys);
 }else
 printf("item doesnt exist!");
 return 0;
}
user1118321
11.9k1 gold badge20 silver badges46 bronze badges
asked Jul 4, 2018 at 23:16
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Reading and Writing from the disk

You have the same code in disk_read and disk_write to calculate size_of_btNode. This can be moved into a helper function to avoid the code duplication. In this calculation you use sizeof(int) * 3, while for the actual read or write you're using sizeof(read_node.numKeys) etc. You should use the sizeof the 3 variables you're reading rather than assuming they are always an int.

Similarly, the calculation of offset is repeated.

These two functions are very similar. It is possible to merge them into one function which would avoid the duplication of the I/O code.

There is no error checking or handling in the disk operations.

Node creation

new_node leaves the pos_in_disk field uninitialized. It also makes assumptions about the types it is allocating memory for. Using sizeof(*n.keys) rather than sizeof(element) would get rid of that. (Same for kids.)

Node insertion

In bt_insert_nonfull the condition

 if(i+1 != pos){
 node.keys[i+1] = key;
 }else{
 node.keys[pos] = key;
 }

can be reduced to one line, since the code in both branches does the same thing (since in the else branch, pos will equal i+1).

numKeys meaning

The way it is used in the code, sometimes it seems that numKeys is the number of keys, other times that it is the index of the last valid key (which is one less than the number of keys). Comparisons with tree->order-1 or pos<=node.numKeys point to the latter, while other code for creating new nodes or inserting looks like the former. These need to be checked and a consistent usage applied throughout.

Miscellaneous

btCreate and createQueue have the same issue with sizeof using a specific type as other uses of sizeof.

There are numerous btNode structures passed by value to functions. In some cases these can be replaced with passing by const btNode * to avoid the copy.

answered Jul 5, 2018 at 3:54
\$\endgroup\$
1
  • \$\begingroup\$ Thanks, i will take your notes and improve the code \$\endgroup\$ Commented Jul 5, 2018 at 12:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.