I wanted to store some dictionary data (key/value pairs of strings) in C, and decided to use a trie.
I found what looks like a good implementation here, along with some nice notes.
In this implementation, each node in the trie is a trie itself. Each node has next
, prev
, child
, and parent
properties, pointing to other nodes or NULL
.
I've modified the code as follows:
Node values are of type
void *
instead of typeint
.Keys are kept in alphabetical order as new nodes are added. This isn't much trouble, since tries to a lot of the work naturally. This should improve search time, since you can stop looking for a character once you find a character in the same position with a greater value.
This might slow down adding new keys, except that adding new keys involves searching for existing keys, so adding sorted keys could possibly be faster in some cases.
Added a "dump" function.
Formatted and refactored.
Public functions for manipulating tries
Trie_create()
creates a new trie and returns a pointer to it.Trie_dump()
dumps all keys and values in the trie tostdout
.Trie_get()
gets the value at a given key.Trie_put()
puts a value in the given key.Trie_remove()
removes the given key from the trie.
trie.h
typedef struct Trie_t Trie;
Trie *Trie_create();
const char *Trie_get(Trie *root, const char *key);
void Trie_put(Trie *root, const char *key, void *data);
void Trie_remove(Trie *root, const char *key);
void Trie_dump(Trie *t);
trie.c
#include <stdio.h>
#include "trie.h"
#include <stdlib.h>
struct Trie_t {
char key;
void *value;
Trie *next;
Trie *prev;
Trie *child;
Trie *parent;
Trie *marker;
};
Trie *Trie_createNode(const char key, void *data) {
Trie *t = NULL;
t = (Trie *)malloc(sizeof(Trie));
if (!t) {
printf("Malloc failed\n");
return t;
}
t->key = key;
t->value = data;
t->next = NULL;
t->child = NULL;
t->parent = NULL;
t->prev = NULL;
t->marker = NULL;
return t;
}
void Trie_unlink(Trie *t) {
if (t->next)
t->next->prev = t->prev;
if (t->prev)
t->prev->next = t->next;
else if (t->parent)
t->parent->child = t->next;
t->prev = 0;
t->next = 0;
t->parent = 0;
}
void Trie_insertAfter(Trie *oldTrie, Trie *newTrie) {
Trie_unlink(newTrie);
newTrie->parent = oldTrie->parent;
newTrie->next = oldTrie->next;
newTrie->prev = oldTrie;
if (newTrie->next)
newTrie->next->prev = newTrie;
oldTrie->next = newTrie;
}
void Trie_insertBefore(Trie *oldTrie, Trie *newTrie) {
Trie_insertAfter(oldTrie, newTrie);
Trie_insertAfter(newTrie, oldTrie);
}
Trie *Trie_search(Trie *root, const char *key) {
while (1) {
Trie *t;
int match = 0;
for (t = root; t && t->key <= *key; t = t->next) {
if (t->key == *key) {
match = 1;
break;
}
}
if (!match)
return NULL;
if (*key == '0円')
return t;
root = t->child;
key++;
}
}
void Trie_putPart(Trie *root, const char *key, void *data){
Trie *t;
for (t = root; *key; t = t->child) {
t->child = Trie_createNode(*key, NULL);
t->child->parent = t;
key++;
}
t->child = Trie_createNode('0円', data);
t->child->parent = t;
}
void Trie_put(Trie *root, const char *key, void *data) {
Trie *t = NULL;
Trie *current = NULL;
Trie *tmp = NULL;
if (!root)
return;
t = root->child;
current = Trie_search(t, key);
if (current) {
current->value = data;
return;
}
if (!t) {
Trie_putPart(root, key, data);
return;
}
while (*key != '0円') {
if (*key != t->key)
break;
key++;
t = t->child;
}
while (t->next && (t->key < *key)) {
if (*key == t->next->key) {
key++;
Trie_put(t->next, key, data);
return;
}
t = t->next;
}
tmp = Trie_createNode(*key, NULL);
if (t->key > tmp->key) {
Trie_insertBefore(t, tmp);
} else {
Trie_insertAfter(t, tmp);
}
key++;
Trie_putPart(tmp, key, data);
return;
}
void Trie_remove(Trie *root, const char *key) {
Trie *t = NULL;
if (!root || !key)
return;
for (t = Trie_search(root->child, key); t; t = t->parent) {
if (t->prev || t->next) {
Trie_unlink(t);
free(t);
return;
}
}
}
Trie *Trie_create() {
return Trie_createNode('0円', 0);
}
void *Trie_getRaw(Trie *root, const char *key) {
Trie *t = Trie_search(root->child, key);
return t ? t->value : 0;
}
const char *Trie_get(Trie *root, const char *key) {
return Trie_getRaw(root, key);
}
void Trie_dumpf(Trie *root, const char * keysep, const char * valsep) {
Trie *t = root;
Trie *tmp;
while (t) {
if (t->value) {
tmp = root;
while (tmp) {
printf("%c", tmp->key);
tmp = tmp->marker;
}
printf("%s%s%s", keysep, (char *)t->value, valsep);
while (t && !t->next)
t = t->parent;
if (t)
t = t->next;
if (t && t->parent)
t->parent->marker = t;
} else {
t = t->child;
t->parent->marker = t;
}
}
}
void Trie_dump(Trie *t) {
Trie_dumpf(t, ": ", "\n");
}
example.c
int main() {
Trie *t = Trie_create();
Trie_put(t, "name", "Testman");
Trie_put(t, "address", "123 main st");
Trie_put(t, "max-y", "200");
Trie_put(t, "max-z", "300");
Trie_put(t, "country", "USA");
Trie_put(t, "city", "Nowhere");
Trie_put(t, "max-x", "100");
Trie_remove(t, "max-y");
Trie_remove(t, "badkey");
Trie_dump(t);
printf("max xyz = %s,%s,%s\n",
Trie_get(t, "max-x"), Trie_get(t, "max-y"), Trie_get(t, "max-z"));
return 0;
}
output
address: 123 main st
city: Nowhere
country: USA
max-x: 100
max-z: 300
name: Testman
max xyz = 100,(null),300
Questions
Some things I'm wondering about:
In the original code,
TrieAdd
andTrieRemove
take a pointer to a pointer to a trie, instead of a pointer to a trie. I couldn't see any reason for doing this, so I changed it. Is this alright? What reason could the author have had for doing this?I hacked up
Trie_remove
pretty badly. I was confused about this code at first, but I think I understand it now. Stilll, if anyone wants to comment on this, I'm interested. Eventually I want to writeTrie_destroy()
to complementTrie_create()
.Any thoughts on the sorting approach and any potential cost/benefit in terms of performance? Should I consider doing a binary search to improve search time even further?
Tries want void pointers as data, but in practice I'm giving them strings. I did this to keep things flexible in case I want to store complex data later. Is this too weird?
Is there any redundancy to eliminate or obvious enhancements that should be made?
Any new bugs I created that weren't in the original code?
Any old bugs I overlooked that are still in my code?
Any comments on
Trie_dump
?
Original code
Varun Gupta's trie implementation in c, included here for reference.
/*trie.h*/
typedef int trieVal_t;
typedef struct trieNode {
char key;
trieVal_t value;
struct trieNode *next;
struct trieNode *prev;
struct trieNode *children;
struct trieNode *parent;
} trieNode_t;
trieNode_t* TrieSearch(trieNode_t *root, const char *key);
/*trie.c*/
#include <stdio.h>
#include "trie.h"
#include <stdlib.h>
#define DEBUG
trieNode_t *TrieCreateNode(char key, int data)
{
trieNode_t *node = NULL;
node = (trieNode_t *)malloc(sizeof(trieNode_t));
if(NULL == node)
{
printf("Malloc failed\n");
return node;
}
node->key = key;
node->next = NULL;
node->children = NULL;
node->value = data;
node->parent= NULL;
node->prev= NULL;
return node;
}
void TrieAdd(trieNode_t **root, char *key, int data)
{
trieNode_t *pTrav = NULL;
if(NULL == *root)
{
printf("NULL tree\n");
return;
}
#ifdef DEBUG
printf("\nInserting key %s: \n",key);
#endif
pTrav = (*root)->children;
if(TrieSearch(pTrav, key))
{
printf("Duplicate!\n");
return;
}
if(pTrav == NULL)
{
/*First Node*/
for(pTrav = *root; *key; pTrav = pTrav->children)
{
pTrav->children = TrieCreateNode(*key, 0xffffffff);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: %c\n",pTrav->children->key);
#endif
key++;
}
pTrav->children = TrieCreateNode('0円', data);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: %c\n",pTrav->children->key);
#endif
return;
}
while(*key != '0円')
{
if(*key == pTrav->key)
{
key++;
#ifdef DEBUG
printf("Traversing child: %c\n",pTrav->children->key);
#endif
pTrav = pTrav->children;
}
else
break;
}
while(pTrav->next)
{
if(*key == pTrav->next->key)
{
key++;
TrieAdd(&(pTrav->next), key, data);
return;
}
pTrav = pTrav->next;
}
pTrav->next = TrieCreateNode(*key, 0xffffffff);
pTrav->next->parent = pTrav->parent;
pTrav->next->prev = pTrav;
#ifdef DEBUG
printf("Inserting %c as neighbour of %c \n",pTrav->next->key, pTrav->key);
#endif
key++;
for(pTrav = pTrav->next; *key; pTrav = pTrav->children)
{
pTrav->children = TrieCreateNode(*key, 0xffffffff);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: %c\n",pTrav->children->key);
#endif
key++;
}
pTrav->children = TrieCreateNode('0円', data);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: %c\n",pTrav->children->key);
#endif
return;
}
trieNode_t* TrieSearch(trieNode_t *root, const char *key)
{
trieNode_t *level = root;
trieNode_t *pPtr = NULL;
int lvl=0;
while(1)
{
trieNode_t *found = NULL;
trieNode_t *curr;
for (curr = level; curr != NULL; curr = curr->next)
{
if (curr->key == *key)
{
found = curr;
lvl++;
break;
}
}
if (found == NULL)
return NULL;
if (*key == '0円')
{
pPtr = curr;
return pPtr;
}
level = found->children;
key++;
}
}
void TrieRemove(trieNode_t **root, char *key)
{
trieNode_t *tPtr = NULL;
trieNode_t *tmp = NULL;
if(NULL == *root || NULL == key)
return;
tPtr = TrieSearch((*root)->children, key);
if(NULL == tPtr)
{
printf("Key not found in trie\n");
return;
}
while(1)
{
if( tPtr->prev && tPtr->next)
{
tmp = tPtr;
tPtr->next->prev = tPtr->prev;
tPtr->prev->next = tPtr->next;
free(tmp);
break;
}
else if(tPtr->prev && !(tPtr->next))
{
tmp = tPtr;
tPtr->prev->next = NULL;
free(tmp);
break;
}
else if(!(tPtr->prev) && tPtr->next)
{
tmp = tPtr;
tPtr->parent->children = tPtr->next;
free(tmp);
break;
}
else
{
tmp = tPtr;
tPtr = tPtr->parent;
free(tmp);
}
}
}
1 Answer 1
If I look at the Wikipedia page on Trie, https://en.wikipedia.org/wiki/Trie I see that each node has one parent and an arbitrary number of children; if the key is a char the limit would be 255 children (or maybe less if you restrict yourself to printable chars). Yet your Trie
structure has a very different arrangement. As your structure doesn't match my naive expectation, and you have no comments indicating how the tree is actually arranged I didn't review the functionality of the code, but instead just picked as few nits:
- put standard headers before your own
- start functions with the
{
in the first column (some tools rely on this - or used to anyway) - make local functions static. And static functions don't need the
Trie_
prefix. - use const in function parameters and variables wherever possible
- the
key
parameter toTrie_createNode
need not beconst
(although it does no harm) - use of braces on single-line conditions etc is generally preferred (although noisy)
- use
perror
to print error messages (egperror("malloc");
) don't cast malloc. eg. this
Trie *t = NULL; t = (Trie *)malloc(sizeof(Trie));
should be:
Trie *t = malloc(sizeof(Trie)); /* or malloc(sizeof *t) */
nested loops are best avoided. For example in
Trie_search
, the for-loop could be extracted to a function (egfind_key
).the
while(1)
inTrie_search
would be better as a for-loopTrie *Trie_search(const Trie *root, const char *key) { for (const Trie *t = root; (t = find_key(t, *key)) != NULL; ++key, t = t->child) { if (*key == '0円') { return t; } } return NULL; }
Note that this loop (and your original) allow the search to start even if
key
is an empty string. I also get the impression that the '0円' is entered as a node in the tree, which seems very odd, but maybe I misunderstood the code.the two while loops in
Trie_put
could probably be extracted into functions
-
\$\begingroup\$ Thanks, your comments are very helpful, I'll update my answer in a bit with changes and an explanation of the children thing (but yes, the
0円
is entered... only the0円
entries have values. Each node's key is a single character). \$\endgroup\$Dagg– Dagg2013年04月20日 00:30:18 +00:00Commented Apr 20, 2013 at 0:30 -