Dictrie: Extending the Python dictionary into a Trie

Question 1

I implemented a trie in Python2.7 as a dictionary of dictionaries by extending the UserDict class to allow native dictionary access syntax and iteration.

For example, trie['stack'] produces a subtrie of all words that begin with 'stack', 'stack' in trie checks for containment in the trie, and for word in trie.get_words('stack'): iterates over all words that start with 'stack', among other features (github.com/sufyanAbbasi/dictrie):

from UserDict import UserDict
from collections import deque
class Dictrie(UserDict, object):
 def __init__(self, *wordslists, **kwargs):
 init_trie = kwargs.get('dict', {})
 super(Dictrie, self).__init__(init_trie)
 for words in wordslists:
 self.build_trie(words)
 # returns if word is a valid word in the dictionary
 def is_word(self, word):
 return word in self and ' ' in self[word]
 # returns a generator to produce all words in the trie beginning with
 # the root string from shortest to longest
 def get_words(self, root):
 queue = deque([root])
 while queue:
 curr_str = queue.popleft()
 if not self[curr_str]:
 yield curr_str.strip()
 else:
 queue.extend(curr_str + key for key in sorted(self[curr_str].iterkeys()))
 # builds the trie given an iterator of words
 def build_trie(self, words):
 words = list(words)
 for word in words:
 self[' '] = word
 #iterates over all the words in the dictionary
 def __iter__(self):
 queue = deque(sorted(self.iterkeys()))
 while queue:
 curr_str = queue.popleft()
 if not self[curr_str]:
 yield curr_str.strip()
 else:
 queue.extend(curr_str + key for key in sorted(self[curr_str].iterkeys()))
 def __contains__(self, key):
 sub_trie = self.data
 for char in key:
 if char in sub_trie:
 sub_trie = sub_trie[char]
 else:
 return False
 return True
 def __getitem__(self, key):
 sub_trie = self.data
 for char in key:
 if char in sub_trie:
 sub_trie = sub_trie[char]
 else:
 raise KeyError(key)
 return sub_trie
 def __setitem__(self, key, item):
 sub_trie = self.data
 for char in item.strip() + ' ':
 sub_trie = sub_trie.setdefault(char, {})

There are a few gripes that I have with my code that I'd like suggestions on, in order of priority:

Performance

The get_word and __iter__ generators use a BFS algorithm that repeats a lot of computation, however it has the advantage of being simple to read. As the trie is parsed in breadth-first style, the entire word is appended to the end of the queue instead of caching the current dictionary and starting there, so it repeats the computation. Is the efficient strategy to append a tuple containing the parsed word and a cached dictionary? lru_cache decorator? Recursion, maybe? Here are the methods in question:
```
def get_words(self, root):
 queue = deque([root])
 while queue:
 curr_str = queue.popleft()
 if not self[curr_str]:
 yield curr_str.strip()
 else:
 queue.extend(curr_str + key for key in sorted(self[curr_str].iterkeys()))
def __iter__(self):
 queue = deque(sorted(self.iterkeys()))
 while queue:
 curr_str = queue.popleft()
 if not self[curr_str]:
 yield curr_str.strip()
 else:
 queue.extend(curr_str + key for key in sorted(self[curr_str].iterkeys()))
```

Design

The get_word and __iter__ methods, are exactly the same BFS algorithm, except for the first line in which the initial queue in get_word is the root string where __iter__ starts the queue with the first keys in the trie (so all words are iterated over). There's an opportunity for code reduction by setting one equal to the other, but I can't seem to come up with a clever way to do so, given that one takes in a parameter and the other does not.
Accessing a subtrie such as trie['stack'] returns a nested dictionary instead of a Dictrie object, is that a design flaw? As it is, it allows the user to employ their own trie traversal algorithms and create a Dictrie object if they need to, however it creates unexpected behavior when performing iteration on it such as for words in trie['stack']. Should indexing into the Dictrie class return a Dictrie or a dictionary of dictionaries? The weird thing is, they are almost exactly the same!
What other useful features should a trie implementation have?

Question 2

For the redundancy between get_words and __iter__, you could add another option to get_words that allows directly passing a queue (and requiring that exactly one of the two options is passed):

def get_words(self, root=None, queue=None):
 # cannot pass both `root` and `queue`, but have to pass one of them
 assert (root is not None) ^ (queue is not None)
 if queue is None and root is not None:
 queue = deque([root])
 while queue:
 curr_str = queue.popleft()
 if not self[curr_str]:
 yield curr_str.strip()
 else:
 queue.extend(curr_str + key
 for key in sorted(self[curr_str].iterkeys()))
def __iter__(self):
 for word in self.get_words(queue=deque(sorted(self.iterkeys()))):
 yield word

Question 3

I didn't think about XOR, sweet! This is interesting, I might go with this solution as it resolves the quandary. One edit I might add to this solution is to iterate on all words in the dictionary when no input is given to get_words(), which would mimic the __iter__ function. Thanks so much! I will mark this answered tomorrow if there are no more bites.

Question 4

@sufjan Than you would not even need the second argument and XOR anymore and could even declare __iter__ = get_words.

Question 5

In my effort to avoid using the keyword argument, I overlooked the simplicity that it would introduce. Thanks!

Graipher Graipher 41.7k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2018-01-09 11:38:01Z

For the redundancy between get_words and __iter__, you could add another option to get_words that allows directly passing a queue (and requiring that exactly one of the two options is passed):

def get_words(self, root=None, queue=None):
 # cannot pass both `root` and `queue`, but have to pass one of them
 assert (root is not None) ^ (queue is not None)
 if queue is None and root is not None:
 queue = deque([root])
 while queue:
 curr_str = queue.popleft()
 if not self[curr_str]:
 yield curr_str.strip()
 else:
 queue.extend(curr_str + key
 for key in sorted(self[curr_str].iterkeys()))
def __iter__(self):
 for word in self.get_words(queue=deque(sorted(self.iterkeys()))):
 yield word

I didn't think about XOR, sweet! This is interesting, I might go with this solution as it resolves the quandary. One edit I might add to this solution is to iterate on all words in the dictionary when no input is given to get_words(), which would mimic the __iter__ function. Thanks so much! I will mark this answered tomorrow if there are no more bites.
@sufjan Than you would not even need the second argument and XOR anymore and could even declare __iter__ = get_words.
In my effort to avoid using the keyword argument, I overlooked the simplicity that it would introduce. Thanks!

Stack Exchange Network

Dictrie: Extending the Python dictionary into a Trie

Performance

Design

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Dictrie: Extending the Python dictionary into a Trie

Performance

Design

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions