This is a follow-up of Wikipedia Trie pseudocode in Python
Code quality improvements
find
has been fixed and a regression test ("bana" vs "banana") has been added so that it will never again be broken that way.Used ordinary instance methods when sensible (previous code suffered from staticmethods overuse).
Significant simplification from the use of Python
defaultdict
Operator overloading to provide easy membership test and printing (
__repr__
,__contains__
)Removed the
value
argument as I saw no use for it.Minor doc-style language adjustments.
The printing is not used in testing anymore, as it should not be, because it is arbitrary (as Python's
dict
s are)
Functionality improvements
The printing is now as it should be, not reversed.
There is no more a weird
inspect
method, you can simplyprint
the trie.
Example output
A trie from the words ('banning', 'banned', 'banana', 'bad', 'cooking', 'cought', 'count')
is printed as:
c o o k i n g # u g h t # n t # b a n a n a # n i n g # e d # d #
#
signals the end of a word.
The code
import collections
import doctest
class Trie:
"""
Implements a Trie (also known as 'digital tree', 'radix tree' or 'prefix tree').
Where common starting letters of many words are stored just once.
"""
def __init__(self):
self.child = collections.defaultdict(Trie)
def insert(self, string):
"""
Add a given string to the trie, modifying it **in place**.
>>> t = Trie()
>>> t.insert('hello')
>>> t.insert('hi')
>>> list(sorted(t.child.keys()))
['h']
>>> first_node = t.child['h']
>>> list(sorted(first_node.child.keys()))
['e', 'i']
As you can see, the same `h` was written only once,
and it is connected with both `i` and `e`.
"""
node = self
for char in string:
node = node.child[char]
node = node.child[None]
def __contains__(self, word):
"""
>>> t = Trie()
>>> t.insert('example')
>>> 'example' in t
True
>>> 'exemplum' in t
False
>>> t.insert('bana')
>>> 'banana' in t
False
>>> t.insert('banning')
>>> t.insert('banned')
"""
trie = self
for char in word:
if char in trie.child:
trie = trie.child[char]
else:
return False
return True
def __str__(self, depth = 0):
"""
Shows a nicely formatted and indented Trie.
Cannot be tested as equivalent representations
are arbitrarly chosen from (`dict`s are not ordered).
"""
s = []
for i in self.child:
s.append( '{}{} {}'.format(
' ' * depth, i or '#', '\n' + self.child[i].__str__(depth + 1)))
return ''.join(s)
if __name__ == '__main__':
doctest.testmod()
trie = Trie()
for word in ('banning', 'banned', 'banana', 'bad', 'cooking', 'cought', 'count'):
trie.insert(word)
print(trie)
1 Answer 1
You could easily extent __init__
to take a list of strings to add to your Trie
from the get go.
def __init__(self, *args):
self.child = collections.defaultdict(Trie)
for arg in args:
self.insert(arg)
Now you could pass an arbitrary number of strings to Trie
and they'll all be added in initialisation.
Also, if you wanted to you could still make the string representations equivalent by sorting the list before you return it.
def __str__(self, depth = 0):
s = []
for i in self.child:
s.append( '{}{} {}'.format(
' ' * depth, i or '#', '\n' + self.child[i].__str__(depth + 1)))
s.sort()
return ''.join(s)
If you're concerned with running time, you can make it flaggable and off by default. Speaking of defaults, you should use depth=0
(no spaces, PEP8 reference) as that's the commonly accepted syntax.
-
1\$\begingroup\$ I was thinking about writing a method,
from_strings
that would generate a Trie from a list of strings, is giving__init__
more power better than that? \$\endgroup\$Caridorc– Caridorc2015年10月05日 17:03:44 +00:00Commented Oct 5, 2015 at 17:03 -
\$\begingroup\$ About performance, printing is by far slower than sorting, so I see no performance issues with sorting (
sorted
seems better thansort
to me in this instance, but it is subjective) \$\endgroup\$Caridorc– Caridorc2015年10月05日 17:07:52 +00:00Commented Oct 5, 2015 at 17:07 -
\$\begingroup\$ @Caridorc Your mileage may vary on what's better. It largely depends on usage I think. If you were thinking about that though, why not allow
insert
to take*args
? That way you could do both anyway. Also I mostly usedsort
to make it easier to separate withif flag: s.sort()
. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年10月05日 17:09:20 +00:00Commented Oct 5, 2015 at 17:09 -
\$\begingroup\$ Yes, the sort thing is pretty subjective
''.join(sorted(s) if should_sort else s)
may be considered more or less readable than yours depending on taste. You mean makinginsert
capable of inserting arbitrarly many words? \$\endgroup\$Caridorc– Caridorc2015年10月05日 17:13:45 +00:00Commented Oct 5, 2015 at 17:13 -
\$\begingroup\$ Yeah. Insert could take *args and just insert every word that's passed. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年10月05日 17:16:51 +00:00Commented Oct 5, 2015 at 17:16
t = Trie(); t.insert('banana'); 'bana' in t
would returnTrue
. Is this really the desired behavior? \$\endgroup\$b
-ba
-ban
-bana
-banan
, it is calledprefix tree
for this reason \$\endgroup\$value
- with it, you could easily have a special sentinel value that isn't normally available to clients that says "this is not an end node of any word in the trie". This would be necessary if you allowed removing items from the trie, because without knowing which words have been explicitly added, you can't tell whethertrie.remove('banana')
should also remove'bana'
. \$\endgroup\$insert
has actually changed anything? Nothing different happens whether I'm adding a new value or not. \$\endgroup\$