As an interesting exercise in Python 3.3, I implemented a Trie (also known as a Prefix Tree).
Example usages
Create from list of key-value-tuples
mapping = (('she', 1), ('sells', 5), ('sea', 10), ('shells', 19), ('today', 5))
trie = Trie.from_mapping(mapping)
some_prefix = 'sh'
some_key = 'today'
print('Keys:', *trie)
print('Items: ', *trie.items())
print('Keys prefixed "{}":'.format(some_prefix), *trie.keys(prefix=some_prefix))
trie[some_key] = -55
print('Value of "{}":'.format(some_key), trie[some_key])
print(some_key, 'is in' if some_key in trie else 'is not in', 'the Trie')
Output
Keys: today sells sea she shells Items: ('today', 5) ('sells', 5) ('sea', 10) ('she', 1) ('shells', 19) Keys prefixed "sh": she shells Value of "today": -55 today is in the Trie
Find out lengths of words in a string
words = 'She sells sea shells today she sells no shells she sells'.split()
trie = Trie.from_keys(words, value_selector=len)
print(*trie.items())
Output
('sells', 5) ('sea', 3) ('she', 3) ('shells', 6) ('today', 5) ('no', 2) ('She', 3)
Implementation
For simplicity, each entry is represented by a dict
mapping characters to child entries – if this is problematic, feel free to remark on that.
class _Entry:
def __init__(self, value=None):
self.value = value
self.children = {}
def __iter__(self):
return iter(self.children.items())
I made it iterable so I don't have to constantly write things like entry.children.items()
.
Here's the Trie
itself – please feel free to point out any or all issues you can spot. I started taking a look at Python two weeks ago, so I'd also appreciate your comments on how Pythonic (or not) this is, as well as any potential issues of performance.
class Trie:
@staticmethod
def from_mapping(mapping):
"""
Constructs a Trie from an iterable of two-item tuples.
:param mapping: An iterable of key value pairs (each key must itself
be an iterable)
:return: A new Trie containing the keys and values in the mapping
"""
trie = Trie()
for key, value in mapping:
if not key:
continue
trie[key] = value
return trie
@staticmethod
def from_keys(keys, value_selector):
"""
Constructs a Trie from an iterable of keys, mapping each key to the
value returned by the value selector function, which is passed each key.
:param keys: an iterable of keys (each key must itself be an iterable)
:return: a new Trie containing the keys in the mapping
"""
trie = Trie()
for key in keys:
if not key:
continue
trie[key] = value_selector(key)
return trie
def __init__(self):
"""
Creates an empty Trie.
"""
self._root = _Entry()
self._length = 0
def __getitem__(self, key):
"""
Gets the value corresponding to the specified key.
:param key: an iterable
:return: the value corresponding to the key if it is in the Trie,
otherwise None
"""
entry = self._find(key)
if entry is not None:
return entry.value
def __setitem__(self, key, value):
"""
Sets or inserts the specified value, mapping the specified key to it
:param key: an iterable to update or insert
:param value: the object to associate with the key
"""
self.update(key, lambda old_value: value)
def __delitem__(self, key):
"""
Removes the value of the specified key without removing it from the Trie
:param key: the iterable whose value is to be removed
"""
self[key] = None
self._length -= 1
def __contains__(self, key):
"""
Determines if the key is in the Trie and has a value associated with it
:param key: the iterable to search for
:return: True if the Trie contains the key, otherwise False
"""
entry = self._find(key)
return entry is not None and entry.value is not None
def __len__(self):
return self._length
def __bool__(self):
return len(self) > 0
def __iter__(self):
return iter(self.keys())
def keys(self, prefix=''):
"""
Searches for keys beginning with the specified prefix
:param prefix: an iterable
:return: A generator yielding the keys one by one, as they are found
"""
yield from (key for key, value in self.items(prefix))
def items(self, prefix=''):
"""
Searches for key value pairs where the keys begin with the specified
prefix
:param prefix: an iterable
:return: A generator yielding tuples of keys and values as they are
found
"""
start = self._find(prefix)
if start is None:
return ()
stack = [(prefix, start)]
while stack:
current_prefix, entry = stack.pop()
if entry.value is not None:
yield current_prefix, entry.value
for char, children in entry:
stack.append((current_prefix + char, children))
def update(self, key, value_selector):
"""
Updates key with the value obtained from the value selector function,
to which the old value corresponding to the key is passed.
:param key: the key whose value is to be updated
:param value_selector: a function that takes the old value as an
argument and returns a new value
"""
entry = self._root
for char in key:
if char not in entry.children:
entry.children[char] = _Entry()
entry = entry.children[char]
new_value = value_selector(entry.value)
if entry.value != new_value:
entry.value = new_value
self._length += 1
def _find(self, key):
entry = self._root
for char in key:
if char not in entry.children:
return None
entry = entry.children[char]
return entry
3 Answers 3
1. Bug
Sometimes the
_length
attribute is updated wrongly. First, when deleting an item, the length is decremented regardless of whether the item was actually present in the trie:>>> trie = Trie() >>> del trie['key'] >>> len(trie) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: __len__() should return >= 0
Second, when updating an item, the length is incremented regardless of whether the item was actually new:
>>> trie = Trie() >>> trie['key'] = 1 >>> trie['key'] = 2 >>> print(*trie) key >>> len(trie) 2
2. Other comments
You have docstrings for all your methods: awesome! You might consider adding a docstring for the
Trie
class (look at the output ofhelp(Trie)
and consider what else is needed). This is also a good class to consider adding doctests.Since the class
_Entry
is only used inTrie
, you could make it an attribute ofTrie
.In the
_Entry
class, you useNone
as a placeholder to mean that there is no value associated with this entry. This prevents anyone from storingNone
as a value in aTrie
. Instead, you could leave thevalue
attribute unset to indicate no value (this can be detected with thehasattr
function), and writedel entry.value
to delete it.You could have
_Entry
inherit fromdict
and then you wouldn't need to have an attributechildren
.By using
@staticmethod
you prevent anyone from subclassingTrie
, because when someone callsSubTrie.from_mapping
, the object they get back will be aTrie
rather than aSubTrie
. You should use@classmethod
instead:@classmethod def from_mapping(class_, mapping): trie = class_() # ...
It's not clear why you have the lines:
if not key: continue
in
from_mapping
andfrom_keys
. Why can't I add the empty string to aTrie
? (If you have a good reason, it would be worth mentioning it in the documentation, and also worth raising an error here, instead of silently discarding these keys.)The
__getitem__
method returnsNone
to indicate failure. In Python it's normal to raise an exception to indicate failure rather than returning an exceptional type. In this case it would be appropriate to raiseKeyError
(just as the built-indict.__getitem__
does).You might consider whether
Trie
should inherit fromMutableMapping
in thecollections.abc
module: this would give youvalues
,get
,__eq__
,__ne__
,pop
,popitem
,clear
,update
andsetdefault
methods for free.Since your
Trie
class presents a mapping-like interface, Python programmers will expect itsupdate
method to present a similar interface to theupdate
method on ordinary dicationary objects. (This is easy if you inherit fromMutableMapping
since theupdate
method is defined in that abstract base class.)(Your original
update
would then need a new name.)Similarly, since your
Trie
class presents a mapping-like interface, Python programmers will expect its constructor to present a similar interface to thedict
constructor. (Again, this is easy if you inherit fromMutableMapping
since you can just pass the positional and keyword arguments to theupdate
method.)It seems perverse that the
value_selector
argument to theupdate
method is different from thevalue_selector
argument to thefrom_keys
method. (The former takes the old value as its argument but the latter takes the key as the argument.) This can only be confusing to users.Also, if your
update
method is called, but there's no value in the trie forkey
, thevalue_selector
function gets called anyway (with the argumentNone
). You might consider whether this method should take a third argument, the value to use in the case where there is no value in the trie. (Or maybe you should raiseKeyError
in this case.)I have a feeling that your
from_keys
andupdate
methods are not very Pythonic. Instead offrom_keys
, callers would be happy to write:Trie((key, f(key)) for key in keys)
and instead of
trie.update(key, lambda old_value: old_value + 1)
caller would be happy to writetrie[key] += 1
My guess is that your goal in writing the
update
method is to avoid the double lookup in the above. So it's a reasonable thing to provide, it just needs some thought about the name and interface.You could simplify some of your logic if your
_find
method was able to create new_Entry
objects. See below for how.If your function body consists only of
yield from <generator>
, you might as well writereturn <generator>
.
3. Revised code
This is how I would write the class, addressing all the points above.
from collections.abc import MutableMapping
class Trie(MutableMapping):
"""
A prefix tree.
>>> mapping = dict(she=1, sells=5, sea=10, shells=19, today=5)
>>> trie = Trie(mapping)
>>> some_prefix = 'sh'
>>> some_key = 'today'
>>> print(*sorted(trie))
sea sells she shells today
>>> print(*sorted(trie.items()))
('sea', 10) ('sells', 5) ('she', 1) ('shells', 19) ('today', 5)
>>> print(*trie.keys(prefix='sh'))
she shells
>>> trie['today'] = -55
>>> trie['today']
-55
>>> len(trie)
5
>>> 'shells' in trie, 'shore' in trie
(True, False)
>>> del trie['shells']
>>> len(trie)
4
"""
# A dictionary with an optional :value: attribute.
class _Entry(dict):
def has_value(self):
return hasattr(self, 'value')
def __init__(self, *args, **kwargs):
"""
Return a new True initialized from an optional positional
argument and a possibly empty set of keyword arguments, as for
the built-in type :dict:
"""
self._root = self._Entry()
self._length = 0
self.update(*args, **kwargs)
def __getitem__(self, key):
"""
Gets the value corresponding to the specified key.
:param key: an iterable
:return: the value corresponding to the key if it is in the Trie,
otherwise raise KeyError.
"""
entry = self._find(key)
if entry.has_value():
return entry.value
else:
raise KeyError(key)
def __setitem__(self, key, value):
"""
Sets or inserts the specified value, mapping the specified key to it
:param key: an iterable to update or insert
:param value: the object to associate with the key
"""
entry = self._find(key, create=True)
if not entry.has_value():
self._length += 1
entry.value = value
def __delitem__(self, key):
"""
Removes the value of the specified key from the Trie
:param key: the iterable whose value is to be removed
"""
entry = self._find(key)
if entry.has_value():
del entry.value
self._length -= 1
else:
raise KeyError(key)
def __len__(self):
return self._length
def __iter__(self):
return iter(self.keys())
def keys(self, prefix=''):
"""
Searches for keys beginning with the specified prefix
:param prefix: an iterable
:return: A generator yielding the keys one by one, as they are found
"""
return (key for key, value in self.items(prefix))
def items(self, prefix=''):
"""
Searches for key value pairs where the keys begin with the specified
prefix
:param prefix: an iterable
:return: A generator yielding tuples of keys and values as they are
found
"""
try:
start = self._find(prefix)
except KeyError:
raise StopIteration
stack = [(prefix, start)]
while stack:
current_prefix, entry = stack.pop()
if entry.has_value():
yield current_prefix, entry.value
for char, children in entry.items():
stack.append((current_prefix + char, children))
def _find(self, key, create=False):
entry = self._root
for char in key:
if char in entry:
entry = entry[char]
elif create:
new_entry = self._Entry()
entry[char] = new_entry
entry = new_entry
else:
raise KeyError(key)
return entry
4. Algorithmic concerns
You create
_Entry
objects for all elements in each key, even if there are no other keys sharing any of those entries. For example, if I create an emptyTrie
and then add a value for the keysupercalifragilisticexpialidocious
, 34_Entry
objects will be constructed, and when I look up this key, all 34 of them will have to be traversed. You might consider storing the tail of a key differently when it doesn't need to share space with other keys, and creating_Entry
objects dynamically as necessary when you add new keys that share a prefix with existing keys.When you delete keys, you don't remove
_Entry
objects that have become empty as a result. This means that later lookups may need to descend deeper into the trie before discovering that the key is not present.
-
1\$\begingroup\$ Awesome! My only issue is that I sternly disapprove of the suggesting to subclass
dict
for the_Entry
class. \$\endgroup\$Winston Ewert– Winston Ewert2013年04月01日 21:24:40 +00:00Commented Apr 1, 2013 at 21:24 -
\$\begingroup\$ @WinstonEwert: Can you explain your concern? \$\endgroup\$Gareth Rees– Gareth Rees2013年04月01日 21:28:57 +00:00Commented Apr 1, 2013 at 21:28
-
\$\begingroup\$ Gareth, thank you for your outstanding review. I particularly like the way you got rid of the duplicated logic by introducing the
create
parameter in_find
, though I think I might make it keyword-only. \$\endgroup\$Adam– Adam2013年04月01日 21:31:07 +00:00Commented Apr 1, 2013 at 21:31 -
\$\begingroup\$ It only makes sense to subclass
dict
if you are making some kind of specialdict
. Here you are creating a class which uses adict
, and I think thats much more naturally represented using composition rather then inheritance. I'll admit, its a bit a purest thing here. \$\endgroup\$Winston Ewert– Winston Ewert2013年04月01日 21:34:34 +00:00Commented Apr 1, 2013 at 21:34
A note on terminology: a mapping is, according to Python glossary,
A container object that supports arbitrary key lookups and implements the methods specified in the Mapping or MutableMapping abstract base classes. Examples include
dict
,collections.defaultdict
,collections.OrderedDict
andcollections.Counter
.
Your from_mapping
assumes a different type of argument.
-
\$\begingroup\$ Thanks, that's a very good point. In fact, I was confused by the
dict
constructor documentation but that's because I didn't read it carefully enough. Any suggestions for a better name? I'm thinking ofkey_value_pairs
... \$\endgroup\$Adam– Adam2013年04月01日 20:09:38 +00:00Commented Apr 1, 2013 at 20:09 -
\$\begingroup\$ @codesparkle, I'd go from
from_pairs
myself. \$\endgroup\$Winston Ewert– Winston Ewert2013年04月01日 21:19:31 +00:00Commented Apr 1, 2013 at 21:19 -
\$\begingroup\$ @codesparkle I'd follow Gareth's suggestion and have the constructor call
update
. \$\endgroup\$Janne Karila– Janne Karila2013年04月02日 06:01:33 +00:00Commented Apr 2, 2013 at 6:01
__getitem__
returns None if a key cannot be found, to follow python conventions it should raise KeyError.
Simliar with __delitem__
, I'd expected a KeyError if deleting a non-existent element.
I wonder if it would be better to have your _Entry
object support a .items() method which could then be implemented recursively.
Given that it act a lot like a mapping, I'd expect the update
method to act like dict.update
but it doesn't.
if entry.value != new_value:
entry.value = new_value
self._length += 1
Its not clear to me that this is the correct logic for incrementing the length.
-
\$\begingroup\$ Thank you for your excellent points. You're right about
length
: it is indeed a bug. \$\endgroup\$Adam– Adam2013年04月01日 21:22:48 +00:00Commented Apr 1, 2013 at 21:22