I implemented a Trie data structure in Java, to test my understanding of the concept. I tried (pun intended :) ) to follow TDD steps along the way (i.e., add first a failing test case, and implement functionality after ensuring that it really fails). I'm posting here the end result. Besides the implementation itself, I'm also interested in the quality of the tests.
I'm mostly interested in the following aspects:
- Is this a correct implementation of the Trie data structure? If not, then what are the mistakes?
- Can the performance or the effectiveness (both in speed, as well as in readability) of the implementation be improved?
- Can you think of any test case, which would cause an issue in this implementation, besides the trivial NPE issues, if
null
is passed to the methods (I intentionally left out theif (word == null) ...
checks to make the code shorter)? (In other words, a test that exposes a bug.) - Would you improve anything in the unit tests?
Of course, any other suggestion is also welcome.
What I already know could be improved:
I intentionally left out the
isLeaf
part for now. (Which tells, for each node, if it is the end of a complete word. E.g., if we had a "path" forhello
, then both theo
at the end, as well as the lastl
could have theisLeaf
flag set to true.)string2Coll
method could be written in a more elegant way, using Java 8 (but for some reason my Eclipse can not set Java 8 code generation, although I have JDK for Java 8 installed -- and at this point, I'm not too interested in that aspect)
For the record, I checked in this code into my github repository, and probably I'll improve it based on the answers from here.
Trie.java
package trie;
import java.util.Collection;
import java.util.HashMap;
import java.util.Map;
public class Trie<E> {
private static class Node<E> {
private Map<E, Node<E>> children = new HashMap<>();
Node<E> addChild(E child) {
children.putIfAbsent(child, new Node<E>());
return children.get(child);
}
Node<E> getChild(E child) {
return children.get(child);
}
Node<E> removeChild(E child) {
return children.remove(child);
}
boolean hasAtMostOneChild() {
return children.size() < 2;
}
}
private Node<E> root = new Node<>();
public boolean contains(Collection<E> word) {
Node<E> node = root;
for (E ch: word) {
Node<E> next = node.getChild(ch);
if (next == null) {
return false;
} else {
node = next;
}
}
return true;
}
public void add(Collection<E> word) {
Node<E> node = root;
for (E ch: word) {
Node<E> next = node.getChild(ch);
if (next == null) {
next = node.addChild(ch);
}
node = next;
}
}
public void remove(Collection<E> word) {
Node<E> node = root;
Node<E> lastFork = root;
E childAtLastFork = word.iterator().hasNext() ? word.iterator().next() : null;
for (E ch: word) {
if (!node.hasAtMostOneChild()) {
lastFork = node;
childAtLastFork = ch;
}
node = node.getChild(ch);
if (node == null) {
break;
}
}
lastFork.removeChild(childAtLastFork);
}
}
TrieTest.java
package trie;
import static org.hamcrest.Matchers.is;
import static org.junit.Assert.assertThat;
import java.util.ArrayList;
import java.util.Collection;
import org.junit.Test;
public class TrieTest {
@Test
public void emptyTrieShouldContainEmptyWord() {
Trie<Character> trie = new Trie<>();
assertThat(trie.contains(string2Coll("")), is(true));
}
@Test
public void emptyTrieShouldNotContainWord() {
Trie<Character> trie = new Trie<>();
assertThat(trie.contains(string2Coll("hello")), is(false));
}
@Test
public void addZeroLengthWordShouldNotCauseException() {
Trie<Character> trie = new Trie<>();
trie.add(string2Coll(""));
}
@Test
public void wordShouldBeContainedIfAdded() {
Trie<Character> trie = new Trie<>();
trie.add(string2Coll("hello"));
assertThat(trie.contains(string2Coll("hello")), is(true));
}
@Test
public void wordShouldNotBeContainedIfRemoved() {
Trie<Character> trie = new Trie<>();
trie.add(string2Coll("hello"));
trie.remove(string2Coll("hello"));
assertThat(trie.contains(string2Coll("hello")), is(false));
}
@Test
public void removingNonExistingWordShouldNotCauseException() {
Trie<Character> trie = new Trie<>();
trie.remove(string2Coll("hello"));
assertThat(trie.contains(string2Coll("hello")), is(false));
}
@Test
public void removingOneWordShouldNotRemoveAnother() {
Trie<Character> trie = new Trie<Character>();
trie.add(string2Coll("hello"));
trie.add(string2Coll("help"));
trie.remove(string2Coll("help"));
assertThat(trie.contains(string2Coll("help")), is(false));
assertThat(trie.contains(string2Coll("hello")), is(true));
}
@Test
public void removingZeroLengthWordShouldNotCauseException() {
Trie<Character> trie = new Trie<>();
trie.remove(string2Coll(""));
assertThat(trie.contains(string2Coll("")), is(true));
}
private static Collection<Character> string2Coll(String str) {
Collection<Character> ret = new ArrayList<>(str.length());
for (char ch: str.toCharArray()) {
ret.add(ch);
}
return ret;
}
}
1 Answer 1
Is this a correct implementation of the Trie data structure?
Apart from what you intentionally left out, yes. My big problem with it, is that a Trie
is actually a Set
, but when you leave isLeaf
out, then it can't work.
Actually, a Trie
can also be a Map
associating leaves with some values. IMHO, this is the way to go as the Set
can be obtained trivially using Collections.newSetFromMap`.
Concerning the keys, I'd replace Collection
by List
. While all you need is some fixed iteration order, I don't think anyone wants ever call trie.add(someSet)
. If you really want to make it general, then you could use Iterable
instead.
Can the performance or the effectiveness (both in speed, as well as in readability) of the implementation be improved?
It can be make more space-efficient by using some tricks in Node
as there's just one child on the average. This would surely come at the cost of more code, so your solution is fine.
I could be probably made much more space and time efficient for the most common case using char
keys, but that obviously wasn't your goal.
besides the trivial NPE issues
These are no issues. Passing null
where it doesn't belong to should always throw; anything else is an invitation for later bugs.
Would you improve anything in the unit tests?
The remove method should be tested much more. Does removing the empty collection remove everything? What happens if you remove a substring of a present string?
Interactions of multiple calls should be tested. What if you add a word twice?
I guess, you should add equals
and hashCode
, which would assist you in the tests. When you go for implementing Map
, then don't forget that you must follow its (stupid) hashCode
contract.
I intentionally left out the
isLeaf
part for now. (Which tells, for each node, if it is the end of a complete word.
Unfortunately, without isLeaf
, you have just a funny data structure, which is hard to grasp and hard to use. With it, you get a Set
, which is something we all know well. But as the first step, it's alright. Anyway, I'd recommend to forget isLeaf
and go for the Map
instead.
Concerning the code itself, there's nothing to say, maybe except for word.iterator().hasNext()
being the same as !word.isEmpty()
, but given that you use next()
, I can understand if this was intentional. However, you may want to special case the removal of the empty word (whatever it's meant to do).
removingZeroLengthWordShouldNotCauseException
- is this really all you want?
-
\$\begingroup\$ Thanks, I did most of the improvements you suggested. I don't understand one thing, though: do you suggest to add
equals
andhashCode
to theTrie
or theNode
class? And how would that help with the tests? \$\endgroup\$Attilio– Attilio2017年12月05日 19:03:26 +00:00Commented Dec 5, 2017 at 19:03 -
\$\begingroup\$ @Attilio To
Trie
. It'd allow you to do tests liket1 = new Trie(); t2 = new Trie(); assertEquals(t1.add("a"), t1.add("a").add("a"))
easily. There are tons of such possible tests. I don't claim you need them, but when tests are very very simple to write, then I prefer to write a bunch of them without much thought, so I know they cover most possible problems, no matter how the implementation looks like. You'll also needequals
when implementing theMap
. \$\endgroup\$maaartinus– maaartinus2017年12月06日 01:46:09 +00:00Commented Dec 6, 2017 at 1:46