Generic trie implementation in Java

Question 1

I implemented a Trie data structure in Java, to test my understanding of the concept. I tried (pun intended :) ) to follow TDD steps along the way (i.e., add first a failing test case, and implement functionality after ensuring that it really fails). I'm posting here the end result. Besides the implementation itself, I'm also interested in the quality of the tests.

I'm mostly interested in the following aspects:

Is this a correct implementation of the Trie data structure? If not, then what are the mistakes?
Can the performance or the effectiveness (both in speed, as well as in readability) of the implementation be improved?
Can you think of any test case, which would cause an issue in this implementation, besides the trivial NPE issues, if null is passed to the methods (I intentionally left out the if (word == null) ... checks to make the code shorter)? (In other words, a test that exposes a bug.)
Would you improve anything in the unit tests?

Of course, any other suggestion is also welcome.

What I already know could be improved:

I intentionally left out the isLeaf part for now. (Which tells, for each node, if it is the end of a complete word. E.g., if we had a "path" for hello, then both the o at the end, as well as the last l could have the isLeaf flag set to true.)
string2Coll method could be written in a more elegant way, using Java 8 (but for some reason my Eclipse can not set Java 8 code generation, although I have JDK for Java 8 installed -- and at this point, I'm not too interested in that aspect)

For the record, I checked in this code into my github repository, and probably I'll improve it based on the answers from here.

Trie.java

package trie;
import java.util.Collection;
import java.util.HashMap;
import java.util.Map;
public class Trie<E> {
 private static class Node<E> {
 private Map<E, Node<E>> children = new HashMap<>();
 Node<E> addChild(E child) {
 children.putIfAbsent(child, new Node<E>());
 return children.get(child);
 }
 Node<E> getChild(E child) {
 return children.get(child);
 }
 Node<E> removeChild(E child) {
 return children.remove(child);
 }
 boolean hasAtMostOneChild() {
 return children.size() < 2;
 }
 }
 private Node<E> root = new Node<>();
 public boolean contains(Collection<E> word) {
 Node<E> node = root;
 for (E ch: word) {
 Node<E> next = node.getChild(ch);
 if (next == null) {
 return false;
 } else {
 node = next;
 }
 }
 return true;
 }
 public void add(Collection<E> word) {
 Node<E> node = root;
 for (E ch: word) {
 Node<E> next = node.getChild(ch);
 if (next == null) {
 next = node.addChild(ch);
 }
 node = next;
 }
 }
 public void remove(Collection<E> word) {
 Node<E> node = root;
 Node<E> lastFork = root;
 E childAtLastFork = word.iterator().hasNext() ? word.iterator().next() : null;
 for (E ch: word) {
 if (!node.hasAtMostOneChild()) {
 lastFork = node;
 childAtLastFork = ch;
 }
 node = node.getChild(ch);
 if (node == null) {
 break;
 }
 }
 lastFork.removeChild(childAtLastFork);
 }
}

TrieTest.java

package trie;
import static org.hamcrest.Matchers.is;
import static org.junit.Assert.assertThat;
import java.util.ArrayList;
import java.util.Collection;
import org.junit.Test;
public class TrieTest {
 @Test
 public void emptyTrieShouldContainEmptyWord() {
 Trie<Character> trie = new Trie<>();
 assertThat(trie.contains(string2Coll("")), is(true));
 } 
 @Test
 public void emptyTrieShouldNotContainWord() {
 Trie<Character> trie = new Trie<>();
 assertThat(trie.contains(string2Coll("hello")), is(false));
 }
 @Test
 public void addZeroLengthWordShouldNotCauseException() {
 Trie<Character> trie = new Trie<>();
 trie.add(string2Coll(""));
 }
 @Test
 public void wordShouldBeContainedIfAdded() {
 Trie<Character> trie = new Trie<>();
 trie.add(string2Coll("hello"));
 assertThat(trie.contains(string2Coll("hello")), is(true));
 }
 @Test
 public void wordShouldNotBeContainedIfRemoved() {
 Trie<Character> trie = new Trie<>();
 trie.add(string2Coll("hello"));
 trie.remove(string2Coll("hello"));
 assertThat(trie.contains(string2Coll("hello")), is(false));
 }
 @Test
 public void removingNonExistingWordShouldNotCauseException() {
 Trie<Character> trie = new Trie<>();
 trie.remove(string2Coll("hello"));
 assertThat(trie.contains(string2Coll("hello")), is(false));
 }
 @Test
 public void removingOneWordShouldNotRemoveAnother() {
 Trie<Character> trie = new Trie<Character>();
 trie.add(string2Coll("hello"));
 trie.add(string2Coll("help"));
 trie.remove(string2Coll("help"));
 assertThat(trie.contains(string2Coll("help")), is(false));
 assertThat(trie.contains(string2Coll("hello")), is(true));
 }
 @Test
 public void removingZeroLengthWordShouldNotCauseException() {
 Trie<Character> trie = new Trie<>();
 trie.remove(string2Coll(""));
 assertThat(trie.contains(string2Coll("")), is(true));
 } 
 private static Collection<Character> string2Coll(String str) {
 Collection<Character> ret = new ArrayList<>(str.length());
 for (char ch: str.toCharArray()) {
 ret.add(ch);
 }
 return ret;
 }
}

Question 2

Is this a correct implementation of the Trie data structure?

Apart from what you intentionally left out, yes. My big problem with it, is that a Trie is actually a Set, but when you leave isLeaf out, then it can't work.

Actually, a Trie can also be a Map associating leaves with some values. IMHO, this is the way to go as the Set can be obtained trivially using Collections.newSetFromMap`.

Concerning the keys, I'd replace Collection by List. While all you need is some fixed iteration order, I don't think anyone wants ever call trie.add(someSet). If you really want to make it general, then you could use Iterable instead.

Can the performance or the effectiveness (both in speed, as well as in readability) of the implementation be improved?

It can be make more space-efficient by using some tricks in Node as there's just one child on the average. This would surely come at the cost of more code, so your solution is fine.

I could be probably made much more space and time efficient for the most common case using char keys, but that obviously wasn't your goal.

besides the trivial NPE issues

These are no issues. Passing null where it doesn't belong to should always throw; anything else is an invitation for later bugs.

Would you improve anything in the unit tests?

The remove method should be tested much more. Does removing the empty collection remove everything? What happens if you remove a substring of a present string?

Interactions of multiple calls should be tested. What if you add a word twice?

I guess, you should add equals and hashCode, which would assist you in the tests. When you go for implementing Map, then don't forget that you must follow its (stupid) hashCode contract.

I intentionally left out the isLeaf part for now. (Which tells, for each node, if it is the end of a complete word.

Unfortunately, without isLeaf, you have just a funny data structure, which is hard to grasp and hard to use. With it, you get a Set, which is something we all know well. But as the first step, it's alright. Anyway, I'd recommend to forget isLeaf and go for the Map instead.

Concerning the code itself, there's nothing to say, maybe except for word.iterator().hasNext() being the same as !word.isEmpty(), but given that you use next(), I can understand if this was intentional. However, you may want to special case the removal of the empty word (whatever it's meant to do).

removingZeroLengthWordShouldNotCauseException - is this really all you want?

Question 3

Thanks, I did most of the improvements you suggested. I don't understand one thing, though: do you suggest to add equals and hashCode to the Trie or the Node class? And how would that help with the tests?

Question 4

@Attilio To Trie. It'd allow you to do tests like t1 = new Trie(); t2 = new Trie(); assertEquals(t1.add("a"), t1.add("a").add("a")) easily. There are tons of such possible tests. I don't claim you need them, but when tests are very very simple to write, then I prefer to write a bunch of them without much thought, so I know they cover most possible problems, no matter how the implementation looks like. You'll also need equals when implementing the Map.

maaartinus maaartinus 13.6k1 gold badge35 silver badges73 bronze badges · Accepted Answer · 2017-12-05 11:00:45Z

Is this a correct implementation of the Trie data structure?

Apart from what you intentionally left out, yes. My big problem with it, is that a Trie is actually a Set, but when you leave isLeaf out, then it can't work.

Actually, a Trie can also be a Map associating leaves with some values. IMHO, this is the way to go as the Set can be obtained trivially using Collections.newSetFromMap`.

Concerning the keys, I'd replace Collection by List. While all you need is some fixed iteration order, I don't think anyone wants ever call trie.add(someSet). If you really want to make it general, then you could use Iterable instead.

Can the performance or the effectiveness (both in speed, as well as in readability) of the implementation be improved?

It can be make more space-efficient by using some tricks in Node as there's just one child on the average. This would surely come at the cost of more code, so your solution is fine.

I could be probably made much more space and time efficient for the most common case using char keys, but that obviously wasn't your goal.

besides the trivial NPE issues

These are no issues. Passing null where it doesn't belong to should always throw; anything else is an invitation for later bugs.

Would you improve anything in the unit tests?

The remove method should be tested much more. Does removing the empty collection remove everything? What happens if you remove a substring of a present string?

Interactions of multiple calls should be tested. What if you add a word twice?

I guess, you should add equals and hashCode, which would assist you in the tests. When you go for implementing Map, then don't forget that you must follow its (stupid) hashCode contract.

I intentionally left out the isLeaf part for now. (Which tells, for each node, if it is the end of a complete word.

Unfortunately, without isLeaf, you have just a funny data structure, which is hard to grasp and hard to use. With it, you get a Set, which is something we all know well. But as the first step, it's alright. Anyway, I'd recommend to forget isLeaf and go for the Map instead.

Concerning the code itself, there's nothing to say, maybe except for word.iterator().hasNext() being the same as !word.isEmpty(), but given that you use next(), I can understand if this was intentional. However, you may want to special case the removal of the empty word (whatever it's meant to do).

removingZeroLengthWordShouldNotCauseException - is this really all you want?

Thanks, I did most of the improvements you suggested. I don't understand one thing, though: do you suggest to add equals and hashCode to the Trie or the Node class? And how would that help with the tests?
@Attilio To Trie. It'd allow you to do tests like t1 = new Trie(); t2 = new Trie(); assertEquals(t1.add("a"), t1.add("a").add("a")) easily. There are tons of such possible tests. I don't claim you need them, but when tests are very very simple to write, then I prefer to write a bunch of them without much thought, so I know they cover most possible problems, no matter how the implementation looks like. You'll also need equals when implementing the Map.

Stack Exchange Network

Generic trie implementation in Java

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Generic trie implementation in Java

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions