I have implemented a trie in Java and I would request you to review it and suggest improvements or new features.
The code is split into 3 files.
TrieNode
(individual nodes in the trie)
import java.util.HashMap;
class TrieNode {
private char value;
private HashMap<Character, TrieNode> children ;
public TrieNode() {
// TODO Auto-generated constructor stub
children = new HashMap<>();
}
public TrieNode(char value) {
this.value = value;
children = new HashMap<>();
}
public char getValue() {
return value;
}
public void setValue(char value) {
this.value = value;
}
public HashMap<Character, TrieNode> getChildren() {
return children;
}
public void setChildren(HashMap<Character, TrieNode> children) {
this.children = children;
}
}
Trie
(the trie itself)
import java.util.Set;
public class Trie {
private TrieNode root;
public Trie() {
// TODO Auto-generated constructor stub
root = new TrieNode();
root.setValue('$'); //arbitary node value for root
}
/**
* This method will build the trie with the given set of words
* @param wordSet
*/
public void buildTrie(Set<String> wordSet){
for (String word : wordSet){
int wordLength = word.length();
TrieNode crawl = root;
for (int i=0;i<wordLength ;++i){
crawl = traverseTrie(crawl, word.charAt(i));
}
}
}
/**
* This method will traverse the trie and find out the exact position to
* insert a particular character
* @param node
* @param input
* @return
*/
public TrieNode traverseTrie(TrieNode node,char input){
TrieNode newNode = null;
//if no such node exists then create one
if (node.getChildren().get(input)==null){
node.getChildren().put(input, new TrieNode(input));
newNode = node.getChildren().get(input);
}else{
//if exist then just traverse
newNode = node.getChildren().get(input);
}
return newNode;
}
/**
* This method searches the trie for the given word and returns a boolean value
* indicating it's presence or absence
* @param word
* @return {@link boolean} value indicating it's presence or absence
*/
public boolean searchWord(String word) {
// TODO Auto-generated method stub
boolean result = true;
int wordLength = word.length();
TrieNode crawler = root;
for (int i= 0 ; i< wordLength ; ++i){
char character = word.charAt(i);
if (crawler.getChildren().get(character) == null){
result = false ;
break;
}else{
crawler = crawler.getChildren().get(character);
}
}
return result;
}
//getters and setters
public TrieNode getRoot() {
return root;
}
public void setRoot(TrieNode root) {
this.root = root;
}
}
TrieDriver
(the driver program to run the trie)
import java.util.HashSet;
import java.util.Set;
public class TrieDriver {
public static void main(String[] args) {
// TODO Auto-generated method stub
Trie trie = new Trie();
Set<String> dictionary = new HashSet<>();
dictionary.add("CAT");
dictionary.add("CATWOMAN");
dictionary.add("CATERER");
dictionary.add("DOT");
dictionary.add("DOG");
dictionary.add("DOC");
//build the trie
trie.buildTrie(dictionary);
String wordToSearch = "CAX";
//search the trie
System.out.println("The word "+wordToSearch + " is present or not???"+
(boolean)trie.searchWord(wordToSearch));
}
}
2 Answers 2
Using a concrete class where an abstract one or an interface can be used is usually a bad practice. That is, the
getChildren
method of theTrieNode
class should return aMap<Character, Integer>
, not aHashMap
. There is no need to get bound to a concrete implementation.The
TrieNode
class has redundant fields and methods. Thevalue
field is not used anywhere, so there is no need to keep it. Getters and setters for it are not necessary, too. Moreover, this classes exposes its internal representation. The clients of this class(in this it is theTrie
class) do not really need to know how the children of this node are represented. Instead of returning aMap
and letting the client work with it, I would create two methods:getChildForCharacter
andgetChildForCharacterOrCreateIfAbsent
. Here is my version of this class:class TrieNode { private Map<Character, Integer> children = new HashMap<>(); /** * Returns the child of this node for the given character or null * if it does not exist. */ public TrieNode getChildForCharacter(char c) { return children.get(c); } /** * Returns the child of this node for the given character. If it does * not exist, a new child is created. */ public TrieNode getChildForCharacterOrCreateIfAbsent(char c) { if (getChildForCharacter(c) == null) { children.put(c, new TrieNode()); } return children.get(c); } };
Now this class does not expose its implementation and we can easily change it in the future if it is necessary. You could also add a
isTerminal
andsetTerminal
methods to this class if you want to search for the entire words, not just their prefixes.The
Trie
class, again, in my opinion, contains a couple of methods that should not be there.- I think that the
getRoot
andsetRoot
method should be simply deleted. Why? Setting a new root breaks the trie(in a sense that previously added strings might turn out not to be there any) and the purpose of a getter is not clear(why would I need a trie's root?). Moreover, they expose the internal representation, which is not good. I'd just remove them. The clients of theTrie
do not need to know that it has a root in the first place because it is an implementation detail. traverseTree
should beprivate
(or at leastprotected
). It is a helper method. There is no need to expose it to the clients. They don't even know what aTrieNode
is(because this class is not public).buildTrie
is not a very good name. It implies that this method should be called only once(then the question is: why isn't it a constructor?). But it makes perfect sense to add new words to a trie. That's why I'd rename it toaddAll
or something like this. I would also add a methodadd
which takes oneString
argument(to avoid making the clients of this class create aSet
even if they want to add only oneString
.
- I think that the
Spacing and indentation. It is inconsistent in your code. Sometimes whitespaces around binary operators are absent, sometimes there are arbitrary whitespaces before a semicolon. You should keep you style consistent and adhere to established guidelines. It is easy to fix it using almost any modern Java IDE.
Testing. If do not use unit-tests, start using them. Testing your code in a consistent, systematic manner is a very good practice.
Summary
Avoid exposing the internal representation of objects. Do not create getters and setters for the sake of having them. It is a bad practice.
Keep your code style consistent and follow established guidelines unless you have very good reasons not to.
Keep a consistent level of abstraction in your API. Try not to mix concepts from different levels(like a node in a trie and a
HashMap
).
Excessive duplication
In this code I see node.getChildren().get(input)
3 times:
TrieNode newNode; if (node.getChildren().get(input) == null) { node.getChildren().put(input, new TrieNode(input)); newNode = node.getChildren().get(input); } else { newNode = node.getChildren().get(input); } return newNode;
Such chained expressions are a mental burden,
hard to type, and some of the lookups are inefficient.
The name newNode
is also misleading, as it is not always a new node in this snippet.
The snippet can be written simpler and better as:
TrieNode nextNode = node.getChildren().get(input);
if (nextNode == null) {
nextNode = new TrieNode(input);
node.getChildren().put(input, nextNode);
}
return nextNode;
Don't add a setter if you don't really need it
Is there a good reason for Trie.setRoot
to exist?
If you don't need it, then omit it.
Then you can make Trie.root
a final
member.
final
members a great because they are predictable.
In fact, it seems to me that TrieNode
doesn't need setChildren
and setValue
.
I think it's good to strive for minimalistic implementations. Every statement in your code is a potential bug. The simpler the code the better. Omit all unnecessary elements.
Don't expose implementation details
Unless other classes need it,
the TrieNode
would be best as a static class
declared inside Trie
.
Naming
You can drop "Trie" from these methods:
buildTrie
traverseTrie
The name of the class already implies that you're working with a Trie.
In searchWord
the term "Word" seems also pointless.
Other possible simplifications
This can be simplified:
boolean result = true; int wordLength = word.length(); TrieNode crawler = root; for (int i = 0; i < wordLength; ++i) { char character = word.charAt(i); if (crawler.getChildren().get(character) == null) { result = false; break; } else { crawler = crawler.getChildren().get(character); } } return result;
There's no need for the flag result
,
you can return immediately in the false
case:
int wordLength = word.length();
TrieNode crawler = root;
for (int i = 0; i < wordLength; ++i) {
char character = word.charAt(i);
crawler = crawler.getChildren().get(character);
if (crawler == null) {
return false;
}
}
return true;
Excessive comments
I don't understand why you include in a code review comments like this:
// TODO Auto-generated constructor stub
These comments are also clearly pointless:
//build the trie trie.buildTrie(dictionary); String wordToSearch = "CAX"; //search the trie System.out.println("The word "+wordToSearch + " is present or not???"+ (boolean)trie.searchWord(wordToSearch));
Suggested implementation
With the above suggestions applied, your code becomes a lot simpler and easier on the eyes:
class Trie {
private static class TrieNode {
private final char value;
private final Map<Character, TrieNode> children;
public TrieNode(char value) {
this.value = value;
this.children = new HashMap<>();
}
public char getValue() {
return value;
}
public Map<Character, TrieNode> getChildren() {
return children;
}
}
private final TrieNode root = new TrieNode('$');
public void build(Set<String> wordSet) {
for (String word : wordSet) {
int wordLength = word.length();
TrieNode crawl = root;
for (int i = 0; i < wordLength; ++i) {
crawl = traverse(crawl, word.charAt(i));
}
}
}
public TrieNode traverse(TrieNode node, char input) {
TrieNode nextNode = node.getChildren().get(input);
if (nextNode == null) {
nextNode = new TrieNode(input);
node.getChildren().put(input, nextNode);
}
return nextNode;
}
public boolean search(String word) {
int wordLength = word.length();
TrieNode crawler = root;
for (int i = 0; i < wordLength; ++i) {
char character = word.charAt(i);
crawler = crawler.getChildren().get(character);
if (crawler == null) {
return false;
}
}
return true;
}
}
Unit testing
Instead using that main
method and reading its output,
I suggest to start using proper unit tests, for example:
public class TrieTest {
private Trie trie;
public TrieTest() {
trie = new Trie();
trie.build(new HashSet<>(Arrays.asList("CAT", "CATWOMAN", "CATERER", "DOT", "DOG", "DOC")));
}
@Test
public void test_CAX() {
assertFalse(trie.search("CAX"));
}
@Test
public void test_CATER() {
assertTrue(trie.search("CATER"));
}
@Test
public void test_ELVIS() {
assertFalse(trie.search("ELVIS"));
}
}