I am trying to write clean code in Java for reading text of any size and printing the word count in ascending order without using Java collection framework.
For example, given the following input text "What is the your name of the name?", it should print:
your 1
of 1
is 1
What 1
name 2
the 2
I have written the below code, but I am not sure it is the right answer. Please evaluate it.
public class WordArrayList {
private WordData[] words;
public void setWords(WordData[] words) {
this.words = words;
}
public WordData[] getWords() {
return words;
}
private int size;
public WordData getWord(int index) {
return words[index];
}
public int getSize() {
return size;
}
private static final int DEFAULT_WORD_ARRAY_SIZE = 10;
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
public WordArrayList() {
words = new WordData[DEFAULT_WORD_ARRAY_SIZE];
}
public WordArrayList(int initialCapacity) {
super();
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal Capacity: "
+ initialCapacity);
words = new WordData[initialCapacity];
}
public int insertWord(final String word) {
if (checkAndAddExistingWord(word)) {
return 0;
}
checkAndIncreaseCapacity();
words[size] = new WordData(word);
size++;
return 1;
}
private boolean checkAndAddExistingWord(String word) {
int position = 0;
for (position = 0; position < size; position++) {
if (words[position].getWord().equals(word)) {
words[position].incrementWordCount();
return true;
}
}
return false;
}
private void checkAndIncreaseCapacity() {
if (isWordArrayFull(size))
increaseCapacity(size);
}
private void increaseCapacity(int size) {
int newCapacity = size * 2;
if (newCapacity > MAX_ARRAY_SIZE) {
newCapacity = Integer.MAX_VALUE;
}
WordData[] newWords = new WordData[newCapacity];
System.arraycopy(words, 0, newWords, 0, size);
}
private boolean isWordArrayFull(int wordCount) {
return (wordCount - words.length > 0);
}
public void setWord(int i, WordData word) {
words[i] = word;
}
public class TextAnalyzer {
private static final String REG_SPLIT_EXP = "\\s+";
private String text;
private String textSplitExp;
private WordArrayList wordArrayList;
public TextAnalyzer(String text, String textSplitExp) {
this.textSplitExp = textSplitExp;
this.text = text;
wordArrayList = new WordArrayList();
}
public TextAnalyzer(String text) {
this.textSplitExp = REG_SPLIT_EXP;
this.text = text;
wordArrayList = new WordArrayList();
}
private String[] convertToArray() {
return text.split(textSplitExp);
}
public void generateWordArrayList() {
String[] splitTextArray = convertToArray();
for (int i = 0; i < splitTextArray.length; i++) {
wordArrayList.insertWord(splitTextArray[i]);
}
}
public void printWordArrayList() {
for (int i = 0; i < wordArrayList.getSize(); i++) {
System.out.println("=" + wordArrayList.getWord(i).getWord() + "=="
+ wordArrayList.getWord(i).getWordCount());
}
}
public void sortWordArrayList() {
TextAnalyzerUtil.sort(wordArrayList);
}
}
public class WordData {
public WordData(final String vWord) {
this.word = vWord;
this.wordCount=1;
}
private String word;
public int getWordCount() {
return wordCount;
}
public void setWordCount(int wordCount) {
this.wordCount = wordCount;
}
public void setWord(String word) {
this.word = word;
}
private int wordCount;
private int position;
public String getWord() {
return word;
}
public void incrementWordCount() {
this.wordCount++;
}
public int getPosition() {
return position;
}
public void setPosition(int position) {
this.position = position;
}
}
For testing:
analyzer.generateWordArrayList();
analyzer.sortWordArrayList();
analyzer.printWordArrayList();
2 Answers 2
There are a couple of things you should think about or fix:
Indentation. You're not really consistent in indenting your code. Please follow the Java conventions.
No need to call
super()
inside your constructor, the default empty super constructor is automatically called.Use "constructor chaining" in
WordArrayList
, andTextAnalyzer
. That is, call one constructor from another. For example,public TextAnalyzer(String text) { this(text, REG_SPLIT_EXP); }
You have a whole lot of unnecessary methods:
setWords
,getWords
,setWordCount
,setWord
. You don't use them and I really don't think that you need them.Use
private final
for fields that should be initialized once and then never change, such asprivate final String word;
in theWordData
class. Apply this wherever possible.private int position;
is not used inWordData
, and I don't think you need to use it either. Select and press the Delete button on your keyboardI suggest naming the class
WordList
instead. The fact that it uses an array is not important. It is a List of words, how the internal list works is not needed for the user to knowCalling
generateWordArrayList
multiple times would produce strange results.convertToArray
is private, contains one line, and is only called from one location: Does it really need to be it's own method?Use a "for-each" loop to improve readability when iterating:
for (String str : splitTextArray)
You can store the number of times each word occurs with a
HashMap<String, Integer>
instead. To initialize the HashMap, you can use this:Map<String, Integer> words = new HashMap<String, Integer>(); for (String str : splitTextArray) { if (words.containsKey(str)) words.put(str, words.get(str) + 1); else words.put(str, 1); }
Note that you can use
TreeMap
instead ofHashMap
to keep the keys in a sorted order, so that if you iterate overwords.entrySet()
, you will always iterate in sorted order. (You can also provide a customComparator
to the TreeMap if you are not satisfied with the built-in sorting)
Overall the current classes you have seems to do what they are supposed to do, and the result is correct. Good job.
-
\$\begingroup\$ thanks for your prompt response sorry for not adding sorting code but it was just quick sort. Can the above code be done in better way. I am asking in better design prospect. \$\endgroup\$Nitin– Nitin2014年01月07日 17:29:58 +00:00Commented Jan 7, 2014 at 17:29
-
\$\begingroup\$ @Nitin Remove the
WordData
class, and replace theWordArrayList
class with aMap<String, Integer>
. That's the simplest and in my opinion best way to get the job done. \$\endgroup\$Simon Forsberg– Simon Forsberg2014年01月07日 17:33:03 +00:00Commented Jan 7, 2014 at 17:33 -
\$\begingroup\$ Hi Simon, sorry i need to write code without collection so no hashmap \$\endgroup\$Nitin– Nitin2014年01月07日 17:35:21 +00:00Commented Jan 7, 2014 at 17:35
-
\$\begingroup\$ @Nitin In that case, just clean up your code by following the other points I have pointed out for you. \$\endgroup\$Simon Forsberg– Simon Forsberg2014年01月07日 17:39:32 +00:00Commented Jan 7, 2014 at 17:39
-
3\$\begingroup\$ When putting a count as a value in a map, I tend to use a mutable value (e.g.
AtomicInteger
) so updating the count can be done without doing aput
every time. \$\endgroup\$bowmore– bowmore2014年01月07日 18:40:32 +00:00Commented Jan 7, 2014 at 18:40
Bug
In increaseCapacity()
, you create a new array and copy the old contents to it, but never replace the old array.
Efficiency
The running time will be O(n2), where n is the number of words, since you could potentially run through the entire existing array when adding a new word. Using a HashMap
would be O(n), and a TreeMap
would be O(n log n). You are probably doing this the "hard" way for fun, so that's OK, as long as you are aware of the potential performance problem.
Interface
You should pare down the number of exposed methods, not just to reduce the amount of code, but to improve the interface. If you put getters and setters on all fields, your class would be too promiscuous. At that point, you might as well make all the fields public
. (In fact, the way your current code exposes all its state, it is about as unmaintainable as code that uses global variables.) By choosing what methods to expose, you give the class a specific purpose and prevent it from being misused. Also, limiting the interface helps you reserve the capability to change the implementation, should you come up with a better idea later.
For WordArrayList
:
- You should rename it to
WordCounter
.WordArrayList
sounds like a generic, passive data structure.WordCounter
has a clear purpose. insertWord()
is a misnomer, it implies that it will always result in an additional entry, and that the caller has some control over the order of insertion. I thinkaddWord()
would be more appropriate.- I would also offer
addWords(String[] words)
, since it's convenient for callers, and it might present opportunities for optimization if you add a whole batch of words at once. - Instead of exposing the guts of the object, I suggest presenting a method to retrieve an
Enumeration
of the results, based on the general advice above. (When the Java Collections Framework was introduced in 1.2,Enumeration
was semi-deprecated in favor ofIterator
. I don't know if you considerIterator
to be off-limits as a result, so I chose the antiquatedEnumeration
interface instead.) - I would consolidate your private helper methods a bit for simplicity.
- I've added a
reset()
method to clear all the counts, which is quite optional. It makes up for the removal ofWordCount.setWordCount()
.
For WordData
:
- I would expose just
getWord()
andgetWordCount()
(probably better renamed togetCount()
). - The position should not be stored part of
WordData
, and in fact you never use it. - You made the constructor take a
final
argument. That's OK, but the better place to put thefinal
keyword is in theprivate final String word
declaration. - I choose to make it an inner class of
WordCounter
, but that's just my preference.
Note that with these changes, users of WordCounter
have no way to muck with its internal state except through the approved interfaces. WordCounter
can offer some guarantees of self consistency in a way that the original WordArrayList
couldn't. If it's used in a more complex system, those guarantees make it possible for you to debug the code using reasoning.
Constructor chaining
Whenever you offer more than one constructor, you should chain them, such that the one with more default parameters calls the one with explicit parameters.
Code formatting
Your code formatting is all over the place, in terms of indentation. You should also leave a blank line between methods for readability.
import java.util.Arrays;
import java.util.Comparator;
import java.util.Enumeration;
import java.util.NoSuchElementException;
public class WordCounter {
//////////////////////////////////////////////////////////////////////
public static class WordData {
private final String word;
private int wordCount;
WordData(String vWord) {
this.word = vWord;
this.wordCount = 1;
}
public String getWord() {
return word;
}
public int getWordCount() {
return wordCount;
}
}
//////////////////////////////////////////////////////////////////////
private static class WordFrequencyComparator implements Comparator<WordData> {
@Override
public int compare(WordData a, WordData b) {
int wcDiff = b.getWordCount() - a.getWordCount();
return wcDiff != 0 ? wcDiff : a.getWord().compareTo(b.getWord());
}
}
//////////////////////////////////////////////////////////////////////
private WordData[] words;
private int size;
private static final int DEFAULT_WORD_ARRAY_SIZE = 10;
private static final Comparator<WordData> DESCENDING_WORD_FREQUENCY_COMPARATOR = new WordFrequencyComparator();
public WordCounter() {
this(DEFAULT_WORD_ARRAY_SIZE);
}
public WordCounter(int initialCapacity) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal Capacity: "
+ initialCapacity);
words = new WordData[initialCapacity];
}
/**
* Clears the word count.
*/
public void reset() {
for (int i = 0; i < size; i++) {
words[i] = null;
}
}
/**
* Adds word to the word count.
*
* @return 1 if the word was previously unseen, 0 otherwise.
*/
public int addWord(String word) {
checkAndIncreaseCapacity(size + 1);
if (checkAndAddExistingWord(word)) {
return 0;
}
words[size++] = new WordData(word);
return 1;
}
/**
* Adds words to the word count.
*
* @return the number of previously unseen words.
*/
public int addWords(String[] words) {
checkAndIncreaseCapacity(size + words.length);
int newWordsAdded = 0;
for (String w : words) {
newWordsAdded += addWord(w);
}
return newWordsAdded;
}
/**
* Returns an Enumeration of the WordData in descending order of word
* frequency. Behavior is undefined if you add words or reset the
* counter while enumerating.
*/
public Enumeration<WordData> elements() {
Arrays.sort(words, 0, size, DESCENDING_WORD_FREQUENCY_COMPARATOR);
return new Enumeration<WordData>() {
private int i = 0;
@Override
public boolean hasMoreElements() {
return i < size;
}
@Override
public WordData nextElement() {
try {
return words[i++];
} catch (ArrayIndexOutOfBoundsException e) {
throw new NoSuchElementException();
}
}
};
}
private boolean checkAndAddExistingWord(String word) {
for (WordData wd : words) {
if (wd.getWord().equals(word)) {
wd.wordCount++;
return true;
}
}
return false;
}
private void checkAndIncreaseCapacity(int sizeHint) {
if (size >= words.length) {
int newCapacity = size * 2;
if (newCapacity < sizeHint) {
newCapacity = sizeHint;
}
if (newCapacity < 0) {
newCapacity = Integer.MAX_VALUE;
}
WordData[] newWords = new WordData[newCapacity];
System.arraycopy(words, 0, newWords, 0, size);
words = newWords;
}
}
}
With the changes above, TextAnalyzer
can become a lot simpler:
import java.util.Enumeration;
public class TextAnalyzer {
private static final String REG_SPLIT_EXP = "\\s+";
private WordCounter wordCounter;
public TextAnalyzer(String text, String textSplitExp) {
wordCounter = new WordCounter();
wordCounter.addWords(text.split(textSplitExp));
}
public TextAnalyzer(String text) {
this(text, REG_SPLIT_EXP);
}
public void printWords() {
Enumeration<WordCounter.WordData> words = wordCounter.elements();
while (words.hasMoreElements()) {
WordCounter.WordData wordData = words.nextElement();
System.out.println("=" + wordData.getWord() + "=="
+ wordData.getWordCount());
}
}
}
-
\$\begingroup\$ thanks @200_success. I have couple of doubts 1. Can we scale this application for large number of text e.g. 1 miilion words. 2. Arrays is part of collection framework and use legacy sort which in terns clone the whole can we use sort without using Arrays?. \$\endgroup\$Nitin– Nitin2014年01月09日 07:04:32 +00:00Commented Jan 9, 2014 at 7:04
-
1\$\begingroup\$ 1. See my remarks about efficiency in the review. 2. You're right on both counts.
Arrays
was introduced with Java Collections in Java 1.2, andArrays.sort(Object[])
uses mergesort, which does not operate in place. In place ofArrays.sort()
, you could substitute whatever sorting method you were using before. \$\endgroup\$200_success– 200_success2014年01月10日 06:57:20 +00:00Commented Jan 10, 2014 at 6:57
TextAnalyzerUtil.sort
method in your code? That way we could compile the code ourselves and therefore make a better review. \$\endgroup\$