I am creating an application which needs to validated user input to ensure the words being entered are real words. What improvements if any can I make?
class MyDictionary {
Vector<String> words = new Vector<>();
public MyDictionary() {
URL url;
try {
url = new URL("http://www.example.com/hugewordlist.txt");
URLConnection uc = url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(uc.getInputStream()));
char[] buffer = new char[1024];
int i = 0;
StringBuffer b = new StringBuffer();
int readBytes = 0;
while ((readBytes = br.read(buffer, i, 1024)) != -1) {
i += 1024;
b.append(buffer, 0, readBytes);
}
for (String w : b.toString().split("\n")) {
words.add(w);
}
br.close();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public boolean hasWord(String word) {
return words.contains(word);
}
public int size() {
return words.size();
}
}
3 Answers 3
- What do you want to do when there are no words? The current solution isn't satisfactory at all. You're simply failing silently, and hasWord will always return false. By the way, if there's an exception in the middle of of the loop, will your file be closed?
- Searching in a
Vector
takes O(n) time. This means that if you double the size of the vector, it will take twice as long. Other containers such as TreeSet have O(log n) access time: doubling their size will only slow the process by one iteration.HashSet
has O(1) access time. Your lookups will be way faster. - Do you know that in a standard text unknown words account for ~10% of your text?
-
\$\begingroup\$ "hasWord will always return false" ? I'm wondering why \$\endgroup\$Nettogrof– Nettogrof2012年02月24日 16:30:54 +00:00Commented Feb 24, 2012 at 16:30
-
\$\begingroup\$ @Nettogrof What makes you think hasWord always returns false? \$\endgroup\$user11335– user113352012年02月24日 17:38:59 +00:00Commented Feb 24, 2012 at 17:38
-
\$\begingroup\$ Oops my bad, I misread. If the reading fail, then the hasWord will always return false. \$\endgroup\$Nettogrof– Nettogrof2012年02月24日 17:51:21 +00:00Commented Feb 24, 2012 at 17:51
There are a number of suggestions I can think of. Hope these are helpful. Well done for asking for feedback.
Consider clarifying in your question exactly what kind of improvements you're asking for. Do you want the class to be "reliable", "maintainable", "optimised for performance", "written in as few lines as possible", etc. etc. otherwise you will get very general feedback.
Hard coded URL prevents reuse
Move the URL resource into at least a string constant or better still into your application configuration (perhaps a web.config file). You may want to consider passing in the URL as a parameter. This would make your class more reusable and maybe even more easy to test.
What does the class do?
The fact that the class loads a dictionary of words from a URL is currently "hidden" away in your constructor. The name "MyDictionary" is not a meaningful class name as it doesn't describe what the class is for or what it does. Consider options like WordVector.
Low level detail not hidden away
I'm confused about the way you're loading the file using 1024 byte increments. What goal are you hoping to accomplish by doing the load in this way? This is a low level detail which could be hidden away, perhaps even in a separate class - something like "LargeURLLoader" which the WordVector class could use to load its list. The advantage of pulling the load out into another class would allow you to load files in different ways, and also to focus on the details of the actual load, and would allow you to load such files from other classes instead of only within the dictionary.
Good luck!
Depending on how and where you are (re-)using your class, you may consider abstracting the class into a hierarchy of interfaces and sub-classes.
For example, it could make sense to define an interface
just containing the hasWord
method (and maybe the size
method), then create an abstract base class, that contains the hasWord
functionality, but doesn't load the data and then a concrete class that reads the data from an InputStream
and finally a last class that provides that InputStream
from a URL. This would include not hard-coding the URL into the class, but provide it as a parameter.
-
\$\begingroup\$ Instance of PrematureAbstraction? \$\endgroup\$Quentin Pradet– Quentin Pradet2012年02月24日 13:27:50 +00:00Commented Feb 24, 2012 at 13:27
-
1\$\begingroup\$ @Cygal Well, that is a good point :-) even if I did write "depending on how you (re-)use your class". I probably overdid it here, but the OP's code doesn't really show the slightest attempt of actually wielding OOP, and I wanted to hint at what is possible. \$\endgroup\$RoToRa– RoToRa2012年02月24日 13:40:12 +00:00Commented Feb 24, 2012 at 13:40
Vector
which is synchronized, even though you don't appear to be using it in a threaded context. Beyond that, there's no good way to use a different word list (because you've hardcoded the URL). \$\endgroup\$