6
\$\begingroup\$

I am creating an application which needs to validated user input to ensure the words being entered are real words. What improvements if any can I make?

class MyDictionary {
 Vector<String> words = new Vector<>();
 public MyDictionary() {
 URL url;
 try {
 url = new URL("http://www.example.com/hugewordlist.txt");
 URLConnection uc = url.openConnection();
 BufferedReader br = new BufferedReader(new InputStreamReader(uc.getInputStream()));
 char[] buffer = new char[1024];
 int i = 0;
 StringBuffer b = new StringBuffer();
 int readBytes = 0;
 while ((readBytes = br.read(buffer, i, 1024)) != -1) {
 i += 1024;
 b.append(buffer, 0, readBytes);
 }
 for (String w : b.toString().split("\n")) {
 words.add(w);
 }
 br.close();
 } catch (MalformedURLException e) {
 e.printStackTrace();
 } catch (IOException e) {
 e.printStackTrace();
 }
 }
 public boolean hasWord(String word) {
 return words.contains(word);
 }
 public int size() {
 return words.size();
 }
}
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Feb 24, 2012 at 12:14
\$\endgroup\$
1
  • 3
    \$\begingroup\$ You're using Vector which is synchronized, even though you don't appear to be using it in a threaded context. Beyond that, there's no good way to use a different word list (because you've hardcoded the URL). \$\endgroup\$ Commented Feb 24, 2012 at 22:24

3 Answers 3

7
\$\begingroup\$
  1. What do you want to do when there are no words? The current solution isn't satisfactory at all. You're simply failing silently, and hasWord will always return false. By the way, if there's an exception in the middle of of the loop, will your file be closed?
  2. Searching in a Vector takes O(n) time. This means that if you double the size of the vector, it will take twice as long. Other containers such as TreeSet have O(log n) access time: doubling their size will only slow the process by one iteration. HashSet has O(1) access time. Your lookups will be way faster.
  3. Do you know that in a standard text unknown words account for ~10% of your text?
answered Feb 24, 2012 at 12:25
\$\endgroup\$
3
  • \$\begingroup\$ "hasWord will always return false" ? I'm wondering why \$\endgroup\$ Commented Feb 24, 2012 at 16:30
  • \$\begingroup\$ @Nettogrof What makes you think hasWord always returns false? \$\endgroup\$ Commented Feb 24, 2012 at 17:38
  • \$\begingroup\$ Oops my bad, I misread. If the reading fail, then the hasWord will always return false. \$\endgroup\$ Commented Feb 24, 2012 at 17:51
4
\$\begingroup\$

There are a number of suggestions I can think of. Hope these are helpful. Well done for asking for feedback.

Consider clarifying in your question exactly what kind of improvements you're asking for. Do you want the class to be "reliable", "maintainable", "optimised for performance", "written in as few lines as possible", etc. etc. otherwise you will get very general feedback.

Hard coded URL prevents reuse

Move the URL resource into at least a string constant or better still into your application configuration (perhaps a web.config file). You may want to consider passing in the URL as a parameter. This would make your class more reusable and maybe even more easy to test.

What does the class do?

The fact that the class loads a dictionary of words from a URL is currently "hidden" away in your constructor. The name "MyDictionary" is not a meaningful class name as it doesn't describe what the class is for or what it does. Consider options like WordVector.

Low level detail not hidden away

I'm confused about the way you're loading the file using 1024 byte increments. What goal are you hoping to accomplish by doing the load in this way? This is a low level detail which could be hidden away, perhaps even in a separate class - something like "LargeURLLoader" which the WordVector class could use to load its list. The advantage of pulling the load out into another class would allow you to load files in different ways, and also to focus on the details of the actual load, and would allow you to load such files from other classes instead of only within the dictionary.

Good luck!

answered Feb 27, 2012 at 18:56
\$\endgroup\$
2
\$\begingroup\$

Depending on how and where you are (re-)using your class, you may consider abstracting the class into a hierarchy of interfaces and sub-classes.

For example, it could make sense to define an interface just containing the hasWord method (and maybe the size method), then create an abstract base class, that contains the hasWord functionality, but doesn't load the data and then a concrete class that reads the data from an InputStream and finally a last class that provides that InputStream from a URL. This would include not hard-coding the URL into the class, but provide it as a parameter.

answered Feb 24, 2012 at 13:19
\$\endgroup\$
2
  • \$\begingroup\$ Instance of PrematureAbstraction? \$\endgroup\$ Commented Feb 24, 2012 at 13:27
  • 1
    \$\begingroup\$ @Cygal Well, that is a good point :-) even if I did write "depending on how you (re-)use your class". I probably overdid it here, but the OP's code doesn't really show the slightest attempt of actually wielding OOP, and I wanted to hint at what is possible. \$\endgroup\$ Commented Feb 24, 2012 at 13:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.