As a new Java learner I would like to improve my coding style. I wrote a program which gets as an input a file name, and asks for two file names for the output.
The first output file includes all the words from the input file with their number of occurrences.
The second output file includes the words in lexicographic order.
There is some code that is written twice, I think I can fix that by declaring a new method for this part of code, but I'm not sure how to do this properly.
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.*;
import java.io.*;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Scanner;
import java.util.Set;
public class TextAnalyzer {
private static HashMap<String, Integer> wordCount = new HashMap<String, Integer>();
public static StringBuffer stringBuffer;
public static void main(String[] args) throws IOException {
if (args.length == 0) {
System.out.println("Please specify input file as program argument");
return;
}
getWordsCount(args[0]);
String outputFileName;
Scanner scanner = new Scanner(System.in);
System.out.println("Please enter output file name:");
outputFileName = scanner.nextLine();
while (outputFileName.isEmpty()) {
System.out
.println("Invalid file name ! Please re-enter output file name:\n");
outputFileName = scanner.nextLine();
}
stringBuffer = new StringBuffer();
for (Entry<String, Integer> word : wordCount.entrySet()) {
stringBuffer.append(String.format("%s\t%d%n", word.getKey(),
word.getValue()));
}
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(outputFileName));
bufferedWriter.write(stringBuffer.toString());
bufferedWriter.close();
Set<String> s = wordCount.keySet();
List<String> l = new ArrayList<String>(s);
Collections.sort(l);
String s2 = "";
for (int i = 0; i < l.size(); i++) {
s2 = s2 + l.get(i) + "\n";
}
System.out.println("Please enter output file name:");
outputFileName = scanner.nextLine();
while (outputFileName.isEmpty()) {
System.out
.println("Invalid file name ! Please re-enter output file name:\n");
outputFileName = scanner.nextLine();
}
bufferedWriter = new BufferedWriter(new FileWriter(outputFileName));
bufferedWriter.write(s2);
bufferedWriter.close();
scanner.close();
}
/**
* Updates the Hashmap wordCount by inserting a word as a key and its number
* of occurrences as a value.
*
* @param filename
* the file to read from
* @throws IOException
*/
private static void getWordsCount(String filename) throws IOException {
Scanner scanner = new Scanner(new File(filename));
while (scanner.hasNext() == true) {
String token = scanner.next();
Integer tokenCount = wordCount.get(token);
if (tokenCount == null) {
wordCount.put(token, 1);
} else
wordCount.put(token, tokenCount + 1);
}
scanner.close();
}// ENDD
}
2 Answers 2
Single responsibility principle
The main
method does most of the job.
It would be better to split that up to smaller methods,
in a way that each method has a single responsibility.
You will end up with something more modular, reusable, testable.
Make variables local when possible
stringBuffer
is only used within the main
method,
so doesn't need to be a member variable.
Declare it right before you need it,
in fact, it's best at the time of initialization:
StringBuffer stringBuffer = new StringBuffer();
Avoid static variables when possible
None of the static variables were needed. In general, avoid static variables as much as possible, especially if they are mutable.
wordCount
is not as simple to eliminate as stringBuffer
,
but still easy enough:
make getWordsCount
return a Map<String, Integer>
.
Use the enhanced for-each loop
Instead of this:
for (int i = 0; i < l.size(); i++) { s2 = s2 + l.get(i) + "\n"; }
Since you don't need the index values and it's just clutter, the recommended writing style is this:
for (String word : l) {
s2 = s2 + word + "\n";
}
This has another advantage that in this form the loop can work efficiently not only with an ArrayList
, but with a LinkedList
too,
because elements are no longer accessed by index.
Use boolean expressions directly
Instead of this:
while (scanner.hasNext() == true) {
You can and should write this:
while (scanner.hasNext()) {
Refer to types by interfaces
Instead of this:
private static HashMap<String, Integer> wordCount = new HashMap<String, Integer>();
This would be better:
private static Map<String, Integer> wordCount = new HashMap<String, Integer>();
Use Java 7
I'm guessing you're using an older version because of things like this:
List<String> l = new ArrayList<String>(s);
Starting from Java 7 you don't need to specify the type parameters at the right hand side of expressions, the compiler can figure them out most of the time, and you can use the diamond operator <>
like this:
List<String> l = new ArrayList<>(s);
Unused imports
You have some unused imports:
import java.io.FileNotFoundException; import java.util.Map;
I'm surprised your IDE doesn't warn about them. In fact it probably does. Strive to keep warnings to minimum, ideally down to zero.
Extremely poor names
In this code,
the variable names s
, l
, s2
are bordering obfuscation.
Set<String> s = wordCount.keySet(); List<String> l = new ArrayList<>(s); Collections.sort(l); String s2 = ""; for (int i = 0; i < l.size(); i++) { s2 = s2 + l.get(i) + "\n"; }
Sure you can come up with better names for these, which will make the code so much more readable.
Prefer StringBuilder
over StringBuffer
StringBuffer
synchronizes internally to make it thread safe.
This is rarely needed in practice,
and in this example it's completely unnecessary.
Use StringBuilder
instead.
Btw, is there a benefit to accumulating the output in memory before producing the output? Probably not. You could just write the output directly, without building up a long string.
Put braces consistently
In this code, you put braces for the if
, but not for the else
if (tokenCount == null) { wordCount.put(token, 1); } else wordCount.put(token, tokenCount + 1);
At the minimum, be consistent. The recommendation is to put braces always.
Do no string concatenation in loops:
String s2 = "";
for (int i = 0; i < l.size(); i++) {
s2 = s2 + l.get(i) + "\n"; // <- SLOW!!!!!!
}
A nicer style would be
s2 += l.get(i) + "\n"; // <- Equally slow!
But in a loop, it means that on each iteration a new String
gets created and it gets longer and longer. And slower and slower. Use this
StringBuilder sb = new StringBuilder();
for (int i = 0; i < l.size(); i++) {
sb.append(l.get(i)).append("\n");
}
String s2 = sb.toString();
And then improve the names (I simply used sb
and s2
as I have no idea what it's all about).