(This post is the continuation of Get histogram of bytes in any set of files in Java - take II.)
This time my code looks like as follows:
com.github.coderodde.file.util.ByteHistogramApp.java:
package com.github.coderodde.file.util;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.util.logging.Level;
import java.util.logging.Logger;
/**
* This class implements a program for counting byte histograms in files.
*
* @version 1.0.2 (Nov 17, 2024)
* @since 1.0.0 (Nov 13, 2024)
*/
public final class ByteHistogramApp {
private static final Logger LOGGER =
Logger.getLogger(ByteHistogram.class.getSimpleName());
public static void main(String[] args) {
final ByteHistogram byteHistogram = new ByteHistogram();
if (args.length == 0) {
try {
loadByteHistogramFromStdin(byteHistogram);
} catch (IOException ex) {
LOGGER.log(
Level.SEVERE,
"Could not load the histogram from " +
"standard input stream.");
}
} else {
for (final String fileName : args) {
try {
loadByteHistogramFromFile(fileName, byteHistogram);
} catch (IOException ex) {
LOGGER.log(
Level.SEVERE,
"Could not load the histogram from file \"{0}\".",
fileName);
}
}
}
System.out.println(byteHistogram);
}
/**
* Loads the byte histogram data only from standard input.
*
* @param byteHistogram the byte histogram for holding the statistics.
*
* @throws IOException if I/O fails.
*/
private static void loadByteHistogramFromStdin(
final ByteHistogram byteHistogram) throws IOException {
loadImpl(new BufferedInputStream(System.in),
byteHistogram);
}
/**
* Loads the byte histogram data only from the file named {@code fileName}.
*
* @param fileName the name of the file to load from.
* @param byteHistogram the byte histogram for holding the statistics.
*
* @throws FileNotFoundException if file named {@code fileName} is not
* found.
* @throws IOException if I/O fails.
*/
private static void loadByteHistogramFromFile(
final String fileName,
final ByteHistogram byteHistogram)
throws FileNotFoundException, IOException {
final InputStream is =
new BufferedInputStream(
new FileInputStream(fileName));
loadImpl(is, byteHistogram);
}
/**
* Performs the actual loading of the data to the input byte histogram.
*
* @param is the input stream from which to load the data.
* @param byteHistogram the target byte histogram.
*
* @throws IOException if I/O fails.
*/
private static void loadImpl(final InputStream is,
final ByteHistogram byteHistogram)
throws IOException {
int ch;
while ((ch = is.read()) != -1) {
byteHistogram.insert(ch);
}
}
}
Critique request
Am I getting anywhere? Please tell me anything what comes to mind.
3 Answers 3
if (args.length == 0) {
try {
loadByteHistogramFromStdin(byteHistogram);
I fail to see why handling stdin is a special case compared to files. Technically all you need is to read from a stream, what that stream is does not matter. Simplified it could be:
if (args.length == 0) {
loadIntoByteHistogram(System.in, byteHistogram);
} else {
for (String argument : args) {
loadIntoByteHistogram(new FileInputStream(argument), byteHistogram);
}
}
With loadIntoByteHistogram
closing the stream after it is done in this example.
On a sidenote, traditionally many UNIX-like tools allow to pass -
as parameter to designate to read from stdin at that point:
yourtool file1 file2 - file3
Which might or might not be a functionality you want to support.
System.out.println(byteHistogram);
I'm not a fan of having UI logic inside toString()
, as it is also often used for debugging purposes (for example by IDEs). Also the object itself (ByteHistogram
) would contain logic for the UI this way, which might or might not be desirable. Having another method in your main CLI class which does the formatting seems like a good solution instead.
showHistogram(byteHistogram);
private static void loadImpl(final InputStream is,
final ByteHistogram byteHistogram)
Every time you need to use the Impl
postfix something is wrong. Either the method structuring is off, or the name could be better. loadFromInputStreamInto
seems like a good candidate in this case.
loadFromStdInInto(ByteHistogram)
loadFromFileInto(Path, ByteHistogram)
loadFromInputStreamInto(InputStream, ByteHistogram)
final InputStream is =
new BufferedInputStream(
new FileInputStream(fileName));
loadImpl(is, byteHistogram);
That stream is never closed, which isn't really an issue in this application, but you should always consider closing the stream you're opening to avoid leaking anything. Either have loadImpl
close the streams it is passed (which might or might not be desirable) or use a try-with-resources block:
try (final InputStream is =
new BufferedInputStream(
new FileInputStream(fileName))) {
loadImpl(is, byteHistogram);
}
That will close the stream automatically when leaving the block.
That said, why do you need a BufferedInputStream
here? You're never using the additional functionality down the line.
int ch;
while ((ch = is.read()) != -1) {
byteHistogram.insert(ch);
}
Reading a stream byte by byte can be slow. Consider reading from it in blocks instead, 16k, 32k or 64k depending on the usecase.
byte[] buffer = new byte[16 * 1024];
int read = 0;
while ((read = (stream.read(buffer)) >= 0) {
byteHistogram.add(buffer, 0, read);
}
Also ch
and is
aren't really good names, especially as the ch
seems to imply char
but it's really an int
.
/**
* Loads the byte histogram data only from standard input.
*
* @param byteHistogram the byte histogram for holding the statistics.
*
* @throws IOException if I/O fails.
*/
While it is a controversial topic, I'm not a fan of documentation that tells me nothing beyond what I could deduce from the function signature. It seems like wasted effort most of the time.
The next block is a little bit better:
/**
* Loads the byte histogram data only from the file named {@code fileName}.
*
* @param fileName the name of the file to load from.
* @param byteHistogram the byte histogram for holding the statistics.
*
* @throws FileNotFoundException if file named {@code fileName} is not
* found.
* @throws IOException if I/O fails.
*/
But raises a few points: First, the wording of "Loads the byte histogram data only from" implies that there are methods which do load from multiple sources or are not side-effect free. Second, "if I/O fails" is not that helpful again. A more explicit version would look like this:
/**
* Loads the data from the file under the given {@code fileName} into the given {@link ByteHistogram}.
*
* @param fileName The name of the file from which to load the data into the given {@link ByteHistogram}.
* @param byteHistogram The {@link ByteHistogram} into which to load the data from the given file.
* @throws FileNotFoundException If the given {@code fileName} does not denote an existing file.
* @throws IOException If reading from the given {@code fileName} failed.
*/
Considering that many IDEs only display parts of the Javadoc based on the current context, repeating yourself is not a bad idea.
} catch (IOException ex) {
LOGGER.log(
Level.SEVERE,
"Could not load the histogram from " +
"standard input stream.");
You're swallowing the exception and its details here. That means that it will not be possible to tell what error/exception actually happened. Consider handing the exception itself to the logger so that it can be printed along with the error message.
} catch (IOException ex) {
LOGGER.log(
Level.SEVERE,
"Could not load the histogram from " +
"standard input stream.",
ex);
-
\$\begingroup\$ About - on a command line in *nix: what should happen in case the tool is invoked with no arguments, read from stdin too? \$\endgroup\$coderodde– coderodde2024年11月19日 08:02:58 +00:00Commented Nov 19, 2024 at 8:02
-
1\$\begingroup\$ Most tools read from stdin in that case, yes. \$\endgroup\$Bobby– Bobby2024年11月19日 08:25:17 +00:00Commented Nov 19, 2024 at 8:25
-
1\$\begingroup\$ "why do you need a
BufferedInputStream
here? ... Reading a stream byte by byte can be slow." - I have to point out that you're wrong, reading bytes fromBufferedInputStream
is not slow, becauseint read()
returns abyte
from the internal 8 Kb buffer, which populated from the underlying stream as necessary. The size of the internal buffer can be tweaked while constructingBufferedInputStream
. In this case, there's no advantage in providing an external buffer and then reading from it manually, only more boilerplate code. \$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024年11月19日 09:13:57 +00:00Commented Nov 19, 2024 at 9:13 -
\$\begingroup\$ Also, if the command line parameters are
tool a.txt - b.txt -
should it process a.txt then stdin then b.txt and finally stdin again? \$\endgroup\$coderodde– coderodde2024年11月19日 11:26:35 +00:00Commented Nov 19, 2024 at 11:26 -
\$\begingroup\$ @AlexanderIvanchenko The function itself is not guaranteed to get a
BufferedInputStream
, it only accepts anInputStream
. If the functionality of theBufferedInputStream
is "required" or assumed to be there, then it either should wrap the incoming stream or limit the parameter type accordingly. But I missed that interaction, yes. \$\endgroup\$Bobby– Bobby2024年11月19日 22:34:16 +00:00Commented Nov 19, 2024 at 22:34
Closing resources
Again, you forgot to close InputStream
reading from a file (refer to this answer, where I've demonstrated how to utilize try-with-resources for that purpose).
Also note that while a stream interacting with the file system should be closed as soon as you're done with it no matter what, a stream reading from the System.in
is not mandatory to close. Because no other system processes can interact with System.in
(i.e. there's no threat of resource leak), and when main()
exits, JVM will take off System.in
, as well as System.out
and System.error
.
And if you close a stream wrapped around the standard input, you'll not be able to use System.in
anymore.
Code Structure
Focus on structuring your code by identifying responsibilities and introducing elements representing them.
This piece of code has many levels of nesting and bears a lot of cognitive load. You can make it cleaner by applying extract method refactoring technique on both branches.
if (args.length == 0) {
try {
loadByteHistogramFromStdin(byteHistogram);
} catch (IOException ex) {
LOGGER.log(
Level.SEVERE,
"Could not load the histogram from " +
"standard input stream.");
}
} else {
for (final String fileName : args) {
try {
loadByteHistogramFromFile(fileName, byteHistogram);
} catch (IOException ex) {
LOGGER.log(
Level.SEVERE,
"Could not load the histogram from file \"{0}\".",
fileName);
}
}
}
And I don't see any logic that will allow your application to terminate while reading from the standard input.
It looks like you were too focused on the code reuse rather than addressing distinct use cases.
-
\$\begingroup\$ Damn it! Forgot again to close the streams. :( \$\endgroup\$coderodde– coderodde2024年11月17日 12:15:14 +00:00Commented Nov 17, 2024 at 12:15
-
\$\begingroup\$ @coderodde How do you expect the program to terminate while reading from the standard input? Looks like you have a bug \$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024年11月17日 12:31:11 +00:00Commented Nov 17, 2024 at 12:31
you use System.out
to print the byte histogram.
- you already use logging library, so it should be used to write to standard output. the advantage of logging library is that output can be directed to a file or somewhere else via configuration
- you did not share code of
ByteHistogram
. does it implementtoString()
? because the default implementation is not very helpful