Java file analyzer byte-by-byte

Question 1

This is part of this topic.
Problem:
I need to read a file with BigIntegers and make some analysis "on-the-fly" with each read number (get prime numbers count, get armstrong numbers count). Right now I have this code, but it works very slow:

package ee.raintree.test.numbers;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.math.BigInteger;
import ee.raintree.test.numbers.utils.MathUtils;
public class FileAnalyzer {
 private static final char SEPARATOR = ' ';
 private File file;
 private int armstrongNumbersCount;
 private int primeNumbersCount;
 public FileAnalyzer(File file) {
 this.file = file;
 }
 public void analyze() throws FileNotFoundException, IOException {
 countNumbers();
 }
 public int getArmstrongNumbersCount() {
 return armstrongNumbersCount;
 }
 public int getPrimeNumbersCount() {
 return primeNumbersCount;
 }
 private void countNumbers() throws FileNotFoundException, IOException {
 StringBuilder numberSb = new StringBuilder();
 try (FileInputStream fis = new FileInputStream(file)) {
 char currentChar;
 while (fis.available() > 0) {
 currentChar = (char) fis.read();
 if (currentChar == SEPARATOR) {
 analyzeNumber(new BigInteger(numberSb.toString()));
 numberSb = new StringBuilder();
 continue;
 }
 numberSb.append(currentChar);
 }
 if (numberSb.length() > 0) {
 analyzeNumber(new BigInteger(numberSb.toString()));
 }
 }
 }
 private void analyzeNumber(BigInteger number) {
 if(MathUtils.isArmstrongNumber(number)) {
 armstrongNumbersCount++;
 }
 if(MathUtils.isPrime(number)) {
 primeNumbersCount++;
 }
 }
}

How can I speed up this process? As I was informed - it is lack of buffering.

Question 2

Replace

try (FileInputStream fis = new FileInputStream(file)) {

with

try (InputStream fis = new BufferedInputStream(new FileInputStream(file))) {

See also Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream?

Question 3

In the second try with resources shouldn't the FileInputStream also created before buffered stream, so both can be closed?

Question 4

Apart from the performce issue, you should not call available at all, as it makes your code unnecessarily complicated. Instead, read into an int variable until the value gets -1, just like everyone else is doing this. You then need to add a (char) type cast to the append call.

It's not necessary to create a new StringBuilder each time, you can just call sbNumber.setLength(0).

Question 5

Some more general things to consider. Your class is called FileAnalyzer, but it advertises public properties like getPrimeNumbersCount, which I'd consider to be part of the analysis result (rather than the analyser itself). Do they really belong there?

There's a similar issue around the way that you're storing the file to be analysed in a field of the analyser. If a field isn't needed by at least two public member functions, I'd consider if it really needs to be a field or not.

FileNotFoundException extends IOException, so you don't really need to declare that a method throws both of them.

Does the analyzer really need to be tied to an input file, or could it actually take in an InputStream as the source to analyse? This approach makes the analyser more flexible (it could be used to process numbers from an API call, or could simply process numbers from a String for testing purposes), and pushes the decision about what type of stream to construct up to the caller, where there may be more context about the information's source.

Given those points, I'd expect the analyze function to have a signature more like:

NumericAnalysis analyze(InputStream streamToAnalyse) throws IOException

As it stands, your public analyze method immediately delegates out to a private countNumbers. You don't perform any signature adaption or anything else as part of this, so there doesn't really seem to be a reason for creating this extra layer of complexity. Why not just put the code straight into the analyze method?

There's no validity checking when constructing a BigInteger when a SEPERATOR is encountered. This might be OK in your scenario because you're confident about the source of the data, however it's good to consider what you would expect the behaviour to be if for example there were two spaces between a pair of numbers / if a file started with a space before the first number.

score 3 · Accepted Answer · 2020-01-08 07:17:07Z

3

\$\begingroup\$

Replace

try (FileInputStream fis = new FileInputStream(file)) {

with

try (InputStream fis = new BufferedInputStream(new FileInputStream(file))) {

See also Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream?

Share

edited Jan 8, 2020 at 7:24

answered Jan 8, 2020 at 7:17

Gaslight Deceive Subvert's user avatar

Gaslight Deceive Subvert Gaslight Deceive Subvert

1,39410 silver badges18 bronze badges

\$\endgroup\$

1

\$\begingroup\$ In the second try with resources shouldn't the FileInputStream also created before buffered stream, so both can be closed? \$\endgroup\$

JaDogg
– JaDogg

2020年01月08日 16:35:45 +00:00
Commented Jan 8, 2020 at 16:35

Add a comment |

Stack Exchange Network

Java file analyzer byte-by-byte

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Java file analyzer byte-by-byte

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions