4
\$\begingroup\$

The idiomatic way of reading lines from a file doesn't work for my purposes because the files I'm dealing with might have lines that are escaped. I came up with the below (which seems to meet my needs) and am wondering if anyone has ideas on how I can improve things:

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UncheckedIOException;
import java.io.UnsupportedEncodingException;
import java.util.Iterator;
import java.util.Spliterator;
import java.util.Spliterators;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public final class LineReader implements AutoCloseable, Iterable<String> {
 private final char[] m_buffer;
 private final char m_escapeChar;
 private final Object m_lock;
 private final InputStreamReader m_reader;
 private int m_bufferOffset = 0;
 private int m_bufferPosition = 0;
 private String m_lineValue = "";
 private int m_numCharsRead = 0;
 public String getCurrent() { return m_lineValue; }
 public LineReader (final InputStreamReader reader, final char escapeChar, final int bufferLength) {
 m_buffer = new char[bufferLength];
 m_escapeChar = escapeChar;
 m_lock = this;
 m_reader = reader;
 }
 public LineReader(final InputStreamReader reader) {
 this(reader, '"', 1024);
 }
 public LineReader(final String filePath, final String encodingName) throws FileNotFoundException, UnsupportedEncodingException {
 this (new InputStreamReader(new FileInputStream(filePath), encodingName));
 }
 public void close() throws IOException {
 m_reader.close();
 }
 public Iterator<String> iterator() {
 return new Iterator<String>() {
 @Override
 public boolean hasNext() {
 try {
 return read();
 }
 catch (final IOException e) {
 throw new UncheckedIOException(e);
 }
 }
 @Override
 public String next() {
 return getCurrent();
 }
 };
 }
 public boolean read() throws IOException {
 final char[] buffer = m_buffer;
 final char escapeChar = m_escapeChar;
 final Object lock = m_lock;
 boolean isEscaping = false;
 synchronized(lock) {
 m_lineValue = "";
 do {
 while (m_bufferPosition < m_numCharsRead) {
 final char currentChar = buffer[m_bufferPosition++];
 if (currentChar == escapeChar) {
 if (!nextCharEquals(escapeChar)) {
 isEscaping = !isEscaping;
 }
 else {
 m_bufferPosition++;
 }
 }
 if (!isEscaping && ((currentChar == '\r') || (currentChar == '\n'))) {
 if (m_bufferOffset < m_bufferPosition) {
 m_lineValue += new String(buffer, m_bufferOffset, (m_bufferPosition - m_bufferOffset - 1));
 }
 if ((currentChar == '\r') && nextCharEquals('\n')) {
 m_bufferPosition++;
 }
 m_bufferOffset = m_bufferPosition;
 return true;
 }
 }
 } while (fillBuffer());
 return (m_lineValue.equals("") ? false : true);
 }
 }
 public Stream<String> stream() {
 return StreamSupport.stream(Spliterators.spliteratorUnknownSize(iterator(), (Spliterator.NONNULL | Spliterator.ORDERED)), false);
 }
 private boolean fillBuffer() throws IOException {
 final char[] buffer = m_buffer;
 final int offset = m_bufferOffset;
 final int position = m_bufferPosition;
 final InputStreamReader reader = m_reader;
 if (offset < position) {
 m_lineValue += new String(buffer, offset, (position - offset));
 }
 m_bufferOffset = 0;
 m_bufferPosition = 0;
 m_numCharsRead = reader.read(buffer);
 return (0 < m_numCharsRead);
 }
 private boolean nextCharEquals(final char value) throws IOException {
 return (((m_bufferPosition < m_numCharsRead) || fillBuffer()) && (m_buffer[m_bufferPosition] == value));
 }
}
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jul 31, 2018 at 14:56
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

General

  • In Java, don’t prefix variable names.
  • lineValue should be a StringBuilder, not a String. Similarly, use Arrays.copyOfRange() rather than making new String instances.
  • Never, ever lock on yourself. Make a new object or use a Lock. Otherwise if some external class synchronizes on one of your instances, it stops working until they release it.
  • You can use Reader instead of InputStreamReader.
  • getCurrent() should be defined below the constructors, and method bodies should almost never be forced onto one line.
  • adding local variables pointing to instance variables with slightly different names is confusing, and serves no purpose. Just refer to the instance variables.
  • There should be one line of whitespace between the end of each method and the beginning of the next.
  • I agree with @Malachi that you should simplify your uses of the ternary operator where possible.
  • You need to document your methods, and probably your instance variables. The names could also be more clear.

read()

  • Should this method be private? Should it just throw an UncheckedIOException?
  • it’s easier to read positive checks than negative checks - switch the if and else when checking nextCharEquals.
  • declare variables in as low a scope as possible. isEscaping belongs inside the synchronized block.
  • you can get rid of the outer loop by using while ((this.bufferPosition < this.numCharsRead) || this.fillBuffer()) { read - you don’t need the special case for the next character. It’s just additional complexity.
  • the process you’re describing isn’t "escaping". Escaping indicates that the next character contains an instruction or some other special character. What you want is to ignore CR-LF bounded by quotes, or some other special quoting character.

If you applied all these changes, your code might look something like:

import java.io.Closeable;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.UncheckedIOException;
import java.io.UnsupportedEncodingException;
import java.util.Arrays;
import java.util.Iterator;
import java.util.Spliterator;
import java.util.Spliterators;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public final class LineReader implements AutoCloseable, Closeable, Iterable<String> {
 private final char[] buffer;
 private final char escapeChar;
 private final Object lock = new Object();
 private final Reader reader;
 private int bufferOffset;
 private int bufferPosition;
 private final StringBuilder currentLine = new StringBuilder();
 private int numCharsRead;
 public LineReader(final InputStreamReader reader) {
 this(reader, '"', 1024);
 }
 public LineReader(final InputStreamReader reader, final char escapeChar, final int bufferLength) {
 this.buffer = new char[bufferLength];
 this.escapeChar = escapeChar;
 this.reader = reader;
 }
 public LineReader(final String filePath, final String encodingName)
 throws FileNotFoundException, UnsupportedEncodingException {
 this(new InputStreamReader(new FileInputStream(filePath), encodingName));
 }
 @Override
 public void close()
 throws IOException {
 this.reader.close();
 }
 @Override
 public Iterator<String> iterator() {
 return new Iterator<String>() {
 @Override
 public boolean hasNext() {
 try {
 return LineReader.this.read();
 } catch (final IOException e) {
 throw new UncheckedIOException(e);
 }
 }
 @Override
 public String next() {
 return LineReader.this.getCurrent();
 }
 };
 }
 public String getCurrent() {
 return this.currentLine.toString();
 }
 public boolean read() throws IOException {
 synchronized (this.lock) {
 this.currentLine.setLength(0);
 boolean inQuoteBlock = false;
 while ((this.bufferPosition < this.numCharsRead) || this.fillBuffer()) {
 final char currentChar = this.buffer[this.bufferPosition++];
 if (currentChar == this.escapeChar) {
 inQuoteBlock = !inQuoteBlock;
 }
 if (inQuoteBlock) {
 continue;
 }
 if ((currentChar == '\r') || (currentChar == '\n')) {
 this.appendBufferToCurrentLine();
 if ((currentChar == '\r') && this.nextCharEquals('\n')) {
 this.bufferPosition++;
 }
 this.bufferOffset = this.bufferPosition;
 return true;
 }
 }
 return this.currentLine.length() > 0;
 }
 }
 public Stream<String> stream() {
 return StreamSupport.stream(
 Spliterators.spliteratorUnknownSize(this.iterator(), (Spliterator.NONNULL | Spliterator.ORDERED)),
 false);
 }
 private boolean fillBuffer() throws IOException {
 this.appendBufferToCurrentLine();
 this.bufferOffset = 0;
 this.bufferPosition = 0;
 this.numCharsRead = this.reader.read(this.buffer);
 return (0 < this.numCharsRead);
 }
 private void appendBufferToCurrentLine() {
 if (this.bufferOffset < this.bufferPosition) {
 this.currentLine.append(
 Arrays.copyOfRange(this.buffer, this.bufferOffset, (this.bufferPosition - this.bufferOffset)));
 }
 }
 private boolean nextCharEquals(final char value) throws IOException {
 return (((this.bufferPosition < this.numCharsRead) || this.fillBuffer())
 && (this.buffer[this.bufferPosition] == value));
 }
}
answered Jul 31, 2018 at 19:15
\$\endgroup\$
6
  • \$\begingroup\$ There is lots of great advice in here but I feel like things get muddied a bit by ridiculous claims like "In Java, don't prefix variables names." Sure, prefixing fields with m_ might not be idiomatic but it is an extremely helpful convention that I personally follow regardless of language. Superficial concerns like how much white space is used, what an acceptable line width is, and the order of class members are all things that can and should be left to automation. The most important thing is to be consistent and any library/repository that wants to be so should enforce it via tooling. \$\endgroup\$ Commented Jul 31, 2018 at 19:46
  • \$\begingroup\$ Also, the local/final variables are another useful convention that, if followed, can help prevent modification of class members that might be mutable in the class scope but shouldn't necessarily be so in the local scope. In .NET land this practice can sometimes help the compiler(s) to make more informed decisions that lead to better optimization but I don't at all know if this applies in Java land. \$\endgroup\$ Commented Jul 31, 2018 at 19:53
  • \$\begingroup\$ Can you explain the usage of Arrays.copyOfRange: it seems like the more efficient choice would be to use the overload of .append that accepts an existing array along the offset/length; am I missing something? \$\endgroup\$ Commented Jul 31, 2018 at 20:03
  • \$\begingroup\$ Idiomatic code is important because your code is going to be read, on average, ten times. Unless you're working on a project by yourself and you personally follow it through to end-of-life, your code will be read by other people. While you may be making your code easier for you to read, you're making it harder for everybody who comes after you. Consistent use of whitespace, reasonable maximum line widths, and putting constructors first in classes all make the code easier to read. Local/final variables are likewise non-idiomatic and add to cognitive load because now the reader has \$\endgroup\$ Commented Jul 31, 2018 at 22:38
  • \$\begingroup\$ twice as many variables to track. Please don't assume that .NET conventions apply in Java, or vice versa. \$\endgroup\$ Commented Jul 31, 2018 at 22:40
3
\$\begingroup\$

I noticed that you return a ternary statement for a boolean value, this is not necessary.

return (m_lineValue.equals("") ? false : true);

you should simply return:

return !m_lineValue.equals("");

This is more straightforward in my opinion.

another thing to note is your nested if statement here

if (currentChar == escapeChar) {
 if (!nextCharEquals(escapeChar)) {
 isEscaping = !isEscaping;
 }
 else {
 m_bufferPosition++;
 }
}

The inside if statement should be switched around so that the conditional does not need to be negated, something like this

if (currentChar == escapeChar) {
 if (nextCharEquals(escapeChar)) {
 m_bufferPosition++;
 }
 else {
 isEscaping = !isEscaping;
 }
}
answered Jul 31, 2018 at 15:23
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.