I wrote a function for iterating over a large file with performance in mind. It takes in an InputStream
and reads until it reaches the end of the file. Whenever it detects an end of line it will call the UnsafeConsumer#accept
with the line.
UnsafeConsumer
is the same asConsumer<T>
but has a throws declaration on accept functionstream
has to be closed outside the function
I am am not sure if this is the fastest way to do this. Is there anything different that I could do?
public static void fileLines(InputStream stream, UnsafeConsumer<String,IOException> cons) throws IOException{
StringBuilder lineBuild=new StringBuilder();
int character;
while(true){ //read file
while(true){ //read line
character=stream.read();
if(character==-1){
cons.accept(lineBuild.toString());
lineBuild.setLength(0);
return;
}
if(character=='\n'){
cons.accept(lineBuild.toString());
lineBuild.setLength(0);
character=stream.read();
if(character==-1) return;
else if(character!='\r'&&character!='\n') lineBuild.append((char)character);
break;
}
lineBuild.append((char)character);
}
}
}
-
\$\begingroup\$ Did you benchmark? What other variations did you try? \$\endgroup\$Emily L.– Emily L.2017年08月29日 23:24:43 +00:00Commented Aug 29, 2017 at 23:24
2 Answers 2
You are conflating bytes with characters. An InputStream
produces bytes, but you implicitly interpret each byte as a character when you do character=stream.read()
followed by the cast lineBuild.append((char)character)
. To be clear:
String
s consist ofchar
s, not bytes.InputStream
s andOutputStream
s deal with bytes.Reader
s andWriter
s deal with characters.
To read lines as strings, first convert stream
to a Reader
using InputStreamReader
.
BufferedReader br = new BufferedReader(new InputStreamReader(stream, "ISO-8859-1"));
Then, BufferedReader
offers a .readLine()
method. Or, since you seem to be interested in a stream-based approach, call .lines()
, which produces a Stream<String>
.
You can use the LineIterator
from Apache commons IO library.
For your case it may be like:
LineIterator it = FileUtils.lineIterator(file, "ISO-8859-1");
try {
while (it.hasNext()) {
cons.accept(it.nextLine());
}
} finally {
it.close();
}
UPDATE (follow up of comments)
In case when you don't have file
instance use IOUtils:
LineIterator it = IOUtils.lineIterator(stream, "ISO-8859-1");
-
\$\begingroup\$ Where does
file
come from? And why did you choose UTF-8, breaking compatibility with the original code? \$\endgroup\$200_success– 200_success2017年08月30日 00:13:04 +00:00Commented Aug 30, 2017 at 0:13 -
\$\begingroup\$ You are right,
file
is missed in the given question, but the author mentioned about some large file. I assume a file instance is presented as well. "UTF-8" is just an example, there is a method without this parameter. \$\endgroup\$Dmytro Maslenko– Dmytro Maslenko2017年08月30日 00:17:48 +00:00Commented Aug 30, 2017 at 0:17
Explore related questions
See similar questions with these tags.