The following method reads specially formatted text (TOML) from a file and parses the input using a parser to get a few properties, which are then used to create an object.
The profiler tells me that this method is the bottleneck in my program. While parsing ~5000 files, this method eats up over 95% of the total running time. While in another method, I am able to write the data parsed from this method to a text file in under 4 seconds for more than 5000 files, this method takes around 15 seconds on an average to run.
/**
* Creates an appropriate instance of a Parsable implementation depending
* upon the header of the file.
*
* @param file the path of the file from which to create a Parsable.
* @return the created Parsable.
*/
private Parsable createParsable(Path file) {
Toml toml = new Toml();
try (BufferedReader br = Files.newBufferedReader(file, StandardCharsets.UTF_8)) {
StringBuilder header = new StringBuilder();
String line;
while ((line = br.readLine()) != null && !line.equals(HEADER_DELIMITER)) {
header.append(line).append("\n");
}
toml.parse(header.toString());
String title = toml.getString("title");
author = toml.getString("author") != null ? toml.getString("author") : author;
String date = toml.getString("date");
String slug = toml.getString("slug");
LocalDate publishDate = LocalDate.parse(date, DateTimeFormatter.ofPattern(config.getDateFormat()));
String layout = toml.getString("layout");
List<String> tag = toml.getList("tags");
StringBuilder content = new StringBuilder();
while ((line = br.readLine()) != null) {
content.append(line).append("\n");
}
if (layout.equals("post")) {
return new Post(title, author, publishDate, file, content.toString(), slug, layout, tag);
}
else {
return new Page(title, author, file, content.toString(), slug, layout, tag);
}
}
catch (IOException ex) {
Logger.getLogger(DirectoryCrawler.class.getName()).log(Level.SEVERE, null, ex);
}
return null;
}
It is important to note that this method gets called over 5000 times in my test. I have tried analyzing separate parts of the method for the performance problem, but haven't been able to identify any. How can I write this better?
The TOML library is from here : https://github.com/mwanji/toml4j
And the implementation of the constructors are like so:
/**
* Creates a Post with the given paramenter.
*
* @param titl the post title
* @param auth the post author
* @param dat the post date
* @param loc the post's Path
* @param cont the post's content
* @param slu the post slug
* @param lay the layout
* @param tag the list of tags
*/
public Post(String titl, String auth, LocalDate dat,
Path loc, String cont, String slu, String lay, List<String> tag) {
title = titl;
author = auth;
//TODO add summary option
//this.summary = summ;
date = dat;
location = loc;
content = cont;
slug = slu;
layout = lay;
tags = tag;
}
1 Answer 1
I see that the processing you're doing here does not depend on the order you're processing in. This means you can parallelize the processing heavily.
Additionally you're doing line-by-line processing, which allows you to use one of the new features of Java 8, namely Files.lines
This greatly simplifies the code you have to following outline:
try (Stream<String> lines = Files.lines(path).parallel()) {
// do line by line processing
} catch (IOException e) {
// sensible handling
}
Also it might be faster to keep parsables in "one" file, this reduces channel and OS waiting overhead for Open/Close operations when doing I/O
-
\$\begingroup\$ Thanks for the Files.lines hint. It never crossed my mind to use Streams here. I'll try to parallelize the process. However, I can't keep parsables in one file, as they are separate files which will be created by the end user. \$\endgroup\$Pawan– Pawan2015年07月02日 10:26:50 +00:00Commented Jul 2, 2015 at 10:26
-
\$\begingroup\$ Doesn't the
while ((line = br.readLine()) != null && !line.equals(HEADER_DELIMITER))
block prevent easy parallelism? \$\endgroup\$Veedrac– Veedrac2015年07月02日 12:58:48 +00:00Commented Jul 2, 2015 at 12:58 -
\$\begingroup\$ @Veedrac that's just a simple
.filter()
IIUC... basically the br.readLine() happens inside the stream and you just have to filter out nulls and HEADERs \$\endgroup\$Vogel612– Vogel6122015年07月02日 13:03:43 +00:00Commented Jul 2, 2015 at 13:03 -
1\$\begingroup\$ @Vogel612 It looks to me like a
takeWhile
, not afilter
. \$\endgroup\$Veedrac– Veedrac2015年07月02日 13:58:56 +00:00Commented Jul 2, 2015 at 13:58
new Toml()
? \$\endgroup\$Post
andPage
constructors will be useful too, if they are non-trivial. \$\endgroup\$