Create an object based on data from a text file

Question 1

The following method reads specially formatted text (TOML) from a file and parses the input using a parser to get a few properties, which are then used to create an object.

The profiler tells me that this method is the bottleneck in my program. While parsing ~5000 files, this method eats up over 95% of the total running time. While in another method, I am able to write the data parsed from this method to a text file in under 4 seconds for more than 5000 files, this method takes around 15 seconds on an average to run.

/**
* Creates an appropriate instance of a Parsable implementation depending
* upon the header of the file.
*
* @param file the path of the file from which to create a Parsable.
* @return the created Parsable.
*/
private Parsable createParsable(Path file) {
 Toml toml = new Toml();
 try (BufferedReader br = Files.newBufferedReader(file, StandardCharsets.UTF_8)) {
 StringBuilder header = new StringBuilder();
 String line;
 while ((line = br.readLine()) != null && !line.equals(HEADER_DELIMITER)) {
 header.append(line).append("\n");
 }
 toml.parse(header.toString());
 String title = toml.getString("title");
 author = toml.getString("author") != null ? toml.getString("author") : author;
 String date = toml.getString("date");
 String slug = toml.getString("slug");
 LocalDate publishDate = LocalDate.parse(date, DateTimeFormatter.ofPattern(config.getDateFormat()));
 String layout = toml.getString("layout");
 List<String> tag = toml.getList("tags");
 StringBuilder content = new StringBuilder();
 while ((line = br.readLine()) != null) {
 content.append(line).append("\n");
 }
 if (layout.equals("post")) {
 return new Post(title, author, publishDate, file, content.toString(), slug, layout, tag);
 }
 else {
 return new Page(title, author, file, content.toString(), slug, layout, tag);
 }
 }
 catch (IOException ex) {
 Logger.getLogger(DirectoryCrawler.class.getName()).log(Level.SEVERE, null, ex);
 }
 return null;
}

It is important to note that this method gets called over 5000 times in my test. I have tried analyzing separate parts of the method for the performance problem, but haven't been able to identify any. How can I write this better?

The TOML library is from here : https://github.com/mwanji/toml4j

And the implementation of the constructors are like so:

/**
 * Creates a Post with the given paramenter.
 *
 * @param titl the post title
 * @param auth the post author
 * @param dat the post date
 * @param loc the post's Path
 * @param cont the post's content
 * @param slu the post slug
 * @param lay the layout
 * @param tag the list of tags
 */
public Post(String titl, String auth, LocalDate dat,
 Path loc, String cont, String slu, String lay, List<String> tag) {
 title = titl;
 author = auth;
 //TODO add summary option
 //this.summary = summ;
 date = dat;
 location = loc;
 content = cont;
 slug = slu;
 layout = lay;
 tags = tag;
}

Question 2

Do you have a link to the implementation of new Toml()?

Question 3

I also think sharing the implementations for your Post and Page constructors will be useful too, if they are non-trivial.

Question 4

@h.j.k. I added the constructor. But I dont think that is the bottleneck as creation of objects is really cheap in Java. Plus, the profiler tells me that the Toml functions don't take too much time (around ~500ms at most).

Question 5

what java version do you have available?

Question 6

@Vogel612 Java 1.8.0_25

Question 7

I see that the processing you're doing here does not depend on the order you're processing in. This means you can parallelize the processing heavily.

Additionally you're doing line-by-line processing, which allows you to use one of the new features of Java 8, namely Files.lines

This greatly simplifies the code you have to following outline:

try (Stream<String> lines = Files.lines(path).parallel()) {
 // do line by line processing
} catch (IOException e) {
 // sensible handling
}

Also it might be faster to keep parsables in "one" file, this reduces channel and OS waiting overhead for Open/Close operations when doing I/O

Question 8

Thanks for the Files.lines hint. It never crossed my mind to use Streams here. I'll try to parallelize the process. However, I can't keep parsables in one file, as they are separate files which will be created by the end user.

Question 9

Doesn't the while ((line = br.readLine()) != null && !line.equals(HEADER_DELIMITER)) block prevent easy parallelism?

Question 10

@Veedrac that's just a simple .filter() IIUC... basically the br.readLine() happens inside the stream and you just have to filter out nulls and HEADERs

Question 11

@Vogel612 It looks to me like a takeWhile, not a filter.

Vogel612 Vogel612 25.5k7 gold badges59 silver badges141 bronze badges · Answer 1 · 2015-07-02 10:17:41Z

2

\$\begingroup\$

I see that the processing you're doing here does not depend on the order you're processing in. This means you can parallelize the processing heavily.

Additionally you're doing line-by-line processing, which allows you to use one of the new features of Java 8, namely Files.lines

This greatly simplifies the code you have to following outline:

try (Stream<String> lines = Files.lines(path).parallel()) {
 // do line by line processing
} catch (IOException e) {
 // sensible handling
}

Also it might be faster to keep parsables in "one" file, this reduces channel and OS waiting overhead for Open/Close operations when doing I/O

Share

answered Jul 2, 2015 at 10:17

Vogel612's user avatar

Vogel612 Vogel612

25.5k7 gold badges59 silver badges141 bronze badges

\$\endgroup\$

4

\$\begingroup\$ Thanks for the Files.lines hint. It never crossed my mind to use Streams here. I'll try to parallelize the process. However, I can't keep parsables in one file, as they are separate files which will be created by the end user. \$\endgroup\$

Pawan
– Pawan

2015年07月02日 10:26:50 +00:00
Commented Jul 2, 2015 at 10:26
\$\begingroup\$ Doesn't the while ((line = br.readLine()) != null && !line.equals(HEADER_DELIMITER)) block prevent easy parallelism? \$\endgroup\$

Veedrac
– Veedrac

2015年07月02日 12:58:48 +00:00
Commented Jul 2, 2015 at 12:58
\$\begingroup\$ @Veedrac that's just a simple .filter() IIUC... basically the br.readLine() happens inside the stream and you just have to filter out nulls and HEADERs \$\endgroup\$

Vogel612
– Vogel612

2015年07月02日 13:03:43 +00:00
Commented Jul 2, 2015 at 13:03
1

\$\begingroup\$ @Vogel612 It looks to me like a takeWhile, not a filter. \$\endgroup\$

Veedrac
– Veedrac

2015年07月02日 13:58:56 +00:00
Commented Jul 2, 2015 at 13:58

Add a comment |

Stack Exchange Network

Create an object based on data from a text file

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Create an object based on data from a text file

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions