5
\$\begingroup\$

This is a follow-up from: Reversing a domain name string in Java

There are a number of ways of splitting a String using a delimiter in Java, especially now with Java 8:

  1. StringTokenizer (not recommended)
  2. String.split (recommended replacement)
  3. Scanner with delimiter
  4. Scanner with Pattern
  5. Pattern.splitAsStream

With the methods that take a Pattern, the Pattern can be pre-compiled and stored or created dynamically.

StringTokenizer is not recommended, but apparently still used. The other methods, to my eye, all look fairly similar. Obviously some lend themselves better to some situations than others, but it's all too much, so I decided to leverage JMH to see which one is faster.

I created a project from the Archetype as recommended, and added the following benchmarks:

@OutputTimeUnit(TimeUnit.SECONDS)
@State(Scope.Thread)
public class MyBenchmark {
 private Pattern pattern;
 private String warandpeace;
 @Setup
 public void prepare() {
 pattern = Pattern.compile("\\.");
 try (final InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream("com/boris/benchmark/warandpeace.txt");
 final BufferedReader reader = new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8))) {
 warandpeace = reader.lines().collect(joining());
 } catch (IOException e) {
 throw new ExceptionInInitializerError(e);
 }
 }
 @Benchmark
 public void testSplit(final Blackhole blackhole) {
 for (final String s : warandpeace.split("\\.")) {
 blackhole.consume(s);
 }
 }
 @Benchmark
 public void testStringTokenizer(final Blackhole blackhole) {
 final StringTokenizer stringTokenizer = new StringTokenizer(warandpeace, ".");
 while (stringTokenizer.hasMoreTokens()) {
 blackhole.consume(stringTokenizer.nextToken());
 }
 }
 @Benchmark
 public void testScannerString(final Blackhole blackhole) {
 try (final Scanner scanner = new Scanner(warandpeace).useDelimiter("\\.")) {
 while (scanner.hasNext()) {
 blackhole.consume(scanner.next());
 }
 }
 }
 @Benchmark
 public void testScannerRegex(final Blackhole blackhole) {
 try (final Scanner scanner = new Scanner(warandpeace).useDelimiter(Pattern.compile("\\."))) {
 while (scanner.hasNext()) {
 blackhole.consume(scanner.next());
 }
 }
 }
 @Benchmark
 public void testScannerPrecompiledRegex(final Blackhole blackhole) {
 try (final Scanner scanner = new Scanner(warandpeace).useDelimiter(pattern)) {
 while (scanner.hasNext()) {
 blackhole.consume(scanner.next());
 }
 }
 }
 @Benchmark
 public void testSplitAsStream(final Blackhole blackhole) {
 Pattern.compile("\\.").splitAsStream(warandpeace).forEach(blackhole::consume);
 }
 @Benchmark
 public void testPrecompiledSplitAsStream(final Blackhole blackhole) {
 pattern.splitAsStream(warandpeace).forEach(blackhole::consume);
 }
}

For review is my usage of JMH. I don't have much experience with JMH and this will hopefully get me on right path to writing correct benchmarks, so nitpicks are also good!

com/boris/benchmark/warandpeace.txt is the full text of War and Peace as made available by Project Gutenberg.

I'm not going to post the full output of the benchmarks here, unless otherwise requested, but here is the result:

# Run complete. Total time: 00:47:13
Benchmark Mode Cnt Score Error Units
MyBenchmark.testPrecompiledSplitAsStream thrpt 200 355.893 ± 6.618 ops/s
MyBenchmark.testScannerPrecompiledRegex thrpt 200 75.811 ± 1.311 ops/s
MyBenchmark.testScannerRegex thrpt 200 76.246 ± 1.427 ops/s
MyBenchmark.testScannerString thrpt 200 75.690 ± 1.279 ops/s
MyBenchmark.testSplit thrpt 200 358.356 ± 4.745 ops/s
MyBenchmark.testSplitAsStream thrpt 200 348.294 ± 7.435 ops/s
MyBenchmark.testStringTokenizer thrpt 200 145.767 ± 2.585 ops/s
  • CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
  • RAM: 16GB DDR3 1333 MHz

My first post had a stupid mistake in the benchmarking code (see revision history for details), please disregard the earlier results which put Scanner with a String at ≅1 op/s.

asked Dec 7, 2015 at 11:35
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Your benchmarks look fine.

Instead of using the BlackHole you could probably have simply returned the number of sentences although that would have created an (unlikely) window for optimisations and using the BlackHole is safer.

answered Dec 22, 2015 at 11:32
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.