The method for the shorter strings uses prime number multiplication and division and lives entirely on the stack if I am not completely mistaken.
The method for the shorter strings uses prime number multiplication and division and lives on the stack if I am not completely mistaken.
The method for the shorter strings uses prime number multiplication and division and lives entirely on the stack if I am not mistaken.
In the interview I basically came up with thisa Map/histogram solution (a bit polished now):and in retrospective I implemented another solution, focusing on reducing space complexity.
I basically came up with this solution (a bit polished now):
In the interview I came up with a Map/histogram solution and in retrospective I implemented another solution, focusing on reducing space complexity.
UPDATE #1BENCHMARK
As it looks like, the array variant beats the prime variant in terms of ops/ns in my benchmark. My out-of-the-blue guess is that with the prime variant, a lot of multiplication/division/modulo operations are performed while the array variant simply uses increment and decrement on the values.
The allocations, as expected look much better with the prime numbers variant. Here is the benchmark:
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
@State(Scope.Benchmark)
public class AnagramBenchmark {
private static final int WORDS_ARRAY_SIZE = 40727;
private String [] words;
@Setup
public void setup() {
try ( InputStream is = getClass().getResourceAsStream("/9-letter-words.txt");
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr)) {
words = new String[WORDS_ARRAY_SIZE];
for (int i = 0; ;i++) {
String line = br.readLine();
if (line == null) {
break;
}
words[i] = line;
}
} catch (Exception e) {
e.printStackTrace();
}
}
@Benchmark
@OperationsPerInvocation(WORDS_ARRAY_SIZE - 1)
public void primsAnagram(Blackhole bh) {
for (int i = 0; i < (WORDS_ARRAY_SIZE - 1); i++) {
String s1 = words[i];
String s2 = words[i + 1];
bh.consume(Anagram.isAnagramUsingPrimes(s1, s2));
}
}
@Benchmark
@OperationsPerInvocation(WORDS_ARRAY_SIZE - 1)
public void arrayAnagram(Blackhole bh) {
for (int i = 0; i < (WORDS_ARRAY_SIZE - 1); i++) {
String s1 = words[i];
String s2 = words[i + 1];
bh.consume(Anagram.isAnagramUsingArray(s1, s2));
}
}
}
Here is the the command to run it from the console, including a profile that measures allocations as well:
mvn clean install && java -jar target/benchmarks.jar AnagramBenchmark -prof gc
# Run complete. Total time: 00:02:35
Benchmark Mode Cnt Score Error Units
AnagramBenchmark.arrayAnagram avgt 25 76 48.959758 ± 2 0.046995 ns/op
AnagramBenchmark.primsAnagramarrayAnagram:·gc.alloc.rate avgt 25 1261564.539522 ± 32.580 MB/sec
AnagramBenchmark.arrayAnagram:·gc.alloc.rate.norm avgt 25 120.000 ± 0.849001 B/op
AnagramBenchmark.arrayAnagram:·gc.churn.PS_Eden_Space avgt 25 1580.981 ± 157.861 MB/sec
AnagramBenchmark.arrayAnagram:·gc.churn.PS_Eden_Space.norm avgt 25 121.312 ± 12.255 B/op
AnagramBenchmark.arrayAnagram:·gc.churn.PS_Survivor_Space avgt 25 0.087 ± 0.019 MB/sec
AnagramBenchmark.arrayAnagram:·gc.churn.PS_Survivor_Space.norm avgt 25 0.007 ± 0.001 B/op
AnagramBenchmark.arrayAnagram:·gc.count avgt 25 81.000 counts
AnagramBenchmark.arrayAnagram:·gc.time avgt 25 47.000 ms
AnagramBenchmark.primsAnagram avgt 25 124.970 ± 3.350 ns/op
AnagramBenchmark.primsAnagram:·gc.alloc.rate avgt 25 ≈ 10−4 MB/sec
AnagramBenchmark.primsAnagram:·gc.alloc.rate.norm avgt 25 ≈ 10−4 B/op
AnagramBenchmark.primsAnagram:·gc.count avgt 25 ≈ 0 counts
UPDATE #1
As it looks like, the array variant beats the prime variant in my benchmark. My out-of-the-blue guess is that with the prime variant, a lot of multiplication/division/modulo operations are performed while the array variant simply uses increment and decrement on the values.
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
@State(Scope.Benchmark)
public class AnagramBenchmark {
private static final int WORDS_ARRAY_SIZE = 40727;
private String [] words;
@Setup
public void setup() {
try ( InputStream is = getClass().getResourceAsStream("/9-letter-words.txt");
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr)) {
words = new String[WORDS_ARRAY_SIZE];
for (int i = 0; ;i++) {
String line = br.readLine();
if (line == null) {
break;
}
words[i] = line;
}
} catch (Exception e) {
e.printStackTrace();
}
}
@Benchmark
@OperationsPerInvocation(WORDS_ARRAY_SIZE - 1)
public void primsAnagram(Blackhole bh) {
for (int i = 0; i < (WORDS_ARRAY_SIZE - 1); i++) {
String s1 = words[i];
String s2 = words[i + 1];
bh.consume(Anagram.isAnagramUsingPrimes(s1, s2));
}
}
@Benchmark
@OperationsPerInvocation(WORDS_ARRAY_SIZE - 1)
public void arrayAnagram(Blackhole bh) {
for (int i = 0; i < (WORDS_ARRAY_SIZE - 1); i++) {
String s1 = words[i];
String s2 = words[i + 1];
bh.consume(Anagram.isAnagramUsingArray(s1, s2));
}
}
}
Benchmark Mode Cnt Score Error Units
AnagramBenchmark.arrayAnagram avgt 25 76.959 ± 2.046 ns/op
AnagramBenchmark.primsAnagram avgt 25 126.539 ± 0.849 ns/op
BENCHMARK
As it looks like, the array variant beats the prime variant in terms of ops/ns in my benchmark. My out-of-the-blue guess is that with the prime variant, a lot of multiplication/division/modulo operations are performed while the array variant simply uses increment and decrement on the values.
The allocations, as expected look much better with the prime numbers variant. Here is the benchmark:
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
@State(Scope.Benchmark)
public class AnagramBenchmark {
private static final int WORDS_ARRAY_SIZE = 40727;
private String [] words;
@Setup
public void setup() {
try ( InputStream is = getClass().getResourceAsStream("/9-letter-words.txt");
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr)) {
words = new String[WORDS_ARRAY_SIZE];
for (int i = 0; ;i++) {
String line = br.readLine();
if (line == null) {
break;
}
words[i] = line;
}
} catch (Exception e) {
e.printStackTrace();
}
}
@Benchmark
@OperationsPerInvocation(WORDS_ARRAY_SIZE - 1)
public void primsAnagram(Blackhole bh) {
for (int i = 0; i < (WORDS_ARRAY_SIZE - 1); i++) {
String s1 = words[i];
String s2 = words[i + 1];
bh.consume(Anagram.isAnagramUsingPrimes(s1, s2));
}
}
@Benchmark
@OperationsPerInvocation(WORDS_ARRAY_SIZE - 1)
public void arrayAnagram(Blackhole bh) {
for (int i = 0; i < (WORDS_ARRAY_SIZE - 1); i++) {
String s1 = words[i];
String s2 = words[i + 1];
bh.consume(Anagram.isAnagramUsingArray(s1, s2));
}
}
}
Here is the the command to run it from the console, including a profile that measures allocations as well:
mvn clean install && java -jar target/benchmarks.jar AnagramBenchmark -prof gc
# Run complete. Total time: 00:02:35
Benchmark Mode Cnt Score Error Units
AnagramBenchmark.arrayAnagram avgt 25 48.758 ± 0.995 ns/op
AnagramBenchmark.arrayAnagram:·gc.alloc.rate avgt 25 1564.522 ± 32.580 MB/sec
AnagramBenchmark.arrayAnagram:·gc.alloc.rate.norm avgt 25 120.000 ± 0.001 B/op
AnagramBenchmark.arrayAnagram:·gc.churn.PS_Eden_Space avgt 25 1580.981 ± 157.861 MB/sec
AnagramBenchmark.arrayAnagram:·gc.churn.PS_Eden_Space.norm avgt 25 121.312 ± 12.255 B/op
AnagramBenchmark.arrayAnagram:·gc.churn.PS_Survivor_Space avgt 25 0.087 ± 0.019 MB/sec
AnagramBenchmark.arrayAnagram:·gc.churn.PS_Survivor_Space.norm avgt 25 0.007 ± 0.001 B/op
AnagramBenchmark.arrayAnagram:·gc.count avgt 25 81.000 counts
AnagramBenchmark.arrayAnagram:·gc.time avgt 25 47.000 ms
AnagramBenchmark.primsAnagram avgt 25 124.970 ± 3.350 ns/op
AnagramBenchmark.primsAnagram:·gc.alloc.rate avgt 25 ≈ 10−4 MB/sec
AnagramBenchmark.primsAnagram:·gc.alloc.rate.norm avgt 25 ≈ 10−4 B/op
AnagramBenchmark.primsAnagram:·gc.count avgt 25 ≈ 0 counts