I'm learning Scala, and I wrote a program to estimate the value of Pi. When I'm using multiple threads, it takes 5-6 times longer to calculate the same iterations. I wrote a similar program in Java, with plain old threads, and with ExecutorService
, and the results are the same.
Why is this the case?
(I have 4 cores in my processor, and during the multithreading calculation, the processor usage goes up to 100%.)
import java.util.Random
import java.util.concurrent.Executors
import java.util.concurrent.Callable
object Main {
def main(args: Array[String]): Unit = {
println("Calculating PI using the Monte-Carlo method")
val n = 10000000
evaluatePiCalculator(new SimplePiCalculator(n))
evaluatePiCalculator(new ThreadedPiCalculator(n))
}
def evaluatePiCalculator(calc: PiCalculator) = {
val pi = time { calc.calculate() }
println(pi)
val d = Math.abs(Math.PI - pi)
println(d)
}
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
val tdiff = t1 - t0
println("Elapsed time: " + tdiff + "ns" + " " + (tdiff / 1000000) + "ms")
result
}
}
object Circle {
class Point(val x: Double, val y: Double) {
def isInside: Boolean = Math.sqrt(x * x + y * y) < 1.0
}
val random = new Random()
private def randomPoint(): Point = {
def generate() = random.nextDouble() * 2 - 1
new Point(generate(), generate())
}
def tryToHit(): Boolean = randomPoint().isInside
}
abstract class PiCalculator() {
def calculate(): Double
}
class SimplePiCalculator(n: Int) extends PiCalculator with Callable[Int] {
override def call(): Int = calculateHits()
def calculateHits(): Int = {
var hits = 0
for (i <- 1 to n) {
if (Circle.tryToHit()) hits += 1
}
hits
}
def calculate(): Double = 4.0 * calculateHits / n
}
class ThreadedPiCalculator(n: Int) extends PiCalculator {
val threads = 4
val pool = Executors.newFixedThreadPool(threads)
def calculate(): Double = {
val f1 = pool.submit(new SimplePiCalculator(n / threads))
val f2 = pool.submit(new SimplePiCalculator(n / threads))
val f3 = pool.submit(new SimplePiCalculator(n / threads))
val f4 = pool.submit(new SimplePiCalculator(n / threads))
val (h1, h2, h3, h4) = (f1.get, f2.get, f3.get, f4.get)
pool.shutdown()
4.0 * (h1 + h2 + h3 + h4) / n
}
}
Here are the results of the run:
Calculating PI using the Monte-Carlo method Elapsed time: 473156580ns 473ms 3.1405192 0.0010734535897931607 Elapsed time: 3413799647ns 3413ms 3.1424036 8.109464102070696E-4
Update:
If I change the Circle
object
to a class
, and create an instance inside of SimplePiCalculator
, and use that instance to calculate the hits, the expected performance discrepancy disappears. The calculation takes half the time of the single threaded implementation.
So the cause of this was using a common object from multiple threads.
However, when I tried to apply the same with the Java solution (i.e. change the tryToHit
method from static to instance call), it did not work.
Can I ask, how to debug multithreaded code? What are the tools, methods, guidelines? How to determine the cause of slowdowns?
-
\$\begingroup\$ I think you could probably do more than 4 threads. You have 4 processors and that usually means you have more threads... \$\endgroup\$Dair– Dair2016年09月03日 08:23:16 +00:00Commented Sep 3, 2016 at 8:23
-
\$\begingroup\$ Scratch that, you may not... But I would consider experimenting... \$\endgroup\$Dair– Dair2016年09月03日 08:26:25 +00:00Commented Sep 3, 2016 at 8:26
2 Answers 2
The cause of the long execution times in the multithreaded case were the use of static methods, fields, objects. In the scala code code above, if I change the Circle class to object:
class Circle {
class Point(val x: Double, val y: Double) {
def isInside: Boolean = Math.sqrt(x * x + y * y) < 1.0
}
val random = new Random()
private def randomPoint(): Point = {
def generate() = random.nextDouble() * 2 - 1
new Point(generate(), generate())
}
def tryToHit(): Boolean = randomPoint().isInside
}
And then create a Circle instance inside of the PiCalculator class:
class SimplePiCalculator(n: Int) extends PiCalculator with Callable[Int] {
override def call(): Int = calculateHits()
val circle = new Circle
def calculateHits(): Int = {
var hits = 0
for (i <- 1 to n) {
if (circle.tryToHit()) hits += 1
}
hits
}
def calculate(): Double = 4.0 * calculateHits / n
}
Then execution times are as follows:
Calculating PI using the Monte-Carlo method Elapsed time: 481745137ns 481ms 3.142086 4.933464102068186E-4 Elapsed time: 219407150ns 219ms 3.1410116 5.810535897929903E-4
Also in the Java solution, I've found, that the Circle class was using a public static final Random RANDOM
static variable to generate random points, and this was the cause of the slowdown.
So it looks like, when writing multithreaded code, one must avoid using common objects, and static methods/variables.
If you wanted to reduce the run time even further the following changes could be made:
class SimplePiCalculator(n: Int) extends PiCalculator with Callable[Int] {
val random = new Random()
override def call(): Int = {
var hits = 0
var i = 0
while (i < n) {
val x = random.nextDouble()
val y = random.nextDouble()
if (x * x + y * y < 1.0) hits += 1
i += 1
}
hits
}
def calculate(): Double = 4.0 * call / n
}
class ThreadedPiCalculator(n: Int) extends PiCalculator {
val threads = /* users choice */
val pool = Executors.newFixedThreadPool(threads)
def calculate(): Double = {
val res =
(for (i <- 0 until threads) yield {
pool.submit(new SimplePiCalculator(n / threads))
}).map(_.get).sum
pool.shutdown()
4.0 * res / n
}
}
With the above code if I set threads
to 10
my run time dropped to ~70 ms (naturally this number will vary on different machines). This code is a bit more bare-bones than what you have and may not fit your style but it is quite a bit quicker than what you have posted.
The significant changes are
- The removal of the
Circle
class which for a task as simple as this just added cognitive overhead for me. - The removal of
sqrt
and a*
(this may not hold outside of unit circle, but the unit circle is all we need in this case :P) - Extending the functionality of
ThreadPiCalculator(...)
so users can experiment with different values forthreads
without have to change a lot of code.
Explore related questions
See similar questions with these tags.