Can you give me some performance advice on how to optimize the time of execution of the following calculation of E?
package calculatee;
import java.math.BigDecimal;
import java.math.MathContext;
import java.math.RoundingMode;
public class CalculateE {
static long start1 = System.nanoTime();
public static BigDecimal factorial(int x) {
BigDecimal prod = new BigDecimal("1");
for (int i = x; i > 1; i--) {
prod = prod.multiply(new BigDecimal(i));
}
return prod;
}
public static void main(String[] args) {
BigDecimal e = BigDecimal.ONE;
for (int i = 1; i < 1000; i++) {
e = e.add(BigDecimal.valueOf(Math.pow(3 * i, 2) + 1).divide(factorial(3 * i),new MathContext(10000, RoundingMode.HALF_UP)));
}
System.out.println("e = " + e);
long stop = System.nanoTime();
long diff = stop - start1;
System.out.println(diff + " ns");
}
}
The time of execution is about 3055588556 ns.
3 Answers 3
you can split up the calculation in your for
loop into threads and sum up the results afterwards: here an example for 4 Threads: Use as many Threads as you have CPU
s. It's important to add the result inside a synchronized
block, otherwise you result could be corrupted by race conditions. I get a speed up of 50% with 4 threads compared to a single core execution.
import java.math.BigDecimal;
import java.math.MathContext;
import java.math.RoundingMode;
public class CalculateE {
static long start1 = System.currentTimeMillis();
static BigDecimal result = BigDecimal.ONE;
public static BigDecimal factorial(int x) {
BigDecimal prod = new BigDecimal("1");
for (int i = x; i > 1; i--) {
prod = prod.multiply(new BigDecimal(i));
}
return prod;
}
public static Runnable getRunner(final int from, final int to) {
Runnable runner = new Runnable() {
@Override
public void run() {
BigDecimal e = BigDecimal.ZERO;
for (int i = from; i < to; i++) {
e = e.add(BigDecimal
.valueOf(Math.pow(3 * i, 2) + 1)
.divide(factorial(3 * i),
new MathContext(10000, RoundingMode.HALF_UP)));
}
addResult(e);
}
};
return runner;
}
public static synchronized void addResult(BigDecimal e){
result = result.add(e);
}
public static void main(String[] args) throws InterruptedException {
Runnable r1 = getRunner(1, 251);
Runnable r2 = getRunner(251, 501);
Runnable r3 = getRunner(501, 751);
Runnable r4 = getRunner(751, 1000);
Thread t1 = new Thread(r1);
t1.start();
Thread t2 = new Thread(r2);
t2.start();
Thread t3 = new Thread(r3);
t3.start();
Thread t4 = new Thread(r4);
t4.start();
t1.join();
t2.join();
t3.join();
t4.join();
System.out.println("e = " + result);
long stop = System.currentTimeMillis();
long diff = stop - start1;
System.out.println(diff + " ms");
}
}
-
1\$\begingroup\$ Nice that you say use as many processors as you have. Pitty you don't make it generic by asking the pc his available processors by using :
Runtime.getRuntime().availableProcessors();
and using one AtomicInteger what you use in the 4 threads for counting to the end, so if one thread crash, the other ones shall continue to do the whole work. \$\endgroup\$chillworld– chillworld2014年05月06日 11:52:18 +00:00Commented May 6, 2014 at 11:52 -
\$\begingroup\$ @chillworld why would one thread crash, and why is it ok to continue if it did? \$\endgroup\$Peter Lawrey– Peter Lawrey2014年05月06日 12:43:21 +00:00Commented May 6, 2014 at 12:43
-
\$\begingroup\$ In this example normally there shall not be a crash, but this is a system you can use for every task splitting. The advantage is also that your x threads finish simultane, what is maybe not true when you split it in x equal divisions. \$\endgroup\$chillworld– chillworld2014年05月06日 12:47:56 +00:00Commented May 6, 2014 at 12:47
-
\$\begingroup\$ @chillworld Good point. It is unlikely that the work is the same for all threads. \$\endgroup\$Peter Lawrey– Peter Lawrey2014年05月06日 12:49:14 +00:00Commented May 6, 2014 at 12:49
-
\$\begingroup\$ Sometimes optimising the code gives you more improvement than using more threads. \$\endgroup\$Peter Lawrey– Peter Lawrey2014年05月06日 12:50:49 +00:00Commented May 6, 2014 at 12:50
This is 4.5x faster than the original code.
On my machine, the single threaded version was about the same speed as the multi-threaded version I wrote. The single thread version was much simpler.
import java.math.BigDecimal;
public class CalculateE {
public static void main(String... ignored) {
long start = System.nanoTime();
BigDecimal e = BigDecimal.ONE;
BigDecimal factorial = BigDecimal.ONE;
for (int i = 1; i < 1000; i++) {
long x = i * 3;
factorial = factorial.multiply(BigDecimal.valueOf(x * (x - 1) * (x - 2)));
e = e.add(BigDecimal.valueOf(9L * i * i + 1).divide(factorial, 9200, BigDecimal.ROUND_HALF_UP));
}
long time = System.nanoTime() - start;
System.out.println("e = " + e);
System.out.println("Took " + time / 1e9 + " seconds to calculate");
}
}
prints
e = 2.7.....
Took 0.796560819 seconds to calculate
Note: running the multi-threaded example above reported
2626 ms
A standalone factorial()
function is always a big red flag. Same goes for pow()
. When calculating power series most often they just not needed.
Consider a Horner schema instead (pseudocode below):
In a most straightforward way, without using 3i
trick, the calculation would looks like
e = 1.0
for i = N, i > 0, i -= 1
e = 1.0 + e/i
No factorials, no powers.
If using the trick, I'd first split the formula into two sums:
e = sum(1 / (3i)!) + sum((3i)**2 / (3i)!)
calculate them independently, and add the results. The first schema is a trivial adaptation of the above one; the second one is just a bit more involved:
result = 1.0
for i = 3*N, i > 0, i -= 3
result = i*i * (1.0 + result/(i*(i-1)*(i-2)))
No factorials, no powers. No need for big integers as well.
PS: the biggest problem with the Horner schema is estimating the iteration count for the desired accuracy. Sometimes a heavy math is necessary. In such a simple case like e
it is almost trivial.