Better performance in calculating E

Question 1

Can you give me some performance advice on how to optimize the time of execution of the following calculation of E?

package calculatee;
import java.math.BigDecimal;
import java.math.MathContext;
import java.math.RoundingMode;
public class CalculateE {
 static long start1 = System.nanoTime();
 public static BigDecimal factorial(int x) {
 BigDecimal prod = new BigDecimal("1");
 for (int i = x; i > 1; i--) {
 prod = prod.multiply(new BigDecimal(i));
 }
 return prod;
 }
 public static void main(String[] args) {
 BigDecimal e = BigDecimal.ONE;
 for (int i = 1; i < 1000; i++) {
 e = e.add(BigDecimal.valueOf(Math.pow(3 * i, 2) + 1).divide(factorial(3 * i),new MathContext(10000, RoundingMode.HALF_UP)));
 }
 System.out.println("e = " + e);
 long stop = System.nanoTime();
 long diff = stop - start1;
 System.out.println(diff + " ns");
 }
}

The time of execution is about 3055588556 ns.

Question 2

you can split up the calculation in your for loop into threads and sum up the results afterwards: here an example for 4 Threads: Use as many Threads as you have CPUs. It's important to add the result inside a synchronized block, otherwise you result could be corrupted by race conditions. I get a speed up of 50% with 4 threads compared to a single core execution.

import java.math.BigDecimal;
import java.math.MathContext;
import java.math.RoundingMode;
public class CalculateE {
 static long start1 = System.currentTimeMillis();
 static BigDecimal result = BigDecimal.ONE;
 public static BigDecimal factorial(int x) {
 BigDecimal prod = new BigDecimal("1");
 for (int i = x; i > 1; i--) {
 prod = prod.multiply(new BigDecimal(i));
 }
 return prod;
 }
 public static Runnable getRunner(final int from, final int to) {
 Runnable runner = new Runnable() {
 @Override
 public void run() {
 BigDecimal e = BigDecimal.ZERO;
 for (int i = from; i < to; i++) {
 e = e.add(BigDecimal
 .valueOf(Math.pow(3 * i, 2) + 1)
 .divide(factorial(3 * i),
 new MathContext(10000, RoundingMode.HALF_UP)));
 }
 addResult(e);
 }
 };
 return runner;
 }
 public static synchronized void addResult(BigDecimal e){
 result = result.add(e);
 }
 public static void main(String[] args) throws InterruptedException {
 Runnable r1 = getRunner(1, 251);
 Runnable r2 = getRunner(251, 501);
 Runnable r3 = getRunner(501, 751);
 Runnable r4 = getRunner(751, 1000);
 Thread t1 = new Thread(r1);
 t1.start();
 Thread t2 = new Thread(r2);
 t2.start();
 Thread t3 = new Thread(r3);
 t3.start();
 Thread t4 = new Thread(r4);
 t4.start();
 t1.join();
 t2.join();
 t3.join();
 t4.join();
 System.out.println("e = " + result);
 long stop = System.currentTimeMillis();
 long diff = stop - start1;
 System.out.println(diff + " ms");
 }
}

Question 3

Nice that you say use as many processors as you have. Pitty you don't make it generic by asking the pc his available processors by using : Runtime.getRuntime().availableProcessors(); and using one AtomicInteger what you use in the 4 threads for counting to the end, so if one thread crash, the other ones shall continue to do the whole work.

Question 4

@chillworld why would one thread crash, and why is it ok to continue if it did?

Question 5

In this example normally there shall not be a crash, but this is a system you can use for every task splitting. The advantage is also that your x threads finish simultane, what is maybe not true when you split it in x equal divisions.

Question 6

@chillworld Good point. It is unlikely that the work is the same for all threads.

Question 7

Sometimes optimising the code gives you more improvement than using more threads.

Question 8

This is 4.5x faster than the original code.

On my machine, the single threaded version was about the same speed as the multi-threaded version I wrote. The single thread version was much simpler.

import java.math.BigDecimal;
public class CalculateE {
 public static void main(String... ignored) {
 long start = System.nanoTime();
 BigDecimal e = BigDecimal.ONE;
 BigDecimal factorial = BigDecimal.ONE;
 for (int i = 1; i < 1000; i++) {
 long x = i * 3;
 factorial = factorial.multiply(BigDecimal.valueOf(x * (x - 1) * (x - 2)));
 e = e.add(BigDecimal.valueOf(9L * i * i + 1).divide(factorial, 9200, BigDecimal.ROUND_HALF_UP));
 }
 long time = System.nanoTime() - start;
 System.out.println("e = " + e);
 System.out.println("Took " + time / 1e9 + " seconds to calculate");
 }
}

prints

e = 2.7.....
Took 0.796560819 seconds to calculate

Note: running the multi-threaded example above reported

2626 ms

Question 9

A standalone factorial() function is always a big red flag. Same goes for pow(). When calculating power series most often they just not needed.

Consider a Horner schema instead (pseudocode below):

In a most straightforward way, without using 3i trick, the calculation would looks like

e = 1.0
for i = N, i > 0, i -= 1
 e = 1.0 + e/i

No factorials, no powers.

If using the trick, I'd first split the formula into two sums:

e = sum(1 / (3i)!) + sum((3i)**2 / (3i)!)

calculate them independently, and add the results. The first schema is a trivial adaptation of the above one; the second one is just a bit more involved:

result = 1.0
for i = 3*N, i > 0, i -= 3
 result = i*i * (1.0 + result/(i*(i-1)*(i-2)))

No factorials, no powers. No need for big integers as well.

PS: the biggest problem with the Horner schema is estimating the iteration count for the desired accuracy. Sometimes a heavy math is necessary. In such a simple case like e it is almost trivial.

Simulant Simulant 1593 silver badges8 bronze badges · Answer 1 · 2014-05-06 11:20:37Z

you can split up the calculation in your for loop into threads and sum up the results afterwards: here an example for 4 Threads: Use as many Threads as you have CPUs. It's important to add the result inside a synchronized block, otherwise you result could be corrupted by race conditions. I get a speed up of 50% with 4 threads compared to a single core execution.

import java.math.BigDecimal;
import java.math.MathContext;
import java.math.RoundingMode;
public class CalculateE {
 static long start1 = System.currentTimeMillis();
 static BigDecimal result = BigDecimal.ONE;
 public static BigDecimal factorial(int x) {
 BigDecimal prod = new BigDecimal("1");
 for (int i = x; i > 1; i--) {
 prod = prod.multiply(new BigDecimal(i));
 }
 return prod;
 }
 public static Runnable getRunner(final int from, final int to) {
 Runnable runner = new Runnable() {
 @Override
 public void run() {
 BigDecimal e = BigDecimal.ZERO;
 for (int i = from; i < to; i++) {
 e = e.add(BigDecimal
 .valueOf(Math.pow(3 * i, 2) + 1)
 .divide(factorial(3 * i),
 new MathContext(10000, RoundingMode.HALF_UP)));
 }
 addResult(e);
 }
 };
 return runner;
 }
 public static synchronized void addResult(BigDecimal e){
 result = result.add(e);
 }
 public static void main(String[] args) throws InterruptedException {
 Runnable r1 = getRunner(1, 251);
 Runnable r2 = getRunner(251, 501);
 Runnable r3 = getRunner(501, 751);
 Runnable r4 = getRunner(751, 1000);
 Thread t1 = new Thread(r1);
 t1.start();
 Thread t2 = new Thread(r2);
 t2.start();
 Thread t3 = new Thread(r3);
 t3.start();
 Thread t4 = new Thread(r4);
 t4.start();
 t1.join();
 t2.join();
 t3.join();
 t4.join();
 System.out.println("e = " + result);
 long stop = System.currentTimeMillis();
 long diff = stop - start1;
 System.out.println(diff + " ms");
 }
}

Nice that you say use as many processors as you have. Pitty you don't make it generic by asking the pc his available processors by using : Runtime.getRuntime().availableProcessors(); and using one AtomicInteger what you use in the 4 threads for counting to the end, so if one thread crash, the other ones shall continue to do the whole work.
@chillworld why would one thread crash, and why is it ok to continue if it did?
In this example normally there shall not be a crash, but this is a system you can use for every task splitting. The advantage is also that your x threads finish simultane, what is maybe not true when you split it in x equal divisions.
@chillworld Good point. It is unlikely that the work is the same for all threads.
Sometimes optimising the code gives you more improvement than using more threads.

Peter Lawrey Peter Lawrey 7183 silver badges7 bronze badges · Answer 2 · 2014-05-06 12:41:07Z

This is 4.5x faster than the original code.

On my machine, the single threaded version was about the same speed as the multi-threaded version I wrote. The single thread version was much simpler.

import java.math.BigDecimal;
public class CalculateE {
 public static void main(String... ignored) {
 long start = System.nanoTime();
 BigDecimal e = BigDecimal.ONE;
 BigDecimal factorial = BigDecimal.ONE;
 for (int i = 1; i < 1000; i++) {
 long x = i * 3;
 factorial = factorial.multiply(BigDecimal.valueOf(x * (x - 1) * (x - 2)));
 e = e.add(BigDecimal.valueOf(9L * i * i + 1).divide(factorial, 9200, BigDecimal.ROUND_HALF_UP));
 }
 long time = System.nanoTime() - start;
 System.out.println("e = " + e);
 System.out.println("Took " + time / 1e9 + " seconds to calculate");
 }
}

prints

e = 2.7.....
Took 0.796560819 seconds to calculate

Note: running the multi-threaded example above reported

2626 ms

vnp vnp 58.6k4 gold badges55 silver badges144 bronze badges · Answer 3 · 2014-05-06 19:34:49Z

A standalone factorial() function is always a big red flag. Same goes for pow(). When calculating power series most often they just not needed.

Consider a Horner schema instead (pseudocode below):

In a most straightforward way, without using 3i trick, the calculation would looks like

e = 1.0
for i = N, i > 0, i -= 1
 e = 1.0 + e/i

No factorials, no powers.

If using the trick, I'd first split the formula into two sums:

e = sum(1 / (3i)!) + sum((3i)**2 / (3i)!)

calculate them independently, and add the results. The first schema is a trivial adaptation of the above one; the second one is just a bit more involved:

result = 1.0
for i = 3*N, i > 0, i -= 3
 result = i*i * (1.0 + result/(i*(i-1)*(i-2)))

No factorials, no powers. No need for big integers as well.

PS: the biggest problem with the Horner schema is estimating the iteration count for the desired accuracy. Sometimes a heavy math is necessary. In such a simple case like e it is almost trivial.

Stack Exchange Network

Better performance in calculating E

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Better performance in calculating E

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions