Improve nested loop for bio-statistics calculation

Question 1

I am doing a bio-statistics calculation and the following code works. However, can someone help to improve the messy nested loop?

for(int i=0; i<NN; i++) {
 for (int j=0; j<NN; j++) {
 if (i != j){
 thirdlayer = 0;
 for (int k=0; k<NN; k++) {
 fourthlayer = 0;
 for (int l=0; l<NN; l++) {
 fourthlayer = fourthlayer + V[j*NN+l]*V[NN+l]*J[k*NN+l];
 }
 thirdlayer = thirdlayer + V[k]*V[i*NN+k]*fourthlayer;
 }
 if(pi_cod[j] != 0)
 Transitions[i*NN +j] = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thirdlayer/Padt;
 }
 }
}

Question 2

This doesn't look like matrix multiplication. What are V, J, pi_cod, and Transitions? Why is sqrt() involved in matrix multiplication?

Question 3

@200_success: I'd also like to know if this is C or C++. The sqrt() tells me it's the latter (assuming std:: was left out), but looking at the rest of the code, I hope I'm wrong.

Question 4

@Jamal I think it's C.

Question 5

The fourthlayer values don't depend on i, so you could precompute them for every j and k. This should reduce time complexity from O(NN^4) to O(NN^3).

Question 6

@200_success Sorry I said wrong. It is no matrix multiplication, it's some biol statistic algorithm.

Question 7

Using small l for the index is bad because it looks like the digit 1. It's better to use large L.

Instead of j*NN, it's better to use a cached index that increments by for:

int NN2 = NN*NN
for(int iNN=k; iNN < (NN2+k); iNN+=NN) {
 thirdlayer = thirdlayer + V[k]*V[iNN]*fourthlayer;
}

This could be a bit faster.

Another hook - more use pointer as array sintax: for 3layer better get a row vector in wich 4layer for process:

int* VVj = &(V[j*NN]);
int* VNN = &(V[NN]);
for (int k=0; k<NN; k++) {
 int* JNNk = &(J[k*NN]);
 fourthlayer = 0;
 for (int l=0; l<NN; l++) {
 fourthlayer = fourthlayer + VVj[l]*VNN[l]*JNNk[l];
 }
 thirdlayer = thirdlayer + V[k]*V[i*NN+k]*fourthlayer;
 }

good compiler do it for you byself, but in such decomposition may better see data dependents, and it is a bit simpler and short

also you can deploy from 4layer V[j*NN+l]*V[NN+l] into stanalone vector that can be prepared in outer j cycle.

Instead of division (xxx)/Padt (better=faster), use multiplication *(1/Padt), or move out from the last cycle:

double thp = thirdlayer/Padt;
for(int i=0; i<NN; i++) { 
 if(i != j && pi_cod[j] != 0)
 Transitions[i*NN +j] = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thp;
 }

Instead of a conditional calculation, it's better to use a conditional assignment since it can better optimized for x86:

double thp = thirdlayer/Padt;
for(int i=0; i<NN; i++) {
 int pcodj = (pi_cod[j] != 0)?pi_cod[j]: 1;
 double transition = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thp;
 if(i != j && pi_cod[j] != 0)
 Transitions[i*NN +j] = transition;
 }

If it's rare misses for assignment, so penalty for calculation could be negligible.

Question 8

Give this a shot. Though I suspect your compiler might have been doing this already.

for (int j=0; j<NN; j++) {
 thirdlayer = 0;
 for (int k=0; k<NN; k++) {
 fourthlayer = 0;
 for (int l=0; l<NN; l++) {
 fourthlayer = fourthlayer + V[j*NN+l]*V[NN+l]*J[k*NN+l];
 }
 for(int i=0; i<NN; i++) {
 thirdlayer = thirdlayer + V[k]*V[i*NN+k]*fourthlayer;
 }
 }
 for(int i=0; i<NN; i++) { 
 if(i != j && pi_cod[j] != 0)
 Transitions[i*NN +j] = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thirdlayer/Padt;
 }
}

This is what nwellnhof meant. Now there are only 3 levels of nesting loops.

Question 9

OK. I carefully looked into that and realized it was not correct way to re-factory the code at all

Question 10

removed the layer from 3rd and 4th because it made things annoyingly long, but moved some of the initializers into the for() and precomputed some x*NN, changed the a = a +... to a+=... and moved the 4 layer into its for loop (thus the trailing semicolon)

for(int i=0, iN=0; i<NN; i++, iN=i*NN) {
 for(int j=0, third=0, jN=0; j<NN; j++, jN=j*NN, third=0) {
 for(int k=0, fourth=0, kN=0; i!=j && k<NN; third += V[k]*V[iN+k]*fourth, k++, kN=k*NN)
 for(int l=0, jN=j*NN; l<NN; fourth+=V[jN+l]*V[NN+l]*J[kN+l], l++);
 if (i!=j && pi_cod[j] != 0)
 Transitions[iN+j]=sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[iN+j]*third/Padt;
 }
}

Question 11

the loop initialized fourth cannot be used outside l loop, so third+=..*fourth fails. Similarly, iN and jN in Transitions[iN+j] also fails as they are random number in this scope.

Question 12

Thanks for editing. But the logical is still not right as it generates different result. Also, the running time seems even slower than original code

alexrayne alexrayne 112 bronze badges · Answer 1 · 2013-10-11 20:13:33Z

Using small l for the index is bad because it looks like the digit 1. It's better to use large L.

Instead of j*NN, it's better to use a cached index that increments by for:

int NN2 = NN*NN
for(int iNN=k; iNN < (NN2+k); iNN+=NN) {
 thirdlayer = thirdlayer + V[k]*V[iNN]*fourthlayer;
}

This could be a bit faster.

Another hook - more use pointer as array sintax: for 3layer better get a row vector in wich 4layer for process:

int* VVj = &(V[j*NN]);
int* VNN = &(V[NN]);
for (int k=0; k<NN; k++) {
 int* JNNk = &(J[k*NN]);
 fourthlayer = 0;
 for (int l=0; l<NN; l++) {
 fourthlayer = fourthlayer + VVj[l]*VNN[l]*JNNk[l];
 }
 thirdlayer = thirdlayer + V[k]*V[i*NN+k]*fourthlayer;
 }

good compiler do it for you byself, but in such decomposition may better see data dependents, and it is a bit simpler and short

also you can deploy from 4layer V[j*NN+l]*V[NN+l] into stanalone vector that can be prepared in outer j cycle.

Instead of division (xxx)/Padt (better=faster), use multiplication *(1/Padt), or move out from the last cycle:

double thp = thirdlayer/Padt;
for(int i=0; i<NN; i++) { 
 if(i != j && pi_cod[j] != 0)
 Transitions[i*NN +j] = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thp;
 }

Instead of a conditional calculation, it's better to use a conditional assignment since it can better optimized for x86:

double thp = thirdlayer/Padt;
for(int i=0; i<NN; i++) {
 int pcodj = (pi_cod[j] != 0)?pi_cod[j]: 1;
 double transition = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thp;
 if(i != j && pi_cod[j] != 0)
 Transitions[i*NN +j] = transition;
 }

If it's rare misses for assignment, so penalty for calculation could be negligible.

Jean-Bernard Pellerin Jean-Bernard Pellerin 5425 silver badges24 bronze badges · Answer 2 · 2013-10-08 17:07:30Z

Give this a shot. Though I suspect your compiler might have been doing this already.

for (int j=0; j<NN; j++) {
 thirdlayer = 0;
 for (int k=0; k<NN; k++) {
 fourthlayer = 0;
 for (int l=0; l<NN; l++) {
 fourthlayer = fourthlayer + V[j*NN+l]*V[NN+l]*J[k*NN+l];
 }
 for(int i=0; i<NN; i++) {
 thirdlayer = thirdlayer + V[k]*V[i*NN+k]*fourthlayer;
 }
 }
 for(int i=0; i<NN; i++) { 
 if(i != j && pi_cod[j] != 0)
 Transitions[i*NN +j] = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thirdlayer/Padt;
 }
}

This is what nwellnhof meant. Now there are only 3 levels of nesting loops.

OK. I carefully looked into that and realized it was not correct way to re-factory the code at all

technosaurus technosaurus 1715 bronze badges · Answer 3 · 2013-10-11 05:27:34Z

removed the layer from 3rd and 4th because it made things annoyingly long, but moved some of the initializers into the for() and precomputed some x*NN, changed the a = a +... to a+=... and moved the 4 layer into its for loop (thus the trailing semicolon)

for(int i=0, iN=0; i<NN; i++, iN=i*NN) {
 for(int j=0, third=0, jN=0; j<NN; j++, jN=j*NN, third=0) {
 for(int k=0, fourth=0, kN=0; i!=j && k<NN; third += V[k]*V[iN+k]*fourth, k++, kN=k*NN)
 for(int l=0, jN=j*NN; l<NN; fourth+=V[jN+l]*V[NN+l]*J[kN+l], l++);
 if (i!=j && pi_cod[j] != 0)
 Transitions[iN+j]=sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[iN+j]*third/Padt;
 }
}

the loop initialized fourth cannot be used outside l loop, so third+=..*fourth fails. Similarly, iN and jN in Transitions[iN+j] also fails as they are random number in this scope.
Thanks for editing. But the logical is still not right as it generates different result. Also, the running time seems even slower than original code

Stack Exchange Network

Improve nested loop for bio-statistics calculation

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Improve nested loop for bio-statistics calculation

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions