I am doing a bio-statistics calculation and the following code works. However, can someone help to improve the messy nested loop?
for(int i=0; i<NN; i++) {
for (int j=0; j<NN; j++) {
if (i != j){
thirdlayer = 0;
for (int k=0; k<NN; k++) {
fourthlayer = 0;
for (int l=0; l<NN; l++) {
fourthlayer = fourthlayer + V[j*NN+l]*V[NN+l]*J[k*NN+l];
}
thirdlayer = thirdlayer + V[k]*V[i*NN+k]*fourthlayer;
}
if(pi_cod[j] != 0)
Transitions[i*NN +j] = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thirdlayer/Padt;
}
}
}
3 Answers 3
Using small l
for the index is bad because it looks like the digit 1. It's better to use large L
.
Instead of j*NN
, it's better to use a cached index that increments by for
:
int NN2 = NN*NN
for(int iNN=k; iNN < (NN2+k); iNN+=NN) {
thirdlayer = thirdlayer + V[k]*V[iNN]*fourthlayer;
}
This could be a bit faster.
Another hook - more use pointer as array sintax: for 3layer better get a row vector in wich 4layer for process:
int* VVj = &(V[j*NN]);
int* VNN = &(V[NN]);
for (int k=0; k<NN; k++) {
int* JNNk = &(J[k*NN]);
fourthlayer = 0;
for (int l=0; l<NN; l++) {
fourthlayer = fourthlayer + VVj[l]*VNN[l]*JNNk[l];
}
thirdlayer = thirdlayer + V[k]*V[i*NN+k]*fourthlayer;
}
good compiler do it for you byself, but in such decomposition may better see data dependents, and it is a bit simpler and short
also you can deploy from 4layer V[j*NN+l]*V[NN+l] into stanalone vector that can be prepared in outer j cycle.
Instead of division (xxx)/Padt
(better=faster), use multiplication *(1/Padt)
, or move out from the last cycle:
double thp = thirdlayer/Padt;
for(int i=0; i<NN; i++) {
if(i != j && pi_cod[j] != 0)
Transitions[i*NN +j] = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thp;
}
Instead of a conditional calculation, it's better to use a conditional assignment since it can better optimized for x86:
double thp = thirdlayer/Padt;
for(int i=0; i<NN; i++) {
int pcodj = (pi_cod[j] != 0)?pi_cod[j]: 1;
double transition = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thp;
if(i != j && pi_cod[j] != 0)
Transitions[i*NN +j] = transition;
}
If it's rare misses for assignment, so penalty for calculation could be negligible.
Give this a shot. Though I suspect your compiler might have been doing this already.
for (int j=0; j<NN; j++) {
thirdlayer = 0;
for (int k=0; k<NN; k++) {
fourthlayer = 0;
for (int l=0; l<NN; l++) {
fourthlayer = fourthlayer + V[j*NN+l]*V[NN+l]*J[k*NN+l];
}
for(int i=0; i<NN; i++) {
thirdlayer = thirdlayer + V[k]*V[i*NN+k]*fourthlayer;
}
}
for(int i=0; i<NN; i++) {
if(i != j && pi_cod[j] != 0)
Transitions[i*NN +j] = sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[i*NN +j]*thirdlayer/Padt;
}
}
This is what nwellnhof meant. Now there are only 3 levels of nesting loops.
-
\$\begingroup\$ OK. I carefully looked into that and realized it was not correct way to re-factory the code at all \$\endgroup\$Adam Q– Adam Q2013年10月11日 00:21:45 +00:00Commented Oct 11, 2013 at 0:21
removed the layer from 3rd and 4th because it made things annoyingly long, but moved some of the initializers into the for() and precomputed some x*NN, changed the a = a +... to a+=... and moved the 4 layer into its for loop (thus the trailing semicolon)
for(int i=0, iN=0; i<NN; i++, iN=i*NN) {
for(int j=0, third=0, jN=0; j<NN; j++, jN=j*NN, third=0) {
for(int k=0, fourth=0, kN=0; i!=j && k<NN; third += V[k]*V[iN+k]*fourth, k++, kN=k*NN)
for(int l=0, jN=j*NN; l<NN; fourth+=V[jN+l]*V[NN+l]*J[kN+l], l++);
if (i!=j && pi_cod[j] != 0)
Transitions[iN+j]=sqrt(pi_cod[i]*pi_cod[1]/(pi_cod[0]*pi_cod[j]))*Q[iN+j]*third/Padt;
}
}
-
\$\begingroup\$ the loop initialized fourth cannot be used outside
l
loop, sothird+=..*fourth
fails. Similarly,iN
andjN
inTransitions[iN+j]
also fails as they are random number in this scope. \$\endgroup\$Adam Q– Adam Q2013年10月11日 17:16:40 +00:00Commented Oct 11, 2013 at 17:16 -
\$\begingroup\$ Thanks for editing. But the logical is still not right as it generates different result. Also, the running time seems even slower than original code \$\endgroup\$Adam Q– Adam Q2013年10月11日 20:25:20 +00:00Commented Oct 11, 2013 at 20:25
V
,J
,pi_cod
, andTransitions
? Why issqrt()
involved in matrix multiplication? \$\endgroup\$sqrt()
tells me it's the latter (assumingstd::
was left out), but looking at the rest of the code, I hope I'm wrong. \$\endgroup\$C
. \$\endgroup\$fourthlayer
values don't depend oni
, so you could precompute them for everyj
andk
. This should reduce time complexity fromO(NN^4)
toO(NN^3)
. \$\endgroup\$