Efficient implementation of a Dynamic Programming challenge

Question 1

I wrote the following code for a Dynamic Programming problem but the online judge gives me a 'Time Limit Exceeded' error. How can I optimize this code to make it more efficient, especially the part of initializing the DP array to 0.0 (I am using memset() right now)?

Problem:

There are N values in an array. Two players take turn by turn to play the game.
Player1, with 50% probability can pick the left most value or the right most value. 3.If there is only one value left in the array, the player, having no more choice, will just pick up that value.
Question is to calculate the expected sum of values that First player can obtain.
\$T <= 500\$ and \$N <= 2000\$

#include<iostream>
#include<iomanip>
#include<cstdio>
#include<cstring>
using namespace std;
#define max 2000
int main(){
 int T;
 double **dp = new double*[max];
 for(int i=0; i<max; ++i)
 dp[i] = new double[max];
 cin >> T;
 while(T--){
 int N;
 cin >> N;
 int V[N];
 for(int i=0; i<N; ++i)
 cin >> V[i] ;
 for(int i=0; i<N; ++i)
 memset(dp[i], 0.0, sizeof(dp[0][0])*N);
 for(int i=0; i<N; ++i)
 dp[i][i] = V[i];
 for(int j=1; j<N; ++j)
 for(int i=j-1; i>=0; --i){
 dp[j][i] = 0.5 * ( dp[i][i] + dp[j][j] + dp[j-1][i+1] );
 if( (i+2)<N )
 dp[j][i] += (0.250 * dp[j][i+2]);
 if( (j-2)>=0 )
 dp[j][i] += (0.250 * dp[j-2][i]);
 }
 cout << setprecision (5) << dp[N-1][0] << endl;
 }
 for(int i=0; i<max; ++i)
 delete[] dp[i];
 delete dp;
 return 0;
}

Question 2

Is there a public url that you could point us to for information on the problem?

Question 3

Glenn: I've updated the Question.

Question 4

What is the time limit? Do you have any idea how much you're failing it by? That should help indicate if another approach is required.

Question 5

sadly, I don't get to know the time when my code gets interuppted but for this question they give 4sec on their servers for K<=500 and N<=2000. I too guess theres' nothing much left to optimize so I've to come up with something better than O(N^2). like- O(NlogN) or O(N). Thanks for all your help guys.

Question 6

It looks like the nested loops of your algorithm are using only three rows of the dp matrix:

dp[j]
dp[j-1]
dp[j-2]

You can use this observation to transform your solution from one requiring O(N^2) space to one requiring O(N) space by changing

double **dp = new double*[max];

to

double dp[3][max]; // Now your matrix has a chance of fitting on the stack

and adding %3 to all operations that touch the first index of the dp matrix.

This may or may not eliminate the timeout, but it will make your program better by reducing the amount of resources that it needs to operate.

Question 7

Thanks for suggesting a memory efficient approach. I've updated my code but it still goes over the allowed Time Limit!

Question 8

@srbh.kmr Does it produce the correct output? Can you switch from double to float?

Question 9

I think so. Yes, it never gives me a 'Wrong answer' just runs out of time when submitted. and yes, I can change double to float but what difference would it make specifically? i wonder.

Question 10

@srbh.kmr Changing doubles to floats will reduce the amount of data needed for computation in half, improving the chance of hitting the cache more often.

Question 11

Oh. Right. I was under the wrong impression that both double and float are 4Bytes.

Question 12

There are some things that I can think of:

Is it quicker to deal with one blob of size max*max, and replacing dp[i][j] with dp[i+j*N]?
Are you sure that your values are doubles, or could they be 32, 16, or even 8-bit integers? - less space to alloc/memset. Similarly, is your 2000 limit a specified one or one "large enough"?
I'm not convinced that you actually need memset at all (You'll need to guard the dp[j-1][i+1] term which reaches below the diagonal for the first cells)! Because you're always refering to values to the 'bottom left' of the current cell, you never access values generated by the previous run...
Also, because you "shouldn't" access values below the diagonal, all the dp[i] lines are actually (N - i) in length, so less to alloc/zero. You will though need to make other changes to the indices in the loops (and especially guard the dp[j-1][i+1] term).

And although it won't actually save any time, you can 'squash' a couple of loops together.

for(int i=0; i<N; ++i) {
 int V;
 cin >> V;
 memset( dp[i], 0, sizeof(dp[0][0])*N );
 dp[i][i] = V;
}

And don't forget that the final delete should be delete[] dp;!

Question 13

Thanks for your review. Points noted but yes, result has to be a double. and T<=500 and N<=2000 is specified. and I need to do memset in order to zero out the dp[][] array for every new testcases. I know I'm allocating almost twice as many elements as I need to, but please take a look at my updated code. It still gives me 'Time Limit Exceeded' error!

Question 14

Actually, yes I can see I don't need that memset at all. Thanks,

score 2 · Answer 1 · 2012-10-03 14:31:37Z

2

\$\begingroup\$

It looks like the nested loops of your algorithm are using only three rows of the dp matrix:

dp[j]
dp[j-1]
dp[j-2]

You can use this observation to transform your solution from one requiring O(N^2) space to one requiring O(N) space by changing

double **dp = new double*[max];

to

double dp[3][max]; // Now your matrix has a chance of fitting on the stack

and adding %3 to all operations that touch the first index of the dp matrix.

This may or may not eliminate the timeout, but it will make your program better by reducing the amount of resources that it needs to operate.

Share

answered Oct 3, 2012 at 14:31

Sergey Kalinichenko's user avatar

Sergey Kalinichenko Sergey Kalinichenko

5221 gold badge7 silver badges10 bronze badges

\$\endgroup\$

6

\$\begingroup\$ Thanks for suggesting a memory efficient approach. I've updated my code but it still goes over the allowed Time Limit! \$\endgroup\$

srbh.kmr
– srbh.kmr

2012年10月03日 17:09:22 +00:00
Commented Oct 3, 2012 at 17:09
\$\begingroup\$ @srbh.kmr Does it produce the correct output? Can you switch from double to float? \$\endgroup\$

Sergey Kalinichenko
– Sergey Kalinichenko

2012年10月03日 17:11:06 +00:00
Commented Oct 3, 2012 at 17:11
\$\begingroup\$ I think so. Yes, it never gives me a 'Wrong answer' just runs out of time when submitted. and yes, I can change double to float but what difference would it make specifically? i wonder. \$\endgroup\$

srbh.kmr
– srbh.kmr

2012年10月03日 17:15:02 +00:00
Commented Oct 3, 2012 at 17:15
\$\begingroup\$ @srbh.kmr Changing doubles to floats will reduce the amount of data needed for computation in half, improving the chance of hitting the cache more often. \$\endgroup\$

Sergey Kalinichenko
– Sergey Kalinichenko

2012年10月03日 17:26:50 +00:00
Commented Oct 3, 2012 at 17:26
\$\begingroup\$ Oh. Right. I was under the wrong impression that both double and float are 4Bytes. \$\endgroup\$

srbh.kmr
– srbh.kmr

2012年10月03日 17:33:34 +00:00
Commented Oct 3, 2012 at 17:33

| Show 1 more comment

Glenn Rogers Glenn Rogers 2,10114 silver badges18 bronze badges · Answer 2 · 2012-10-03 15:43:20Z

There are some things that I can think of:

Is it quicker to deal with one blob of size max*max, and replacing dp[i][j] with dp[i+j*N]?
Are you sure that your values are doubles, or could they be 32, 16, or even 8-bit integers? - less space to alloc/memset. Similarly, is your 2000 limit a specified one or one "large enough"?
I'm not convinced that you actually need memset at all (You'll need to guard the dp[j-1][i+1] term which reaches below the diagonal for the first cells)! Because you're always refering to values to the 'bottom left' of the current cell, you never access values generated by the previous run...
Also, because you "shouldn't" access values below the diagonal, all the dp[i] lines are actually (N - i) in length, so less to alloc/zero. You will though need to make other changes to the indices in the loops (and especially guard the dp[j-1][i+1] term).

And although it won't actually save any time, you can 'squash' a couple of loops together.

for(int i=0; i<N; ++i) {
 int V;
 cin >> V;
 memset( dp[i], 0, sizeof(dp[0][0])*N );
 dp[i][i] = V;
}

And don't forget that the final delete should be delete[] dp;!

Thanks for your review. Points noted but yes, result has to be a double. and T<=500 and N<=2000 is specified. and I need to do memset in order to zero out the dp[][] array for every new testcases. I know I'm allocating almost twice as many elements as I need to, but please take a look at my updated code. It still gives me 'Time Limit Exceeded' error!
Actually, yes I can see I don't need that memset at all. Thanks,

Stack Exchange Network

Efficient implementation of a Dynamic Programming challenge

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Efficient implementation of a Dynamic Programming challenge

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions