0

I compared different type of allocating a 1d-array or 2d-array as follows. I found that using new operator is more efficient, maybe std::arrary and std::vector is a object, they are generic and safe but more time? Morever, I do not know why call new outside function is more efficent than that inside function?

#include <iostream>
#include <vector>
#include <array>
#include <ctime>
void test1 () {
int *arr = new int[10000];
for (int i=0; i<10000; ++i) {
 arr[i] = 3;
}
for (int i=0; i<10000; ++i) {
 int a = arr[i];
}
delete arr;
}
void test11 () {
int **arr = new int*[100];
for (int i=0; i<100; ++i) {
 arr[i] = new int[100];
}
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
}
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
}
delete [] arr;
}
void test2() {
std::vector<int> arr(10000);
for (int i=0; i<10000; ++i) {
 arr[i] = 3;
}
for (int i=0; i<10000; ++i) {
 int a = arr[i];
}
}
void test22() {
std::vector<std::vector<int> > arr(100, std::vector<int>(100));
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
}
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
}
}
void test3(int *arr, int n) {
for (int i=0; i<n; ++i) {
 arr[i] = 3;
}
for (int i=0; i<n; ++i) {
 int a = arr[i];
}
}
void test33(int **arr, int m, int n) {
for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 arr[i][j] = 3;
 }
}
for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 int a = arr[i][j];
 }
}
}
void test4() {
std::array<int, 10000> arr;
for (int i=0; i<10000; ++i) {
 arr[i] = 3;
}
for (int i=0; i<10000; ++i) {
 int a = arr[i];
}
}
void test44() {
std::array<std::array<int, 100>, 100> arr;
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
}
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j)
 int a = arr[i][j];
}
}
int main() {
clock_t start, end;
start = clock();
for (int i=0; i<1000; ++i) {
 test1();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test11();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test2();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test22();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 int *arr = new int[10000];
 test3(arr, 10000);
 delete arr;
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
int **arr = new int*[100];
for (int i=0; i<100; ++i) {
 arr[i] = new int[100];
}
for (int i=0; i<1000; ++i) {
 test33(arr, 100, 100);
}
delete [] arr;
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test4();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test44();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
}

the output is:

90 ms
80 ms
70 ms
120 ms
50 ms
40 ms
100 ms
190 ms

Thanks for your help, maybe I did not describe my question correctly, I write a function that will be calling many times, this function new a array then delete it:

void fun() {
 int *arr = new int[10000]; //maybe very big
 //todo something else
 delete arr; 
}

someone tell me it's not efficient, because it new and delete every time, now I have two question:

1. what is the correct way to memory management?

int *arr = new int[]; delete arr;
int **arr = new int*[]; delete [] arr;

wrong? maybe like this:

for (int i=0; i<n; ++i){
 delete [] arr;
}
delete arr;

2. what is the best way for me to write this function

kfsone
24.4k3 gold badges46 silver badges78 bronze badges
asked Mar 12, 2016 at 6:21
4
  • 1
    Before comparing anything, first fix your code: everything you allocate with new[] has to be released with delete[]. Commented Mar 12, 2016 at 7:31
  • 1
    Also did you compile in release mode with full optimizations? Commented Mar 12, 2016 at 11:45
  • "what is the correct way to memory management?" - using std::vector. Commented Mar 12, 2016 at 12:23
  • @Galik in the comments below he says that these are debug-build benchmarks. Commented Mar 12, 2016 at 19:48

3 Answers 3

1

I don't think your tests are right. Perhaps you are running in debug mode? I don't see any way that test11() could be faster than test1() (even though it is not freeing all of its memory).

Additionally, in many of the cases above, a release mode compiler would optimize away your code because it doesn't actually do anything:

for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
}

Any compiler will almost certainly eliminate that because 'a' isn't used, which means that 'i' and 'j' in turn aren't used.

for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
}

Some compilers could even be good enough to eliminate that code since the memory is never read again, but most probably wouldn't.

I would advise not to worry about any performance overhead with vector vs. new int[]. In debug mode, you'll get free debugging aids (bounds checking) while in release mode, as long as you don't call functions that throw for out-of-range, the code will be essentially identical in performance for practical purposes. Plus, you don't have to worry about memory management (both test1() and test11() are not quite right).

answered Mar 12, 2016 at 6:35

8 Comments

Test 1 vs test 11? test 1 should beat test 11. Test 11 has crappy locality and will be cache missing more often, debug build or no. Dynamic 2D arrays suck. Commentary on the naivety looks pretty good. Optimizer will discard most of that code.
Check again, test11() is 2D. test1() is a straight single array.
Yeah got brain mixed up. Edited, but it looks like you got there too fast.
Sorry, my bad. I should've given you time to take it back.
Nope. My fault. Should have proofed before posting.
|
1

Let's do some work to improve the test, and give the standard library a chance by using it properly...

#include <iostream>
#include <vector>
#include <array>
#include <ctime>
#include <memory>
#include <algorithm>
#include <iterator>
void test1 () {
 auto arr = std::make_unique<int[]>(10000);
 std::fill(arr.get(), arr.get() + 10000, 3);
 for (int i=0; i<10000; ++i) {
 int a = arr[i];
 }
}
void test11 () {
 auto arr = std::make_unique<std::unique_ptr<int[]>[]>(100);
 for (auto i = 0 ; i < 100 ; ++i) {
 arr[i] = std::make_unique<int[]>(100);
 }
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
 }
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
 }
}
void test2() {
 std::vector<int> arr(10000);
 std::fill(std::begin(arr), std::end(arr), 3);
 for (int i=0; i<10000; ++i) {
 int a = arr[i];
 }
}
void test22() {
 std::vector<std::vector<int> > arr(100, std::vector<int>(100));
 std::for_each(begin(arr),
 end(arr),
 [](auto& inner) {
 std::fill(std::begin(inner), std::end(inner), 3);
 });
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
 }
}
void test3(int *arr, int n) {
 std::fill(arr, arr + n, 3);
 for (int i=0; i<n; ++i) {
 int a = arr[i];
 }
}
void test33(const std::unique_ptr<std::unique_ptr<int[]>[]>& arr, int m, int n) {
 for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 arr[i][j] = 3;
 }
 }
 for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 int a = arr[i][j];
 }
 }
}
void test4() {
 std::array<int, 10000> arr;
 std::fill(std::begin(arr), std::end(arr), 3);
 for (int i=0; i<10000; ++i) {
 int a = arr[i];
 }
}
void test44() {
 std::array<std::array<int, 100>, 100> arr;
 std::for_each(begin(arr),
 end(arr),
 [](auto& inner) {
 std::fill(std::begin(inner), std::end(inner), 3);
 });
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j)
 int a = arr[i][j];
 }
}
int main() {
 clock_t start, end;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test1();
 }
 end = clock();
 std::cout << "test 1 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test11();
 }
 end = clock();
 std::cout << "test 11 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test2();
 }
 end = clock();
 std::cout << "test 2 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test22();
 }
 end = clock();
 std::cout << "test 22 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 int *arr = new int[10000];
 test3(arr, 10000);
 delete [] arr;
 }
 end = clock();
 std::cout << "test 3 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 auto arr = std::make_unique<std::unique_ptr<int[]>[]>(100);
 for (auto i = 0 ; i < 100 ; ++i) {
 arr[i] = std::make_unique<int[]>(100);
 }
 for (int i=0; i<1000; ++i) {
 test33(arr, 100, 100);
 }
 arr.reset();
 end = clock();
 std::cout << "test 33 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test4();
 }
 end = clock();
 std::cout << "test 4 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test44();
 }
 end = clock();
 std::cout << "test 44 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
}

results on my computer when compiled with -O2:

test 1 0.002 ms
test 11 13.506 ms
test 2 2.753 ms
test 22 13.738 ms
test 3 1.42 ms
test 33 1.552 ms
test 4 0 ms
test 44 0 ms

Let's also note that the arrays are "small" and are being allocated and deallocated repeatedly. If you can re-use the buffers, then the difference in timing will vanish completely.

Also note: test33 is fast because it never reallocates memory - you're re-using the buffer.

answered Mar 12, 2016 at 17:53

4 Comments

Isn't int a = arr[i][j]; a prime candidate for elision in an optimized build? Would that explain tests 1, 4 and 44?
You can't profile code like this. All of your code is being eliminated. The results, like the poster's, are bogus.
@kfsone the optimiser will see through that
@RobL understood. What this is really testing is memory allocation.
0

When operating on plain C arrays, compiler is able recognize opportunities to unwind the loops, saving a small amount of iteration overhead. Possibly (unlikely) compiler does not optimize out access loops for STL containers because it does not know whether or not they modify any member variables.

Also my best guess as to why the test11 is outperforming test1 is that it's interweaving multiple iterations of the outer loop (taking advantage of the fact that modern x86/x64 processors are superscalar), IE:

for(int j = 0; j < 100; ++j)
{
 for(int i = 0; i < 100; i+=4)
 {
 arr[i][j] = 3;
 arr[i+1][j] = 3;
 arr[i+2][j] = 3;
 arr[i+4][j] = 3;
 }
}

or possibly something else entirely. compilers can do some very complicated loop transformations to get the every little bit of performance.

answered Mar 12, 2016 at 7:13

1 Comment

Truth, but it can't easily fix the problems caused by the scattered memory allocation of the 2D array. The 1D array should demolish the 2D array unless some really sneaky pattern recognition is taking place and giving the 2D array a locality makeover.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.