efficiency of c++ arrays vs std::vector and std::array

Question 1

I compared different type of allocating a 1d-array or 2d-array as follows. I found that using new operator is more efficient, maybe std::arrary and std::vector is a object, they are generic and safe but more time? Morever, I do not know why call new outside function is more efficent than that inside function?

#include <iostream>
#include <vector>
#include <array>
#include <ctime>
void test1 () {
int *arr = new int[10000];
for (int i=0; i<10000; ++i) {
 arr[i] = 3;
}
for (int i=0; i<10000; ++i) {
 int a = arr[i];
}
delete arr;
}
void test11 () {
int **arr = new int*[100];
for (int i=0; i<100; ++i) {
 arr[i] = new int[100];
}
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
}
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
}
delete [] arr;
}
void test2() {
std::vector<int> arr(10000);
for (int i=0; i<10000; ++i) {
 arr[i] = 3;
}
for (int i=0; i<10000; ++i) {
 int a = arr[i];
}
}
void test22() {
std::vector<std::vector<int> > arr(100, std::vector<int>(100));
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
}
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
}
}
void test3(int *arr, int n) {
for (int i=0; i<n; ++i) {
 arr[i] = 3;
}
for (int i=0; i<n; ++i) {
 int a = arr[i];
}
}
void test33(int **arr, int m, int n) {
for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 arr[i][j] = 3;
 }
}
for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 int a = arr[i][j];
 }
}
}
void test4() {
std::array<int, 10000> arr;
for (int i=0; i<10000; ++i) {
 arr[i] = 3;
}
for (int i=0; i<10000; ++i) {
 int a = arr[i];
}
}
void test44() {
std::array<std::array<int, 100>, 100> arr;
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
}
for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j)
 int a = arr[i][j];
}
}
int main() {
clock_t start, end;
start = clock();
for (int i=0; i<1000; ++i) {
 test1();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test11();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test2();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test22();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 int *arr = new int[10000];
 test3(arr, 10000);
 delete arr;
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
int **arr = new int*[100];
for (int i=0; i<100; ++i) {
 arr[i] = new int[100];
}
for (int i=0; i<1000; ++i) {
 test33(arr, 100, 100);
}
delete [] arr;
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test4();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
start = clock();
for (int i=0; i<1000; ++i) {
 test44();
}
end = clock();
std::cout << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
}

the output is:

90 ms
80 ms
70 ms
120 ms
50 ms
40 ms
100 ms
190 ms

Thanks for your help, maybe I did not describe my question correctly, I write a function that will be calling many times, this function new a array then delete it:

void fun() {
 int *arr = new int[10000]; //maybe very big
 //todo something else
 delete arr; 
}

someone tell me it's not efficient, because it new and delete every time, now I have two question:

1. what is the correct way to memory management?

int *arr = new int[]; delete arr;
int **arr = new int*[]; delete [] arr;

wrong? maybe like this:

for (int i=0; i<n; ++i){
 delete [] arr;
}
delete arr;

2. what is the best way for me to write this function

Question 2

Before comparing anything, first fix your code: everything you allocate with new[] has to be released with delete[].

Question 3

Also did you compile in release mode with full optimizations?

Question 4

"what is the correct way to memory management?" - using std::vector.

Question 5

@Galik in the comments below he says that these are debug-build benchmarks.

Question 6

I don't think your tests are right. Perhaps you are running in debug mode? I don't see any way that test11() could be faster than test1() (even though it is not freeing all of its memory).

Additionally, in many of the cases above, a release mode compiler would optimize away your code because it doesn't actually do anything:

for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
}

Any compiler will almost certainly eliminate that because 'a' isn't used, which means that 'i' and 'j' in turn aren't used.

for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
}

Some compilers could even be good enough to eliminate that code since the memory is never read again, but most probably wouldn't.

I would advise not to worry about any performance overhead with vector vs. new int[]. In debug mode, you'll get free debugging aids (bounds checking) while in release mode, as long as you don't call functions that throw for out-of-range, the code will be essentially identical in performance for practical purposes. Plus, you don't have to worry about memory management (both test1() and test11() are not quite right).

Question 7

Test 1 vs test 11? test 1 should beat test 11. Test 11 has crappy locality and will be cache missing more often, debug build or no. Dynamic 2D arrays suck. Commentary on the naivety looks pretty good. Optimizer will discard most of that code.

Question 8

Check again, test11() is 2D. test1() is a straight single array.

Question 9

Yeah got brain mixed up. Edited, but it looks like you got there too fast.

Question 10

Sorry, my bad. I should've given you time to take it back.

Question 11

Nope. My fault. Should have proofed before posting.

Question 12

Let's do some work to improve the test, and give the standard library a chance by using it properly...

#include <iostream>
#include <vector>
#include <array>
#include <ctime>
#include <memory>
#include <algorithm>
#include <iterator>
void test1 () {
 auto arr = std::make_unique<int[]>(10000);
 std::fill(arr.get(), arr.get() + 10000, 3);
 for (int i=0; i<10000; ++i) {
 int a = arr[i];
 }
}
void test11 () {
 auto arr = std::make_unique<std::unique_ptr<int[]>[]>(100);
 for (auto i = 0 ; i < 100 ; ++i) {
 arr[i] = std::make_unique<int[]>(100);
 }
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
 }
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
 }
}
void test2() {
 std::vector<int> arr(10000);
 std::fill(std::begin(arr), std::end(arr), 3);
 for (int i=0; i<10000; ++i) {
 int a = arr[i];
 }
}
void test22() {
 std::vector<std::vector<int> > arr(100, std::vector<int>(100));
 std::for_each(begin(arr),
 end(arr),
 [](auto& inner) {
 std::fill(std::begin(inner), std::end(inner), 3);
 });
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
 }
}
void test3(int *arr, int n) {
 std::fill(arr, arr + n, 3);
 for (int i=0; i<n; ++i) {
 int a = arr[i];
 }
}
void test33(const std::unique_ptr<std::unique_ptr<int[]>[]>& arr, int m, int n) {
 for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 arr[i][j] = 3;
 }
 }
 for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 int a = arr[i][j];
 }
 }
}
void test4() {
 std::array<int, 10000> arr;
 std::fill(std::begin(arr), std::end(arr), 3);
 for (int i=0; i<10000; ++i) {
 int a = arr[i];
 }
}
void test44() {
 std::array<std::array<int, 100>, 100> arr;
 std::for_each(begin(arr),
 end(arr),
 [](auto& inner) {
 std::fill(std::begin(inner), std::end(inner), 3);
 });
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j)
 int a = arr[i][j];
 }
}
int main() {
 clock_t start, end;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test1();
 }
 end = clock();
 std::cout << "test 1 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test11();
 }
 end = clock();
 std::cout << "test 11 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test2();
 }
 end = clock();
 std::cout << "test 2 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test22();
 }
 end = clock();
 std::cout << "test 22 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 int *arr = new int[10000];
 test3(arr, 10000);
 delete [] arr;
 }
 end = clock();
 std::cout << "test 3 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 auto arr = std::make_unique<std::unique_ptr<int[]>[]>(100);
 for (auto i = 0 ; i < 100 ; ++i) {
 arr[i] = std::make_unique<int[]>(100);
 }
 for (int i=0; i<1000; ++i) {
 test33(arr, 100, 100);
 }
 arr.reset();
 end = clock();
 std::cout << "test 33 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test4();
 }
 end = clock();
 std::cout << "test 4 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test44();
 }
 end = clock();
 std::cout << "test 44 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
}

results on my computer when compiled with -O2:

test 1 0.002 ms
test 11 13.506 ms
test 2 2.753 ms
test 22 13.738 ms
test 3 1.42 ms
test 33 1.552 ms
test 4 0 ms
test 44 0 ms

Let's also note that the arrays are "small" and are being allocated and deallocated repeatedly. If you can re-use the buffers, then the difference in timing will vanish completely.

Also note: test33 is fast because it never reallocates memory - you're re-using the buffer.

Question 13

Isn't int a = arr[i][j]; a prime candidate for elision in an optimized build? Would that explain tests 1, 4 and 44?

Question 14

You can't profile code like this. All of your code is being eliminated. The results, like the poster's, are bogus.

Question 15

@kfsone the optimiser will see through that

Question 16

@RobL understood. What this is really testing is memory allocation.

Question 17

When operating on plain C arrays, compiler is able recognize opportunities to unwind the loops, saving a small amount of iteration overhead. Possibly (unlikely) compiler does not optimize out access loops for STL containers because it does not know whether or not they modify any member variables.

Also my best guess as to why the test11 is outperforming test1 is that it's interweaving multiple iterations of the outer loop (taking advantage of the fact that modern x86/x64 processors are superscalar), IE:

for(int j = 0; j < 100; ++j)
{
 for(int i = 0; i < 100; i+=4)
 {
 arr[i][j] = 3;
 arr[i+1][j] = 3;
 arr[i+2][j] = 3;
 arr[i+4][j] = 3;
 }
}

or possibly something else entirely. compilers can do some very complicated loop transformations to get the every little bit of performance.

Question 18

Truth, but it can't easily fix the problems caused by the scattered memory allocation of the 2D array. The 1D array should demolish the 2D array unless some really sneaky pattern recognition is taking place and giving the 2D array a locality makeover.

Rob L Rob L 2,54016 silver badges23 bronze badges · Answer 1 · 2016-03-12 06:35:57Z

I don't think your tests are right. Perhaps you are running in debug mode? I don't see any way that test11() could be faster than test1() (even though it is not freeing all of its memory).

Additionally, in many of the cases above, a release mode compiler would optimize away your code because it doesn't actually do anything:

for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
}

Any compiler will almost certainly eliminate that because 'a' isn't used, which means that 'i' and 'j' in turn aren't used.

for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
}

Some compilers could even be good enough to eliminate that code since the memory is never read again, but most probably wouldn't.

I would advise not to worry about any performance overhead with vector vs. new int[]. In debug mode, you'll get free debugging aids (bounds checking) while in release mode, as long as you don't call functions that throw for out-of-range, the code will be essentially identical in performance for practical purposes. Plus, you don't have to worry about memory management (both test1() and test11() are not quite right).

Test 1 vs test 11? test 1 should beat test 11. Test 11 has crappy locality and will be cache missing more often, debug build or no. Dynamic 2D arrays suck. Commentary on the naivety looks pretty good. Optimizer will discard most of that code.
Check again, test11() is 2D. test1() is a straight single array.
Yeah got brain mixed up. Edited, but it looks like you got there too fast.

score 1 · Answer 2 · 2016-03-12 17:53:59Z

Let's do some work to improve the test, and give the standard library a chance by using it properly...

#include <iostream>
#include <vector>
#include <array>
#include <ctime>
#include <memory>
#include <algorithm>
#include <iterator>
void test1 () {
 auto arr = std::make_unique<int[]>(10000);
 std::fill(arr.get(), arr.get() + 10000, 3);
 for (int i=0; i<10000; ++i) {
 int a = arr[i];
 }
}
void test11 () {
 auto arr = std::make_unique<std::unique_ptr<int[]>[]>(100);
 for (auto i = 0 ; i < 100 ; ++i) {
 arr[i] = std::make_unique<int[]>(100);
 }
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 arr[i][j] = 3;
 }
 }
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
 }
}
void test2() {
 std::vector<int> arr(10000);
 std::fill(std::begin(arr), std::end(arr), 3);
 for (int i=0; i<10000; ++i) {
 int a = arr[i];
 }
}
void test22() {
 std::vector<std::vector<int> > arr(100, std::vector<int>(100));
 std::for_each(begin(arr),
 end(arr),
 [](auto& inner) {
 std::fill(std::begin(inner), std::end(inner), 3);
 });
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j) {
 int a = arr[i][j];
 }
 }
}
void test3(int *arr, int n) {
 std::fill(arr, arr + n, 3);
 for (int i=0; i<n; ++i) {
 int a = arr[i];
 }
}
void test33(const std::unique_ptr<std::unique_ptr<int[]>[]>& arr, int m, int n) {
 for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 arr[i][j] = 3;
 }
 }
 for (int i=0; i<m; ++i) {
 for (int j=0; j<n; ++j) {
 int a = arr[i][j];
 }
 }
}
void test4() {
 std::array<int, 10000> arr;
 std::fill(std::begin(arr), std::end(arr), 3);
 for (int i=0; i<10000; ++i) {
 int a = arr[i];
 }
}
void test44() {
 std::array<std::array<int, 100>, 100> arr;
 std::for_each(begin(arr),
 end(arr),
 [](auto& inner) {
 std::fill(std::begin(inner), std::end(inner), 3);
 });
 for (int i=0; i<100; ++i) {
 for (int j=0; j<100; ++j)
 int a = arr[i][j];
 }
}
int main() {
 clock_t start, end;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test1();
 }
 end = clock();
 std::cout << "test 1 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test11();
 }
 end = clock();
 std::cout << "test 11 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test2();
 }
 end = clock();
 std::cout << "test 2 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test22();
 }
 end = clock();
 std::cout << "test 22 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 int *arr = new int[10000];
 test3(arr, 10000);
 delete [] arr;
 }
 end = clock();
 std::cout << "test 3 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 auto arr = std::make_unique<std::unique_ptr<int[]>[]>(100);
 for (auto i = 0 ; i < 100 ; ++i) {
 arr[i] = std::make_unique<int[]>(100);
 }
 for (int i=0; i<1000; ++i) {
 test33(arr, 100, 100);
 }
 arr.reset();
 end = clock();
 std::cout << "test 33 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test4();
 }
 end = clock();
 std::cout << "test 4 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
 start = clock();
 for (int i=0; i<1000; ++i) {
 test44();
 }
 end = clock();
 std::cout << "test 44 " << (double)(end - start) * 1000.0 / CLOCKS_PER_SEC << " ms" << std::endl;
}

results on my computer when compiled with -O2:

test 1 0.002 ms
test 11 13.506 ms
test 2 2.753 ms
test 22 13.738 ms
test 3 1.42 ms
test 33 1.552 ms
test 4 0 ms
test 44 0 ms

Let's also note that the arrays are "small" and are being allocated and deallocated repeatedly. If you can re-use the buffers, then the difference in timing will vanish completely.

Also note: test33 is fast because it never reallocates memory - you're re-using the buffer.

Isn't int a = arr[i][j]; a prime candidate for elision in an optimized build? Would that explain tests 1, 4 and 44?
You can't profile code like this. All of your code is being eliminated. The results, like the poster's, are bogus.
@RobL understood. What this is really testing is memory allocation.

nfries88 nfries88 5207 silver badges8 bronze badges · Answer 3 · 2016-03-12 07:13:20Z

When operating on plain C arrays, compiler is able recognize opportunities to unwind the loops, saving a small amount of iteration overhead. Possibly (unlikely) compiler does not optimize out access loops for STL containers because it does not know whether or not they modify any member variables.

Also my best guess as to why the test11 is outperforming test1 is that it's interweaving multiple iterations of the outer loop (taking advantage of the fact that modern x86/x64 processors are superscalar), IE:

for(int j = 0; j < 100; ++j)
{
 for(int i = 0; i < 100; i+=4)
 {
 arr[i][j] = 3;
 arr[i+1][j] = 3;
 arr[i+2][j] = 3;
 arr[i+4][j] = 3;
 }
}

or possibly something else entirely. compilers can do some very complicated loop transformations to get the every little bit of performance.

Truth, but it can't easily fix the problems caused by the scattered memory allocation of the 2D array. The 1D array should demolish the 2D array unless some really sneaky pattern recognition is taking place and giving the 2D array a locality makeover.

CollectivesTM on Stack Overflow

efficiency of c++ arrays vs std::vector and std::array

3 Answers 3

8 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

8 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related