I implemented a basic dynamic array using C++.
When deleting an element and shifting the others, the last element will be twice repeated (the original still exists after copying). I want to avoid memory leaking by deallocating the last element of the array. Currently, I do this by shrinking once after a number of deletions. Are there better solutions?
Finally, I am interested in some feedback on this implementation.
#include <iostream>
#include <stdexcept>
using namespace std;
class DynamicArray {
protected:
int size;
int* arr;
int length;
void grow() {
copyToNewSize(size * 2);
}
void shrink() {
copyToNewSize(size / 2);
}
void copyToNewSize(int newSize) {
int* temp = new int[newSize];
if (newSize > size)
for (int i = 0; i < size; i++) {
temp[i] = arr[i];
}
else
for (int i = 0; i < newSize; i++) {
temp[i] = arr[i];
}
delete[] arr;
size = newSize;
arr = temp;
}
void checkIndex(int index) {
if (index >= length || index < 0)
throw "Index is out of the array range!";
}
public:
DynamicArray(int startingSize) {
size = startingSize;
arr = new int[size];
length = 0;
}
~DynamicArray() {
delete[] arr;
}
int push(int value) {
length++;
if (length == (size + 1))
grow();
arr[length - 1] = value;
return length;
}
int pop() {
length--;
int value = arr[length];
// memory leaking , delete arr[length]
if (length <= size / 2)
shrink();
return value;
}
int insert(int index, int value) {
checkIndex(index);
length++;
if (length == (size + 1))
grow();
for (int i = length - 1; i > index; i--) {
arr[i] = arr[i - 1];
}
arr[index] = value;
return length;
}
int remove(int index) {
checkIndex(index);
int value = arr[index];
length--;
for (int i = index; i < length; i++)
arr[i] = arr[i + 1];
// memory leaking , delete arr[length]
if (length <= (size / 2))
shrink();
return value;
}
int get(int index) {
checkIndex(index);
return arr[index];
}
void set(int index, int value) {
checkIndex(index);
arr[index] = value;
}
int search(int value) {
for (int i = 0; i < length; i++) {
if (arr[i] == value)
return i;
}
return -1;
}
int getLength() {
return length;
}
}
3 Answers 3
First of all, there is a major problem with your class: it does not respect the rule of three/five/zero:
it has a non trivial destructor that destroys memory allocated in constructor
it neither defines nor deletes the copy constructor and assignment operator Let us look at the following code:
DynamicArray dynarr[16]; // initialize values in dynarr if (...) { // opens a nested block DynamicArray dynarr2 = dynarr; // copy construct the new DynamicArray (1) // use it } // dynarr2 goes out of scope (2)
At (1), the compiler generated copy constructor will blindly copy the member, having dynarr
and dynarr2
share their inner int array.
At (2), the dtor for dynarr2
will delete the shared inner array: ZAP!... dynarr
now has a dangling pointer, Undefined Behaviour is ensured from there....
There is another possible error in size
management. You consistently divide it by 2 when removing elements and multiply it by 2 when adding. Fine. Except that if you let the size reach 0 by removing the last element of a DynamicArray
0*2 is still 0 so the size will be definitively stuck at 0! You must handle that corner case by preventing the size to reach 0 in shrink
.
What follow are only minor possible improvements.
Your code uses
using namespace std;
This is not an error because it is allowed by the language, and is even common in many tutorials. It is a nice trick that allows to use all the goodies from the standard library without the
std::
prefix. But, it does import a number of useless symbols into the current namespace, somehow defeating the whole namespace concepts. In production code, best practices recommend to never useusing namespace std
but only import the relevant symbols. Only a cosmetic improvement.in
checkIndex
, you usethrow "Index is out of the array range!";
This actually throws a pointer to a string litteral. Here again it is not an error since it is allowed by the language. But best practices recomment to only throw objects of (subclasses of) the
std::exception
class. Users of your class will probably not expect to have to catch a char pointer. You'd better use anout_of_range
or aruntime_error
exception if you think it is not worth declaring a custom exceptionYou compare
size
andnewsize
incopyToNewSize
to know how many elements should be copied. But in fact you have only to copylength
elements... This one is only a tiny optimization.
For your initial question, there is nothing bad in only shrinking after a number of removals, because a shrink involve reallocation and a copy of the elements, so it is an expensive operation. For that reason, you should even considere stop shrinking below a threshold (for example 8 elements). As a bonus, it will directly prevent size
to reach 0, and you could even make that minimal size an optional parameter of your constructor.
-
\$\begingroup\$ and you could even make that minimal size an optional parameter of your constructor. - Then you'd have to store it somewhere. That sounds unlikely to be good, although if you're implementing this for real (rather than a learning exercise) then making it different from
std::vector
is a good idea, otherwise you'd just usestd::vector
. I'd be more inclined to make a tuning option like that an optional template parameter, or possibly astatic
member variable. \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 03:15:29 +00:00Commented Sep 21, 2021 at 3:15 -
1\$\begingroup\$ @PeterCordes: I generally avoid integer templates, because they are a nightmare in a library: either you put the full definition in headers, or you have to instantiate a number of concrete classes and users will be restricted to them. I have never found a correct implementation of multidimensional arrays in C++ because of that (inner dimensions can only be constexpr). \$\endgroup\$Serge Ballesta– Serge Ballesta2021年09月21日 07:29:30 +00:00Commented Sep 21, 2021 at 7:29
Consider using
size_t
instead ofint
. The actual type of size_t is platform-dependent; a common mistake is to assume size_t is the same as unsigned int, which can lead to problems, particularly as 64-bit architectures become more prevalent.It is generally bad practice to repeat code that does the same thing. Consider changing
copyToNewSize
to something like this:
void copyToNewSize(size_t newSize) {
int* temp = new int[newSize];
size_t copySize = std::min(size, newSize);
for (size_t i = 0; i < copySize; i++) {
temp[i] = arr[i];
}
delete[] arr;
size = newSize;
arr = temp;
}
- I would suggest that you rename
size
toallocatedSize
to avoid confusion betweensize
andlength
.
-
\$\begingroup\$ 64-bit architectures are already widespread; what's still left to change is that real-world memory capacities continue to grow, so it's becoming more realistic (and maybe less rare in real-world usage) to actually have arrays that large. \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 03:17:52 +00:00Commented Sep 21, 2021 at 3:17
-
\$\begingroup\$ Also,
std::copy
exists, can be a good idea to use it instead of open-coding a copy loop. \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 03:19:38 +00:00Commented Sep 21, 2021 at 3:19 -
\$\begingroup\$
#pragma omp parallel for
on a copy loop? Yeah, could be for very large copies (like maybe more than 4GiB), on systems like a big Xeon where a single core can't saturate memory-controller bandwidth. (Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?). If you have so many large copies to do that it's worth considering threads just for that, though, probably algorithmic optimizations to avoid copying so much will be even better. Or see my comments on mdfst13's answer re: Linuxmremap
to avoid physical copying. \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 12:02:31 +00:00Commented Sep 21, 2021 at 12:02 -
\$\begingroup\$ (Or if other cores can be doing something useful while some are just copying, that might be good for overall throughput, unless multiple other threads are waiting for this copy to complete. On some CPUs there are diminishing returns for using even more threads to max out copy bandwidth, although on current Skylake-Xeon it's fairly linear for quite a few cores because the per-core memory bandwidth is so small (compared to a desktop and also as a fraction of the available aggregate memory-controller bandwidth)) \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 12:05:29 +00:00Commented Sep 21, 2021 at 12:05
-
\$\begingroup\$ You forgot to add any compiler flags on TIO, so that was
gcc -O0
. If you just meant to link the code but not try to run it on the server, godbolt.org/z/jW91snGdj lets you look at the asm. (usinggcc -O3 -fopenmp
. Without-fopenmp
,#pragma omp
does nothing.) \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 16:20:12 +00:00Commented Sep 21, 2021 at 16:20
Don't talk like a pirate
I would prefer the name array
(another reason not to using namespace std;
) to arr
. And I'd prefer a more descriptive name like data
or numbers
to either.
Don't waste work
int push(int value) { length++; if (length == (size + 1)) grow(); data[length - 1] = value; return length; }
You add and subtract three times. Consider
int push(int value) {
if (length == size) {
grow();
}
data[length] = value;
length++;
return length;
}
That adds once (the increment).
In general, I would avoid the statement form of control structures. That can lead to odd, hard to find bugs. Worse, you use it inconsistently, which makes the code harder to read.
Fragile code
if (length <= size / 2) shrink();
and
void shrink() { copyToNewSize(size / 2); }
Combined these are fragile. To work properly, you need size / 2
to be the same in both places. Consider
size_t bound = size / 2;
if (length <= bound) {
copyToNewSize(bound);
}
Now it has to be the same in both places. Another alternative
void shrinkIfNecessary() {
size_t bound = size / 2;
if (length <= bound) {
copyToNewSize(bound);
}
}
Then replace the if
and call to shrink
with shrinkIfNecessary
.
In general, you should try to avoid parallel logic. If you need the same logic in two places, try to abstract it out into at least a variable (e.g. bound
here) or a function (e.g. shrinkIfNecessary
).
It's natural to think that grow
should have a shrink
. But here, the shrinking operation is triggered differently. They aren't mirror images of each other. You have to grow any time you exceed the bounds. You don't need to shrink every time. So it actually makes sense to have grow
and shrinkIfNecessary
instead.
You also might consider moving the 2 to a constant, e.g. GROWTH_FACTOR
. But that will work fine even if inconsistent between growing and shrinking, so it's less important.
std::copy
if (newSize > size) for (int i = 0; i < size; i++) { temp[i] = arr[i]; } else for (int i = 0; i < newSize; i++) { temp[i] = arr[i]; }
In this, you copy manually. But you don't need to do that. C++ has std::copy
. So you could just say
std::copy(arr, arr + std::min(size, newSize), temp);
The std::min
was already posted here.
Note that std::copy
requires #include <algorithm>
.
This would get around the entire question of int
vs. size_t
vs. auto
as regards to the loop index variables. No more loops means no more loop indexes.
The C++ way
C++ also has std::vector
, which would get around the entire class.
The C way
If you prefer to reinvent the wheel, you might consider if you want to go all the way and return to the C memory management. Why C? Because in C, you have realloc
, which can make arrays smaller or larger. Usually when making an array smaller, it would use the same memory and free just what is at the end. C++ doesn't have that capability (except in that C++ has access to the C routines). Of course, if you do that, you want to stop using new
and delete
in favor of calloc
/malloc
and free
.
The realloc
function also handles the copying for you if necessary. But it won't always be necessary, because realloc
will usually reuse the existing array when making it smaller. And it will sometimes use the existing array when making it bigger as well.
Duplicates
When deleting an element and shifting the others, the last element will be twice repeated (the original still exists after copying).
In one sense, this is harmless. If you track the size correctly and initialize the memory before using, this might be true, but you'd never know.
If you are concerned about leaving traces in your program, note that realloc
can make an array smaller efficiently enough that you could call it on every pop
or remove
. However, you'd still likely prefer to grow less frequently. Because at least some of the time, it would cause the array to need to be copied.
You might also consider manually clearing the entry. E.g. in remove
:
data[length] = 0;
That's the kind of thing you'd do if the data were sensitive for some reason.
-
\$\begingroup\$
std::vector
unfortunately can't userealloc
because it has to work on non-trivially-copyable types (i.e. whose object-representation may need to change when copied to a new address, or just whose copy-constructor has a side-effect such as debug logging.). But even more unfortunately, not even in a template specialization for trivially-copyable types is easy: C++ has replaceablenew
, and there's a guarantee that ifstd::vector
allocates, it's via calls tonew
, so user-definednew
will run. This replacement might be in a different compilation unit so you can't template on it. \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 10:16:57 +00:00Commented Sep 21, 2021 at 10:16 -
\$\begingroup\$ IDK why C++ has refused to add a
try_realloc
allocator interface that could avoid copying, but it makesstd::vector
pretty garbage for huge arrays, especially compared to what it could be doing on Linux where themremap(2)
system call does this at page granularity, trying to extend an existing mapping. And withMREMAP_MAYMOVE
can always avoid copying by remapping the existing physical pages to the start of a larger extend of virtual address space if there were wasn't room to grow at the original spot. (Still TLB misses though.) \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 10:22:05 +00:00Commented Sep 21, 2021 at 10:22 -
1\$\begingroup\$ So basically C++ requires
std::vector
to suck at clever reallocation optimizations, although if an implementation could detect thatnew
wasn't replaced (or promise via a command line arg), the as-if rule would let std::vector use smarter realloc. Howard Hinnant said on SO) that his attempts to improve things for C++11 didn't get traction. IMO adding atry_realloc
interface to the language would have been easy; implementations could always define it asnew
+ copy +delete
orreturn false
. \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 10:29:19 +00:00Commented Sep 21, 2021 at 10:29 -
1\$\begingroup\$ So yeah, writing your own container that used
realloc
or evenmmap
/mremap
(intended for large allocations where bypassing a free-list makes sense) could be a case where it's actually interesting for real use, not just as a learning exercise. \$\endgroup\$Peter Cordes– Peter Cordes2021年09月21日 10:31:41 +00:00Commented Sep 21, 2021 at 10:31 -
\$\begingroup\$ @PeterCordes
std::vector
is a template (and so isstd::allocator
), templates can have specialisations.std::vector<TriviablyCopyable>
can userealloc
\$\endgroup\$Caleth– Caleth2021年09月21日 12:39:37 +00:00Commented Sep 21, 2021 at 12:39
std::vector<int>
or one of the other standard containers? \$\endgroup\$