As a C++ beginner coming from Java, I have become increasingly confused on the topic of memory management and how to avoid memory leaks. Is the code below risking a memory leak that I'm not currently aware of? Any help or constructive feedback would be greatly appreciated.
#pragma once
template <class T>
class DynamicArray {
private:
T *m_arr;
int m_length; //amount of elements currently being stored in the array
int m_capacity; //actual size of the array
public:
DynamicArray();
~DynamicArray();
T get(int index); //O(1)
void add(T obj); //no need to push any objects forward, O(1)
void insert(int index, T obj); //pushes forward all objects in front of the given index, then sets the obj at the given index, O(n)
void set(int index, T obj); //sets the given index of m_arr as obj, O(1)
void remove(int index); //removes the object at the given index and pushes all the array contents back, O(n)
int size(); //O(1)
void print();
};
#include <iostream>
#include "Header.h"
template<class T>
DynamicArray<T>::DynamicArray() : m_arr(new T[1]), m_length(0), m_capacity(1) {}
template<class T>
DynamicArray<T>::~DynamicArray() {
delete[] m_arr;
}
template<class T>
T DynamicArray<T>::get(int index) {
if (index < m_length && index >= 0)
return m_arr[index];
else throw ("Index out of bounds!");
}
template<class T>
void DynamicArray<T>::set(int index, T obj) {
if (index < m_length && index >= 0) {
m_arr[index] = obj;
} else throw ("Index out of bounds!");
}
template<class T>
void DynamicArray<T>::add(T obj) {
if (m_length == m_capacity) {
T *new_arr = new T[m_length * 2];
for (int i = 0; i < m_length; i++) {
new_arr[i] = m_arr[i];
}
delete[] m_arr;
m_arr = new_arr;
m_capacity = m_capacity * 2;
}
m_arr[m_length] = obj;
m_length++;
}
template<class T>
void DynamicArray<T>::insert(int index, T obj) {
if (index < m_length && index >= 0) {
int size;
if (m_length == m_capacity) size = m_length * 2;
else size = m_capacity;
T *new_arr = new T[size];
for (int i = 0, j = 0; i < m_length; i++, j++) {
if (i == index) {
new_arr[j] = obj;
j++;
}
new_arr[j] = m_arr[i];
}
delete[] m_arr;
m_arr = new_arr;
m_capacity = m_capacity * 2;
m_length++;
} else throw ("Index out of bounds!");
}
template<class T>
void DynamicArray<T>::remove(int index) {
if (index < m_length && index >= 0) {
T *new_arr = new T[m_capacity];
for (int i = 0, j = 0; i < m_length; i++, j++) {
if (i == index) i++;
if(i < m_length) new_arr[j] = m_arr[i];
}
delete[] m_arr;
m_arr = new_arr;
m_capacity = m_capacity * 2;
m_length--;
} else throw ("Index out of bounds!");
}
template<class T>
int DynamicArray<T>::size() {
return m_length;
}
template<class T>
void DynamicArray<T>::print() {
std::cout << m_arr[0];
for (int i = 1; i < m_length; i++) {
std::cout << ", " << m_arr[i];
}
}
4 Answers 4
Welcome to C++, and welcome to Code Review.
C++ memory management is, as you probably have realized, tough and error-prone. There are many things that can easily go wrong. Assuming that no exception is thrown, I don't see obvious memory leaks in your code; however, there are still some issues worth discussing.
You can take a look at my implementation of a non-resizable dynamic array or a stack-based full-fledged vector for some inspiration.
Special member functions
You have not defined copy constructors or move constructors, so the compiler will synthesize corresponding constructors that simply copy all the members — which is completely wrong, as now the two dynamic arrays will point to the same memory. Not only are the elements shared between the copies, causing modifications to one array to affect the other, but the two copies will attempt to free the same memory upon destruction, leading to a double-free error, which is way more serious than a memory leak.
Initialization semantics
It is generally expected that the constructor of the element type is called \$n\$ times if \$n\$ elements are pushed into the dynamic array. In your code, however, this is not the case, whre the amount of constructors called is determined by the capacity of the dynamic array. Elements are first default initialized, and then copy-assigned to.
The correct way to solve this problem requires allocating an uninitialized buffer, and using placement new (or equivalent features) to construct the elements, which is another can of worms.
Exception safety
Think of what happens when the construction of an element throws an
exception — your code will halt halfway, and there will be a
memory leak. Resolving this problem would require a manual try
block, or standard library facilities like
std::uninitialized_copy
(which essentially do the same under
the hood) if you switched to uninitialized buffers and manual
construction.
Move semantics
All of the elements are copied every time, which is wasteful. Make good use of move semantics when appropriate.
Miscellaneous
Used std::size_t
instead of int
to store sizes and indexes.1
get
, size
, and print
should be const
. Moreover, get
should
return a const T&
. In fact, get
and set
would idiomatically be
replaced by operator[]
.
Don't throw a const char*
. Use a dedicated exception class like
std::out_of_range
instead.
Manual loops like
for (int i = 0; i < m_length; i++) {
new_arr[i] = m_arr[i];
}
are better replaced with calls to std::copy
(or
std::move
).
Re-allocating every time insert
is called doesn't seem like a good
idea. A better trade-off might be to append an element and then
std::rotate
it to the correct position (assuming rotation
doesn't throw).
Also, print
might take an std::ostream&
(or perhaps
std::basic_ostream<Char, Traits>&
) argument for extra flexibility.
1 As Andreas H. pointed out in the comments, this recommendation is subject to debate, since the use of unsigned arithmetic has its pitfalls. An alternative is to use std::ptrdiff_t
and std::ssize
(C++20) instead. You can write your own version of ssize
as shown on the cppreference page if C++20 is not accessible.
-
1\$\begingroup\$ I agree with everything, but the unsigned
size_t
for index and size needs a bit more attention. Any operation where index arithmetic is to be performed is going to fail very easily. Think of a convolution loop:for (size_t n = 0; n < a.size()+b.size(); n++) { out[n] = 0; for (size_t m = std::max(0U, a.size()-1-n); m < std::min(b.size(), n-1); m++) out[n] += a[n - m]*b[m]; }
Themin
andmax
expressions are there to prevent out-of-bounds access for arraysa
andb
. But they dont. The code will in allmost all cases produce a segfault. There is no compiler warning, \$\endgroup\$Andreas H.– Andreas H.2021年02月03日 08:17:31 +00:00Commented Feb 3, 2021 at 8:17 -
\$\begingroup\$ TL/DR: I know the standard library uses unsigned for index and size_t, but given the arithmetic rules in C & C++ it is, except for the simple cases, error-prone and actually wrong to use an unsigned
index
when working with containers. So it is perhaps better to use a signed type for index and alsosize()
in the first place (the latter because otherwise you always get a unsigned/signed comparison warning) \$\endgroup\$Andreas H.– Andreas H.2021年02月03日 08:20:39 +00:00Commented Feb 3, 2021 at 8:20 -
\$\begingroup\$ @AndreasH. I understand your sentiment about the use of unsigned numbers - integral arithmetic in C++ has never been satisfactory. I personally choose to follow the precedent set by the standard library and use unsigned arithmetic to keep consistency and reduce complications due to signedness conversion, as I suppose many would do, so I recommended
size_t
, but the wrapping semantics does require special treatment, and signed arithmetic is indeed necessary in many scenarios involving negative numbers. The alternative, of course, is to consistently useptrdiff_t
andssize
instead. \$\endgroup\$L. F.– L. F.2021年02月03日 09:02:23 +00:00Commented Feb 3, 2021 at 9:02 -
1\$\begingroup\$ @AndreasH. I edited the answer to reflect our discussion. \$\endgroup\$L. F.– L. F.2021年02月03日 09:08:36 +00:00Commented Feb 3, 2021 at 9:08
-
2\$\begingroup\$ I just found: github.com/ericniebler/stl2/issues/182#issuecomment-287683189 There is a link to a video of a panel discussion where a few "C++ gurus" argue that using unsigned size types was a misktake in STL in the first place. I am not saying your answer is wrong, I just think the unexplained recommendation to follow the precedent can be dangerous. Otherwise good answer. \$\endgroup\$Andreas H.– Andreas H.2021年02月03日 09:10:25 +00:00Commented Feb 3, 2021 at 9:10
delete
looks strange. There is no reason to createnew_arr
. Everything can (and should) be done in place. BTW,m_capacity = m_capacity * 2;
there is just wrong (I suspect a copy-paste frominsert
).if
in a tight loop incurs a strong performance penalty. Consider breaking the insertion loop into two:for (int i = 0,; i < index; i++) { new_arr[i] = arr[i]; } new_arr[index] = obj; for (int i = index; i < m_length; i++) { new_arr[index + 1] = arr[index]; }
Which brings us to the next point: no naked loops. Two loops above copy ranges. Factor them into a function:
copy_range(new_arr, arr, index); new_arr[index] = obj; copy_range(new_arr + index + 1, arr + index, m_length - index);
Idiomatically,
copy_range
better be expressed in tems of iterators rather then indices.Consider shrinking the array after too many removals.
Const Correctness
A member function that doesn't modify the underlying object should normally be a const
member function. For example:
T get(int index); //O(1)
This doesn't modify the contents of the DynamicArray
, and returns the element by value (not reference) so client code can't get access to the DynamicArray
's internal data to modify it either. As such, this should almost certainly be a const
member function:
T get(int index) const;
Avoiding Unnecessary Copying
Right now your add
, insert
and set
member functions all accept T
objects by value.
void add(T obj);
void insert(int index, T obj);
void set(int index, T obj);
This means that when you're adding/inserting/setting an element in the DynamicArray
, you're starting with one copy of the object in the client code, then copying that object to pass as an argument, then doing a copy assignment to put it into the array. For types that are "cheap" to copy (e.g., int
) that's fine, but for types that are expensive to copy (e.g., if somebody creates a DynamicArray<DynamicArray<int>>
where each sub-array is several megabytes of data) this would get extremely slow.
To avoid (at least some of) that copying, you can pass the input by reference instead:
void add(T const &obj);
void insert(int index, T const &obj);
void set(int index, T const &obj);
This avoids creating the copy solely for argument passing, so you end up with only two instances of the object: one in the client code and one in the array itself, but not an extra one being passed as the parameter.
Using a reference to a const
object allows the reference to refer to a temporary object. For example, assume I have some Complex
type that represents a complex number. If I have an expression like myComplex + yourComplex
, that will create a temporary object. A reference to a const
object can refer to that temporary (but without the const
qualifier it can't).
Rule of 0/3/5
In C++, there's kind of a general rule that if you need to implement any of assignment, copy construction, and destruction, you probably need to implement all three to get a class that manages its resources correctly.
You have a destructor, and yes, you really need to do something to deal with copy construction and copy assignment. For example, if you create a copy of a DynamicArray:
DynamicArray<int> foo;
DynamicArray<int> bar = foo;
...the result tends to be on the ugly side (e.g., typically a core dump).
So you generally want to either implement (at least) copy construction and copy assignment, or else prevent code that tries to do it from compiling, usually with something like this:
DynamicArray(DynamicArray const &) = delete;
DynamicArray &operator=(DynamicArray const &) = delete;
Although it's not necessary to get correct (non-crashing) behavior, you may want to support move construction and move assignment as well:
DynamicArray(DynamicArray &&) = delete;
DynamicArray &operator=(DynamicArray &&) = delete;
So the rule of three covers destruction, copy construction and copy assignment. Adding move assignment and move construction gives the rule of 5.
That leaves the rule of 0: don't do any of this. In many cases you can use something like an std::unique_ptr
to automate handling of the copying/assignment/destruction, so your class doesn't need to do any of the above.
-
\$\begingroup\$ Could you elaborate on how I would use
std::unique_ptr
in this context? I mostly understand how smart pointers work, but not the practical usage of them. \$\endgroup\$ethan warco– ethan warco2021年02月02日 18:45:35 +00:00Commented Feb 2, 2021 at 18:45 -
\$\begingroup\$ @ethanwarco: In this case, it's not clear to me that a
unique_ptr
actually would work at all well (that's why I only mentioned it in passing, but maybe I should have been more explicit). In a typical case, using astd::unique_ptr
to your data will result in a type that can be moved but not copied, and in this case, I'd guess you probably want to allow copying. \$\endgroup\$Jerry Coffin– Jerry Coffin2021年02月02日 19:10:43 +00:00Commented Feb 2, 2021 at 19:10 -
\$\begingroup\$ @JerryCoffin, you can allow copying on a type that includes a
unique_ptr
, you just have to implement the copy constructor yourself, which you have to do in this case anyway. \$\endgroup\$Jan Hudec– Jan Hudec2021年02月02日 19:23:01 +00:00Commented Feb 2, 2021 at 19:23 -
\$\begingroup\$ @JanHudec: Yes, you can--but in a case like this, I think you probably lose about as much as you gain. \$\endgroup\$Jerry Coffin– Jerry Coffin2021年02月02日 21:15:28 +00:00Commented Feb 2, 2021 at 21:15
-
\$\begingroup\$ @JerryCoffin, what would you lose? A
unique_ptr
would guard you from a bunch of mistakes, take care of all deleting and some exception safety and you'd only have to be careful with swapping it when reallocating. \$\endgroup\$Jan Hudec– Jan Hudec2021年02月03日 06:06:06 +00:00Commented Feb 3, 2021 at 6:06
Your implementations for "insert" and "remove" are very inefficient. First, there is no obvious reason why "remove" would reallocate the array. And there is no reason why "insert" would always reallocate the array. Here's what I would do:
Add a private method where you pass the intended capacity. If the intended capacity is greater than the capacity, it increases the array size. If the intended capacity is much less than the capacity, decrease the array size.
For "add", set the intended capacity to length + 1, then add the element. For "insert", set the intended capacity to length + 1, copy the elements at the end of the vector using copy_backwards, then store the new element. For "remove", use std::copy to move everything in the right place, then set the intended size to length - 1.
For "insert", you should allow the position (length), with the same effect as "add".
Using std::copy and std::copy_backwards means that instead of a for-loop you will run the most optimised code possible to move the array elements.
PS. Apple uses an array implementation that makes inserting / removing array elements both at the beginning and the end of the array cheap (operations at index i take O (min (i, length - i)) steps); this is done by allowing element 0 to be stored anywhere in the array, and allowing the array to wrap around at the end of the storage).
PPS. If you shrink the array when it becomes too small, make sure that you can't run into a situation where inserting one element grows the array, then deleting one element shrinks it, etc. etc.
std::vector
? Implementingvector
well, is a fairly non-trivial task even for people with a lot of experience. \$\endgroup\$