Modern Vector implementation

Question 1

Please give me some feedback on my attempt at a vector. I wanted this to be the lowest level concept of a vector that I'm using, so I'm primarily concerned with speed, and I would build wrappers afterward for safety when its needed. This is why I'm not using the copy and swap idiom for the copy ctor. Anyway please critique and give explanations for changes where it can be improved for my goal.

vector.h:

#ifndef VECTOR_H
#define VECTOR_H
template<typename T>
class Vector
{
public:
 typedef T* iterator;
 typedef const T* const_iterator;
 Vector();
 Vector(unsigned int size);
 Vector(unsigned int size, const T& initial);
 Vector(const Vector<T>& v);
 Vector(Vector<T> && v);
 ~Vector();
 Vector<T>& operator =(Vector<T> const& v);
 Vector<T>& operator =(Vector<T> && v); // move assignment
 inline unsigned int capacity() const { return m_capacity; }
 inline unsigned int size() const { return m_size; }
 inline bool empty() const { return m_size == 0; }
 inline iterator begin() { return buff; }
 inline iterator end() { return buff + (m_size - 1) * sizeof(T); }
 inline const_iterator c_begin() { return buff; }
 inline const_iterator c_end() { return buff + (m_size - 1) * sizeof(T); }
 T& front();
 T& back();
 void pushBack(const T& val);
 void popBack();
 void reserve(unsigned int capacity);
 void resize(unsigned int size);
 T& operator [](unsigned int index);
 void clear();
private:
 unsigned int m_capacity;
 unsigned int m_size;
 unsigned int Log;
 T* buff;
 void swap(Vector<T> & v);
};
#endif

vector.cpp:

#include "vector.h"
#include <cmath> //for Log
template<typename T>
Vector<T>::Vector() : 
 m_capacity(0),
 m_size(0),
 Log(0),
 buff(nullptr) {}
template<typename T>
Vector<T>::Vector(unsigned int size) : //an argument of 0 leads to a capacity value of 1, distinct from the default ctor
 m_size(size), 
 Log(ceil(log((double)size) / log(2.0))), 
 m_capacity(1 << Log), 
 buff(size ? new T[m_capacity] : nullptr) {}
template<typename T>
Vector<T>::Vector(unsigned int size, const T& initial) :
 m_size(size), 
 Log(ceil(log((double)size) / log(2.0))), 
 m_capacity(1 << Log), 
 buff(size ? new T[m_capacity] : nullptr)
{
 for (unsigned int i = 0; i < size; i++)
 {
 //using placement new to place each element of v[i] in our already allocated buffer
 new(buffer + i) T(initial);
 }
}
template<typename T>
Vector<T>::Vector(Vector<T> const& v) :
 m_capacity(v.capacity()), 
 m_size(v.size()), 
 Log(v.Log), 
 buff(m_size ? new T[m_capacity] : nullptr)
{
 std::copy(v.buff, v.buff + v.m_size, buff); //std::copy is better than loops if we have both arrays
 //for (unsigned int i = 0; i < m_size; i++)
 //new(buffer + sizeof(T)*i) T(v[i]);
}
template<typename T>
Vector<T>::Vector(Vector<T> && v)
 : m_size(0),
 m_capacity(0),
 buff(nullptr),
 Log(0)
{
 swap(v);
}
template<typename T>
Vector<T>& Vector<T>::operator =(Vector<T> const& v)
// //not strong exception, but will write a wrapper that makes it strong exception; lowest level -> fastest
{
 delete[] buff;
 buff = nullptr;
 buff = v.m_capacity ? new T[v.m_capacity * sizeof(T)] : nullptr;
 m_size = v.m_size;
 m_capacity = v.m_capacity;
 Log = v.Log;
 std::copy(v.buff, v.buff + m_size-1, buff);
 return *this;
}
template<typename T>
Vector<T>& Vector<T>::operator =(Vector<T> && v)
{
 delete[] buff; //prep this
 m_size = 0;
 buff = nullptr;
 m_capacity = 0;
 Log = 0;
 swap(v);
 return *this;
}
template<typename T>
Vector<T>::~Vector()
{
 delete[] buff;
 buff = nullptr;
 m_size = 0;
 m_capacity = 0;
 Log = 0;
}
template<typename T>
T& Vector<T>::operator[](unsigned int index)
{
 return buff[index];
}
template<typename T>
T& Vector<T>::front()
{
 return buff[0];
}
template<typename T>
T& Vector<T>::back()
{
 return buff[m_size - 1];
}
template<typename T>
void Vector<T>::reserve(unsigned int capac)
{
 T* newbuff = new T[capac];
 for (unsigned int i = 0; i < m_size; i++)
 newbuff[i] = buff[i];
 m_capacity = capac;
 delete[] buff;
 buff = newbuff;
}
template<typename T>
void Vector<T>::resize(unsigned int size)
{
 Log = ceil(log((double)size) / log(2.0));
 reserve(1 << Log);
 m_size = size;
}
template<typename T>
void Vector<T>::pushBack(const T& val)
{
 if (m_size >= m_capacity)
 {
 reserve(1 << Log++);
 }
 buff[m_size++] = val;
}
template<typename T>
void Vector<T>::popBack()
{
 (reinterpret_cast<T*>(buff)[m_size-- - 1]).~T();
}
template<typename T>
void Vector<T>::clear()
{
 for (int i = 0; i < m_size; i++)
 (reinterpret_cast<T*>(buff)[i]).~T();
 m_size = 0;
 m_capacity = 0;
 Log = 0;
 buff = 0;
}
template<typename T>
void Vector<T>::swap(Vector<T> & v)
{
 std::swap(m_size, v.m_size);
 std::swap(m_capacity, v.m_capacity);
 std::swap(buff, v.buff);
 std::swap(Log, v.Log);
}

Question 2

Is there a reason for not simply using std::vector?

Question 3

If this is meant to be reinventing-the-wheel, you should add that tag (and consider using matching names such as push_back and cbegin).

Question 4

inline as a keyword for functions defined in class is useless because these functions get inline automatic.

Question 5

I don't think the code as posted was actually ready for a review. Please test your code before submitting it here.

Question 6

Why do you think that copy-and-swap idiom might reduce the speed?

Question 7

Memory management

new T[m_capacity] default-constructs m_capacity objects of type T in there.

This might hurt performance a lot if Ts default constructor is not trivial, and even then it still isn't necessary.

Even worse, in pushBack a new T object gets copy constructed in the place of the already existing one (if m_size < m_capacity), without properly deleting it. This might cause an abundance of errors, especially if Ts destructor is not trivial (e.g. it would leak memory).

Consider using std::aligned_storage for storing the objects instead.
The implementation leaks memory if an exception is thrown during any of the member functions which call new (e.g. due to no being able to allocate new memory). Using std::unique_ptr helps preventing those leaks.
clear leaks memory (no call to delete[]).
Some member functions (see below) set m_capacity, Log and/or m_size to unexpected values, which in turn can cause wrong allocations and/or out of bounds memory accesses in other places.
- reserve accesses out of bounds memory if capac < m_size due to newbuff[i] being out of bounds.
- For any given m_capacity, the first call to pushBack where m_size >= m_capacity doesn't allocate bigger space (Log gets increased after the call to reserve in reserve(1 << Log++)).
  
  This means Log is out of sync from m_capacity, buff[m_size++] is out of bounds and m_size > m_capacity after the call.
- The constructors taking a size parameter initialize m_capacity before Log (due to member definition order inside the class), but refer to the uninitialized value of Log for doing so. This sets m_capacity to a wrong value, and thus allocates a wrong amount of memory for buff (if size > 0).
pushBack has a potential use-after-free error: If val refers to an element inside this Vector and Vector needs to reallocate, val will dangle when the new object gets copy constructed.
The copy assignment operator deletes buff without checking whether this == &v. If that condition is true, it deleted v.buff (which it then will try to index in order to populate the newly allocated buff).
- Also, that newly allocated buff is too large by a factor of sizeof(T).

You might want to look into using std::unique_ptr<T[]> for buff. While this doesn't fix out of bounds memory access, it will help with some of the other issues.

Usage with standard algorithms

The name changes of push_back, pop_back, cbegin and cend make this container unusable for standard algorithms.
begin and end should provide a const overload, so they can be called on a const Vector<T>&.
Similarly, cbegin and cend should be declared const.
Also, push_front, pop_front, emplace, emplace_back, emplace_front, remove, insert and all the reverse iterator variants are missing.

Naming

The style of names for member variables is inconsistent: m_size/m_capacity, buff and Log. Choose one and stay consistent!

Class design

There is no way to handle types that have no default or no copy constructor. Please provide an overload on push_back(T&&) to accept non-copyable types, and emplace_front/emplace_back methods to construct T objects for types that have neither a copy nor a move constructor. (The emplace-type methods can also provide performance benefits by reducing copy/move constructions.)
A const Vector<T>& has no way to access elements inside it. Please provide const overloads for front, back and operator[].
Vector<T>::swap should likely be public.
Vector<T> isn't copyable unless T itself is. Thus, the copy constructor and copy assignment operator should be deleted for those cases.
Why force the buffer to a size that is a power of 2? If I specify an exact size in the constructor (or via reserve/resize), I'd expect the Vector to have exactly that size (and not waste additional memory).
No member functions give any exception specification. This prevents certain kinds of optimization. Especially the move constructor, move assignment operator and destructor should be noexcept.

Other issues

Why reinterpret_cast<T*>(buff)? buff is already of type T*.
inline actually doesn't do anything in this implementation.
clear() doesn't call destructors for objects between m_size and m_capacity. (Then again, it doesn't delete[] buff, which would fix that.)
Prefer {} over () for initialization. While there is no instance of the most vexing parse in this code, it always helps to be mindful.
As @TobySpeigh mentions, please put all code into the header file, as all code for templates needs to be accessible where instantiated. (Currently, one would have to use #include "vector.cpp", which is surprising.)
Iterators returned by end() and cend() are expected to point one element after the last one (i.e. buff + m_size). Currently, they point wildly out of bounds (buff + (m_size - 1) * sizeof(T)).
Sometimes std::copy is used for copying, sometimes a raw for loop. This is inconsistent.
- In some of those cases, they could be replaced by a range-based std::move (if T is nothrow movable).
- In some cases, the last element isn't copied due to bound miscalculations.

Performance

No member function gives any exception specification. This might impede performance.
Why use floating point math for Log? (And why use Log at all?)
Try to std::move objects if possible. This can be asserted with template meta programming.
You might consider using realloc for reallocations if T is trivially copyable.
And as always, if performance is important, measure it! Doing premature optimization (especially at the cost of readability or extensibility) is bad.

Also, be sure your code is working correctly before trying to optimize. It doesn't matter if it is really fast if it produces the wrong result!

Question 8

Many programmers prefer power-of-two heap allocations so as to minimize heap fragmentation. It also gets you O(log n) reallocation when repeatedly inserting.

Question 9

@Davislor: Every constant growth factor provides \$\mathcal{O}(\log n)\$ reallocations upon insertion. And minimizing head fragmentation might be a good general goal, but not wasting memory if the user explicitly specifies their size requirements should be weighed higher IMHO.

Question 10

@hoffmale Every constant growth factor does, yes. Allocating only as much memory as you need, though, might need a billion allocations if you add a billion elements one-by-one. The counterargument to wasting memory was: if you do a lot of resizing or deallocation of small objects, you might in practice end up with lots of heap blocks that are too small to use, but that take a long time to search.

Question 11

@hoffmale Added a section to my answer about that.

Question 12

@Davislor Power of two heap allocations lead to the walking vector problem if you have one growing vector; no amount of previously freed vector buffers can fit your new memory request. Any sub power of 2 can avoid this, where after some number K iterations the previously deallocated K buffers can be used for the K+2nd exponential growth.

Question 13

A few things you can improve upon:

Use more of `<algorithm>` and `<memory>`

Raw loops are verbose and unreadable. Use algorithms whenever you can -I've just noticed that you did it in some cases, but left the raw loops in others (cf reserve). You also seem not to know std::uninitialized_fill, which does exactly what you do manually in your third constructor, or std::destroy that's here to achieve what you do in clear (std::destroy_at is for individual elements, as in popBack).

Factorize constructors

Constructors can call other constructors. It avoids some copy-pasting. So, given:

template<typename T>
Vector<T>::Vector(unsigned int size) : //an argument of 0 leads to a capacity value of 1, distinct from the default ctor
 m_size(size), 
 Log(ceil(log((double)size) / log(2.0))), 
 m_capacity(1 << Log), 
 buff(size ? new T[m_capacity] : nullptr) {}

you can have:

template <typename T>
Vector<T>::Vector(unsigned size, const T& value) : Vector(size) {
 std::copy(buffer, buffer+size, value); 
}

Simplify your destructor

You don't need to reset all member variables in your destructor, since you can't use it after anyway.

Question 14

There is a bug in pushBack, you need to take into account the case where you call it on an existing member of the array.

For example in the code

Vector<int> v(1,0);
v.pushBack(v[0]);

pushBack will first resize the array, invalidating the reference to v[0]. It will then attempt to insert that invalid reference in the new array.

Question 15

Wow, totally overlooked that one with all the other memory management issues!

Question 16

Actually, this behavior is undefined in this specific case. Vector<int> v(1, 0) initializes m_capacity wrong, thus v.pushBack(v[0]) might not actually reallocate.

Question 17

Use the well-known names for methods

c_begin and c_end are so close to the standard-container cbegin and cend that users will forever curse you. Likewise for pushBack and popBack (instead of the usual push_back and pop_back).

Use a consistent naming scheme within the class

Members seem to have a mix of naming schemes: buff, m_size, Log. Pick one and stick with it.

Avoid unnecessary code

In the destructor, we have:

buff = nullptr;
m_size = 0;
m_capacity = 0;
Log = 0;

These statements are useless clutter, as the object is being destroyed, and the members will not be accessible after this. A good compiler may elide the code for you, but it can't address the bigger problem, that there's more lines for the human reader to understand.

Similarly, in the move-constructor, we perform most of the destructor actions before calling swap(). It's generally expected that a moved-from object should soon be going out of scope, and perfectly normal to just swap the contents on the assumption that this tidy-up will occur then.

The `clear()` method leaks memory

We set buff to a null pointer without deleting its resource - there's a missing delete[] buff there. It's really worth running the unit tests under Valgrind occasionally to catch stuff like that.

Don't delete contents on assignment...

...until you have checked for self-assignment. That's one mistake that copy-and-swap will save you from.

It will help performance if you also check whether the current capacity is already suitable for the new contents. If it's at least the required size (and not too much bigger - e.g. one or two reallocation steps at most), then you can reduce the work of new and delete by simply re-using the storage you allocated.

Use the correct names of functions from `<cmath>`

It appears that your compiler exercises its right to define names in the global namespace as well as in std. But it's not portable to rely on that, so call std::log and std::ceil by their full names.

Use a single file

The method definitions need to be visible to the users (as they are templates), so they should be in the header file with the declarations.

Question 18

Problems

Your main issue is that your usage of placement new (and manual destructor call) are incorrect and undefined behavior results because the lifespan of objects is inconsistent.

Code Review

Seems a bit likely to clash with other people

#ifndef VECTOR_H
#define VECTOR_H

You have to make these unique include your namespace into the guard. Talking about namespace; add a namespace around your code.

Use of inline is discouraged.

 inline unsigned int capacity() const { return m_capacity; }
 inline unsigned int size() const { return m_size; }
 inline bool empty() const { return m_size == 0; }
 inline iterator begin() { return buff; }
 inline iterator end() { return buff + (m_size - 1) * sizeof(T); }
 inline const_iterator c_begin() { return buff; }
 inline const_iterator c_end() { return buff + (m_size - 1) * sizeof(T); }

In this context it does not add any real meaning. It was supposed to be a hint to the compiler to inline the code. But it was long since shown that humans are very bad at making this choice and all compilers ignore the hint and only inline when they determine it is useful.

These look wrong.

 inline iterator end() { return buff + (m_size - 1) * sizeof(T); }
 inline const_iterator c_end() { return buff + (m_size - 1) * sizeof(T); }

There is no need to add sizeof(T) in those expressions. When you increment a pointer you are adding by the size of the object that it points at.

Seems like you need const versions of these two (in addition to the non const versions).

 T& front();
 T& back();
 T& operator [](unsigned int index);

You have move constructors. Why not move insertion.

 void pushBack(const T& val);

Why not just use a default parameter in the next constructor to implement this version of the constructor?

template<typename T>
Vector<T>::Vector() : 
 m_capacity(0),
 m_size(0),
 Log(0),
 buff(nullptr) {}

The order of the initialization list is not relevant.

template<typename T>
Vector<T>::Vector(unsigned int size) : //an argument of 0 leads to a capacity value of 1, distinct from the default ctor
 m_size(size), 
 Log(ceil(log((double)size) / log(2.0))), 
 m_capacity(1 << Log), 
 buff(size ? new T[m_capacity] : nullptr) {}

The members are initialized in the order they are declared in class declaration. Changing the order in the initializer list will not change that order. If you turn on your compiler warnings it will tell you this.

As a result these members are not all initialized correctly as some depend on values that have not been initialized yet.

The actual order that will be done is:

 m_capacity(1 << Log), // Notice here Log is used
 // But has not been initialized.
 m_size(size), 
 Log(ceil(log((double)size) / log(2.0))), 
 buff(size ? new T[m_capacity] : nullptr) {}

Incorrect usage of placement new.

 buff(size ? new T[m_capacity] : nullptr) // You are calling the constructor of `T` for every object in this array
 ....
 new(buffer + i) T(initial); // You are over writing an object that has had its constructor called.

Placement new can only be used on RAW memory. You are using it here on an already live object (an object that has had its constructor already called). This is undefined behavior.

It would have been more correct to use the assignment operator.

It is a good idea to mark the move constructor as noexcept.

template<typename T>
Vector<T>::Vector(Vector<T> && v)
 : m_size(0),
 m_capacity(0),
 buff(nullptr),
 Log(0)
{
 swap(v);
}

This allows for certain optimizations by the compiler. Since the only call is to swap() this should be true (traditionally swap is also noexcept).

The Assignment operator does not provide the strong exception guarantee. Also it is not self assignment safe. If you assign this object to itself it will have undefined behavior.

template<typename T>
Vector<T>& Vector<T>::operator =(Vector<T> const& v)
// //not strong exception, but will write a wrapper that makes it strong exception; lowest level -> fastest
{
 delete[] buff;
 buff = nullptr;
 buff = v.m_capacity ? new T[v.m_capacity * sizeof(T)] : nullptr;
 m_size = v.m_size;
 m_capacity = v.m_capacity;
 Log = v.Log;
 std::copy(v.buff, v.buff + m_size-1, buff);
 return *this;
}

I see you notice that in your comments. It is trivial to make this exception safe by using the copy and swap idiom (and it has no more overhead than doing it manually). Also the copy and swap idiom is self assignment safe.

Again move operator. Usually mark as noexcept. Unfortunately as you call delete you can't do that!

template<typename T>
Vector<T>& Vector<T>::operator =(Vector<T> && v)
{
 delete[] buff; //prep this
 m_size = 0;
 buff = nullptr;
 m_capacity = 0;
 Log = 0;
 swap(v);
 return *this;
}

There is no need to do the delete here. Simply do the swap. This will allow you to mark the method noexcept. After you have done the swap call clear on the RHS to force the objects in the vector to be destroyed (to mimic the requirements on the std::vector). Let the destructor of the RHS clean up the memory for the vector.

It also adds the opportunity to optimize the usage (as the storage can potentially be re-used).

Extra work being done.

template<typename T>
Vector<T>::~Vector()
{
 delete[] buff;
 // Everything below this point is a waste of time.
 // Don't do it.
 // Also assigning null to the pointer can potentially hide errors. When you are debugging.
 buff = nullptr;
 m_size = 0;
 m_capacity = 0;
 Log = 0;
}

This call to the destructor will get you into trouble.

template<typename T>
void Vector<T>::popBack()
{
 (reinterpret_cast<T*>(buff)[m_size-- - 1]).~T();
}

The trouble is that buff is a pointer to T so when you call delete in the destructor it is going call the destructor for T. By calling it here you have ended the objects life span so calling delete in the destructor is undefined behavior.

I wrote 5 articles about building a vector.

Vector - Resource Management Allocation
Vector - Resource Management Copy Swap
Vector - Resize
Vector - Simple Optimizations
Vector - The Other Stuff

Question 19

The Type of Your Size Parameters

Currently, you have them as unsigned int, but this is often different from size_t and ptrdiff_t. The most common example in 2018 is that most 64-bit compilers define unsigned int as 32 bits wide, which limits your vectors to 4GiB. It's also possible to imagine implementations with 16- or 24-bit size_t but a wider ALU (a 16-bit Windows or DOS program running on an 80386 or better?) The STL version uses size_t, which is 32 bits wide on 32-bit platforms and 64 bits wide on 64-bit platforms.

You will also often find some people who prefer signed to unsigned indices, on the grounds that it's very easy for arithmetic on indices to give you a negative number, and interpreting that as a very large unsigned number will lead to logic errors. This has a portability issue of its own, in that unsigned overflow and underflow are well-defined but signed overflow and underflow are not, but nearly all real-world implementations use two’s-complement math. If you want to use signed indices, the type for that is ptrdiff_t, which is the type of the result of subtracting one pointer from another.

Microsoft’s preferred solution is rsize_t, an optional annex to the C language standard. This is an unsigned quantity whose upper bit flags it as invalid (that is, a size_t that the library is meant to rigorously check for overflow).

Allocation Size

You choose to allocate blocks as powers of two. Elsewhere in the comments, when someone asked if it wouldn’t be better to allocate the exact amount of memory requested, I explain the arguments in favor of the power-of-two approach.

However, adding a .shrink_to_fit() method gives the client code the ability to decide that wasted memory is a bigger concern for that application than heap fragmentation. If the application doesn’t call that, they get the default behavior.

hoffmale hoffmale 6,52818 silver badges41 bronze badges · Answer 1 · 2018-07-12 17:29:14Z

Memory management

new T[m_capacity] default-constructs m_capacity objects of type T in there.

This might hurt performance a lot if Ts default constructor is not trivial, and even then it still isn't necessary.

Even worse, in pushBack a new T object gets copy constructed in the place of the already existing one (if m_size < m_capacity), without properly deleting it. This might cause an abundance of errors, especially if Ts destructor is not trivial (e.g. it would leak memory).

Consider using std::aligned_storage for storing the objects instead.
The implementation leaks memory if an exception is thrown during any of the member functions which call new (e.g. due to no being able to allocate new memory). Using std::unique_ptr helps preventing those leaks.
clear leaks memory (no call to delete[]).
Some member functions (see below) set m_capacity, Log and/or m_size to unexpected values, which in turn can cause wrong allocations and/or out of bounds memory accesses in other places.
- reserve accesses out of bounds memory if capac < m_size due to newbuff[i] being out of bounds.
- For any given m_capacity, the first call to pushBack where m_size >= m_capacity doesn't allocate bigger space (Log gets increased after the call to reserve in reserve(1 << Log++)).
  
  This means Log is out of sync from m_capacity, buff[m_size++] is out of bounds and m_size > m_capacity after the call.
- The constructors taking a size parameter initialize m_capacity before Log (due to member definition order inside the class), but refer to the uninitialized value of Log for doing so. This sets m_capacity to a wrong value, and thus allocates a wrong amount of memory for buff (if size > 0).
pushBack has a potential use-after-free error: If val refers to an element inside this Vector and Vector needs to reallocate, val will dangle when the new object gets copy constructed.
The copy assignment operator deletes buff without checking whether this == &v. If that condition is true, it deleted v.buff (which it then will try to index in order to populate the newly allocated buff).
- Also, that newly allocated buff is too large by a factor of sizeof(T).

You might want to look into using std::unique_ptr<T[]> for buff. While this doesn't fix out of bounds memory access, it will help with some of the other issues.

Usage with standard algorithms

The name changes of push_back, pop_back, cbegin and cend make this container unusable for standard algorithms.
begin and end should provide a const overload, so they can be called on a const Vector<T>&.
Similarly, cbegin and cend should be declared const.
Also, push_front, pop_front, emplace, emplace_back, emplace_front, remove, insert and all the reverse iterator variants are missing.

Naming

The style of names for member variables is inconsistent: m_size/m_capacity, buff and Log. Choose one and stay consistent!

Class design

There is no way to handle types that have no default or no copy constructor. Please provide an overload on push_back(T&&) to accept non-copyable types, and emplace_front/emplace_back methods to construct T objects for types that have neither a copy nor a move constructor. (The emplace-type methods can also provide performance benefits by reducing copy/move constructions.)
A const Vector<T>& has no way to access elements inside it. Please provide const overloads for front, back and operator[].
Vector<T>::swap should likely be public.
Vector<T> isn't copyable unless T itself is. Thus, the copy constructor and copy assignment operator should be deleted for those cases.
Why force the buffer to a size that is a power of 2? If I specify an exact size in the constructor (or via reserve/resize), I'd expect the Vector to have exactly that size (and not waste additional memory).
No member functions give any exception specification. This prevents certain kinds of optimization. Especially the move constructor, move assignment operator and destructor should be noexcept.

Other issues

Why reinterpret_cast<T*>(buff)? buff is already of type T*.
inline actually doesn't do anything in this implementation.
clear() doesn't call destructors for objects between m_size and m_capacity. (Then again, it doesn't delete[] buff, which would fix that.)
Prefer {} over () for initialization. While there is no instance of the most vexing parse in this code, it always helps to be mindful.
As @TobySpeigh mentions, please put all code into the header file, as all code for templates needs to be accessible where instantiated. (Currently, one would have to use #include "vector.cpp", which is surprising.)
Iterators returned by end() and cend() are expected to point one element after the last one (i.e. buff + m_size). Currently, they point wildly out of bounds (buff + (m_size - 1) * sizeof(T)).
Sometimes std::copy is used for copying, sometimes a raw for loop. This is inconsistent.
- In some of those cases, they could be replaced by a range-based std::move (if T is nothrow movable).
- In some cases, the last element isn't copied due to bound miscalculations.

Performance

No member function gives any exception specification. This might impede performance.
Why use floating point math for Log? (And why use Log at all?)
Try to std::move objects if possible. This can be asserted with template meta programming.
You might consider using realloc for reallocations if T is trivially copyable.
And as always, if performance is important, measure it! Doing premature optimization (especially at the cost of readability or extensibility) is bad.

Also, be sure your code is working correctly before trying to optimize. It doesn't matter if it is really fast if it produces the wrong result!

Many programmers prefer power-of-two heap allocations so as to minimize heap fragmentation. It also gets you O(log n) reallocation when repeatedly inserting.
@Davislor: Every constant growth factor provides \$\mathcal{O}(\log n)\$ reallocations upon insertion. And minimizing head fragmentation might be a good general goal, but not wasting memory if the user explicitly specifies their size requirements should be weighed higher IMHO.
@hoffmale Every constant growth factor does, yes. Allocating only as much memory as you need, though, might need a billion allocations if you add a billion elements one-by-one. The counterargument to wasting memory was: if you do a lot of resizing or deallocation of small objects, you might in practice end up with lots of heap blocks that are too small to use, but that take a long time to search.
@Davislor Power of two heap allocations lead to the walking vector problem if you have one growing vector; no amount of previously freed vector buffers can fit your new memory request. Any sub power of 2 can avoid this, where after some number K iterations the previously deallocated K buffers can be used for the K+2nd exponential growth.

papagaga papagaga 5,80712 silver badges25 bronze badges · Answer 2 · 2018-07-12 14:02:40Z

A few things you can improve upon:

Use more of `<algorithm>` and `<memory>`

Raw loops are verbose and unreadable. Use algorithms whenever you can -I've just noticed that you did it in some cases, but left the raw loops in others (cf reserve). You also seem not to know std::uninitialized_fill, which does exactly what you do manually in your third constructor, or std::destroy that's here to achieve what you do in clear (std::destroy_at is for individual elements, as in popBack).

Factorize constructors

Constructors can call other constructors. It avoids some copy-pasting. So, given:

template<typename T>
Vector<T>::Vector(unsigned int size) : //an argument of 0 leads to a capacity value of 1, distinct from the default ctor
 m_size(size), 
 Log(ceil(log((double)size) / log(2.0))), 
 m_capacity(1 << Log), 
 buff(size ? new T[m_capacity] : nullptr) {}

you can have:

template <typename T>
Vector<T>::Vector(unsigned size, const T& value) : Vector(size) {
 std::copy(buffer, buffer+size, value); 
}

Simplify your destructor

You don't need to reset all member variables in your destructor, since you can't use it after anyway.

David Brown David Brown 2111 silver badge5 bronze badges · Answer 3 · 2018-07-12 17:26:38Z

11

\$\begingroup\$

There is a bug in pushBack, you need to take into account the case where you call it on an existing member of the array.

For example in the code

Vector<int> v(1,0);
v.pushBack(v[0]);

pushBack will first resize the array, invalidating the reference to v[0]. It will then attempt to insert that invalid reference in the new array.

Share

answered Jul 12, 2018 at 17:26

David Brown's user avatar

David Brown David Brown

2111 silver badge5 bronze badges

\$\endgroup\$

2

\$\begingroup\$ Wow, totally overlooked that one with all the other memory management issues! \$\endgroup\$

hoffmale
– hoffmale

2018年07月12日 17:51:21 +00:00
Commented Jul 12, 2018 at 17:51
\$\begingroup\$ Actually, this behavior is undefined in this specific case. Vector<int> v(1, 0) initializes m_capacity wrong, thus v.pushBack(v[0]) might not actually reallocate. \$\endgroup\$

hoffmale
– hoffmale

2018年07月12日 18:11:19 +00:00
Commented Jul 12, 2018 at 18:11

Add a comment |

Toby Speight Toby Speight 87.1k14 gold badges104 silver badges322 bronze badges · Answer 4 · 2018-07-12 14:56:35Z

Use the well-known names for methods

c_begin and c_end are so close to the standard-container cbegin and cend that users will forever curse you. Likewise for pushBack and popBack (instead of the usual push_back and pop_back).

Use a consistent naming scheme within the class

Members seem to have a mix of naming schemes: buff, m_size, Log. Pick one and stick with it.

Avoid unnecessary code

In the destructor, we have:

buff = nullptr;
m_size = 0;
m_capacity = 0;
Log = 0;

These statements are useless clutter, as the object is being destroyed, and the members will not be accessible after this. A good compiler may elide the code for you, but it can't address the bigger problem, that there's more lines for the human reader to understand.

Similarly, in the move-constructor, we perform most of the destructor actions before calling swap(). It's generally expected that a moved-from object should soon be going out of scope, and perfectly normal to just swap the contents on the assumption that this tidy-up will occur then.

The `clear()` method leaks memory

We set buff to a null pointer without deleting its resource - there's a missing delete[] buff there. It's really worth running the unit tests under Valgrind occasionally to catch stuff like that.

Don't delete contents on assignment...

...until you have checked for self-assignment. That's one mistake that copy-and-swap will save you from.

It will help performance if you also check whether the current capacity is already suitable for the new contents. If it's at least the required size (and not too much bigger - e.g. one or two reallocation steps at most), then you can reduce the work of new and delete by simply re-using the storage you allocated.

Use the correct names of functions from `<cmath>`

It appears that your compiler exercises its right to define names in the global namespace as well as in std. But it's not portable to rely on that, so call std::log and std::ceil by their full names.

Use a single file

The method definitions need to be visible to the users (as they are templates), so they should be in the header file with the declarations.

Loki Astari Loki Astari 97.7k5 gold badges126 silver badges341 bronze badges · Answer 5 · 2018-07-12 20:18:58Z

Problems

Your main issue is that your usage of placement new (and manual destructor call) are incorrect and undefined behavior results because the lifespan of objects is inconsistent.

Code Review

Seems a bit likely to clash with other people

#ifndef VECTOR_H
#define VECTOR_H

You have to make these unique include your namespace into the guard. Talking about namespace; add a namespace around your code.

Use of inline is discouraged.

 inline unsigned int capacity() const { return m_capacity; }
 inline unsigned int size() const { return m_size; }
 inline bool empty() const { return m_size == 0; }
 inline iterator begin() { return buff; }
 inline iterator end() { return buff + (m_size - 1) * sizeof(T); }
 inline const_iterator c_begin() { return buff; }
 inline const_iterator c_end() { return buff + (m_size - 1) * sizeof(T); }

In this context it does not add any real meaning. It was supposed to be a hint to the compiler to inline the code. But it was long since shown that humans are very bad at making this choice and all compilers ignore the hint and only inline when they determine it is useful.

These look wrong.

 inline iterator end() { return buff + (m_size - 1) * sizeof(T); }
 inline const_iterator c_end() { return buff + (m_size - 1) * sizeof(T); }

There is no need to add sizeof(T) in those expressions. When you increment a pointer you are adding by the size of the object that it points at.

Seems like you need const versions of these two (in addition to the non const versions).

 T& front();
 T& back();
 T& operator [](unsigned int index);

You have move constructors. Why not move insertion.

 void pushBack(const T& val);

Why not just use a default parameter in the next constructor to implement this version of the constructor?

template<typename T>
Vector<T>::Vector() : 
 m_capacity(0),
 m_size(0),
 Log(0),
 buff(nullptr) {}

The order of the initialization list is not relevant.

template<typename T>
Vector<T>::Vector(unsigned int size) : //an argument of 0 leads to a capacity value of 1, distinct from the default ctor
 m_size(size), 
 Log(ceil(log((double)size) / log(2.0))), 
 m_capacity(1 << Log), 
 buff(size ? new T[m_capacity] : nullptr) {}

The members are initialized in the order they are declared in class declaration. Changing the order in the initializer list will not change that order. If you turn on your compiler warnings it will tell you this.

As a result these members are not all initialized correctly as some depend on values that have not been initialized yet.

The actual order that will be done is:

 m_capacity(1 << Log), // Notice here Log is used
 // But has not been initialized.
 m_size(size), 
 Log(ceil(log((double)size) / log(2.0))), 
 buff(size ? new T[m_capacity] : nullptr) {}

Incorrect usage of placement new.

 buff(size ? new T[m_capacity] : nullptr) // You are calling the constructor of `T` for every object in this array
 ....
 new(buffer + i) T(initial); // You are over writing an object that has had its constructor called.

Placement new can only be used on RAW memory. You are using it here on an already live object (an object that has had its constructor already called). This is undefined behavior.

It would have been more correct to use the assignment operator.

It is a good idea to mark the move constructor as noexcept.

template<typename T>
Vector<T>::Vector(Vector<T> && v)
 : m_size(0),
 m_capacity(0),
 buff(nullptr),
 Log(0)
{
 swap(v);
}

This allows for certain optimizations by the compiler. Since the only call is to swap() this should be true (traditionally swap is also noexcept).

The Assignment operator does not provide the strong exception guarantee. Also it is not self assignment safe. If you assign this object to itself it will have undefined behavior.

template<typename T>
Vector<T>& Vector<T>::operator =(Vector<T> const& v)
// //not strong exception, but will write a wrapper that makes it strong exception; lowest level -> fastest
{
 delete[] buff;
 buff = nullptr;
 buff = v.m_capacity ? new T[v.m_capacity * sizeof(T)] : nullptr;
 m_size = v.m_size;
 m_capacity = v.m_capacity;
 Log = v.Log;
 std::copy(v.buff, v.buff + m_size-1, buff);
 return *this;
}

I see you notice that in your comments. It is trivial to make this exception safe by using the copy and swap idiom (and it has no more overhead than doing it manually). Also the copy and swap idiom is self assignment safe.

Again move operator. Usually mark as noexcept. Unfortunately as you call delete you can't do that!

template<typename T>
Vector<T>& Vector<T>::operator =(Vector<T> && v)
{
 delete[] buff; //prep this
 m_size = 0;
 buff = nullptr;
 m_capacity = 0;
 Log = 0;
 swap(v);
 return *this;
}

There is no need to do the delete here. Simply do the swap. This will allow you to mark the method noexcept. After you have done the swap call clear on the RHS to force the objects in the vector to be destroyed (to mimic the requirements on the std::vector). Let the destructor of the RHS clean up the memory for the vector.

It also adds the opportunity to optimize the usage (as the storage can potentially be re-used).

Extra work being done.

template<typename T>
Vector<T>::~Vector()
{
 delete[] buff;
 // Everything below this point is a waste of time.
 // Don't do it.
 // Also assigning null to the pointer can potentially hide errors. When you are debugging.
 buff = nullptr;
 m_size = 0;
 m_capacity = 0;
 Log = 0;
}

This call to the destructor will get you into trouble.

template<typename T>
void Vector<T>::popBack()
{
 (reinterpret_cast<T*>(buff)[m_size-- - 1]).~T();
}

The trouble is that buff is a pointer to T so when you call delete in the destructor it is going call the destructor for T. By calling it here you have ended the objects life span so calling delete in the destructor is undefined behavior.

I wrote 5 articles about building a vector.

Vector - Resource Management Allocation
Vector - Resource Management Copy Swap
Vector - Resize
Vector - Simple Optimizations
Vector - The Other Stuff

Davislor Davislor 9,10519 silver badges39 bronze badges · Answer 6 · 2018-07-12 18:21:36Z

The Type of Your Size Parameters

Currently, you have them as unsigned int, but this is often different from size_t and ptrdiff_t. The most common example in 2018 is that most 64-bit compilers define unsigned int as 32 bits wide, which limits your vectors to 4GiB. It's also possible to imagine implementations with 16- or 24-bit size_t but a wider ALU (a 16-bit Windows or DOS program running on an 80386 or better?) The STL version uses size_t, which is 32 bits wide on 32-bit platforms and 64 bits wide on 64-bit platforms.

You will also often find some people who prefer signed to unsigned indices, on the grounds that it's very easy for arithmetic on indices to give you a negative number, and interpreting that as a very large unsigned number will lead to logic errors. This has a portability issue of its own, in that unsigned overflow and underflow are well-defined but signed overflow and underflow are not, but nearly all real-world implementations use two’s-complement math. If you want to use signed indices, the type for that is ptrdiff_t, which is the type of the result of subtracting one pointer from another.

Microsoft’s preferred solution is rsize_t, an optional annex to the C language standard. This is an unsigned quantity whose upper bit flags it as invalid (that is, a size_t that the library is meant to rigorously check for overflow).

Allocation Size

You choose to allocate blocks as powers of two. Elsewhere in the comments, when someone asked if it wouldn’t be better to allocate the exact amount of memory requested, I explain the arguments in favor of the power-of-two approach.

However, adding a .shrink_to_fit() method gives the client code the ability to decide that wasted memory is a bigger concern for that application than heap fragmentation. If the application doesn’t call that, they get the default behavior.

Stack Exchange Network

Modern Vector implementation

6 Answers 6

Memory management

Usage with standard algorithms

Naming

Class design

Other issues

Performance

Use more of `<algorithm>` and `<memory>`

Factorize constructors

Simplify your destructor

Use the well-known names for methods

Use a consistent naming scheme within the class

Avoid unnecessary code

The `clear()` method leaks memory

Don't delete contents on assignment...

Use the correct names of functions from `<cmath>`

Use a single file

Problems

Code Review

The Type of Your Size Parameters

Allocation Size

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Modern Vector implementation

6 Answers 6

Memory management

Usage with standard algorithms

Naming

Class design

Other issues

Performance

Use more of <algorithm> and <memory>

Factorize constructors

Simplify your destructor

Use the well-known names for methods

Use a consistent naming scheme within the class

Avoid unnecessary code

The clear() method leaks memory

Don't delete contents on assignment...

Use the correct names of functions from <cmath>

Use a single file

Problems

Code Review

The Type of Your Size Parameters

Allocation Size

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use more of `<algorithm>` and `<memory>`

The `clear()` method leaks memory

Use the correct names of functions from `<cmath>`