Implementation of the vector
class in C++ for learning purposes. Tried to implement interesting parts from the API shown on cppreference.com, but some elements like custom allocator support are missing.
template <typename T>
class Vector {
public:
using value_type = T;
using size_type = std::size_t;
using difference_type = std::ptrdiff_t;
using reference = value_type&;
using const_reference = const value_type&;
using pointer = value_type*;
// Iterator support
using iterator = pointer;
using const_iterator = const pointer;
Vector(): data{nullptr}, m_size{0}, m_capacity{0} {}
explicit Vector(size_type capacity) : Vector() {
resize(capacity);
}
Vector(size_type count, const_reference value): Vector() {
resize(count, value);
}
template<typename InputIt>
Vector(InputIt first, InputIt last) : Vector() {
assign(first, last);
}
Vector(std::initializer_list<T> init) : Vector() {
assign(init.begin(), init.end());
}
Vector(const Vector& other): Vector() {
assignRange(other.begin(), other.end());
}
Vector& operator=(const Vector& other) {
Vector temp = other;
swap(temp);
return *this;
}
Vector(Vector&& other) noexcept : Vector() {
swap(other);
}
Vector& operator=(Vector&& other) noexcept {
swap(other);
return *this;
}
~Vector() {
clear();
deallocate(data);
}
size_type size() const noexcept {
return m_size;
}
reference operator[](size_type index) {
return data[index];
}
void resize(size_type count, const_reference value = T()) {
if (count < m_size) {
for (size_type i = count; i < m_size; i++) {
data[i].~T();
}
} else if (count > m_size) {
if (count > m_capacity) {
reserve(count);
}
for (size_type i = m_size; i < count; i++) {
new (data + i) T(value);
}
}
m_size = count;
}
void reserve(size_type count) {
if (m_capacity < count) {
pointer newData = allocate(count);
if (m_size > 0) {
std::uninitialized_move(data, data + m_size, newData);
}
deallocate(data);
data = newData;
m_capacity = count;
}
}
void reserve_if_needed() {
if (m_size == m_capacity) {
if (m_capacity == 0) {
reserve(1);
} else {
reserve(m_capacity * 2);
}
}
}
void push_back(const_reference item) {
reserve_if_needed();
data[m_size++] = item;
}
void pop_back() {
// The standard says pop_back() on an empty vector is
// undefined behavior, so this check is possibly unnecessary
// since implementations can technically do whatever
// in case of undefined behavior?
if (m_size > 0) {
data[m_size - 1].~T();
m_size -= 1;
}
}
template<typename... Args>
void emplace_back(Args&&... args) {
reserve_if_needed();
new (data + m_size) T(std::forward<Args>(args)...);
m_size++;
}
void shrink_to_fit() {
if (m_capacity > m_size) {
pointer new_data = allocate(m_size);
if (m_size > 0) {
std::uninitialized_move(data, data + m_size, new_data);
}
deallocate(data);
data = new_data;
m_capacity = m_size;
}
}
void swap (Vector& other) noexcept {
using std::swap;
swap(data, other.data);
swap(m_size, other.m_size);
swap(m_capacity, other.m_capacity);
}
// Iterator support
iterator begin() const {
return data;
}
iterator end() const {
return data + m_size;
}
iterator insert(iterator pos, const_reference item) {
return insert(pos, 1, item);
}
iterator insert(iterator pos, size_type count, const_reference item) {
size_type index = pos - data;
size_type remaining = m_size - index;
if (m_capacity < m_size + count) {
reserve(m_size + count);
}
std::uninitialized_move(data + index, data + m_size, data + index + count);
std::uninitialized_fill(data + index, data + index + count, item);
m_size += count;
return data + index;
}
iterator erase(iterator pos) {
return erase(pos, pos + 1);
}
iterator erase(iterator first, iterator last) {
size_type n_elements = last - first;
size_type index = first - data;
for (size_type i = index; i < index + n_elements; i++) {
data[i].~T();
}
std::move(data + index + n_elements, data + m_size, data + index);
m_size -= n_elements;
return first;
}
template<typename InputIt>
void assign(InputIt first, InputIt last) {
assignRange(first, last);
}
void clear() noexcept {
for (std::size_t i = 0; i < m_size; i++) {
data[i].~T();
}
m_size = 0;
}
template<typename InputIt>
void assignRange(InputIt first, InputIt last) {
clear();
resize(std::distance(first, last));
std::uninitialized_copy(first, last, data);
}
void assign(size_type count, const_reference value) {
Vector temp(count, value);
swap(temp);
}
private:
pointer data = nullptr;
size_type m_size;
size_type m_capacity;
pointer allocate(size_type count) {
return static_cast<pointer>(::operator new(count * sizeof(value_type)));
}
void deallocate(pointer p) {
if (p != nullptr) {
::operator delete(p);
}
}
};
2 Answers 2
As vector is such a common thing we review here I have written up a series of articles about implementing a vector.
Overview
Very good.
I found one major bug in push_back()
. You have potential memory leaks in reserve()
and shrinktofit()
that are easy to fix. You can simplify your assignment operator (currently you have copy and move versions) they can be combined into a single version that works for both.
Minor comments about some missing functionaity that is in std::vector
.
Code review
Sure you can have empty vectors.
Vector(): data{nullptr}, m_size{0}, m_capacity{0} {}
But I am wondering if the best strategy is not to simply always allocate capacity for minimal size. How often is a vector allocated but not used?
But pointed out by @chrysante below, the std::vector
default constructor is noexcept
so can't have any memory allocation (as that can potentially throw). So if you want to go that route you can mark this default constructor noexcept
.
Vector() noexcept
: data{nullptr}
, m_size{0}
, m_capacity{0}
{}
One comment on style. Like in variable declarations in the code blocks its nice to have member initialization in the constructor one per line (its easier to read). You are not trying to save vertical space.
Slight deviation from the interface of std::vector
! Sure you can do it. But it will confuse people. Also I use it to simplify things below. I have the same constructor in my class but mine is private so it can only be used internally.
explicit Vector(size_type capacity) : Vector() {
resize(capacity);
}
You can simplify these two assignments into a single method:
Vector& operator=(const Vector& other) {
Vector temp = other;
swap(temp);
return *this;
}
Vector& operator=(Vector&& other) noexcept {
swap(other);
return *this;
}
Easy to write as:
// Notice the parameter is value.
// If passed a RValue it is moved into the parameter.
// If passed an LValue it is copied.
// So you get the same effect with less code.
Vector& operator=(Vector other) noexcept
{
swap(other);
return *this;
}
This is good:
reference operator[](size_type index) {
return data[index];
}
But what about accesses to a const Vector? Just because you can't modify does not mean you can't use the operator[] on it.
const_reference operator[](size_type index) const
{
return data[index];
}
While we are here: Why is there no at()
method?
void resize(size_type count, const_reference value = T()) {
// Not valid here ^^^^^^
// That should be in the header only
This is fine. But std::vector
destroys them from back to front.
Just like how it deletes elements during destruction. This is to mimic the behavior of the C-style array (objects are destroyed in reverse order of creation).
if (count < m_size) {
for (size_type i = count; i < m_size; i++) {
data[i].~T();
}
} else if (count > m_size) {
if (count > m_capacity) {
reserve(count);
}
for (size_type i = m_size; i < count; i++) {
new (data + i) T(value);
}
}
m_size = count;
}
void reserve(size_type count) {
if (m_capacity < count) {
pointer newData = allocate(count);
// Note sure why you need to check against 0 here.
// Copy zero data is a NOOP so would not be more expensive.
// If I had to guess this is actually a pesimization as you
// code has an extra branch.
if (m_size > 0) {
// A new function to me.
// Thats really cool.
std::uninitialized_move(data, data + m_size, newData);
}
deallocate(data);
data = newData;
m_capacity = count;
}
}
The one thing here I would watch is that you have a potential leak.
It's hard to spot. But if the type T
does not support move construction then the compiler will use the copy constructor during the std::uninitialized_move
. If one of the copy constructors fails (i.e. throws) then you will leave this function needing to clean up newData
. Though your function does provide the strong exception guarantee.
You can make this simpler by re-using the Vector :-)
void reserve(size_type count) {
if (m_capacity < count) {
Vector temp(count) // Use your vector with pre-reserved size.
// Remember that Vector is a friend of Vector
// So you can reach into the other class and mess with
// its members (just remember to unit test).
std::uninitialized_move(data, data + m_size, temp.data);
temp.m_size = m_size;
swap(temp);
}
}
This is broken. You are pushing into uninitialized memory so you need to construct the object in place.
void push_back(const_reference item) {
reserve_if_needed();
data[m_size++] = item;
// Should be this.
new (std::addressof(data[m_size++])) T(item);
}
What about pushing an RVALUE?
Replace the above with:
// Note: Pass by value.
// RVALUE are moved into item then moved to container.
// LVALUE are copied into item then moved to the container.
void push_back(T item) {
reserve_if_needed();
new (std::addressof(data[m_size++])) T(std::move(item));
}
if (m_size > 0) {
// Why not swap the next two lines?
// This would make the line to call the destructor
// simpler and easier to read as you don't need the -1
data[m_size - 1].~T();
m_size -= 1;
}
Same issue as reserve()
void shrink_to_fit() {
if (m_capacity > m_size) {
pointer new_data = allocate(m_size);
if (m_size > 0) {
std::uninitialized_move(data, data + m_size, new_data);
}
deallocate(data);
data = new_data;
m_capacity = m_size;
}
}
Same solution:
void shrink_to_fit() {
if (m_capacity > m_size) {
Vector temp(m_size); // Vector with reserved capacity
std::uninitialized_move(data, data + m_size, temp.data);
temp.m_size = m_size;
swap(temp);
}
}
Sure standard iterators:
// Iterator support
iterator begin() const {
return data;
}
iterator end() const {
return data + m_size;
}
But what about const iterators or reverse iterators or const reverse iterators? Also when calling begin()
on a const object you will get a const_iterator.
iterator begin() { // normal begin
const_iterator begin() const { // begin on const object
const_iterator cbegin() const { // explicitly asking for const
reverse_iterator rbegin() { // normal rbegin
const_reverse_iterator rbegin() const { // rbegin on const object
const_reverse_iterator crbegin() const { // explicitly asking for re
etc
To be similar to C-style arrays (and the C++ standard idiom that objects are destroyed in reverse order of creation), you should destroy the members in reverse order.
void clear() noexcept {
for (std::size_t i = 0; i < m_size; i++) {
data[i].~T();
}
m_size = 0;
}
No need to check for null pointers here!
void deallocate(pointer p) {
if (p != nullptr) {
::operator delete(p);
}
}
Questions:
- What is the slight deviation from the standard interface in the explicit Vector(size_type capacity)? Is it that the param is named 'capacity' and not 'count?
If you look at the standard (I link to a non standard but reputable source) std::vector::vector you will see there is no constructor that takes a "capacity" (ie. you have allocated space but zero size). There is one that takes a size and fills it with values (but you have that one).
- When you say "T()" should be in the header only, do you mean the template declaration?
No. Default Parameters should be defined in the class header file only. They are not part of the declaration:
class Vector
{
void resize(size_type count, const_reference value = T());
// This is good.
// Reading the interface you see you can have a default value.
};
// Don't define the default parameters here.
// It should generate an error if you do this.
void Vector::resize(size_type count, const_reference value)
{
// STUFF
}
- How does constructing the item in place in push_back() resolve the uninitialized memory issue?
This is a problem becasue:
// This code uses the assignment operator.
data[m_size++] = item;
// The assignment operator assumes that the object
// referenced on the left hand side has already been constructed
// but in your case that is not true, this is unintialized memory.
// So you are using assignment to uninitialized memory
// which could be anything and thus with non trivial T
// will cause an issue.
This is solved by constructing in place:
// Should be this.
new (std::addressof(data[m_size++])) T(item);
The constructor assumes the memory has not been constructed before. You pass the address to operator new
it will not allocate space but simply use the pointer provided as the location and then call the constructor for the type T to correctly initialize the memory.
- do you have any references or documentation on the "pass by value" idiom for avoiding the two assignment operators?
Nope. There are lots of questions on Stackoverflow that go over this though. Should be simple to find.
- And does the SFINAE approach involve checking std::is_nothrow_move_constructible or something in overload resolution? I should look into how this is done in some compiler implementation
Yep.
-
2\$\begingroup\$ If you conform to the spec, you can't allocate in the default constructor. It's
noexcept
iff the allocator is noexcept default constructible and so isstd::allocator
. Allocating any buffer could potentially throw.explicit Vector(size_type capacity);
as implemented is not a deviation from the standard, only the parameter is named confusingly (should becount
or similar).void push_back(T item);
you should not declarepush_back
this way. IfT
is not moveable, the copy constructor may be called twice whenpush_back
is invoked and ifT
is expensive to copy this is a problem. \$\endgroup\$chrysante– chrysante2023年07月21日 08:15:27 +00:00Commented Jul 21, 2023 at 8:15 -
1\$\begingroup\$
begin()
should return aniterator
andbegin() const
should return aconst_iterator
. And the destruction order ofstd::vector
is unspecified. Good review otherwise :-) \$\endgroup\$chrysante– chrysante2023年07月21日 08:18:09 +00:00Commented Jul 21, 2023 at 8:18 -
1\$\begingroup\$ About the "take parameters by value instead of
const&
and&&
overloads" idiom: In general I strongly agree with this, but in generic code you can't really do it because you never know how expensive a move or a copy ofT
really is and in certain situations the object will be moved/copied twice. \$\endgroup\$chrysante– chrysante2023年07月21日 08:24:29 +00:00Commented Jul 21, 2023 at 8:24 -
\$\begingroup\$ @chrysante Interesting, this is the first I hear about potentially copying objects twice. In what situations does this happen? Oh, I see what you mean, if you don’t know what
T
is, you can’t assume it can be moved. Is that it? \$\endgroup\$Cris Luengo– Cris Luengo2023年07月21日 14:11:44 +00:00Commented Jul 21, 2023 at 14:11 -
1\$\begingroup\$ @jdav22 Questions answered above. See end of text above. \$\endgroup\$Loki Astari– Loki Astari2023年07月21日 20:32:22 +00:00Commented Jul 21, 2023 at 20:32
Overall you did a great job already, implementing containers is a very non trivial task. The other review already points out many things, I will add what I think is still missing.
In the list of typedefs you forgot the following:
using const_pointer = value_type const*;
using reverse_iterator = std::reverse_iterator<iterator>;
using const_reverse_iterator = std::reverse_iterator<const_iterator>;
And note that a const_iterator
is not a const iterator
but rather an iterator to const
. The way you declared it, it is simply an iterator
that is const
but you want it to be an iterator that iterates over a range of constant elements, so you could declare it this way:
using const_iterator = value_type const*;
This constructor is fine, but you might want to change the parameter name to count
or something similar.
explicit Vector(size_type count) : Vector() {
resize(count);
}
template<typename InputIt>
Vector(InputIt first, InputIt last) : Vector() { // (*)
assign(first, last);
}
This constructor is a bit tricky. Cppreference says this this about it:
This overload participates in overload resolution only if InputIt satisfies LegacyInputIterator, to avoid ambiguity with the overload (3).
Where "overload (3)" is Vector(size_type count, const T& value)
. You need to use SFINAE or a C++20 requires
clause to restrain the argument types. If you don't do this, this will happen:
Vector<int> v(10, 1); // We want to construct a Vector holding 10 ints with value 1
// But overload (*) will be selected because
// it's a better match and compilation will
// fail (if you're lucky)
The pre C++20 iterator concept is tricky to implement, especially with SFINAE. A preliminary solution could be to implement the pre C++11 solution and to require InputIt
to not be integral using SFINAE:
template<typename InputIt,
typename std::enable_if<!std::is_integral<T>::value, int>::type = 0>
Vector(InputIt first, InputIt last) : Vector() {
assign(first, last);
}
or using a requires
clause:
template<typename InputIt>
requires (!std::integral<InputIt>)
Vector(InputIt first, InputIt last) : Vector() {
assign(first, last);
}
In the resize
method you don't need the if (count < m_size)
and if (count > m_size)
checks. Simply do this:
void resize(size_type count, const_reference value = T()) {
for (size_type i = count; i < m_size; i++) {
data[i].~T();
}
if (count > m_capacity) {
reserve(count);
}
for (size_type i = m_size; i < count; i++) {
new (data + i) T(value);
}
m_size = count;
}
The for
loops already perform both checks so they are redundant.
In the reserve
and the shrink_to_fit
method you need to call the destructor of the elements in the old buffer after calling std::uninitialized_move
Also consider using guard clauses to have shallower indentation.
void reserve(size_type new_cap) {
if (m_capacity >= new_cap {
return;
}
pointer newData = allocate(new_cap);
std::uninitialized_move(data, data + m_size, newData);
std::destroy(data, data + m_size);
deallocate(data);
data = newData;
m_capacity = new_cap;
}
If you don't want to traverse the elements twice write a for loop and move and destroy manually.
Note that a conforming implementation of shrink_to_fit
can be a no-op:
void shrink_to_fit() {}
push_back
looks fine but to add an overload for r-values you can do this to make use of perfect forwarding:
void push_back(value_type const& item) {
push_back_impl(item);
}
void push_back(value_type&& item) {
push_back_impl(std::move(item));
}
template <typename U>
void push_back_impl(U&& item) {
reserve_if_needed();
::new (&data[m_size++]) value_type(std::forward<U>(item));
}
Note that you could do this:
void push_back(value_type item) {
reserve_if_needed();
::new (&data[m_size++]) value_type(std::move(item));
}
Now the implementation is very simple, but it can perform two moves/copies if invoked like this:
Vector<MyType> v;
MyType t;
v.push_back(t);
t
will be copied into push_back
's argument and then copied or moved into the vector. Depending on how expensive a move/copy of value_type
is, this can be fine. Unfortunately we don't know what value_type
is going to be, so this is suboptimal.
But you don't need any SFINAE tricks here as suggested by the other answer.
Also I would like to add, that in any other (non-template) context, I would always do what the other answer suggested and just accept the potential double-move. But in a template context you can't do it because you have to consider all possible potentially rogue types as value_type
.
Your implementation of pop_back
is fine if you don't mind the added overhead of the check for empty()
.
Your insert
method has a bug. If index + count
is less than m_size
, std::uninitialized_move
will construct new objects over existing ones. I can add a working implementation later, but this is tricky to implement. Make sure to add unit tests that check if all elements that have been constructed also get destroyed.
Make all your helper methods private.
Rename the buffer pointer to m_data
. The name data
is reserved for the .data()
method:
pointer data() { return m_data; }
const_pointer data() const { return m_data; }
I think there are more issues, I will edit this later.
-
\$\begingroup\$ Thanks, super helpful! Wish I could accept multiple answers, but have upvoted yours. \$\endgroup\$jdav22– jdav222023年07月23日 22:44:26 +00:00Commented Jul 23, 2023 at 22:44
-
1\$\begingroup\$ If I understand correctly, you would not overload on lvalue vs rvalue reference and use the by-value operator if you knew the type and knew that a move was not too expensive, but in a templated container you choose to overload because the type could be non-movable or expensive to move? \$\endgroup\$jdav22– jdav222023年07月23日 22:45:29 +00:00Commented Jul 23, 2023 at 22:45
-
1\$\begingroup\$ @jdav22 Yes, exactly \$\endgroup\$chrysante– chrysante2023年07月24日 15:47:49 +00:00Commented Jul 24, 2023 at 15:47
-
\$\begingroup\$ Sorry to revive an old thread, but just noticed the comment about the bug in insert(). As I understand it, the only issue here is not all elements in the vector get destroyed, correct? I'm not sure how to write a test to determine whether an element has been destroyed or not, do you have any suggestions? Also, to implement this correctly and destroy all the elements, the only strategy I can think of is to copy the overlapping elements into some other container, destroy what's in the overlap, then copy back, but there must be something more efficient. Any thoughts? Thanks! @chrysante \$\endgroup\$jdav22– jdav222023年08月23日 23:58:17 +00:00Commented Aug 23, 2023 at 23:58
start
,size
,capacity
), they have (start
,end
,buff_end
) \$\endgroup\$.~T()
and then acting as if it is a valid element on which you call assignment operations. That's a bug. You gotta to use placement new to construct the object instead after deletion. I'd rewrite places where it is uses unnecessarily. Also in assignment operation it is not a good practice to consistently allocate new memory, it is inefficient for such a basic-class. You should reuse current memory if possible. \$\endgroup\$