My teacher asked me to come up with a quick version of the C++ string
class. This is what I have written. It works fine and I have tested few cases as detailed in main()
. Now I am curious how good this program is actually in terms of modern C++ fundamentals.
#include <iostream>
#include <exception>
class STR
{
public:
STR(const char* ptr): sz(strlen(ptr)+1),m_data(new (std::nothrow) char[sz])
{
if (m_data)
{
memcpy(m_data, ptr, sz);
std::cout << "STR() - "<< getData() << ":" << getSize() << " chars" << '\n';
}
else
{
std::cerr << "Fatal Error - Cannot construct STR()";
}
}
STR() : m_data(nullptr),sz(0) {}
STR(const STR& obj)
{
sz = strlen(obj) + 1;
m_data = new (std::nothrow) char[sz];
memcpy(m_data, obj, sz);
std::cout << "STR(const STR&) - " << getData() << ":" << getSize() << " chars" << '\n';
}
size_t getSize() const noexcept
{
return sz;
}
char* getData() const
{
return this->m_data;
}
operator const char*() const {
return m_data;
}
STR& operator=(const char* other)
{
if (this->sz != 0 && this->m_data)
{
std::cout << "Destructing..." << getData() << '\n';
this->sz = 0;
delete[] this->m_data;
}
sz = strlen(other) + 1;
m_data = new (std::nothrow) char[sz];
memcpy(m_data, other, sz);
std::cout << "STR(const STR&) - " << getData() << ":" << getSize() << " chars" << '\n';
return *this;
}
STR& operator=(const STR& other)
{
if (this != &other)
{
//Release *this memory
if (this->m_data)
{
std::cout << "Destructing..." << getData() << '\n';
delete[] m_data;
}
sz = other.getSize();
//Allocate *this memory
this->m_data = new (std::nothrow) char[sz];
memcpy(m_data, other.getData(), sz);
std::cout << "operator=(const STR&) - " << getData() << ":" << getSize() << " chars" << '\n';
}
return *this;
}
//Move assignment
STR& operator=(STR&& other)
{
if (this != &other)
{
delete this->m_data;
//char* p = other.getData();
this->m_data = std::exchange(other.m_data, nullptr);
//size_t sz = other.getSize();
this->sz = std::exchange(other.sz, 0);
}
return *this;
}
Particularly here as I had to comment the private
members and make them public as there was no easy way to implement move assignment for STR
.
//private:
size_t sz;
char* m_data;
};
std::ostream& operator<<(std::ostream& os, const STR& str)
{
if (!str.getData())
return (os << "");
os << str.getData();
return os;
}
auto main()->int
{
//Object direct initialization
STR a{"Simple text"};
//Object direct initialization
STR b = "Another text";
STR c = a;
//Empty obj creation
STR d;
std::cout << "d:" << d << '\n';
//Empty object assignment
d = std::move(c);
std::cout << "d:" << d << '\n';
//Existing object assignment
b = d;
std::cout << "b:" << b << '\n';
//Object assigment using const char*
a = "What next";
std::cout << "a:" << a <<'\n';
}
Version 2.0 - Based on all the comments
#include <iostream>
#include <exception>
#include <memory>
class MyString
{
public:
MyString(const char* ptr): sz(strlen(ptr)+1), m_data( sz ? new char[sz]:nullptr)
{
std::copy(ptr, ptr + sz, m_data);
}
MyString() : m_data(nullptr),sz(0) {}
~MyString()
{
delete[] m_data;
}
MyString(const MyString& obj):sz(strlen(obj) + 1), m_data(new char[sz])
{
std::copy(obj.m_data, obj.m_data + sz, m_data);
}
MyString(MyString&& obj) noexcept :sz(obj.sz), m_data(obj.m_data)
{
obj.sz = 0;
obj.m_data = 0;
}
const size_t getSize() const noexcept
{
return sz;
}
const char* getData() const
{
return m_data;
}
operator const char*() const {
return m_data;
}
MyString& operator=(const char* other)
{
MyString tmp(other);
swap(tmp, *this);
return *this;
}
friend void swap(MyString& first, MyString& second)
{
using std::swap;
std::swap(first.m_data, second.m_data);
std::swap(first.sz, second.sz);
}
MyString& operator=(MyString other)
{
MyString tmp(other);
swap(other, *this);
return *this;
}
private:
size_t sz;
char* m_data;
};
std::ostream& operator<<(std::ostream& os, const MyString& str)
{
if (!str.getData())
return (os << "");
os << str.getData();
return os;
}
int main()
{
//Object direct initialization
MyString a{"Simple text"};
//Object direct initialization
MyString b = "Another text";
MyString c = a;
//Empty obj creation
MyString d;
std::cout << "d:" << d << '\n';
//Empty object assignment
d = std::move(c);
std::cout << "d:" << d << '\n';
//Existing object assignment
b = d; ---(1)
std::cout << "b:" << b << '\n';
//Object assigment using const char*
a = "What next";
std::cout << "a:" << a <<'\n';
}
The issue now is that for code line -- (1) , first the copy constructor is called and then assignment operator, I think that copy elision works with rvalue parameters and not with lvalue parameters for unifying assignments of type defined here.
And another todo: is using automated test cases and I am still trying to find out if C++11/14/17 have any test case frameworks or I have to use something external like gtest, cpptest, etc.
6 Answers 6
Error handling
STR(const char* ptr): sz(strlen(ptr)+1),m_data(new (std::nothrow) char[sz])
{
if (m_data)
{
memcpy(m_data, ptr, sz);
std::cout << "STR() - "<< getData() << ":" << getSize() << " chars" << '\n';
}
else
{
std::cerr << "Fatal Error - Cannot construct STR()";
}
}
Writing anything to the standard output in constructor is a terrible idea. It has nothing to do with the string class. Logging is a separate concern.
Printing the error message to std::cerr
is not a proper error handling method. If the allocation fails, the users of your class will never know it (they might not be sitting there looking at the screen). You should indicate the error by throwing an exception. I don't see the reason to use std::nothrow
here. It's easier to let it throw automatically.
This piece of code is even more broken:
m_data = new (std::nothrow) char[sz];
memcpy(m_data, obj, sz);
It invokes undefined behavior if the allocation fails.
Correctness
This class needs a custom destructor (that deletes m_data
). Otherwise, the memory leaks.
Copy and swap idiom
There's a lot of boilerplate code. You do almost the same thing in the assignment operator and the copy constructor. You can avoid it by using the copy and swap idiom. It'll also make the assignment simpler: you won't need to check it for self-assignment.
Modern C++
There's no point in using memcpy
here. Use std::copy
instead.
Useless comments
Comments like //Release *this memory
and //Move assignment
are worse than no comments at all: they clutter your code without providing any extra information. Delete them.
Design
Particularly here as I had to comment the private: members and make them public as there was no easy way to implement move assignment for STR
I don't understand how this is related to move assignment. Making private members public is an awful idea, anyway. It breaks encapsulation. Keep them private.
And did I say anything about not logging anything inside the STR
constructor and member-functions? I think I did. Keep your class focused one thing: the string itself.
Tests
I'd strongly recommend to write proper automated unit tests instead of checking a few things in the main
function.
Other notes
That auto main()->int
is completely pointless. It's just int main()
.
-
\$\begingroup\$ Indeed the best code review comments I had in long time. And your links conceptual and technical discussions greatly helped me to read a lot about everything I did incorrectly. I have tried to implement all the comments in version 2.0 of the code. \$\endgroup\$cpp11dev– cpp11dev2017年09月10日 13:26:23 +00:00Commented Sep 10, 2017 at 13:26
-
\$\begingroup\$ There's no point in using memcpy here. Use std::copy instead. I read in Notes that std::copy could be implemented using std::memmove. But my concern here is latency, what about if MyString has to be lowlatency, then I think we should use memmove directly and avoid std::copy altogether. \$\endgroup\$cpp11dev– cpp11dev2017年09月10日 13:36:03 +00:00Commented Sep 10, 2017 at 13:36
-
\$\begingroup\$ I don't get what latency has to do with this. If
std::copy
is implemented usingmemmove
rest assured that it will be inlined. \$\endgroup\$Matteo Italia– Matteo Italia2017年09月10日 15:19:22 +00:00Commented Sep 10, 2017 at 15:19 -
\$\begingroup\$ @cpp11dev For most of the modern C++ compilers,
std::copy
is not slower thanstd::memcpy
. A more detailed discussion on stackoverflow: stackoverflow.com/questions/4707012/…. \$\endgroup\$kraskevich– kraskevich2017年09月10日 17:33:21 +00:00Commented Sep 10, 2017 at 17:33
Your approach is not very good regarding exception safety. For example if exception is thrown the object state should stay valid. You are deleting the old data before allocating new array - so if new fails (even though you choose to use nothrow - you do not check the return value) you get incosistent state.
Proper way would be to write swap noexcept function And implement one of your operators like this
STR& operator=(const char* other)
{
STR temp(other); // can throw - but state of this is not changed yet
swap(temp); //no except - so safe
return *this; //cannot throw as well - so also safe
}
Move also will be implemented using swap. There is no need to delete anything during move - destructor of the object that is being swapped with this will take care of freeing the memory properly
Apart from what the others have already said, it's a bad idea to return a non-const pointer via the getData()
method as it breaks encapsulation - it's unclear now who owns the object and who should free the buffer memory. If you check how it's done in std::string
, you'll see that c_str()
and data()
methods that are more or less equivalent to your getData()
both return const char*
.
C++ designers some years ago actually made a mistake of returning a non-const char*
from the strstream
class. See here why it resulted in this class now being deprecated:
-
\$\begingroup\$ What class-invariant does
getData()
violate? And how does it violate encapsulation? \$\endgroup\$Deduplicator– Deduplicator2017年09月09日 20:16:06 +00:00Commented Sep 9, 2017 at 20:16 -
\$\begingroup\$ It violates encapsulation as it exposes the underlying data pointer, leaves the underlying data open to uncontrolled modification and makes it unclear who takes ownership. As for the class invariant, I've realised it's not true thanks to your question. sz can't be invalidated that way. Thanks for this comment. I've updated the answer. \$\endgroup\$KjMag– KjMag2017年09月09日 20:45:05 +00:00Commented Sep 9, 2017 at 20:45
-
1\$\begingroup\$ There's no issue with ownership, the function does not change it. Yes, there is an issue with the callee setting some byte to zero, or a zero to something else, because it's a counted-string / zero-terminated-string chimaera, but that's what contracts are for. It would probably be a good idea to go fully one of the ways... \$\endgroup\$Deduplicator– Deduplicator2017年09月09日 21:03:01 +00:00Commented Sep 9, 2017 at 21:03
Use the memory management tools the standard gives you
In particular, m_data
should be of type std::unique_ptr<char[]>
, not char*
.
The memory management you are doing is a standard and common thing... and also well-known to be an extremely common place for programmer errors. Use the right tool for this job, so that it's easier to write code, less likely to have errors, and more obvious to the reader what your implicit intentions are.
(and yes, you do have errors — for example, the way you wrote the code requires a destructor to deallocate the memory, and you don't have one!)
You have a const correctness error
getData()
is a const member, but it returns a pointer that would allow the user to modify the contents of the string! If you are going to have this function, you should have two variants
- A const version that returns
const char*
- A non-const version that returns
char*
although depending on your intentions, you may only want the first and not the second.
-
\$\begingroup\$ I'd spontaneously think that
m_data
could and perhaps should be ashared_ptr
, which will automatically go a long way towards copy on write. \$\endgroup\$Peter - Reinstate Monica– Peter - Reinstate Monica2017年09月11日 07:50:36 +00:00Commented Sep 11, 2017 at 7:50 -
\$\begingroup\$ @PeterA.Schneider: Copy On Write being incompatible with
std::string
requirements (in particular,c_str()
must be O(1)), usingstd::shared_ptr
would not be very helpful. \$\endgroup\$Matthieu M.– Matthieu M.2017年09月11日 13:22:01 +00:00Commented Sep 11, 2017 at 13:22 -
\$\begingroup\$ @MatthieuM. Apart from assuming that it was not the goal of this exercise to produce a perfectly standard-conformant string class (but rather to explore C++): Why does using
std::shared_ptr
instead ofunique_ptr
preventc_str()
from being of constant time complexity? I would think thatstd::shared_ptr::get()
(the function to call from withinc_str()
to get the actual data) is O(1). \$\endgroup\$Peter - Reinstate Monica– Peter - Reinstate Monica2017年09月11日 14:03:55 +00:00Commented Sep 11, 2017 at 14:03 -
1\$\begingroup\$ @PeterA.Schneider: Misremembered sorry, it's not
c_str
it'soperator[]
(the mutable version). With Copy On Write it must, pessimistically, trigger a copy which is O(N) and invalidates references/iterators; however since C++11 it's specified NOT to invalidate references/iterators. \$\endgroup\$Matthieu M.– Matthieu M.2017年09月11日 14:46:57 +00:00Commented Sep 11, 2017 at 14:46 -
\$\begingroup\$ @MatthieuM. Interesting, thanks. [Second thought: Sufficiently advanced iterators could magically stay valid ;-).] So is it established consensus that copy on write is impossible for std::string implementations in modern C++? \$\endgroup\$Peter - Reinstate Monica– Peter - Reinstate Monica2017年09月11日 16:12:40 +00:00Commented Sep 11, 2017 at 16:12
In addition to other comments, your copy constructor and assignment operator should use the already computed size (member sz
or getSize()
result).
Also there is a lack of consistency... Sometime, you put optional this->
and sometime not. Sometime you check for nullptr
, sometime not for similar code.
As a side note, you should not use uppercase name for class. Usually, uppercase word are used for macros. So it is preferable to follow some usual naming convention.
Separate Resource Handling
In C++, there are two sorts of classes:
- those which handle one resource,
- those which implement domain-specific functionality.
Attempting to shoehorn multiple responsibilities into a single class is a violation of the Single Responsibility Principle and will, in the mid-term to long-term, lead to an unmaintainable mess.
Therefore, I advise to use two classes:
StringImpl
will be responsible for handling the resource, this implies allocating it, moving it, assigning it, copying it and freeing it.String
will be responsible for string operations: catenating two strings, finding a character in the string, finding a pattern in the string, ... and will internally rely onStringImpl
for its resource handling.
In this answer, I will concentrate on StringImpl
since this is the only part you have actually presented, so String
is dead simple:
class String {
public:
String() = default;
explicit String(const char* str): impl(str) {}
String(const char* str, size_t size): impl(str, size) {}
bool empty() const { return impl.empty(); }
size_t size() const { return impl.size(); }
char* data() { return impl.data(); }
const char* data() const { return impl.data(); }
// Add in any necessary member here
private:
StringImpl impl;
};
Clearly lay out requirements
Code answers a requirement. It may be clear in your head, but it is not clear to us. It may be clear now, but it will not be months from now.
Therefore, before coding, you need to define requirements, and then use these requirements to inform your choices.
In this case, there are two sets of requirements:
- class invariants,
- method requirements.
And the method requirements themselves can be divided into:
- pre-conditions,
- post-conditions,
- algorithmic complexity,
- exceptions: is any thrown, what is the state of the instance if one is thrown, ...
- possibly specific restrictions.
To illustrate, let's established class invariants for String
:
- String will contain a NUL-terminated contiguous sequence of
char
And some method requirements for size
:
- pre-condition: none
- post-condition: none (it's
const
, so does not modify the class) - algorithmic complexity: O(1)
- exceptions: none
You do not have to use those exact requirements, of course. In this case, though, I really advise you to use a constant-time size()
method: all standard containers have a constant-time size()
method since std::list
was brought back into the fold in C++11.
Without this constant-time requirement, there are two valid implementations of size()
, one using strlen
and one using a mSize
data-member. With this requirement, only the implementation using mSize
is valid.
Avoid special cases
I applaud the effort in not allocating memory in the default constructor. Unfortunately here it comes at the cost of a special case that you have to remember in every single method.
It also, quite unfortunately, leaks to the client: getData
may return a null pointer!
In this specific case, I advise cheating by using a sentinel value: your empty string should pass itself off as ""
.
The default constructor thus becomes:
private:
static char EMPTY = 0;
public:
StringImpl(): mSize(0), mData(&EMPTY) {}
Now, only StringImpl
knows of this optimization1. Its data()
methods will always return a valid pointer so all the String
methods and other clients will be able to use a uniform treatment without NULL-check.
1 And it will have to special-case it in its special methods; delete[]
should never be called on &EMPTY
!
-
\$\begingroup\$ Amazing answer. And I really liked the approach you have suggested in tackling both the design and implementation. I believe, with your Impl design you are pointing towards PIMPL idiom. And your sentinel suggestion is what I had in my mind but could not implement it straight. Thank you for that. Appreciate your answer. So, by your suggestion, I should first design the class based on the requirements and then start my implementations, always. Cool \$\endgroup\$cpp11dev– cpp11dev2017年09月12日 14:30:58 +00:00Commented Sep 12, 2017 at 14:30
-
\$\begingroup\$ @cpp11dev: Two small clarifications. (1) I am not alluding to the PIMPL idiom, just proper layering of responsibilities instead of having one class doing it all. (2) You need not think about all requirements of all methods up-front, however I would suggest that each time you add a method you think about what you want it to do first, and if any change is necessary how that interacts with the requirements of the other methods (it's okay to change requirements, but this needs be a conscious choice). Ideally, changes of requirements should break tests, as those capture the requirements. \$\endgroup\$Matthieu M.– Matthieu M.2017年09月12日 15:17:45 +00:00Commented Sep 12, 2017 at 15:17
char* m_data
andm_data(new (std::nothrow) char[sz])
is not modern c++. If you use a container to keep the char array you wouldn't need to implement any of the move or copy constructors/assignment op. yourself, see this example: ideone.com/3D06K2 \$\endgroup\$