Simple modern C++ string class implementation

Question 1

My teacher asked me to come up with a quick version of the C++ string class. This is what I have written. It works fine and I have tested few cases as detailed in main(). Now I am curious how good this program is actually in terms of modern C++ fundamentals.

#include <iostream>
#include <exception>
class STR
{
public:
 STR(const char* ptr): sz(strlen(ptr)+1),m_data(new (std::nothrow) char[sz])
 {
 if (m_data)
 {
 memcpy(m_data, ptr, sz);
 std::cout << "STR() - "<< getData() << ":" << getSize() << " chars" << '\n';
 }
 else
 {
 std::cerr << "Fatal Error - Cannot construct STR()";
 }
 }
 STR() : m_data(nullptr),sz(0) {}
 STR(const STR& obj)
 {
 sz = strlen(obj) + 1;
 m_data = new (std::nothrow) char[sz];
 memcpy(m_data, obj, sz);
 std::cout << "STR(const STR&) - " << getData() << ":" << getSize() << " chars" << '\n';
 }
 size_t getSize() const noexcept 
 {
 return sz;
 }
 char* getData() const
 {
 return this->m_data;
 }
 operator const char*() const {
 return m_data;
 }
 STR& operator=(const char* other)
 {
 if (this->sz != 0 && this->m_data)
 {
 std::cout << "Destructing..." << getData() << '\n';
 this->sz = 0;
 delete[] this->m_data;
 }
 sz = strlen(other) + 1;
 m_data = new (std::nothrow) char[sz];
 memcpy(m_data, other, sz);
 std::cout << "STR(const STR&) - " << getData() << ":" << getSize() << " chars" << '\n';
 return *this;
 }
 STR& operator=(const STR& other)
 {
 if (this != &other)
 {
 //Release *this memory
 if (this->m_data)
 {
 std::cout << "Destructing..." << getData() << '\n';
 delete[] m_data;
 }
 sz = other.getSize();
 //Allocate *this memory
 this->m_data = new (std::nothrow) char[sz];
 memcpy(m_data, other.getData(), sz);
 std::cout << "operator=(const STR&) - " << getData() << ":" << getSize() << " chars" << '\n';
 }
 return *this;
 }
 //Move assignment
 STR& operator=(STR&& other)
 {
 if (this != &other)
 {
 delete this->m_data;
 //char* p = other.getData();
 this->m_data = std::exchange(other.m_data, nullptr);
 //size_t sz = other.getSize();
 this->sz = std::exchange(other.sz, 0);
 }
 return *this;
 }

Particularly here as I had to comment the private members and make them public as there was no easy way to implement move assignment for STR.

//private: 
 size_t sz;
 char* m_data;
};
std::ostream& operator<<(std::ostream& os, const STR& str)
{
 if (!str.getData())
 return (os << "");
 os << str.getData();
 return os;
}
auto main()->int
{
 //Object direct initialization
 STR a{"Simple text"};
 //Object direct initialization
 STR b = "Another text";
 STR c = a;
 //Empty obj creation
 STR d;
 std::cout << "d:" << d << '\n';
 //Empty object assignment
 d = std::move(c);
 std::cout << "d:" << d << '\n';
 //Existing object assignment
 b = d;
 std::cout << "b:" << b << '\n';
 //Object assigment using const char*
 a = "What next";
 std::cout << "a:" << a <<'\n';
}

Version 2.0 - Based on all the comments

#include <iostream>
#include <exception>
#include <memory>
class MyString
{
public:
 MyString(const char* ptr): sz(strlen(ptr)+1), m_data( sz ? new char[sz]:nullptr)
 {
 std::copy(ptr, ptr + sz, m_data);
 }
 MyString() : m_data(nullptr),sz(0) {}
 ~MyString()
 {
 delete[] m_data;
 }
 MyString(const MyString& obj):sz(strlen(obj) + 1), m_data(new char[sz]) 
 {
 std::copy(obj.m_data, obj.m_data + sz, m_data);
 }
 MyString(MyString&& obj) noexcept :sz(obj.sz), m_data(obj.m_data)
 {
 obj.sz = 0;
 obj.m_data = 0;
 }
 const size_t getSize() const noexcept
 {
 return sz;
 }
 const char* getData() const
 {
 return m_data;
 }
 operator const char*() const {
 return m_data;
 }
 MyString& operator=(const char* other)
 {
 MyString tmp(other);
 swap(tmp, *this);
 return *this;
 }
 friend void swap(MyString& first, MyString& second)
 {
 using std::swap;
 std::swap(first.m_data, second.m_data);
 std::swap(first.sz, second.sz);
 }
 MyString& operator=(MyString other)
 {
 MyString tmp(other);
 swap(other, *this);
 return *this;
 }
private:
 size_t sz;
 char* m_data;
};
std::ostream& operator<<(std::ostream& os, const MyString& str)
{
 if (!str.getData())
 return (os << "");
 os << str.getData();
 return os;
}
int main()
{
 //Object direct initialization
 MyString a{"Simple text"};
 //Object direct initialization
 MyString b = "Another text";
 MyString c = a;
 //Empty obj creation
 MyString d;
 std::cout << "d:" << d << '\n';
 //Empty object assignment
 d = std::move(c);
 std::cout << "d:" << d << '\n';
 //Existing object assignment
 b = d; ---(1)
 std::cout << "b:" << b << '\n';
 //Object assigment using const char*
 a = "What next";
 std::cout << "a:" << a <<'\n';
}

The issue now is that for code line -- (1) , first the copy constructor is called and then assignment operator, I think that copy elision works with rvalue parameters and not with lvalue parameters for unifying assignments of type defined here.

And another todo: is using automated test cases and I am still trying to find out if C++11/14/17 have any test case frameworks or I have to use something external like gtest, cpptest, etc.

Question 2

Just a comment as I don't have time to write a full answer: char* m_data and m_data(new (std::nothrow) char[sz]) is not modern c++. If you use a container to keep the char array you wouldn't need to implement any of the move or copy constructors/assignment op. yourself, see this example: ideone.com/3D06K2

Question 3

I think I know why you believe your members have to be public: it's because you believe encapsulation works at the object level. It does not: it works at the class level. Two instances of a class can access each other's private members directly.

Question 4

@ErnestFriedman-Hill: I think your comment is slightly misleading. There are two common forms of Data Abstraction in use today, Abstract Data Types and Objects. The fundamental difference between the two is that two instances of the same abstract data type can inspect each other's private representation, whereas two objects can never inspect each other's private representation, even if they are of the same type. So, encapsulation must work at the object level, because if it doesn't work at the object level, then you don't have objects in the first place, you have ADTs!

Question 5

@JörgWMittag What you say may be true in a general theoretical sense; nevertheless my statement stands for the specific case iof C++. I'll add that qualification to my comment.

Question 6

@ErnestFriedman-Hill I really appreciate if you can share an example or link to what you are trying to suggest here.

Question 7

Error handling

 STR(const char* ptr): sz(strlen(ptr)+1),m_data(new (std::nothrow) char[sz])
 {
 if (m_data)
 {
 memcpy(m_data, ptr, sz);
 std::cout << "STR() - "<< getData() << ":" << getSize() << " chars" << '\n';
 }
 else
 {
 std::cerr << "Fatal Error - Cannot construct STR()";
 }
 }

Writing anything to the standard output in constructor is a terrible idea. It has nothing to do with the string class. Logging is a separate concern.

Printing the error message to std::cerr is not a proper error handling method. If the allocation fails, the users of your class will never know it (they might not be sitting there looking at the screen). You should indicate the error by throwing an exception. I don't see the reason to use std::nothrow here. It's easier to let it throw automatically.

This piece of code is even more broken:

m_data = new (std::nothrow) char[sz];
memcpy(m_data, obj, sz);

It invokes undefined behavior if the allocation fails.

Correctness

This class needs a custom destructor (that deletes m_data). Otherwise, the memory leaks.

Copy and swap idiom

There's a lot of boilerplate code. You do almost the same thing in the assignment operator and the copy constructor. You can avoid it by using the copy and swap idiom. It'll also make the assignment simpler: you won't need to check it for self-assignment.

Modern C++

There's no point in using memcpy here. Use std::copy instead.

Useless comments

Comments like //Release *this memory and //Move assignment are worse than no comments at all: they clutter your code without providing any extra information. Delete them.

Design

Particularly here as I had to comment the private: members and make them public as there was no easy way to implement move assignment for STR

I don't understand how this is related to move assignment. Making private members public is an awful idea, anyway. It breaks encapsulation. Keep them private.

And did I say anything about not logging anything inside the STR constructor and member-functions? I think I did. Keep your class focused one thing: the string itself.

Tests

I'd strongly recommend to write proper automated unit tests instead of checking a few things in the main function.

Other notes

That auto main()->int is completely pointless. It's just int main().

Question 8

Indeed the best code review comments I had in long time. And your links conceptual and technical discussions greatly helped me to read a lot about everything I did incorrectly. I have tried to implement all the comments in version 2.0 of the code.

Question 9

There's no point in using memcpy here. Use std::copy instead. I read in Notes that std::copy could be implemented using std::memmove. But my concern here is latency, what about if MyString has to be lowlatency, then I think we should use memmove directly and avoid std::copy altogether.

Question 10

I don't get what latency has to do with this. If std::copy is implemented using memmove rest assured that it will be inlined.

Question 11

@cpp11dev For most of the modern C++ compilers, std::copy is not slower than std::memcpy. A more detailed discussion on stackoverflow: stackoverflow.com/questions/4707012/….

Question 12

Your approach is not very good regarding exception safety. For example if exception is thrown the object state should stay valid. You are deleting the old data before allocating new array - so if new fails (even though you choose to use nothrow - you do not check the return value) you get incosistent state.

Proper way would be to write swap noexcept function And implement one of your operators like this

STR& operator=(const char* other)
{
 STR temp(other); // can throw - but state of this is not changed yet
 swap(temp); //no except - so safe
 return *this; //cannot throw as well - so also safe
}

Move also will be implemented using swap. There is no need to delete anything during move - destructor of the object that is being swapped with this will take care of freeing the memory properly

Question 13

Apart from what the others have already said, it's a bad idea to return a non-const pointer via the getData() method as it breaks encapsulation - it's unclear now who owns the object and who should free the buffer memory. If you check how it's done in std::string, you'll see that c_str() and data() methods that are more or less equivalent to your getData() both return const char*.

C++ designers some years ago actually made a mistake of returning a non-const char* from the strstream class. See here why it resulted in this class now being deprecated:

https://stackoverflow.com/questions/45073029/what-i-can-use-instead-of-strstreamfreeze/45073301#45073301

Question 14

What class-invariant does getData() violate? And how does it violate encapsulation?

Question 15

It violates encapsulation as it exposes the underlying data pointer, leaves the underlying data open to uncontrolled modification and makes it unclear who takes ownership. As for the class invariant, I've realised it's not true thanks to your question. sz can't be invalidated that way. Thanks for this comment. I've updated the answer.

Question 16

There's no issue with ownership, the function does not change it. Yes, there is an issue with the callee setting some byte to zero, or a zero to something else, because it's a counted-string / zero-terminated-string chimaera, but that's what contracts are for. It would probably be a good idea to go fully one of the ways...

Question 17

Use the memory management tools the standard gives you

In particular, m_data should be of type std::unique_ptr<char[]>, not char*.

The memory management you are doing is a standard and common thing... and also well-known to be an extremely common place for programmer errors. Use the right tool for this job, so that it's easier to write code, less likely to have errors, and more obvious to the reader what your implicit intentions are.

(and yes, you do have errors — for example, the way you wrote the code requires a destructor to deallocate the memory, and you don't have one!)

You have a const correctness error

getData() is a const member, but it returns a pointer that would allow the user to modify the contents of the string! If you are going to have this function, you should have two variants

A const version that returns const char*
A non-const version that returns char*

although depending on your intentions, you may only want the first and not the second.

Question 18

I'd spontaneously think that m_data could and perhaps should be a shared_ptr, which will automatically go a long way towards copy on write.

Question 19

@PeterA.Schneider: Copy On Write being incompatible with std::string requirements (in particular, c_str() must be O(1)), using std::shared_ptr would not be very helpful.

Question 20

@MatthieuM. Apart from assuming that it was not the goal of this exercise to produce a perfectly standard-conformant string class (but rather to explore C++): Why does using std::shared_ptr instead of unique_ptr prevent c_str() from being of constant time complexity? I would think that std::shared_ptr::get()(the function to call from within c_str() to get the actual data) is O(1).

Question 21

@PeterA.Schneider: Misremembered sorry, it's not c_str it's operator[] (the mutable version). With Copy On Write it must, pessimistically, trigger a copy which is O(N) and invalidates references/iterators; however since C++11 it's specified NOT to invalidate references/iterators.

Question 22

@MatthieuM. Interesting, thanks. [Second thought: Sufficiently advanced iterators could magically stay valid ;-).] So is it established consensus that copy on write is impossible for std::string implementations in modern C++?

Question 23

In addition to other comments, your copy constructor and assignment operator should use the already computed size (member sz or getSize() result).

Also there is a lack of consistency... Sometime, you put optional this-> and sometime not. Sometime you check for nullptr, sometime not for similar code.

As a side note, you should not use uppercase name for class. Usually, uppercase word are used for macros. So it is preferable to follow some usual naming convention.

Question 24

Separate Resource Handling

In C++, there are two sorts of classes:

those which handle one resource,
those which implement domain-specific functionality.

Attempting to shoehorn multiple responsibilities into a single class is a violation of the Single Responsibility Principle and will, in the mid-term to long-term, lead to an unmaintainable mess.

Therefore, I advise to use two classes:

StringImpl will be responsible for handling the resource, this implies allocating it, moving it, assigning it, copying it and freeing it.
String will be responsible for string operations: catenating two strings, finding a character in the string, finding a pattern in the string, ... and will internally rely on StringImpl for its resource handling.

In this answer, I will concentrate on StringImpl since this is the only part you have actually presented, so String is dead simple:

class String {
public:
 String() = default;
 explicit String(const char* str): impl(str) {}
 String(const char* str, size_t size): impl(str, size) {}
 bool empty() const { return impl.empty(); }
 size_t size() const { return impl.size(); }
 char* data() { return impl.data(); }
 const char* data() const { return impl.data(); }
 // Add in any necessary member here
private:
 StringImpl impl;
};

Clearly lay out requirements

Code answers a requirement. It may be clear in your head, but it is not clear to us. It may be clear now, but it will not be months from now.

Therefore, before coding, you need to define requirements, and then use these requirements to inform your choices.

In this case, there are two sets of requirements:

class invariants,
method requirements.

And the method requirements themselves can be divided into:

pre-conditions,
post-conditions,
algorithmic complexity,
exceptions: is any thrown, what is the state of the instance if one is thrown, ...
possibly specific restrictions.

To illustrate, let's established class invariants for String:

String will contain a NUL-terminated contiguous sequence of char

And some method requirements for size:

pre-condition: none
post-condition: none (it's const, so does not modify the class)
algorithmic complexity: O(1)
exceptions: none

You do not have to use those exact requirements, of course. In this case, though, I really advise you to use a constant-time size() method: all standard containers have a constant-time size() method since std::list was brought back into the fold in C++11.

Without this constant-time requirement, there are two valid implementations of size(), one using strlen and one using a mSize data-member. With this requirement, only the implementation using mSize is valid.

Avoid special cases

I applaud the effort in not allocating memory in the default constructor. Unfortunately here it comes at the cost of a special case that you have to remember in every single method.

It also, quite unfortunately, leaks to the client: getData may return a null pointer!

In this specific case, I advise cheating by using a sentinel value: your empty string should pass itself off as "".

The default constructor thus becomes:

private:
 static char EMPTY = 0;
public:
 StringImpl(): mSize(0), mData(&EMPTY) {}

Now, only StringImpl knows of this optimization¹. Its data() methods will always return a valid pointer so all the String methods and other clients will be able to use a uniform treatment without NULL-check.

¹ And it will have to special-case it in its special methods; delete[] should never be called on &EMPTY!

Question 25

Amazing answer. And I really liked the approach you have suggested in tackling both the design and implementation. I believe, with your Impl design you are pointing towards PIMPL idiom. And your sentinel suggestion is what I had in my mind but could not implement it straight. Thank you for that. Appreciate your answer. So, by your suggestion, I should first design the class based on the requirements and then start my implementations, always. Cool

Question 26

@cpp11dev: Two small clarifications. (1) I am not alluding to the PIMPL idiom, just proper layering of responsibilities instead of having one class doing it all. (2) You need not think about all requirements of all methods up-front, however I would suggest that each time you add a method you think about what you want it to do first, and if any change is necessary how that interacts with the requirements of the other methods (it's okay to change requirements, but this needs be a conscious choice). Ideally, changes of requirements should break tests, as those capture the requirements.

kraskevich kraskevichkraskevich 5,66018 silver badges21 bronze badges · Accepted Answer · 2017-09-09 17:39:17Z

Error handling

 STR(const char* ptr): sz(strlen(ptr)+1),m_data(new (std::nothrow) char[sz])
 {
 if (m_data)
 {
 memcpy(m_data, ptr, sz);
 std::cout << "STR() - "<< getData() << ":" << getSize() << " chars" << '\n';
 }
 else
 {
 std::cerr << "Fatal Error - Cannot construct STR()";
 }
 }

Writing anything to the standard output in constructor is a terrible idea. It has nothing to do with the string class. Logging is a separate concern.

Printing the error message to std::cerr is not a proper error handling method. If the allocation fails, the users of your class will never know it (they might not be sitting there looking at the screen). You should indicate the error by throwing an exception. I don't see the reason to use std::nothrow here. It's easier to let it throw automatically.

This piece of code is even more broken:

m_data = new (std::nothrow) char[sz];
memcpy(m_data, obj, sz);

It invokes undefined behavior if the allocation fails.

Correctness

This class needs a custom destructor (that deletes m_data). Otherwise, the memory leaks.

Copy and swap idiom

There's a lot of boilerplate code. You do almost the same thing in the assignment operator and the copy constructor. You can avoid it by using the copy and swap idiom. It'll also make the assignment simpler: you won't need to check it for self-assignment.

Modern C++

There's no point in using memcpy here. Use std::copy instead.

Useless comments

Comments like //Release *this memory and //Move assignment are worse than no comments at all: they clutter your code without providing any extra information. Delete them.

Design

Particularly here as I had to comment the private: members and make them public as there was no easy way to implement move assignment for STR

I don't understand how this is related to move assignment. Making private members public is an awful idea, anyway. It breaks encapsulation. Keep them private.

And did I say anything about not logging anything inside the STR constructor and member-functions? I think I did. Keep your class focused one thing: the string itself.

Tests

I'd strongly recommend to write proper automated unit tests instead of checking a few things in the main function.

Other notes

That auto main()->int is completely pointless. It's just int main().

Indeed the best code review comments I had in long time. And your links conceptual and technical discussions greatly helped me to read a lot about everything I did incorrectly. I have tried to implement all the comments in version 2.0 of the code.
There's no point in using memcpy here. Use std::copy instead. I read in Notes that std::copy could be implemented using std::memmove. But my concern here is latency, what about if MyString has to be lowlatency, then I think we should use memmove directly and avoid std::copy altogether.
I don't get what latency has to do with this. If std::copy is implemented using memmove rest assured that it will be inlined.
@cpp11dev For most of the modern C++ compilers, std::copy is not slower than std::memcpy. A more detailed discussion on stackoverflow: stackoverflow.com/questions/4707012/….

Stack Exchange Network

Simple modern C++ string class implementation

6 Answers 6

Error handling

Correctness

Copy and swap idiom

Modern C++

Useless comments

Design

Tests

Other notes

Use the memory management tools the standard gives you

You have a const correctness error

Separate Resource Handling

Clearly lay out requirements

Avoid special cases

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simple modern C++ string class implementation

6 Answers 6

Error handling

Correctness

Copy and swap idiom

Modern C++

Useless comments

Design

Tests

Other notes

Use the memory management tools the standard gives you

You have a const correctness error

Separate Resource Handling

Clearly lay out requirements

Avoid special cases

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions