Implementation of cstring wrapper class

Question 1

I want to have a thin wrapper (low overhead) around c-style string. Also, with some additional features like == and < operators for map indexing. I have come up with the following class.

Can somebody comment on it? From what I've gathered this class would take only as much memory as a pointer, so much less than std::string.

I also heard that this class has more overhead (execution-wise) than std::string and indeed that is the case for the constructor (char *), copy constructor and move constructor. Can somebody explain why?

inline char * dupStr(const char * s){
 if(!s){return nullptr;}
 size_t l = strlen(s) + 1;
 char * d = new char[l];
 memcpy(d,s,l);
 return d;
}
class CString_{
 private:
 const char *s_;
 public:
 CString_(const char *s):s_(dupStr(s)){}
 CString_(const std::string &s):CString_(s.c_str()){}
 CString_(const CString_ &that):s_(dupStr(that.s_)){}
 CString_(CString_ &&that):s_(that.s_){
 that.s_ = nullptr;
 }
 CString_ & operator= (const CString_ & that) = delete;
 CString_ & operator= (CString_ && that) = delete;
 ~CString_(){ delete[] s_;}
 const char * get() const {return s_;}
 bool operator<(const CString_ &that) const{
 return strcmp(s_,that.s_) < 0;
 }
 bool operator==(const CString_ &that) const{
 return strcmp(s_,that.s_) == 0;
 }
};

Question 2

Answers to Questions

I want to have a thin wrapper (low overhead) around c-style string.

I am not sure this class is 'low-overhead' compared to std::string or even a C-String. Because it still has all the overhead of creating dynamic space and copying.

Low-overhead: To me this means the cost of creating the object. If you use this as you measurement then you have not achieved your goal. The cost is about the same as both a std::string and C-String.

Now. It has memory management abilities of std::string but has a smaller footprint. It is the same size as a C-String but the advantage of memory management.

So it has some extra advantages (just not low overhead).

Also, with some additional features like == and < operators for map indexing.

Which is useful; like it.

Can somebody comment on it?

Not a problem.

From what I've gathered this class would take only as much memory as a pointer.

Probably.
But you can always test this by using sizeof() to find out for sure.

so much less than std::string.

It's probably about one third the size of a std::string (you can't make a definitive statement because the standard does not specify how to implement it).

Now a third sounds like a lot. But if a pointer is 4 bytes then a std::string would probably be 12 bytes. Not a great saving. Especially since the std::string has several other advantages (small string optimization: if the string is less than 12 bytes there is no dynamic memory allocation).

I also heard that this class has more overhead (execution-wise) than std::string and indeed that is the case for the constructor (char *), copy constructor and move constructor. Can somebody explain why?

I don't believe that. Your code is pretty optimal in those terms.

Review of Code

Not sure why this is a standalone function? Why not make it a private member of the class.

inline char * dupStr(const char * s){
 if(!s){return nullptr;} // Sure. Why save the space
 // Do this on multiple lines.
 // The point is to make the code
 // readable.
 size_t l = strlen(s) + 1;
 char * d = new char[l];
 memcpy(d,s,l);
 return d;
}

The move constructor. You should mark the move constructor as noexcept:

 CString_(CString_ &&that) noexcept
 : s_(that.s_)
 {
 that.s_ = nullptr;
 }

This is because the standard containers require this to do an optimal operations. If you don't specify this then they will fall back to using the copy constructor (note: only mark it as noexcept if it really is).

Sure you can do a get:

 const char * get() const {return s_;}

But it is sometimes also useful to define the cast operator:

 const char * operator() const {return get();}

This is useful so you can use CString_ in any location that you could use a C-String without having to change the code.

Low-Overhead.

If I was thinking low-overhead. Then I would make the class take ownership of the string object. That way there is no dynamic allocation or copying. But you get the advantages of RAII (so automatic destruction).

In addition I would get store the size. One of the largest cost of a C-String is repeatedly calculating the size of the string (we switched some old code from using C-String to std::string (a big job)). But we got a 40% speed increase.

The problem is how the C-String was created (malloc() or new). We can make the assumption that the user will only use string that come from C libraries. The problems are that you can't check and make that guarantee (one of the reasons std::string makes a copy).

 class CStringOwner
 {
 char* data;
 std::size_t size;
 public:
 CStringOwner(std::nullptr_t)
 : data(nullptr)
 , size(0)
 {}
 explicit CStringOwner(char* data)
 : data(data)
 , size(data ? strlen(data) : 0) // make sure null ptrs are correctly handled.
 {}
 // Copy.
 CStringOwner(CStringOwner const& copy)
 : data(reinterpret_cast<char*>(malloc(copy.size+1)))
 , size(copy.size)
 {
 if (data == nullptr) {
 throw std::bad_alloc("CStringOwner::CStringOwner: Copy: Failed on Malloc");
 }
 std::copy(copy.data, copy.data+copy.size+1, data);
 }
 CStringOwner& operator=(CStringOwner copy)
 {
 copy.swap(*this);
 return *this;
 }
 // Move
 CStringOwner(CStringOwner&& move) noexcept
 : data(nullptr)
 , size(0) 
 {
 move.swap(*this);
 }
 CStringOwner& operator=(CStringOwner&& move) noexcept
 {
 move.swap(*this);
 return *this;
 }
 // Destroy
 ~CStringOwner()
 {
 free(data);
 }
 void swap(CStringOwner& other) noexcept
 {
 using std::swap;
 swap(data, other.data);
 swap(size, other.size);
 }
 // Other shit goes here.
 };

Question 3

As a matter of fact, I can assume that C-Strings are malloc'd, so this works perfectly. Thanks.

Question 4

However, I would add !=NULL check after malloc to prevent memory leaks.

Question 5

Acts differently between being passed a null pointer value and a null pointer literal? Ewww.

Question 6

@Mikel: Yes definitely a bug that must be checked for. I am used to new semantics.

Question 7

Added explicit to the constructor: Because we are taking ownership of the pointer and then freeing it when the object is destroyed we don't want this to happen accidentally. So we force the user to be explicit when capturing a C-String. Added checks for null in the constructor (as we can do stringlen() on a `null). Added check for malloc failure in the copy constructor (throws bad_alloc) on allocation failure.

Question 8

Not really a wrapper

If I were to think about a no-overhead wrapper around a C-style string, I would think of a class that looks something like:

struct string_view {
 const char* data;
 size_t len;
 ~string_view() = default;
 // some other members
};

That is, something that does not do anything with memory allocation whatsoever. It does not own the data, it merely views it. This will be a very cheap abstraction that you can build on by adding all sorts of useful members (begin(), end(), c_str(), size(), etc.).

Your class isn't a cstring wrapper class. It's actually an implementation of string. But one that is strictly less useful than std::string. For one, yours is copy- and move-constructible, but neither copy- nor more-assignable? That seems fairly arbitrary.

Also simply because your class is smaller than std::string doesn't mean it'll perform better. For one thing, std::string has SSO which means it may not even need to allocate memory at all. On top of that, you don't provide or store the size, so to get the length of the string you'll have to call strlen - which inefficient, especially since you construct the string via dupStr which has to know the size anyway.

Some Specifics

If you are taking ownership of an array, prefer std::unique_ptr<const char[]>.

get() should be named c_str() to be consistent with people's expections of what a string class should be like. There should be a size() method, which will be inefficient since you don't store the length at the moment. You should also make your class container-like by providing a begin() and end().

Whenever you implement operator< and operator==, you really should implement all 6 comparisons or at least inherit from boost::totally_ordered. With this implementation, you could also implement a helper function:

int compare(const CString_& that) const {
 return strcmp(s_, that.s_);
}

which makes all of them easy:

friend bool operator<(const CString_& lhs, const CString_& rhs) const {
 return lhs.compare(rhs) < 0;
}
friend bool operator<=(const CString_& lhs, const CString_& rhs) const {
 return lhs.compare(rhs) <= 0;
}
friend bool operator!=(const CString_& lhs, const CString_& rhs) const {
 return lhs.compare(rhs) != 0;
}
// etc.

Question 9

Missed the comparison operators (myself). Liked that you covered unique_ptr +1

Question 10

You're going to have to be very cautious about this use of memory, particularly with the new in dupStr(). It could be easy to encounter a memory leak here if it's not deleted properly by the caller. This is really not the C++ way of handling memory.

Also, you don't need to bother making this function inline. The compiler is free to disregard this keyword use, especially if it's not applicable in this particular case.

Lastly, don't cram this statement so much:

if(!s){return nullptr;}

There's no dire need to save lines here, especially if it'll harm readability. Prefer to have each part on separate lines:

if (!s) {
 return nullptr;
}

Question 11

It looks like he might be putting the dupStr method in the header with the class, in which case it does need to be marked inline.

Question 12

Well, the question is what part should be low-overhead.

Your RAII-NTBS has several disadvantages over a std::string:

No short-string-optimization, so common tiny strings need a dynamic allocation just like longer strings do.
No explicit size saved, so cannot contain embedded zeroes, and determining the size is an O(n) operation. The latter has the obvious effect on constructing / assigning to your string-type.
No reserved size saved, so every time the string is expanded, it has to be reallocated.
Your class has a "partially-valid" state, if initialized from nullptr, moved from, or initialized from such a "partially-valid" source.
Comparing such a "partially-valid" object results in undefined behavior.
For no discernable reason, copy- and move-assignment are disabled.

Your dupStr-function should be a hidden implementation-detail of your class.
Let me for amusements-sake point out strdup, which while not part of the C standard is quite ubiquitous, and has subtly different semantics: It uses malloc instead of new[], which might matter, or not. Also, it assumes the argument is a valid NTBS instead of forwarding nullptr if received.

Your comparison-operators should be symmetric in their arguments, and they don't need access to the classes internals, thus best make them free non-friend functions.

Loki Astari Loki Astari 97.7k5 gold badges126 silver badges341 bronze badges · Accepted Answer · 2015-12-07 08:04:58Z

Answers to Questions

I want to have a thin wrapper (low overhead) around c-style string.

I am not sure this class is 'low-overhead' compared to std::string or even a C-String. Because it still has all the overhead of creating dynamic space and copying.

Low-overhead: To me this means the cost of creating the object. If you use this as you measurement then you have not achieved your goal. The cost is about the same as both a std::string and C-String.

Now. It has memory management abilities of std::string but has a smaller footprint. It is the same size as a C-String but the advantage of memory management.

So it has some extra advantages (just not low overhead).

Also, with some additional features like == and < operators for map indexing.

Which is useful; like it.

Can somebody comment on it?

Not a problem.

From what I've gathered this class would take only as much memory as a pointer.

Probably.
But you can always test this by using sizeof() to find out for sure.

so much less than std::string.

It's probably about one third the size of a std::string (you can't make a definitive statement because the standard does not specify how to implement it).

Now a third sounds like a lot. But if a pointer is 4 bytes then a std::string would probably be 12 bytes. Not a great saving. Especially since the std::string has several other advantages (small string optimization: if the string is less than 12 bytes there is no dynamic memory allocation).

I also heard that this class has more overhead (execution-wise) than std::string and indeed that is the case for the constructor (char *), copy constructor and move constructor. Can somebody explain why?

I don't believe that. Your code is pretty optimal in those terms.

Review of Code

Not sure why this is a standalone function? Why not make it a private member of the class.

inline char * dupStr(const char * s){
 if(!s){return nullptr;} // Sure. Why save the space
 // Do this on multiple lines.
 // The point is to make the code
 // readable.
 size_t l = strlen(s) + 1;
 char * d = new char[l];
 memcpy(d,s,l);
 return d;
}

The move constructor. You should mark the move constructor as noexcept:

 CString_(CString_ &&that) noexcept
 : s_(that.s_)
 {
 that.s_ = nullptr;
 }

This is because the standard containers require this to do an optimal operations. If you don't specify this then they will fall back to using the copy constructor (note: only mark it as noexcept if it really is).

Sure you can do a get:

 const char * get() const {return s_;}

But it is sometimes also useful to define the cast operator:

 const char * operator() const {return get();}

This is useful so you can use CString_ in any location that you could use a C-String without having to change the code.

Low-Overhead.

If I was thinking low-overhead. Then I would make the class take ownership of the string object. That way there is no dynamic allocation or copying. But you get the advantages of RAII (so automatic destruction).

In addition I would get store the size. One of the largest cost of a C-String is repeatedly calculating the size of the string (we switched some old code from using C-String to std::string (a big job)). But we got a 40% speed increase.

The problem is how the C-String was created (malloc() or new). We can make the assumption that the user will only use string that come from C libraries. The problems are that you can't check and make that guarantee (one of the reasons std::string makes a copy).

 class CStringOwner
 {
 char* data;
 std::size_t size;
 public:
 CStringOwner(std::nullptr_t)
 : data(nullptr)
 , size(0)
 {}
 explicit CStringOwner(char* data)
 : data(data)
 , size(data ? strlen(data) : 0) // make sure null ptrs are correctly handled.
 {}
 // Copy.
 CStringOwner(CStringOwner const& copy)
 : data(reinterpret_cast<char*>(malloc(copy.size+1)))
 , size(copy.size)
 {
 if (data == nullptr) {
 throw std::bad_alloc("CStringOwner::CStringOwner: Copy: Failed on Malloc");
 }
 std::copy(copy.data, copy.data+copy.size+1, data);
 }
 CStringOwner& operator=(CStringOwner copy)
 {
 copy.swap(*this);
 return *this;
 }
 // Move
 CStringOwner(CStringOwner&& move) noexcept
 : data(nullptr)
 , size(0) 
 {
 move.swap(*this);
 }
 CStringOwner& operator=(CStringOwner&& move) noexcept
 {
 move.swap(*this);
 return *this;
 }
 // Destroy
 ~CStringOwner()
 {
 free(data);
 }
 void swap(CStringOwner& other) noexcept
 {
 using std::swap;
 swap(data, other.data);
 swap(size, other.size);
 }
 // Other shit goes here.
 };

As a matter of fact, I can assume that C-Strings are malloc'd, so this works perfectly. Thanks.
However, I would add !=NULL check after malloc to prevent memory leaks.
Acts differently between being passed a null pointer value and a null pointer literal? Ewww.
@Mikel: Yes definitely a bug that must be checked for. I am used to new semantics.
Added explicit to the constructor: Because we are taking ownership of the pointer and then freeing it when the object is destroyed we don't want this to happen accidentally. So we force the user to be explicit when capturing a C-String. Added checks for null in the constructor (as we can do stringlen() on a `null). Added check for malloc failure in the copy constructor (throws bad_alloc) on allocation failure.

Stack Exchange Network

Implementation of cstring wrapper class

4 Answers 4

Answers to Questions

Review of Code

Low-Overhead.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Implementation of cstring wrapper class

4 Answers 4

Answers to Questions

Review of Code

Low-Overhead.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions