I want to have a thin wrapper (low overhead) around c-style string. Also, with some additional features like ==
and <
operators for map indexing. I have come up with the following class.
Can somebody comment on it? From what I've gathered this class would take only as much memory as a pointer, so much less than std::string
.
I also heard that this class has more overhead (execution-wise) than std::string
and indeed that is the case for the constructor (char *
), copy constructor and move constructor. Can somebody explain why?
inline char * dupStr(const char * s){
if(!s){return nullptr;}
size_t l = strlen(s) + 1;
char * d = new char[l];
memcpy(d,s,l);
return d;
}
class CString_{
private:
const char *s_;
public:
CString_(const char *s):s_(dupStr(s)){}
CString_(const std::string &s):CString_(s.c_str()){}
CString_(const CString_ &that):s_(dupStr(that.s_)){}
CString_(CString_ &&that):s_(that.s_){
that.s_ = nullptr;
}
CString_ & operator= (const CString_ & that) = delete;
CString_ & operator= (CString_ && that) = delete;
~CString_(){ delete[] s_;}
const char * get() const {return s_;}
bool operator<(const CString_ &that) const{
return strcmp(s_,that.s_) < 0;
}
bool operator==(const CString_ &that) const{
return strcmp(s_,that.s_) == 0;
}
};
4 Answers 4
Answers to Questions
I want to have a thin wrapper (low overhead) around c-style string.
I am not sure this class is 'low-overhead' compared to std::string
or even a C-String
. Because it still has all the overhead of creating dynamic space and copying.
Low-overhead: To me this means the cost of creating the object. If you use this as you measurement then you have not achieved your goal. The cost is about the same as both a std::string
and C-String
.
Now. It has memory management abilities of std::string
but has a smaller footprint. It is the same size as a C-String
but the advantage of memory management.
So it has some extra advantages (just not low overhead).
Also, with some additional features like == and < operators for map indexing.
Which is useful; like it.
Can somebody comment on it?
Not a problem.
From what I've gathered this class would take only as much memory as a pointer.
Probably.
But you can always test this by using sizeof()
to find out for sure.
so much less than std::string.
It's probably about one third the size of a std::string
(you can't make a definitive statement because the standard does not specify how to implement it).
Now a third sounds like a lot. But if a pointer is 4 bytes then a std::string
would probably be 12 bytes. Not a great saving. Especially since the std::string
has several other advantages (small string optimization: if the string is less than 12 bytes there is no dynamic memory allocation).
I also heard that this class has more overhead (execution-wise) than std::string and indeed that is the case for the constructor (char *), copy constructor and move constructor. Can somebody explain why?
I don't believe that. Your code is pretty optimal in those terms.
Review of Code
Not sure why this is a standalone function? Why not make it a private member of the class.
inline char * dupStr(const char * s){
if(!s){return nullptr;} // Sure. Why save the space
// Do this on multiple lines.
// The point is to make the code
// readable.
size_t l = strlen(s) + 1;
char * d = new char[l];
memcpy(d,s,l);
return d;
}
The move constructor. You should mark the move constructor as noexcept
:
CString_(CString_ &&that) noexcept
: s_(that.s_)
{
that.s_ = nullptr;
}
This is because the standard containers require this to do an optimal operations. If you don't specify this then they will fall back to using the copy constructor (note: only mark it as noexcept
if it really is).
Sure you can do a get:
const char * get() const {return s_;}
But it is sometimes also useful to define the cast operator:
const char * operator() const {return get();}
This is useful so you can use CString_
in any location that you could use a C-String
without having to change the code.
Low-Overhead.
If I was thinking low-overhead. Then I would make the class take ownership of the string object. That way there is no dynamic allocation or copying. But you get the advantages of RAII (so automatic destruction).
In addition I would get store the size. One of the largest cost of a C-String
is repeatedly calculating the size of the string (we switched some old code from using C-String to std::string
(a big job)). But we got a 40% speed increase.
The problem is how the C-String
was created (malloc()
or new
). We can make the assumption that the user will only use string that come from C libraries. The problems are that you can't check and make that guarantee (one of the reasons std::string
makes a copy).
class CStringOwner
{
char* data;
std::size_t size;
public:
CStringOwner(std::nullptr_t)
: data(nullptr)
, size(0)
{}
explicit CStringOwner(char* data)
: data(data)
, size(data ? strlen(data) : 0) // make sure null ptrs are correctly handled.
{}
// Copy.
CStringOwner(CStringOwner const& copy)
: data(reinterpret_cast<char*>(malloc(copy.size+1)))
, size(copy.size)
{
if (data == nullptr) {
throw std::bad_alloc("CStringOwner::CStringOwner: Copy: Failed on Malloc");
}
std::copy(copy.data, copy.data+copy.size+1, data);
}
CStringOwner& operator=(CStringOwner copy)
{
copy.swap(*this);
return *this;
}
// Move
CStringOwner(CStringOwner&& move) noexcept
: data(nullptr)
, size(0)
{
move.swap(*this);
}
CStringOwner& operator=(CStringOwner&& move) noexcept
{
move.swap(*this);
return *this;
}
// Destroy
~CStringOwner()
{
free(data);
}
void swap(CStringOwner& other) noexcept
{
using std::swap;
swap(data, other.data);
swap(size, other.size);
}
// Other shit goes here.
};
-
\$\begingroup\$ As a matter of fact, I can assume that C-Strings are malloc'd, so this works perfectly. Thanks. \$\endgroup\$Mikel– Mikel2015年12月07日 08:59:05 +00:00Commented Dec 7, 2015 at 8:59
-
\$\begingroup\$ However, I would add
!=NULL
check after malloc to prevent memory leaks. \$\endgroup\$Mikel– Mikel2015年12月07日 13:32:48 +00:00Commented Dec 7, 2015 at 13:32 -
\$\begingroup\$ Acts differently between being passed a null pointer value and a null pointer literal? Ewww. \$\endgroup\$Ben Voigt– Ben Voigt2015年12月07日 16:04:01 +00:00Commented Dec 7, 2015 at 16:04
-
\$\begingroup\$ @Mikel: Yes definitely a bug that must be checked for. I am used to
new
semantics. \$\endgroup\$Loki Astari– Loki Astari2015年12月07日 17:01:08 +00:00Commented Dec 7, 2015 at 17:01 -
2\$\begingroup\$ Added
explicit
to the constructor: Because we are taking ownership of the pointer and then freeing it when the object is destroyed we don't want this to happen accidentally. So we force the user to be explicit when capturing aC-String
. Added checks fornull
in the constructor (as we can dostringlen()
on a `null). Added check for malloc failure in the copy constructor (throws bad_alloc) on allocation failure. \$\endgroup\$Loki Astari– Loki Astari2015年12月07日 17:05:52 +00:00Commented Dec 7, 2015 at 17:05
Not really a wrapper
If I were to think about a no-overhead wrapper around a C-style string, I would think of a class that looks something like:
struct string_view {
const char* data;
size_t len;
~string_view() = default;
// some other members
};
That is, something that does not do anything with memory allocation whatsoever. It does not own the data, it merely views it. This will be a very cheap abstraction that you can build on by adding all sorts of useful members (begin()
, end()
, c_str()
, size()
, etc.).
Your class isn't a cstring wrapper class. It's actually an implementation of string. But one that is strictly less useful than std::string
. For one, yours is copy- and move-constructible, but neither copy- nor more-assignable? That seems fairly arbitrary.
Also simply because your class is smaller than std::string
doesn't mean it'll perform better. For one thing, std::string
has SSO which means it may not even need to allocate memory at all. On top of that, you don't provide or store the size, so to get the length of the string you'll have to call strlen
- which inefficient, especially since you construct the string via dupStr
which has to know the size anyway.
Some Specifics
If you are taking ownership of an array, prefer std::unique_ptr<const char[]>
.
get()
should be named c_str()
to be consistent with people's expections of what a string class should be like. There should be a size()
method, which will be inefficient since you don't store the length at the moment. You should also make your class container-like by providing a begin()
and end()
.
Whenever you implement operator<
and operator==
, you really should implement all 6 comparisons or at least inherit from boost::totally_ordered
. With this implementation, you could also implement a helper function:
int compare(const CString_& that) const {
return strcmp(s_, that.s_);
}
which makes all of them easy:
friend bool operator<(const CString_& lhs, const CString_& rhs) const {
return lhs.compare(rhs) < 0;
}
friend bool operator<=(const CString_& lhs, const CString_& rhs) const {
return lhs.compare(rhs) <= 0;
}
friend bool operator!=(const CString_& lhs, const CString_& rhs) const {
return lhs.compare(rhs) != 0;
}
// etc.
-
\$\begingroup\$ Missed the comparison operators (myself). Liked that you covered
unique_ptr
+1 \$\endgroup\$Loki Astari– Loki Astari2015年12月07日 17:12:12 +00:00Commented Dec 7, 2015 at 17:12
You're going to have to be very cautious about this use of memory, particularly with the new
in dupStr()
. It could be easy to encounter a memory leak here if it's not delete
d properly by the caller. This is really not the C++ way of handling memory.
Also, you don't need to bother making this function inline
. The compiler is free to disregard this keyword use, especially if it's not applicable in this particular case.
Lastly, don't cram this statement so much:
if(!s){return nullptr;}
There's no dire need to save lines here, especially if it'll harm readability. Prefer to have each part on separate lines:
if (!s) {
return nullptr;
}
-
1\$\begingroup\$ It looks like he might be putting the
dupStr
method in the header with the class, in which case it does need to be marked inline. \$\endgroup\$Corbin– Corbin2015年12月07日 07:09:23 +00:00Commented Dec 7, 2015 at 7:09
Well, the question is what part should be low-overhead.
Your RAII-NTBS has several disadvantages over a std::string
:
- No short-string-optimization, so common tiny strings need a dynamic allocation just like longer strings do.
- No explicit size saved, so cannot contain embedded zeroes, and determining the size is an O(n) operation. The latter has the obvious effect on constructing / assigning to your string-type.
- No reserved size saved, so every time the string is expanded, it has to be reallocated.
- Your class has a "partially-valid" state, if initialized from
nullptr
, moved from, or initialized from such a "partially-valid" source.
Comparing such a "partially-valid" object results in undefined behavior. - For no discernable reason, copy- and move-assignment are disabled.
Your dupStr
-function should be a hidden implementation-detail of your class.
Let me for amusements-sake point out strdup
, which while not part of the C standard is quite ubiquitous, and has subtly different semantics: It uses malloc
instead of new[]
, which might matter, or not. Also, it assumes the argument is a valid NTBS instead of forwarding nullptr
if received.
Your comparison-operators should be symmetric in their arguments, and they don't need access to the classes internals, thus best make them free non-friend functions.