I would like some feedback on what I should change with my semi remake of std::to_string
with UDLs (user-defined literals).
This is compiled and working on linux with C++11 using g++
.
#include <deque> // if you use vectors instead you have to add them to the
//string in reverse order (no push_front())
#include <string>
std::string operator"" _str(unsigned long long int num)
{
std::deque<char> digits;
for(char i = 0; num != 0;num = num / 10)
{
i = '0'+(num % 10);
digits.push_front(i);
}
std::string converternum;
for(auto i = 0; i< digits.size();i++)
converternum += digits[i];
return converternum;
}
char * operator"" _cstr(unsigned long long int num)
{
std::deque<char> digits;
for(char i = 0; num != 0;num = num / 10)
{
i = '0'+(num % 10);
digits.push_front(i);
}
std::string converternum;
for(auto i = 0; i< digits.size();i++)
converternum += digits[i];
char * string = (char *)converternum.c_str();
return string;//converternum.c_str();
}
4 Answers 4
I'm ignoring all the testing stuff, and just focusing on the two conversion functions.
std::string operator"" _str(unsigned long long int num)
{
std::deque<char> digits;
There's really no need for the deque. You can push_back()
all the digits directly into a std::string
, and then use std::reverse()
.
To make it even better, you could use the max value of an unsigned long long
and calculate the maximum number of digits you'll need, then use reserve()
. But that probably won't net you any real gains in reality, since you'll likely always be under the small-string optimization size where std::string
won't actually allocate any memory... another reason to ditch the deque and stick with strings.
for(char i = 0; num != 0;num = num / 10)
Have you tested your conversion functions with zero? Because if num
is zero, this loop won't fire at all, and digits
will be empty. So "0"_str
will result in ""
.
This is a pretty weird and confusing loop in general. i
looks like it's going to be some kind of loop counter, but it's not. All-in-all, this just looks like a case of trying to be too clever, and outsmarting yourself (due to the zero issue).
This could be expressed much more simply and clearly as something like:
do
{
digits.push_front('0' + (num % 10));
num /= 10;
} while (num != 0);
and it won't have the zero bug. (do
loops are less common than while
, but the reason to use one here - and this should be documented in a comment - is to guarantee that even if num
is zero, the loop will fire at least once.)
{snipped section about digit character codes}
std::string converternum;
I already mentioned that the deque is unnecessary, but if you're really going to use it, you should take advantage of the fact that you know digits.size()
, and use reserve()
for this string to prevent unnecessary allocations.
char * operator"" _cstr(unsigned long long int num)
This function is mostly identical to the first one, and thus has the same problems. Again, I would ditch the deque and just work with a string. The digits will be in reverse order, but in this case you can fix that with std::reverse_copy()
when you copy them into your output buffer.
char * string = (char *)converternum.c_str();
return string;//converternum.c_str();
Here is where you get into real trouble.
First, never, ever use C-style casts. In this case, you've cast away the constness of c_str()
... that's bad. Prior to C++17, if you want a non-const
pointer to std::string
's data, you have to jump through some hoops - you have to do something like &convertnum[0]
or &convertnum.front()
- and I agree that's a pain. But casting away constness is a red flag that won't pass most code reviews, because it is extremely dangerous.
Second, you are taking a pointer to convertnum
's internal data, and returning that... even though convertnum
is being destroyed. In other words, you are returning a pointer to freed memory, which is UB.
How do you solve these problems? Well, the don't answer is: Don't use C-style strings. In other words, don't have a _cstr
UDL. It's only going to give you grief (it already has!).
But if you really, really must... one option is to do what @hoffmale suggests, and instead of char*
, return std::unique_ptr<char[]>
. That might look something like this:
char* operator"" _cstr(unsigned long long int num)
{
auto digits = std::string{};
do
{
digits += '0' + (num % 10);
num /= 10;
} while (num != 0);
auto p = std::make_unique<char[]>(digits.size() + 1);
std::reverse_copy(digits.cbegin(), digits.cend(), p.get());
p[digits.size()] = '0円'; // have to manually add the null terminator
return p;
}
The problem with that is you're now longer returning a C-string, which makes "_cstr
" kind of a lie.
So another option is to have some kind of storage with a lifetime longer than the function that will retain the memory for you. Something like:
char* operator"" _cstr(unsigned long long int num)
{
// Note: you *can't* use a vector<string>, because string's
// memory may be invalidated on moves, like if the vector
// resizes.
static auto storage = std::vector<std::unique_ptr<char[]>>{};
auto digits = std::string{};
do
{
digits += '0' + (num % 10);
num /= 10;
} while (num != 0);
auto p = std::make_unique<char[]>(digits.size() + 1);
std::reverse_copy(digits.cbegin(), digits.cend(), p.get());
p[digits.size()] = '0円';
// Get the char*, and then move the unique_ptr into storage
// to let it be managed there.
auto result = p.get();
storage.push_back(std::move(p));
return result;
}
This will "work", and it's basically what some C libraries do (when they don't simply leak). But with no way to release the memory, it will just grow and grow. You could add a mechanism to release the memory, but you would need a way to be sure it's safe for every string. This solution is possible, but unwieldy.
The last solution - the worst case (other than outright UB) - is to define some sort of contract that client code must follow. In plain English, document the function by saying "you must delete[]
the memory when you're done with it". That requires programmers to behave, and is very easy to break, so it is not recommended. But... if you really want a C-string function, it's an option. A bad one, but an option nonetheless.
Summary
You have one critical error in your code:
- In
""_cstr(unsigned long long)
, you return a pointer to the internal memory of astd::string
that gets destroyed at the end of the function, meaning you're returning a pointer to invalid memory.
You have a couple of logical errors in your code:
- Both functions will return an empty string when the number is zero, instead of
"0"
.
You have a few inefficiencies:
- You use deques because you want to add the numbers in reverse order, and reason that makes sense because neither strings nor vectors have
push_front()
. But that's all unnecessary. You can simplypush_back()
directly into your result string, and then usestd::reverse()
. (Orpush_back()
into a string buffer, andstd::reverse_copy()
into your final buffer.)
And here are some tips to keep in mind:
It's really good that you're doing testing, but if you're going to do that, you might as well go all in and use something like Google Test. It's header-only, a single include, and it produces very pretty output. A full battery of tests can give you great confidence in your code. Using Google Test might look something like (fair warning, I don't actually use Google Test that often):
TEST(StrTest, HandlesZero) {
EXPECT_EQ(0_str, "0");
}
To test that the types are right, rather than trying to demangle the names (which, by the way, it's been a long time since I've used abi::__cxa_demangle()
, but I'm pretty sure you're not using it correctly), it's so much easier to just use type traits:
TEST(StrTest, ReturnType) {
EXPECT_TRUE((std::is_same_v<decltype(0_str), std::string>));
}
Another dirty trick you can use to examine types is deliberately generating errors:
template <typename T>
struct TD; // note, never define TD
TD<decltype(0_str)> t; // will trigger an error
The error generated will print the exact type of whatever is in the angle brackets, on every compiler. This is a hackish way to check types, but it can help when you're experimenting.
-
\$\begingroup\$ C++ guarantees that digits are contiguous. [lex.charset]/3: "In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous." \$\endgroup\$Jerry Coffin– Jerry Coffin2018年07月13日 03:07:55 +00:00Commented Jul 13, 2018 at 3:07
-
\$\begingroup\$ That's good to know. I'm battle-scarred after EBCDIC and its non-contiguous letters. \$\endgroup\$indi– indi2018年07月13日 03:13:23 +00:00Commented Jul 13, 2018 at 3:13
-
\$\begingroup\$ I certainly can't blame you for that! \$\endgroup\$Jerry Coffin– Jerry Coffin2018年07月13日 03:14:28 +00:00Commented Jul 13, 2018 at 3:14
-
1\$\begingroup\$ For tests of type equality, I prefer to do that at compile-time - e.g.
static_assert(std::is_same_v<decltype(0_str), std::string>);
\$\endgroup\$Toby Speight– Toby Speight2018年07月13日 08:26:25 +00:00Commented Jul 13, 2018 at 8:26 -
\$\begingroup\$ In case it's not obvious - the
static_assert
can still be in the test method. Example in a recent question of mine: Compute mean and variance, incrementally. \$\endgroup\$Toby Speight– Toby Speight2018年07月13日 15:02:32 +00:00Commented Jul 13, 2018 at 15:02
Note: Intermingling actual implementation code and testing code/usage example makes for bad reading.
std::string
has a constructor that takes two iterators and constructs its contents from them. This allows _str
to be simplified to:
std::string operator"" _str(unsigned long long int num)
{
std::deque<char> digits;
for(; num != 0; num /= 10)
{
auto digit = '0' + (num % 10);
digits.push_front(digit);
}
return std::string{ std::begin(digits), std::end(digits) };
}
Similar applies to _cstr
. But beware! Since converternum
in _cstr
gets destructed once the function returns, the pointer returned by c_str()
earlier will dangle. To return a valid char*
, a copy has to be made - and in that case, the memory could be managed directly anyways:
char* operator"" _cstr(unsigned long long int num)
{
std::deque<char> digits;
for(; num != 0; num /= 10)
{
char digit = char{'0'+(num % 10)};
digits.push_front(digit);
}
auto string = new char[digits.size() + 1];
std::copy(std::begin(digits), std::end(digits), string);
string[digits.size()] = '0円';
return string;
}
-
\$\begingroup\$ Thank you, I'm trying to learn more about the standard lib algorithms for these classes and appreciate your help in doing so \$\endgroup\$Shipof123– Shipof1232018年07月13日 01:09:38 +00:00Commented Jul 13, 2018 at 1:09
-
\$\begingroup\$ @Mememyselfandaverycreepy: Generally, I'd suggest returning a
std::unique_ptr<char[]>
, but that isn't a C-style string. If you need C-style strings (e.g. for calling C code), you might want to think about exchangingnew char[...]
formalloc
(C can callfree
, but it doesn't have access todelete
) \$\endgroup\$hoffmale– hoffmale2018年07月13日 01:12:34 +00:00Commented Jul 13, 2018 at 1:12
The other reviews have already covered the most important bits, so I'll just address a few things not already mentioned.
Use your own function
If you have strdup
available, the operator""_cstr
can become a one-liner:
char * operator"" _cstr(unsigned long long int num) {
return strdup(operator""_str(num).c_str());
}
If you don't have strdup
(it's a standard POSIX, but not standard C++ function), one could easily create one or simply write the few lines of code to create the equivalent. Alternatively, one could use the other operator as a base from which to construct a string, but I like this direction better. In fact, I'd probably not provide a cstr
operator at all because of the ugliness it introduces for memory management.
Perform static checking
Instead of using abi::__cxa_demangle()
which is, as you know, non-portable, we can do this:
static_assert(std::is_same<decltype(str), std::string>(),
"str does not match expected type 'std::string'\n");
static_assert(std::is_same<decltype(cstr), char *>(),
"cstr does not match expected type 'char *'\n");
One could even go crazy and define an ugly macro for this:
#define TYPE_CHECK(str, tipo) \
static_assert(std::is_same<decltype(str), tipo>(), \
# str " does not match expected type '" # tipo "'\n")
Then use it as:
TYPE_CHECK(str, std::string);
TYPE_CHECK(cstr, const char *);
As @indi mentioned you can use (const char *)
parameter list. This allows _cstr
to be constexpr
and simplifies it:
constexpr const char * operator"" _cstr(const char * cstr) {
return cstr;
}
constexpr
here helps you to perform static checking:
constexpr bool cstring_equal(const char * lhs, const char * rhs) {
return lhs[0] == rhs[0]
&& (lhs[0] == '0円' || cstring_equal(&lhs[1], &rhs[1]));
}
static_assert(cstring_equal(123_cstr, "123"), "123_cstr doesn't match expected value \"123\"");
Also (const char *)
parameter list allows floating-point: 12.3_cstr
.
auto cstr = "213";
instead ofauto cstr = 213_cstr;
? You save yourself from having to include this library, the 3 characters typed per use, and having to manage the lifetime of the C string. I'm not really seeing the benefit for either C strings or C++ strings. \$\endgroup\$operator ""_str(std::numeric_limits<long long>::digits10)
to get a numeric value in string form. \$\endgroup\$auto operator ""_str(char const* s) { return std::string{s}; }
- that is, take achar const*
parameter rather thanunsigned long long
. \$\endgroup\$