Compile time string manipulation in C++

Question 1

I recently had to work on some conversion of literal strings and wondered if that could be done at compile time using template meta programming. I couldn't find many examples online, so I started playing around to find if I could manage to do it (I'm not a template expert, so it seemed like a good opportunity to learn some things).

I ended up with a two-step process, first storing the converted string into a char array, and then wrapping the array inside a string view. Here is an example applied to a "CamelCase to snake_case" conversion:

#include <array>
#include <stdexcept>
#include <string_view>
#include <tuple>
// Get the size of a string_view once converted to snake_case
constexpr size_t GetSnakeCaseSize(std::string_view str) {
 size_t ret = 0;
 for (size_t i = 0; i < str.length(); ++i) {
 if (i > 0 && str[i] >= 'A' && str[i] <= 'Z') {
 ret += 2;
 }
 else {
 ret += 1;
 }
 }
 return ret;
}
// Get an array of snake_case size from an array of string_view
template<size_t N, std::size_t... I>
constexpr std::array<size_t, N> GetSnakeCaseSize(const std::array<std::string_view, N>& a, std::index_sequence<I...>) {
 return std::array{GetSnakeCaseSize(a[I])...};
}
template<size_t N>
constexpr std::array<size_t, N> GetSnakeCaseSize(const std::array<std::string_view, N>& a) {
 return GetSnakeCaseSize(a, std::make_index_sequence<N>{});
}
// Get a snake_case char array from a string_view
template <size_t N>
constexpr std::array<char, N> ToSnakeCase(std::string_view str) {
 // We can't static_assert based on str, so throw instead
 if (GetSnakeCaseSize(str) != N) {
 throw std::invalid_argument("ToSnakeCase called with wrong output size");
 }
 std::array<char, N> output{};
 size_t index = 0;
 for (size_t i = 0; i < str.length(); ++i) {
 if (str[i] >= 'A' && str[i] <= 'Z') {
 if (i > 0) {
 output[index++] = '_';
 }
 output[index++] = str[i] + 'a' - 'A';
 }
 else {
 output[index++] = str[i];
 }
 }
 return output;
}
// Convert an array of string_view to a tuple of snake_case char arrays
template <size_t N, const std::array<size_t, N>& lengths, std::size_t... Is>
constexpr auto ArrayToSnakeCaseTuple(const std::array<std::string_view, N>& str, std::index_sequence<Is...>) {
 return std::make_tuple(ToSnakeCase<lengths[Is]>(str[Is])...);
}
template <size_t N, const std::array<size_t, N>& lengths>
constexpr auto ArrayToSnakeCaseTuple(const std::array<std::string_view, N>& str) {
 return ArrayToSnakeCaseTuple<N, lengths>(str, std::make_index_sequence<N>{});
}
// Get a string_view from char array
template <size_t N>
constexpr std::string_view ToStringView(const std::array<char, N>& a) {
 return std::string_view(a.data(), N);
}
// Create an array of string_view from a tuple of char arrays
template <typename Tuple, std::size_t... I>
constexpr std::array<std::string_view, std::tuple_size_v<Tuple>> TupleOfArraysToArrayOfStr(const Tuple& t, std::index_sequence<I...>) {
 return { ToStringView(std::get<I>(t)) ... };
}
template <typename Tuple>
constexpr auto TupleOfArraysToArrayOfStr(const Tuple& t) {
 return TupleOfArraysToArrayOfStr(t, std::make_index_sequence<std::tuple_size_v<Tuple>>{});
}

And usage example:

static constexpr std::array<std::string_view, 3> camel_case_strings = { "Hello", "MyWorld", "!!" };
static constexpr auto converted_lengths = GetSnakeCaseSize(camel_case_strings);
static constexpr auto arrays = ArrayToSnakeCaseTuple<std::size(camel_case_strings), converted_lengths>(camel_case_strings);
static constexpr auto snake_case_strings = TupleOfArraysToArrayOfStr(arrays);
static_assert(std::is_same_v<
 const std::array<std::string_view, std::size(camel_case_strings)>,
 decltype(snake_case_strings)>
);
static_assert(snake_case_strings[0] == "hello");
static_assert(snake_case_strings[1] == "my_world");
static_assert(snake_case_strings[2] == "!!");

Live version

Is there a way to improve/simplify it? I don't think we can get around the necesity of having the snake_case_arrays stored somewhere as it's where the strings actually live (string_view only being a convenient wrapper around the memory if I understand correctly). For the intermediate output arrays length, I wish I could just hide it and not have to store it permanently, but I couldn't find a way to make it work properly.

Also, I'm mainly interested in C++17 as that's what I'm most familiar with, but I'd also be curious if there is an easier way to do it with more modern versions of C++.

Question 2

This test can catch more than just upper-case characters, and it can also miss some:

i > 0 && str[i] >= 'A' && str[i] <= 'Z'

In a 8859-1 Latin locale, this fails to catch these upper-case letters: À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý.

And with EBCDIC encodings, it includes these non-letters: { } \ and other characters (varying with exact codepage, but typically including accented lower-case letters in the Latin ones).

Similarly, this is not a safe substitute for std::tolower():

str[i] + 'a' - 'A'

Question 3

Yeah but std::tolower is not constexpr so we can't use it here

Question 4

No, it's not - you'll need to implement a correct constexpr function yourself for these operations.

Question 5

Unclear identifier:

size_t ret = 0;

What's ret? Return? As GetSnakeCaseSize() returns "the size of a string_view once converted to snake_case", why not name this to size or count?

Simplify:

if (i > 0 && str[i] >= 'A' && str[i] <= 'Z') {
 ret += 2;
} 
else {
 ret += 1;
}

to:

ret += (i > 0 && std::isupper(static_cast<unsigned char> (str[i]))) + 1;

Or:

ret += (i > 0 && std::isupper(static_cast<unsigned char> (str[i]))) ? 2 : 1;

Question 6

isupper doesn't seem to be constexpr from what I can see here en.cppreference.com/w/cpp/string/byte/isupper

Question 7

"Simplify": I don't think the one liner is simpler. I'm not a fan of bool to int conversions in arithmetic and there's just too much happening on this line.

Question 8

@adepierre You're right, I have removed that part from my answer.

Question 9

@isanae Well, I added another alternative.

score 4 · Answer 1 · 2024-05-22 15:08:12Z

This test can catch more than just upper-case characters, and it can also miss some:

i > 0 && str[i] >= 'A' && str[i] <= 'Z'

In a 8859-1 Latin locale, this fails to catch these upper-case letters: À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý.

And with EBCDIC encodings, it includes these non-letters: { } \ and other characters (varying with exact codepage, but typically including accented lower-case letters in the Latin ones).

Similarly, this is not a safe substitute for std::tolower():

str[i] + 'a' - 'A'

Yeah but std::tolower is not constexpr so we can't use it here
No, it's not - you'll need to implement a correct constexpr function yourself for these operations.

score 2 · Answer 2 · 2024-05-22 12:34:33Z

2

\$\begingroup\$

Unclear identifier:

size_t ret = 0;

What's ret? Return? As GetSnakeCaseSize() returns "the size of a string_view once converted to snake_case", why not name this to size or count?

Simplify:

if (i > 0 && str[i] >= 'A' && str[i] <= 'Z') {
 ret += 2;
} 
else {
 ret += 1;
}

to:

ret += (i > 0 && std::isupper(static_cast<unsigned char> (str[i]))) + 1;

Or:

ret += (i > 0 && std::isupper(static_cast<unsigned char> (str[i]))) ? 2 : 1;

Share

edited May 22, 2024 at 21:20

answered May 22, 2024 at 12:34

Madagascar's user avatar

Madagascar MadagascarMadagascar

10.2k1 gold badge15 silver badges51 bronze badges

\$\endgroup\$

4

2

\$\begingroup\$ isupper doesn't seem to be constexpr from what I can see here en.cppreference.com/w/cpp/string/byte/isupper \$\endgroup\$

adepierre
– adepierre

2024年05月22日 16:57:59 +00:00
Commented May 22, 2024 at 16:57
1

\$\begingroup\$ "Simplify": I don't think the one liner is simpler. I'm not a fan of bool to int conversions in arithmetic and there's just too much happening on this line. \$\endgroup\$

anon
– anon

2024年05月22日 20:43:48 +00:00
Commented May 22, 2024 at 20:43
\$\begingroup\$ @adepierre You're right, I have removed that part from my answer. \$\endgroup\$

Madagascar
– Madagascar

2024年05月22日 21:20:42 +00:00
Commented May 22, 2024 at 21:20
\$\begingroup\$ @isanae Well, I added another alternative. \$\endgroup\$

Madagascar
– Madagascar

2024年05月22日 21:21:06 +00:00
Commented May 22, 2024 at 21:21

Add a comment |

Stack Exchange Network

Compile time string manipulation in C++

2 Answers 2

Unclear identifier:

Simplify:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Compile time string manipulation in C++

2 Answers 2

Unclear identifier:

Simplify:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions