I recently had to work on some conversion of literal strings and wondered if that could be done at compile time using template meta programming. I couldn't find many examples online, so I started playing around to find if I could manage to do it (I'm not a template expert, so it seemed like a good opportunity to learn some things).
I ended up with a two-step process, first storing the converted string into a char array, and then wrapping the array inside a string view. Here is an example applied to a "CamelCase to snake_case" conversion:
#include <array>
#include <stdexcept>
#include <string_view>
#include <tuple>
// Get the size of a string_view once converted to snake_case
constexpr size_t GetSnakeCaseSize(std::string_view str) {
size_t ret = 0;
for (size_t i = 0; i < str.length(); ++i) {
if (i > 0 && str[i] >= 'A' && str[i] <= 'Z') {
ret += 2;
}
else {
ret += 1;
}
}
return ret;
}
// Get an array of snake_case size from an array of string_view
template<size_t N, std::size_t... I>
constexpr std::array<size_t, N> GetSnakeCaseSize(const std::array<std::string_view, N>& a, std::index_sequence<I...>) {
return std::array{GetSnakeCaseSize(a[I])...};
}
template<size_t N>
constexpr std::array<size_t, N> GetSnakeCaseSize(const std::array<std::string_view, N>& a) {
return GetSnakeCaseSize(a, std::make_index_sequence<N>{});
}
// Get a snake_case char array from a string_view
template <size_t N>
constexpr std::array<char, N> ToSnakeCase(std::string_view str) {
// We can't static_assert based on str, so throw instead
if (GetSnakeCaseSize(str) != N) {
throw std::invalid_argument("ToSnakeCase called with wrong output size");
}
std::array<char, N> output{};
size_t index = 0;
for (size_t i = 0; i < str.length(); ++i) {
if (str[i] >= 'A' && str[i] <= 'Z') {
if (i > 0) {
output[index++] = '_';
}
output[index++] = str[i] + 'a' - 'A';
}
else {
output[index++] = str[i];
}
}
return output;
}
// Convert an array of string_view to a tuple of snake_case char arrays
template <size_t N, const std::array<size_t, N>& lengths, std::size_t... Is>
constexpr auto ArrayToSnakeCaseTuple(const std::array<std::string_view, N>& str, std::index_sequence<Is...>) {
return std::make_tuple(ToSnakeCase<lengths[Is]>(str[Is])...);
}
template <size_t N, const std::array<size_t, N>& lengths>
constexpr auto ArrayToSnakeCaseTuple(const std::array<std::string_view, N>& str) {
return ArrayToSnakeCaseTuple<N, lengths>(str, std::make_index_sequence<N>{});
}
// Get a string_view from char array
template <size_t N>
constexpr std::string_view ToStringView(const std::array<char, N>& a) {
return std::string_view(a.data(), N);
}
// Create an array of string_view from a tuple of char arrays
template <typename Tuple, std::size_t... I>
constexpr std::array<std::string_view, std::tuple_size_v<Tuple>> TupleOfArraysToArrayOfStr(const Tuple& t, std::index_sequence<I...>) {
return { ToStringView(std::get<I>(t)) ... };
}
template <typename Tuple>
constexpr auto TupleOfArraysToArrayOfStr(const Tuple& t) {
return TupleOfArraysToArrayOfStr(t, std::make_index_sequence<std::tuple_size_v<Tuple>>{});
}
And usage example:
static constexpr std::array<std::string_view, 3> camel_case_strings = { "Hello", "MyWorld", "!!" };
static constexpr auto converted_lengths = GetSnakeCaseSize(camel_case_strings);
static constexpr auto arrays = ArrayToSnakeCaseTuple<std::size(camel_case_strings), converted_lengths>(camel_case_strings);
static constexpr auto snake_case_strings = TupleOfArraysToArrayOfStr(arrays);
static_assert(std::is_same_v<
const std::array<std::string_view, std::size(camel_case_strings)>,
decltype(snake_case_strings)>
);
static_assert(snake_case_strings[0] == "hello");
static_assert(snake_case_strings[1] == "my_world");
static_assert(snake_case_strings[2] == "!!");
Is there a way to improve/simplify it? I don't think we can get around the necesity of having the snake_case_arrays
stored somewhere as it's where the strings actually live (string_view only being a convenient wrapper around the memory if I understand correctly).
For the intermediate output arrays length, I wish I could just hide it and not have to store it permanently, but I couldn't find a way to make it work properly.
Also, I'm mainly interested in C++17 as that's what I'm most familiar with, but I'd also be curious if there is an easier way to do it with more modern versions of C++.
2 Answers 2
This test can catch more than just upper-case characters, and it can also miss some:
i > 0 && str[i] >= 'A' && str[i] <= 'Z'
In a 8859-1 Latin locale, this fails to catch these upper-case letters: À
Á
Â
Ã
Ä
Å
Æ
Ç
È
É
Ê
Ë
Ì
Í
Î
Ï
Ð
Ñ
Ò
Ó
Ô
Õ
Ö
Ø
Ù
Ú
Û
Ü
Ý
.
And with EBCDIC encodings, it includes these non-letters: {
}
\
and other characters (varying with exact codepage, but typically including accented lower-case letters in the Latin ones).
Similarly, this is not a safe substitute for std::tolower()
:
str[i] + 'a' - 'A'
-
\$\begingroup\$ Yeah but std::tolower is not constexpr so we can't use it here \$\endgroup\$adepierre– adepierre2024年05月23日 16:21:32 +00:00Commented May 23, 2024 at 16:21
-
\$\begingroup\$ No, it's not - you'll need to implement a correct constexpr function yourself for these operations. \$\endgroup\$Toby Speight– Toby Speight2024年05月23日 16:48:49 +00:00Commented May 23, 2024 at 16:48
Unclear identifier:
size_t ret = 0;
What's ret
? Return? As GetSnakeCaseSize()
returns "the size of a string_view once converted to snake_case", why not name this to size
or count
?
Simplify:
if (i > 0 && str[i] >= 'A' && str[i] <= 'Z') {
ret += 2;
}
else {
ret += 1;
}
to:
ret += (i > 0 && std::isupper(static_cast<unsigned char> (str[i]))) + 1;
Or:
ret += (i > 0 && std::isupper(static_cast<unsigned char> (str[i]))) ? 2 : 1;
-
2\$\begingroup\$ isupper doesn't seem to be constexpr from what I can see here en.cppreference.com/w/cpp/string/byte/isupper \$\endgroup\$adepierre– adepierre2024年05月22日 16:57:59 +00:00Commented May 22, 2024 at 16:57
-
1\$\begingroup\$ "Simplify": I don't think the one liner is simpler. I'm not a fan of
bool
toint
conversions in arithmetic and there's just too much happening on this line. \$\endgroup\$anon– anon2024年05月22日 20:43:48 +00:00Commented May 22, 2024 at 20:43 -
\$\begingroup\$ @adepierre You're right, I have removed that part from my answer. \$\endgroup\$Madagascar– Madagascar2024年05月22日 21:20:42 +00:00Commented May 22, 2024 at 21:20
-
\$\begingroup\$ @isanae Well, I added another alternative. \$\endgroup\$Madagascar– Madagascar2024年05月22日 21:21:06 +00:00Commented May 22, 2024 at 21:21
Explore related questions
See similar questions with these tags.