- 16.5k
- 2
- 25
- 65
Extension
Implicit string views
I want to demonstrate how to solve a problem that comes up all the time when trying to write string algorithms... and also came up here, but I didn’t notice it in the original review.
Here is the problem: You want to write a function that can work with strings—or more specifically, views of strings, because you only want to inspect the string data, not take it or change it. For plain old char
strings, this is trivial:
auto f(std::string_view sv) { ... }
This will automatically work for anything string-like... provided the character type is char
:
f("NUL terminated char arrays work");
f("std::strings work"s);
f("std::string_views work (obv)"sv);
f(my_custom_string_type{"works so long as it is implicitly convertible to std::string_view (which anything 'string-like' should be)"});
Great!
Okay, but now you want to support all character types, not just char
. So you naturally do this:
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
Only now... everything breaks. Not only do non char
strings (still) not work (except for specializations of basic_string_view
)... even the char
strings that worked before no longer work:
f("nope");
f("nope"s);
f("well, obviously this works"sv);
f(my_custom_string_type{"nope"});
Your compiler will probably tell you that it could not find a matching function. The only candidate requires a std::basic_string_view<Char>
(and then it would deduce Char
), and even though everything you threw at it will implicitly convert to a std::basic_string_view<Char>
, they are not, in fact, actuallystd::basic_string_view<Char>
.
So how to solve this problem?
As always, the solution is just another level of indirection:
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<typename S>
auto f(S&& s) { return f(std::basic_string_view<???>{std::forward<S>(s)}); }
Putting aside the question of what the ???
for a moment, this works. Anything string-like, with any character type, will either be converted to a string view... or will be a string view... so there will be no problem finding a match.
But this is hacky, to say the least. So let’s tighten things up a bit.
The first issue is what the ???
should be. How do you get the character type from... literally anything?
Well... you don’t. You just let std::basic_string_view
do the work. There are three possible cases here:
- The type is something that
std::basic_string_view<Char>
can be constructed from directly... which is specifically justChar const*
. In that case,Char
is obvious. - The type is something that
std::basic_string_view<Char>
has a converting constructor for... specifically as a contiguous character range. In that case,std::basic_string_view<Char>
has a deduction guide that deducesChar
. - The type is something that has a conversion operator to
std::basic_string_view<Char>
... that is, anything string-like that isn’t already a contiguous range. In that case, the type itself will tell you whatChar
is via its conversion operator.
In any case, we can let class template argument deduction (CTAD) do the work:
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<typename S>
auto f(S&& s) { return f(std::basic_string_view{std::forward<S>(s)}); }
But this is still not so great, because the second overload claims it takes literally everything. We clearly need to constrain it a bit.
What we need is a "string-like" concept:
template<typename T>
concept string_like = ???;
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<string_like S>
auto f(S&& s) { return f(std::basic_string_view{std::forward<S>(s)}); }
Okay, but how do we define "string-like"?
Well, anything that is implicitly convertible to a basic_string_view
must be string-like. So:
template<typename T>
concept string_like = std::convertible_to<T, std::basic_string_view<???>>;
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<string_like S>
auto f(S&& s) { return f(std::basic_string_view{std::forward<S>(s)}); }
But again we have the problem... what do we put in the ???
?
This is a problem we already solved! We don’t put anything there. We just let CTAD do its thing:
template<typename T>
concept string_like = std::convertible_to<T, decltype(std::basic_string_view{std::declval<T>()})>;
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<string_like S>
auto f(S&& s) { return f(std::basic_string_view{std::forward<S>(s)}); }
The concept works in three steps:
- Use CTAD to deduce a string view type for the given type.
- Use
decltype
to get the type that was deduced. - Make sure
T
is implicitly convertible to that kind of string view.
(We ultimately do an explicit conversion in the function itself, but we require that the type is implicitly convertible. That is on purpose. Anything that is only explicitly convertible to a string view may not be all that string-like. But anything that implicitly converts to a string view must be string-like.)
string_like
or something like it is definitely something that should be in your toolkit—or at least you should keep the technique in mind for when you need it.
In the specific case here, it would look like this:
template<typename T>
concept string_like = std::convertible_to<T, decltype(std::basic_string_view{std::declval<T>()})>;
template<typename Char>
constexpr auto are_isomorphic(std::basic_string_view<Char> s1, std::basic_string_view<Char> s2) { ... }
template<string_like S1, string_like S2>
constexpr auto are_isomorphic(S1&& s1, S2&& s2)
{
return are_isomorphic(std::basic_string_view{std::forward<S1>(s1)}, std::basic_string_view{std::forward<S2>(s2)});
}
Because there are multiple strings involved here, you may also want to confirm that they both have the same character type.
For that, you would need a type trait, like so:
template<string_like S>
using string_char_t = typename decltype(std::basic_string_view{std::declval<S>()})::value_type;
Which you would use to constrain the generic template:
template<string_like S1, string_like S2>
requires std::same_as<string_char_t<S1>, string_char_t<S2>>
constexpr auto are_isomorphic(S1&& s1, S2&& s2)
{
return are_isomorphic(std::basic_string_view{std::forward<S1>(s1)}, std::basic_string_view{std::forward<S2>(s2)});
}
And now you’re off to the races .
Avoiding linear search
As suggested by @PeterCordes in the comments, if the character size is small enough, you could avoid linear searches for matches by using indexing instead. There is one little wrinkle, which is why this was worth extending the review for, but it’s otherwise pretty straightforward. The following is not at all hand-optimized, but any decent optimizing compiler should have an easy time with it.
template<typename Char>
// Specialize if the bit size of the type is 8 (or less, I suppose).
//
// This is perfectly safe because any type that is legit for use as a
// character in a basic_string_view *must* have a numeric_limits
// specialization.
requires ((std::numeric_limits<Char>::digits + std::numeric_limits<Char>::is_signed) >= 8uz)
constexpr auto are_isomorphic(
std::basic_string_view<Char> s,
std::basic_string_view<Char> z) -> bool
{
// Get the easy check out of the way first.
if (s.length() != z.length())
return false;
constexpr auto size = std::numeric_limits<std::make_unsigned_t<Char>>::max();
auto mapping = std::array<Char, size>{};
auto seen = std::bitset<size>{};
return std::ranges::all_of(std::views::zip(s, z),
[&mapping, &seen](auto&& t)
{
auto const [cs, cz] = t;
auto const i = static_cast<std::make_unsigned_t<Char>>(cs);
// If we've already seen the character, and it is *not* mapped to
// its partner, then we're done.
//
// Otherwise, record the mapping.
if (seen.test(i) and mapping[i] != cz)
{
return false;
}
else
{
mapping[i] = cz;
seen.set(i);
}
// This is the only tricky part!
//
// If the target character is "zero", then we can't just search
// for it in the mapping... because "zero" will be the value set
// for every character we have not yet seen.
//
// When the target character is *not* zero, we just do the easy
// thing.
//
// When the target character *is* zero, we need to loop through
// all seen mappings, and check them. Unfortunate, but without a
// sentinel value, it's probably the best we can do. And we can
// how that "zero" values are vanishingly rare in strings (though
// undeniably not so in generic data).
if (cz != Char{})
{
if (auto const p = std::ranges::find(mapping, cz); p != std::ranges::end(mapping))
{
auto const found_index = std::ranges::distance(std::ranges::begin(mapping), p);
// In theory we should test that the mapping we just found
// is actually a seen mapping.
//
// However, because we have special-cased "zero", the only
// way there can be a match is if this *is* a seen
// mapping.
//
// If we did not zero-fill the mapping at the beginning,
// it would have garbage data in it, and so then we would
// have to do the check. But we did zero-fill the mapping,
// and cz is not "zero" in this branch, so... we're cool.
//
// So there is only one thing left to check:
if (i != found_index)
return false;
}
}
else
{
// So cz is "zero". That means we can't just search for it in
// the mapping, because the mapping holds "zero" for every
// unseen pair.
//
// So instead we loop through the seen bitset, and for every
// mapping that has been seen - other than the one we found or
// created earlier - we check to see if the mapped value is
// cz. If it is, then the strings can't be isomorphic.
for (auto n = 0uz; n != seen.size(); ++n)
{
if (i != n and seen.test(n))
{
if (index[n] == cz)
return false;
}
}
}
}
);
}
Can this be parallelized or, dare we hope, vectorized?
Maybe!
What you could do is split long strings up into chunks, and for each chunk, generate the array<Char, size>
mapping and bitset<size>
seen table, then merge them all.
Merging the seen tables is trivial; it’s just merged = seen1 | seen2
.
Merging the mapping is a bit trickier. What you want to do is when the mapped element does not match, check that the seen flag is only set in one of the two seen tables—that is, for every position n
where mapping1[n] != mapping2[n]
, seen1.test(n) ^ seen2.test(n)
must be true
. When the element does match... no biggie; either it was the same mapped pair, or both were never seen (and thus both still "zero"), or one was seen and it happened to be "zero" (which happens to match the unseen case)... all three cases are fine. I’m not really a SIMD guy, but the operations I just described—producing a mask where the mappings don’t match, and then getting their indices and doing the XOR tests with those indices—sure seem doable.
But all through this section we have not asked the Goldblum question . Sure, this can be done... but...
...
... should it?
In my opinion? No.
Let’s be real. This function, while useful and possibly used in some utility or another, it hardly a massively important, linchpin operation. It probably won’t be called thousands or millions of times a second. And it probably won’t be called on strings that are tens of thousands or millions of characters long.
Even in the situation where everything is stacked in such away as to give the best possible advantage to the algorithm that avoids the linear search as opposed to the indexing, you’re literally just talking about avoiding the cost of iterating two bytes at a time over 512 bytes (so 256 iterations) that are always right there, hot in the closest cache. And that is the absolute worst case. In the average case, you’d only be iterating two bytes at a time over 256 bytes (128 iterations), and in the expected realistic case, your strings will usually not be 100+ characters long... so the actual amount of iterating will be much, much smaller. I’m actually skeptical that any cost difference would even be measurable above noise. (And remember that even the indexed solution might need to loop too! Granted, it is far less likely, because NUL
does not usually appear in strings, but there is actually a theoretically possible worst case with string data that is just a million NUL
characters long. The indexed solution would have to do 256 loop iterations for every character, while the linear search would only ever do one.)
Yes, sure, an algorithm should be as efficient as reasonably possible... but the key word there is "reasonably".
I get the allure of being nerd sniped by a deep, low-level optimization problem that could even get you into managing individual registers and flags and farting around with vector operations. But I am not a scientist or mathematician; I am an engineer. And as an engineer, I can’t allow myself to be blinded by the shininess of the problem, because there are important things that are being forgotten by the kinds of people who want to drop everything and dive into problems like this.
Probably the biggest one, is this: nothing is free. Every tiny little bit of extra complication or complexity you add to a problem brings with it an order of magnitude increase in the cost of making your solution real. And by real I mean: actually coded... AND FULLY TESTED... so that it can actually be used in real-world systems.
If you write a single solution to the problem—either like the original code, or any of the versions offered in the reviews—then you need a battery of tests to verify it. So long as you have only a single solution, that same battery of tests should suffice... meaning that you can freely refactor and experiment with tweaks to try and improve the efficiency, without increasing the maintenance burden. You would have working code, that is verifiably tested, and can be improved.
But the moment you add a second solution—one that special-cases for some factor—you not only double the maintenance burden—because now you need a second battery of tests to exercise the new solution—you actually more than double it—because you also need tests to confirm that the second solution is being selected when you think it is, and is not being selected when you don’t want it to be.
And that is the thing that too many people fail to consider: any additional complexity you add to the solution... even if it may increase the efficiency of the actual code... always requires more testing... and that testing very often costs far, far more than any tiny gains you get from twiddling bits in individual registers, however satisfying that may be. I am perpetually horrified by how few people think of testing at all, or at best give it only a cursory thought. As far as I am concerned, any code that is not properly tested is garbage. I don’t care how sexy or efficient it is; if it has not been verified by a battery of tests, it is absolutely useless. I cannot put it in any project I am responsible for. Only a truly incompetent engineer ships untested code. (And yes, I am straight up calling out the 90+% of coders out there who never test their code... many of whom don’t even know how!)
Of course, if you don’t care about producing code for real-world systems... if you’re actually just in it for the intellectual thrill... then by all means, twiddle bits as much as you please. But if you care about being a real, profession programmer, who produces code that gets used in real-world systems, then you might need to tamp down that nerd lust to fart around with a problem until you’ve squeezed every interesting bit out of it, and instead of sexiness or shininess or coolness, make robustness and proven reliability the thing you focus on.
And yes yes yes of course, there are cases where it is worthwhile, and maybe even necessary, to squeeze every last bit of performance out of some bit of code, and where special-casing is not only useful, but possibly even required to get that performance in the key use cases. That is not generally the case, but yes, there are some functions that are so important, and called so often, that they should be hyper-optimized. But, again, if you want to make real-world code, you need to be able to identify when that is the case... and when it is not.
This case? Sure seems like it’s a "not".
My advice is that your time as a programmer is far better spent on writing (and learning how to write) proper tests for functions like this, than on dicking around with hyper-optimizing it by special-casing. That skill—properly testing and validating code—is worth a billion times more than whatever you might gain counting bytes and cycles. You could literally spend years hyper-fixating on making this function as fast as possible in every possible case... all time which will otherwise be effectively wasted. Meanwhile, every minute you spend doing testing and other validation of whatever code you have now will pay off in piles, because not only will the skills themselves be immediately valuable and transferable... you will also have made this function usable in real-world projects that require verified code (even if you never actually use it).
Don’t get nerd sniped by esoteric crap. Focus on reliability and testing. It is and always will be far more important, and useful, to make code that works... provably works... than to make code that is a few nanoseconds faster.
Extension
Implicit string views
I want to demonstrate how to solve a problem that comes up all the time when trying to write string algorithms... and also came up here, but I didn’t notice it in the original review.
Here is the problem: You want to write a function that can work with strings—or more specifically, views of strings, because you only want to inspect the string data, not take it or change it. For plain old char
strings, this is trivial:
auto f(std::string_view sv) { ... }
This will automatically work for anything string-like... provided the character type is char
:
f("NUL terminated char arrays work");
f("std::strings work"s);
f("std::string_views work (obv)"sv);
f(my_custom_string_type{"works so long as it is implicitly convertible to std::string_view (which anything 'string-like' should be)"});
Great!
Okay, but now you want to support all character types, not just char
. So you naturally do this:
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
Only now... everything breaks. Not only do non char
strings (still) not work (except for specializations of basic_string_view
)... even the char
strings that worked before no longer work:
f("nope");
f("nope"s);
f("well, obviously this works"sv);
f(my_custom_string_type{"nope"});
Your compiler will probably tell you that it could not find a matching function. The only candidate requires a std::basic_string_view<Char>
(and then it would deduce Char
), and even though everything you threw at it will implicitly convert to a std::basic_string_view<Char>
, they are not, in fact, actuallystd::basic_string_view<Char>
.
So how to solve this problem?
As always, the solution is just another level of indirection:
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<typename S>
auto f(S&& s) { return f(std::basic_string_view<???>{std::forward<S>(s)}); }
Putting aside the question of what the ???
for a moment, this works. Anything string-like, with any character type, will either be converted to a string view... or will be a string view... so there will be no problem finding a match.
But this is hacky, to say the least. So let’s tighten things up a bit.
The first issue is what the ???
should be. How do you get the character type from... literally anything?
Well... you don’t. You just let std::basic_string_view
do the work. There are three possible cases here:
- The type is something that
std::basic_string_view<Char>
can be constructed from directly... which is specifically justChar const*
. In that case,Char
is obvious. - The type is something that
std::basic_string_view<Char>
has a converting constructor for... specifically as a contiguous character range. In that case,std::basic_string_view<Char>
has a deduction guide that deducesChar
. - The type is something that has a conversion operator to
std::basic_string_view<Char>
... that is, anything string-like that isn’t already a contiguous range. In that case, the type itself will tell you whatChar
is via its conversion operator.
In any case, we can let class template argument deduction (CTAD) do the work:
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<typename S>
auto f(S&& s) { return f(std::basic_string_view{std::forward<S>(s)}); }
But this is still not so great, because the second overload claims it takes literally everything. We clearly need to constrain it a bit.
What we need is a "string-like" concept:
template<typename T>
concept string_like = ???;
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<string_like S>
auto f(S&& s) { return f(std::basic_string_view{std::forward<S>(s)}); }
Okay, but how do we define "string-like"?
Well, anything that is implicitly convertible to a basic_string_view
must be string-like. So:
template<typename T>
concept string_like = std::convertible_to<T, std::basic_string_view<???>>;
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<string_like S>
auto f(S&& s) { return f(std::basic_string_view{std::forward<S>(s)}); }
But again we have the problem... what do we put in the ???
?
This is a problem we already solved! We don’t put anything there. We just let CTAD do its thing:
template<typename T>
concept string_like = std::convertible_to<T, decltype(std::basic_string_view{std::declval<T>()})>;
template<typename Char>
auto f(std::basic_string_view<Char> sv) { ... }
template<string_like S>
auto f(S&& s) { return f(std::basic_string_view{std::forward<S>(s)}); }
The concept works in three steps:
- Use CTAD to deduce a string view type for the given type.
- Use
decltype
to get the type that was deduced. - Make sure
T
is implicitly convertible to that kind of string view.
(We ultimately do an explicit conversion in the function itself, but we require that the type is implicitly convertible. That is on purpose. Anything that is only explicitly convertible to a string view may not be all that string-like. But anything that implicitly converts to a string view must be string-like.)
string_like
or something like it is definitely something that should be in your toolkit—or at least you should keep the technique in mind for when you need it.
In the specific case here, it would look like this:
template<typename T>
concept string_like = std::convertible_to<T, decltype(std::basic_string_view{std::declval<T>()})>;
template<typename Char>
constexpr auto are_isomorphic(std::basic_string_view<Char> s1, std::basic_string_view<Char> s2) { ... }
template<string_like S1, string_like S2>
constexpr auto are_isomorphic(S1&& s1, S2&& s2)
{
return are_isomorphic(std::basic_string_view{std::forward<S1>(s1)}, std::basic_string_view{std::forward<S2>(s2)});
}
Because there are multiple strings involved here, you may also want to confirm that they both have the same character type.
For that, you would need a type trait, like so:
template<string_like S>
using string_char_t = typename decltype(std::basic_string_view{std::declval<S>()})::value_type;
Which you would use to constrain the generic template:
template<string_like S1, string_like S2>
requires std::same_as<string_char_t<S1>, string_char_t<S2>>
constexpr auto are_isomorphic(S1&& s1, S2&& s2)
{
return are_isomorphic(std::basic_string_view{std::forward<S1>(s1)}, std::basic_string_view{std::forward<S2>(s2)});
}
And now you’re off to the races .
Avoiding linear search
As suggested by @PeterCordes in the comments, if the character size is small enough, you could avoid linear searches for matches by using indexing instead. There is one little wrinkle, which is why this was worth extending the review for, but it’s otherwise pretty straightforward. The following is not at all hand-optimized, but any decent optimizing compiler should have an easy time with it.
template<typename Char>
// Specialize if the bit size of the type is 8 (or less, I suppose).
//
// This is perfectly safe because any type that is legit for use as a
// character in a basic_string_view *must* have a numeric_limits
// specialization.
requires ((std::numeric_limits<Char>::digits + std::numeric_limits<Char>::is_signed) >= 8uz)
constexpr auto are_isomorphic(
std::basic_string_view<Char> s,
std::basic_string_view<Char> z) -> bool
{
// Get the easy check out of the way first.
if (s.length() != z.length())
return false;
constexpr auto size = std::numeric_limits<std::make_unsigned_t<Char>>::max();
auto mapping = std::array<Char, size>{};
auto seen = std::bitset<size>{};
return std::ranges::all_of(std::views::zip(s, z),
[&mapping, &seen](auto&& t)
{
auto const [cs, cz] = t;
auto const i = static_cast<std::make_unsigned_t<Char>>(cs);
// If we've already seen the character, and it is *not* mapped to
// its partner, then we're done.
//
// Otherwise, record the mapping.
if (seen.test(i) and mapping[i] != cz)
{
return false;
}
else
{
mapping[i] = cz;
seen.set(i);
}
// This is the only tricky part!
//
// If the target character is "zero", then we can't just search
// for it in the mapping... because "zero" will be the value set
// for every character we have not yet seen.
//
// When the target character is *not* zero, we just do the easy
// thing.
//
// When the target character *is* zero, we need to loop through
// all seen mappings, and check them. Unfortunate, but without a
// sentinel value, it's probably the best we can do. And we can
// how that "zero" values are vanishingly rare in strings (though
// undeniably not so in generic data).
if (cz != Char{})
{
if (auto const p = std::ranges::find(mapping, cz); p != std::ranges::end(mapping))
{
auto const found_index = std::ranges::distance(std::ranges::begin(mapping), p);
// In theory we should test that the mapping we just found
// is actually a seen mapping.
//
// However, because we have special-cased "zero", the only
// way there can be a match is if this *is* a seen
// mapping.
//
// If we did not zero-fill the mapping at the beginning,
// it would have garbage data in it, and so then we would
// have to do the check. But we did zero-fill the mapping,
// and cz is not "zero" in this branch, so... we're cool.
//
// So there is only one thing left to check:
if (i != found_index)
return false;
}
}
else
{
// So cz is "zero". That means we can't just search for it in
// the mapping, because the mapping holds "zero" for every
// unseen pair.
//
// So instead we loop through the seen bitset, and for every
// mapping that has been seen - other than the one we found or
// created earlier - we check to see if the mapped value is
// cz. If it is, then the strings can't be isomorphic.
for (auto n = 0uz; n != seen.size(); ++n)
{
if (i != n and seen.test(n))
{
if (index[n] == cz)
return false;
}
}
}
}
);
}
Can this be parallelized or, dare we hope, vectorized?
Maybe!
What you could do is split long strings up into chunks, and for each chunk, generate the array<Char, size>
mapping and bitset<size>
seen table, then merge them all.
Merging the seen tables is trivial; it’s just merged = seen1 | seen2
.
Merging the mapping is a bit trickier. What you want to do is when the mapped element does not match, check that the seen flag is only set in one of the two seen tables—that is, for every position n
where mapping1[n] != mapping2[n]
, seen1.test(n) ^ seen2.test(n)
must be true
. When the element does match... no biggie; either it was the same mapped pair, or both were never seen (and thus both still "zero"), or one was seen and it happened to be "zero" (which happens to match the unseen case)... all three cases are fine. I’m not really a SIMD guy, but the operations I just described—producing a mask where the mappings don’t match, and then getting their indices and doing the XOR tests with those indices—sure seem doable.
But all through this section we have not asked the Goldblum question . Sure, this can be done... but...
...
... should it?
In my opinion? No.
Let’s be real. This function, while useful and possibly used in some utility or another, it hardly a massively important, linchpin operation. It probably won’t be called thousands or millions of times a second. And it probably won’t be called on strings that are tens of thousands or millions of characters long.
Even in the situation where everything is stacked in such away as to give the best possible advantage to the algorithm that avoids the linear search as opposed to the indexing, you’re literally just talking about avoiding the cost of iterating two bytes at a time over 512 bytes (so 256 iterations) that are always right there, hot in the closest cache. And that is the absolute worst case. In the average case, you’d only be iterating two bytes at a time over 256 bytes (128 iterations), and in the expected realistic case, your strings will usually not be 100+ characters long... so the actual amount of iterating will be much, much smaller. I’m actually skeptical that any cost difference would even be measurable above noise. (And remember that even the indexed solution might need to loop too! Granted, it is far less likely, because NUL
does not usually appear in strings, but there is actually a theoretically possible worst case with string data that is just a million NUL
characters long. The indexed solution would have to do 256 loop iterations for every character, while the linear search would only ever do one.)
Yes, sure, an algorithm should be as efficient as reasonably possible... but the key word there is "reasonably".
I get the allure of being nerd sniped by a deep, low-level optimization problem that could even get you into managing individual registers and flags and farting around with vector operations. But I am not a scientist or mathematician; I am an engineer. And as an engineer, I can’t allow myself to be blinded by the shininess of the problem, because there are important things that are being forgotten by the kinds of people who want to drop everything and dive into problems like this.
Probably the biggest one, is this: nothing is free. Every tiny little bit of extra complication or complexity you add to a problem brings with it an order of magnitude increase in the cost of making your solution real. And by real I mean: actually coded... AND FULLY TESTED... so that it can actually be used in real-world systems.
If you write a single solution to the problem—either like the original code, or any of the versions offered in the reviews—then you need a battery of tests to verify it. So long as you have only a single solution, that same battery of tests should suffice... meaning that you can freely refactor and experiment with tweaks to try and improve the efficiency, without increasing the maintenance burden. You would have working code, that is verifiably tested, and can be improved.
But the moment you add a second solution—one that special-cases for some factor—you not only double the maintenance burden—because now you need a second battery of tests to exercise the new solution—you actually more than double it—because you also need tests to confirm that the second solution is being selected when you think it is, and is not being selected when you don’t want it to be.
And that is the thing that too many people fail to consider: any additional complexity you add to the solution... even if it may increase the efficiency of the actual code... always requires more testing... and that testing very often costs far, far more than any tiny gains you get from twiddling bits in individual registers, however satisfying that may be. I am perpetually horrified by how few people think of testing at all, or at best give it only a cursory thought. As far as I am concerned, any code that is not properly tested is garbage. I don’t care how sexy or efficient it is; if it has not been verified by a battery of tests, it is absolutely useless. I cannot put it in any project I am responsible for. Only a truly incompetent engineer ships untested code. (And yes, I am straight up calling out the 90+% of coders out there who never test their code... many of whom don’t even know how!)
Of course, if you don’t care about producing code for real-world systems... if you’re actually just in it for the intellectual thrill... then by all means, twiddle bits as much as you please. But if you care about being a real, profession programmer, who produces code that gets used in real-world systems, then you might need to tamp down that nerd lust to fart around with a problem until you’ve squeezed every interesting bit out of it, and instead of sexiness or shininess or coolness, make robustness and proven reliability the thing you focus on.
And yes yes yes of course, there are cases where it is worthwhile, and maybe even necessary, to squeeze every last bit of performance out of some bit of code, and where special-casing is not only useful, but possibly even required to get that performance in the key use cases. That is not generally the case, but yes, there are some functions that are so important, and called so often, that they should be hyper-optimized. But, again, if you want to make real-world code, you need to be able to identify when that is the case... and when it is not.
This case? Sure seems like it’s a "not".
My advice is that your time as a programmer is far better spent on writing (and learning how to write) proper tests for functions like this, than on dicking around with hyper-optimizing it by special-casing. That skill—properly testing and validating code—is worth a billion times more than whatever you might gain counting bytes and cycles. You could literally spend years hyper-fixating on making this function as fast as possible in every possible case... all time which will otherwise be effectively wasted. Meanwhile, every minute you spend doing testing and other validation of whatever code you have now will pay off in piles, because not only will the skills themselves be immediately valuable and transferable... you will also have made this function usable in real-world projects that require verified code (even if you never actually use it).
Don’t get nerd sniped by esoteric crap. Focus on reliability and testing. It is and always will be far more important, and useful, to make code that works... provably works... than to make code that is a few nanoseconds faster.
namespace io::github::coderodde::string::utils {
I know this is ported from a Java solution, but in C++ we don’t really use the "reversed domain name" notation for namespacing. Because of that, things that would probably not be a problem in Java become problematic. In particular, the top-level identifier io
is extremely likely to be to used for something else. (Also, what would you do if you had a .int
domain?)
If you really must use the reversed domain name for your namespace, the better way to do it in C++ would be:
namespace io_github_coderodde::string::utils {
But frankly, there is pretty much zero chance of just coderodde
colliding with anything else, so I’d just do namespace coderodde::string::utils
. (I’d probably also ditch the utils
sub-namespace, because what else would be in a string
namespace other than string utilities? String algorithms and/or string concepts?... sure, but are you really going to need separate namespaces for those?)
template <typename Char>
constexpr bool are_isomorphic(const std::basic_string_view<Char>& s,
const std::basic_string_view<Char>& z) {
Really not a fan of indenting at the namespace level, not least because it will change depending on how you write the namespace:
namespace io::github::coderodde::string::utils {
// indent one level, but...
namespace io {
namespace github {
namespace coderodde {
namespace string {
namespace utils {
// indent five levels? for the exact same thing?
You may argue that it would be silly to break up a namespace preamble as above, and for that extreme case it would be, but in reality it often happens organically. Very often you are in a namespace and have to open and close a sub-namespace—like a detail namespace—or close the current namespace to refer to something higher up—like what you would have to do to add std::hash
support to std::chrono::duration
.
It also just wastes at least one indentation level’s worth of horizontal space, for no good reason, because all code should be in a namespace.
Anywho, as I’m sure you’re aware, basic_string_view
has two template parameters: the character and the character traits. If you want to ignore the character traits, that’s fine—I wouldn’t blame you, they’re a bad idea that should be obsoleted—but you should at least document that choice.
And finally, never take views by reference, const
or otherwise. That is not what views are about. Views are defined to be lightweight and copyable references to other data. Re-referencing them is redundant.
std::unordered_map<Char, Char> mapping1;
std::unordered_map<Char, Char> mapping2;
This strikes me as a pretty good case study of something I observe in a lot of computer science-y types: for them, the big-O asymptotic behaviour of an algorithm is everything, to the point where it completely blinds the coder to some otherwise pretty obvious realities. I mean, "obviously" a hash map will be faster than a non-hashed map, because the former has \$O(1)\$ lookup while the latter has \$O(\log n)\$, right? Right?
No. Not right.
Think about it. What do you think the hash of a character—any character, of any character type that would be legal to use in a basic_string_view
—would be?
Would it not be... just the character?
That means the hash is "perfect" and free, but it also means that you are paying for all the hashing machinery and bookkeeping for no reason at all. Without even thinking about the structure of a non-hashed map versus a hash map, it should be painfully obvious that if a hash is just "the thing", then a plain map of "the thing" will be more efficient than something that has to have all the extra hashing machinery.
So even without leaving the theoretical domain, it should be pretty obvious that a map without hashing will be more efficient than a map with hashing, when the hash function is the identity operation.
But now we need to leave the theoretical realm and consider the reality of modern computer hardware, the expected sizes of the data you will be working with this function, and what you are actually doing with that data.
There is a very famous experiment that Bjarne Stroustrup did, and mentioned in a talk years ago, where he measured the performance of randomly inserting data into a sorted std::list
versus a sorted std::vector
, and then randomly removing elements from them. (Note: even though both containers were sorted, I believe Stroustrup still used linear searches, rather than binary searches, just to be fair to std::list
.) Now, you’d probably think—as Stroustrup did—that while std::vector
might dominate with small data sets, surely std::list
with its \$O(1)\$ insertion and removal will outperform std::vector
... which requires shifting all subsequent elements anytime anything is inserted or removed in the middle. But it turns out that std::vector
wins, even up to data sets with hundreds of thousands of elements.
Why? Because of modern computer architecture. Modern computers basically always have caches... often multiple levels of caches, and often caches that are huge, measured in several megabytes. They are also often specially tuned to access data sequentially, using prefetch to get data that it can expect will be accessed next given that you have accessed the previous data in order. Put those facts together, and even though node-based accesses technically have "better" big-O performance, the hidden constants that come from cache miss costs are so huge that simple linear access wins out for virtually all real-world data set sizes.
And let’s think about what you’re actually doing here... you are doing two searches... through two totally different node-based containers. Even if each of those were faster than a simple linear search through a vector—which, again, Stroustrup’s experiment showed it is not, for data set sizes up into the hundreds of thousands—you’re doing two of them. (Actually, in same cases you’re doing up to six of them. More on that shortly.) And given that you can do a single pass through the mapping... yeah, using node-based containers really doesn’t make a lot of sense.
If you doubt that, then by all means, profile. But I would bet that for any size string you would ever expect to see, a simple, flat std::vector
mapping will be much faster.
The algorithm becomes simpler, too! And you can reserve if you want to, to pay only once for an allocation, rather than for every character (twice!!!).
template <typename Char>
constexpr auto are_isomorphic(
std::basic_string_view<Char> s,
std::basic_string_view<Char> z) -> bool
{
if (s.length() != z.length())
return false;
auto mapping = std::vector<std::pair<Char, Char>>{};
// You can reserve here if you want. The tough part is knowing how much
// to reserve. Reserving s.length() will almost certainly be an
// over-estimate. I suppose you could use a heuristic, which you can tune.
for (auto const [cs, cz] : std::views::zip(s, z))
{
auto p = std::ranges::begin(mapping);
auto const end = std::ranges::end(mapping);
// Go through the mapping, searching for *either* cs *or* cz.
// Note that this loop could be parallelized, or even vectorized!
// You could just use a parallel algorithm instead of a raw loop.
//
// You should probably use an algorithm in any case, but I thought
// that would make the algorithm less clear. The *correct* way to
// do this would probably be to use std::ranges::find_if() (and also
// maybe std::ranges::for_each() for the outer loop). How, though?
// I'll say that's an "exercise for the reader". (It's not hard!)
for (; p != end; ++p)
{
// If either cs or cz is found, check the other half of the
// mapping. If it doesn't match, the strings are not isomorphic.
// If it does match, we're done looping through the mapping.
if (std::get<0>(*p) == cs)
{
if (std::get<1>(*p) != cz)
return false;
else
break;
}
if (std::get<1>(*p) == cz)
{
if (std::get<0>(*p) != cs)
return false;
else
break;
}
}
// If we got to the end of the mapping without finding either cs or
// cz, we need to add a new mapping item for them.
if (p == end)
mapping.emplace_back(cs, cz);
}
return true;
}
If you are expecting really, really long strings, then it might be a good idea to use std::deque
rather than std::vector
. std::deque
is almost std::vector
, but where std::vector
has to completely reallocate and move all data every time it needs to grow, std::deque
does not; it just allocates a new chunk. It’s a little slower, but probably not noticeably so for this use case. Especially when compared to node-based containers.
Back to the code:
if (!mapping1.contains(chs)) {
mapping1[chs] = chz;
}
This is a really, really bad way to go about this. This requires doing up to two runs through the map: first for the .contains()
, and possibly second for the insert. This is what .insert_or_assign()
or .try_emplace()
were made for. What you probably want to do here is to try-emplace the new data, and if that returns that the data already exists, then do the check:
auto const [it1, inserted1] = mapping1.try_emplace(chs, chz);
Then later you do:
if (mapping1[chs] != chz || mapping2[chz] != chs) {
return false;
}
... which requires running through the mapping again, even though you already had the mapping item you wanted! Either it was found within .contains()
, or created in the assignment... but either way, you threw that information away, and now have to search through the map again.
Instead, because you already have the iterator to the mapping item you are interested in—whether it was newly inserted or not—you should use it:
if (std::get<1>(*it1) != chz or std::get<1>(*it2) != chs)
return false;
So, altogether:
auto const [it1, _] = mapping1.try_emplace(chs, chz);
auto const [it2, _] = mapping1.try_emplace(chz, chs);
if (std::get<1>(*it1) != chz or std::get<1>(*it2) != chs)
return false;
Or to avoid an unnecessary romp through the second map when you already know the first one failed the test:
// Basically: if chs was already in the map (not inserted), but its partner
// was not chz, we're done.
if (auto const [p, inserted] = mapping1.try_emplace(chs, chz); not inserted and std::get<1>(*p) != chz)
return false;
// Same for the reverse map.
if (auto const [p, inserted] = mapping2.try_emplace(chz, chs); not inserted and std::get<1>(*p) != chs)
return false;
Either way, we get, at max, a single pass through the two maps (and possibly only a single pass through one of them).
But again, a single pass through a flat bi-mapped vector would probably be much, much faster than two passes through node-based maps (whether unnecessarily hashed or not).
QUICK EDIT: Ah, one thing I forgot I wanted to mention: if you really want to follow best practices, then you should have an overload that takes an allocator, which you can use in your mappings. That gives the user control over memory allocation... which is crucial for me in many of the domains I work in. You can still have an overload that just takes the two string views and uses the default standard allocator, for ergonomic purposes.