I am trying to write some string algorithms which I want to work for any kinds of strings and/or locale. I managed to get some results that do work, but I am not sure that what I am doing is idiomatic or anything. Here is a function:
template<typename String>
auto upper(String str,
const std::ctype<typename String::value_type>& f =
std::use_facet<std::ctype<typename String::value_type>>(std::locale{}))
-> String
{
f.toupper(&str[0], &str[0]+str.size());
return str;
}
I have problems getting around the way C++ works with locales and UTF-8. This algorithm is intended to take a string (well, a range of characters with a standard string interface) and returning a copy of it with all the characters converted to uppercase according to a given locale. If no locale (ctype facet) is given, the last used locale is used.
Is there anyway to improve this code (taking locale
vs facet
) for examples so that it would be the most useful?
-
\$\begingroup\$ I am not sure std::locale{} will give you the C-Locale. It will give you the current locale (which will be C-Locale if not explicitly set (though on windows it may inherit from the system and Linux will get it from environment variables automatically (I think)). So (if I am correct (and there is an if there)) if no locale is supplied it will use the application current locale \$\endgroup\$Loki Astari– Loki Astari2014年01月01日 20:59:06 +00:00Commented Jan 1, 2014 at 20:59
1 Answer 1
I think I'd prefer to have the user pass a locale instead of a facet. In most typical cases, a user will deal only with locales, not with the individual facets that make up a particular locale. It's also relatively easy for a user to create a locale on demand, so the code looks something like:
std::string input{"Now is the time for every good man to come of the aid of his country."};
std::string result = upper(input, std::locale("en-us"));
or:
std::string result = upper(input, std::locale(""));
...though I suppose we can probably expect that the default parameter will be used a lot more often than not, in which case all of this is moot.
Anyway, using a locale as the parameter lets us move the use_facet
call inside the function body and use auto
for it instead of writing out its full type:
auto const& f = std::use_facet<std::ctype<typename String::value_type>>(locale);
Not a huge change, but somewhat simpler nonetheless. Given that you're depending on String
being contiguous and supporting random access, it might be worth considering making use of that a little more explicitly, by replacing &str[0]+str.size()
with &str[str.size()]
.
Although I'm somewhat on the fence about it, I'm also less than excited about the idea of using a trailing return type when it's not actually necessary. Maybe it's just a sign of my age, but I still tend to prefer the return type in its traditional location when possible.
Putting all those together, we'd end up with something on this order:
template<typename String>
String upper(String str, std::locale const &locale = std::locale())
{
auto const& f = std::use_facet<std::ctype<typename String::value_type>>(locale);
f.toupper(&str[0], &str[str.size()]);
return str;
}
I'm not sure anybody could call that a huge improvement, but I do think it's at least a minor one, especially in convenience to the user.
-
\$\begingroup\$ Considering the length of the function, any improvement could be called huge. Thanks for your answer :) \$\endgroup\$Morwenn– Morwenn2014年03月02日 21:56:02 +00:00Commented Mar 2, 2014 at 21:56
-
\$\begingroup\$ Considering the trailing return type, I consider it a mere matter of taste and I actually love it - even though I was reluctant to use it at first. By the way, Loki told me the exact same thing that you did about it a few weeks a ago. \$\endgroup\$Morwenn– Morwenn2014年03月02日 21:58:08 +00:00Commented Mar 2, 2014 at 21:58