My code is
void utf8_append(UChar32 cp, std::string& str) {
size_t offset = str.size();
str.resize(offset + U8_LENGTH(cp));
auto ptr = reinterpret_cast<uint8_t*>(&str[0]);
U8_APPEND_UNSAFE(ptr, offset, static_cast<uint32_t>(cp));
}
This works but seems ugly. Maybe I am overlooking a simpler approach?
Relevant documentation: https://unicode-org.github.io/icu/userguide/strings/utf-8.html and https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/utf8_8h.html.
1 Answer 1
Beauty is in the eye of the beholder. I say it is perfectly valid and correct code! The only thing you might get rid of is the static_cast<uint32_t>
, as an UChar32
, which is an alias forint32_t
, will implicitly cast to uint32_t
without warnings.
You could also use append()
instead of resize()
, avoiding the addition, and remove the temporary ptr
, to finally get:
void utf8_append(UChar32 cp, std::string& str) {
auto offset = str.size();
str.append(U8_LENGTH(cp), {});
U8_APPEND_UNSAFE(reinterpret_cast<uint8_t *>(&str[0]), offset, cp);
}
If you can use C++17, str.data()
is slightly nicer than &str[0]
in my opinion. Or you could write &str.front()
.
-
\$\begingroup\$ "without warnings" Not with this project's warning settings. And no C++17, unfortunately. \$\endgroup\$Alexey Romanov– Alexey Romanov2020年10月21日 18:37:15 +00:00Commented Oct 21, 2020 at 18:37
-
\$\begingroup\$ Ah ok. Well if they are that strict then you're stuck with the
static_cast
of course. You could consider usingstd::basic_string<uint8_t>
to get rid of both casts, but it will probably open up a can of worms elsewhere in your codebase. \$\endgroup\$G. Sliepen– G. Sliepen2020年10月21日 18:48:13 +00:00Commented Oct 21, 2020 at 18:48