std::codecvt_utf8_utf16
<codecvt>
class Elem,
unsigned long Maxcode = 0x10ffff,
std::codecvt_mode Mode = (std::codecvt_mode )0 >
class codecvt_utf8_utf16
(deprecated in C++17)
(removed in C++26)
std::codecvt_utf8_utf16
is a std::codecvt facet which encapsulates conversion between a UTF-8 encoded byte string and UTF-16 encoded character string. If Elem
is a 32-bit type, one UTF-16 code unit will be stored in each 32-bit character of the output sequence.
This is an N:M conversion facet, and cannot be used with std::basic_filebuf (which only permits 1:N conversions, such as UTF-32/UTF-8, between the internal and the external encodings). This facet can be used with std::wstring_convert .
Contents
[edit] Template Parameters
Elem
that this facet will read or write without error
[edit] Member functions
codecvt_utf8_utf16
facet (public member function)
codecvt_utf8_utf16
facet (public member function)
std::codecvt_utf8_utf16::codecvt_utf8_utf16
Constructs a new std::codecvt_utf8_utf16
facet, passes the initial reference counter refs to the base class.
Parameters
std::codecvt_utf8_utf16::~codecvt_utf8_utf16
Destroys the facet. Unlike the locale-managed facets, this facet's destructor is public.
Inherited from std::codecvt
Nested types
intern_type
internT
extern_type
externT
state_type
stateT
[edit] Data members
Member functions
Protected member functions
InternT
to ExternT
, such as when writing to file (virtual protected member function of
std::codecvt<InternT,ExternT,StateT>
) [edit]
ExternT
to InternT
, such as when reading from file (virtual protected member function of
std::codecvt<InternT,ExternT,StateT>
) [edit]
ExternT
characters for incomplete conversion (virtual protected member function of
std::codecvt<InternT,ExternT,StateT>
) [edit]
ExternT
characters necessary to produce one InternT
character, if constant (virtual protected member function of
std::codecvt<InternT,ExternT,StateT>
) [edit]
(virtual protected member function of
std::codecvt<InternT,ExternT,StateT>
) [edit]
ExternT
string that would be consumed by conversion into given InternT
buffer (virtual protected member function of
std::codecvt<InternT,ExternT,StateT>
) [edit]
ExternT
characters that could be converted into a single InternT
character (virtual protected member function of
std::codecvt<InternT,ExternT,StateT>
) [edit]
Inherited from std::codecvt_base
ok
conversion was completed with no error
partial
not all source characters were converted
error
encountered an invalid character
noconv
no conversion required, input and output types are the same
[edit] Example
#include <cassert> #include <codecvt> #include <cstdint> #include <iostream> #include <locale> #include <string> int main() { std::string u8 = "z\u00df\u6c34\U0001f34c"; std::u16string u16 = u"z\u00df\u6c34\U0001f34c"; // UTF-8 to UTF-16/char16_t std::u16string u16_conv = std::wstring_convert < std::codecvt_utf8_utf16<char16_t>, char16_t>{}.from_bytes(u8); assert (u16 == u16_conv); std::cout << "UTF-8 to UTF-16 conversion produced " << u16_conv.size() << " code units:\n" << std::showbase << std::hex ; for (char16_t c : u16_conv) std::cout << static_cast<std::uint16_t >(c) << ' '; // UTF-16/char16_t to UTF-8 std::string u8_conv = std::wstring_convert < std::codecvt_utf8_utf16<char16_t>, char16_t>{}.to_bytes(u16); assert (u8 == u8_conv); std::cout << "\nUTF-16 to UTF-8 conversion produced " << std::dec << u8_conv.size() << " bytes:\n" << std::hex ; for (char c : u8_conv) std::cout << +static_cast<unsigned char>(c) << ' '; std::cout << '\n'; }
Output:
UTF-8 to UTF-16 conversion produced 5 code units: 0x7a 0xdf 0x6c34 0xd83c 0xdf4c UTF-16 to UTF-8 conversion produced 10 bytes: 0x7a 0xc3 0x9f 0xe6 0xb0 0xb4 0xf0 0x9f 0x8d 0x8c
[edit] Defect reports
The following behavior-changing defect reports were applied retroactively to previously published C++ standards.
DR | Applied to | Behavior as published | Correct behavior |
---|---|---|---|
LWG 2229 | C++98 | the constructor and destructor were not specified | specifies them |
[edit] See also
Character conversions |
locale-defined multibyte (UTF-8, GB18030) |
UTF-8 |
UTF-16 |
---|---|---|---|
UTF-16 | mbrtoc16 / c16rtomb (with C11's DR488) |
codecvt <char16_t,char,mbstate_t> |
N/A |
UCS-2 | c16rtomb (without C11's DR488) | codecvt_utf8 <char16_t> | codecvt_utf16 <char16_t> |
UTF-32 |
codecvt <char32_t,char,mbstate_t> |
codecvt_utf16 <char32_t> | |
system wchar_t:
UTF-32 (non-Windows) |
mbsrtowcs / wcsrtombs |
codecvt_utf8 <wchar_t> | codecvt_utf16 <wchar_t> |
(enum) [edit]
(class template) [edit]
(class template) [edit]