2

I am working for several days with unicode in C++ now and it is very unclear for me. I have a few questions about its usage and I would be happy if they could be answered. The goal is simply that the output is the string with the proper unicode.

As far as I understood, � is put out when the char is broken. Like when you try to cast a wchat_t to a char.

About my machine OS: kubuntu 19.10

g++ --version
g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

1. Why does this work as std::string should only be capable of storing chars which "é" is not?

setlocale(LC_ALL, "en_US.utf8");
std::cout << "é" << std::endl;
output: é

2. Printing a wchar_t is very strange. Why is the following output as it is?

setlocale(LC_ALL, "en_US.utf8");
wchar_t a = L'é';
std::cout << a << std::endl;
output: 233
setlocale(LC_ALL, "en_US.utf8");
wchar_t a = L'é';
std::wcout << a << std::endl;
output: �
setlocale(LC_ALL, "en_US.utf8");
wchar_t a = L'é';
printf("%lc\n", a);
output: é
setlocale(LC_ALL, "en_US.utf8");
wchar_t a = L'é';
wprintf(L"%lc\n", a);
output: é

PS: setlocale(LC_ALL, "en_US.utf8") is there as suggested by this source. Otherwise, std::wcout would print question marks instead of the proper chars.

asked Dec 9, 2019 at 20:14
12
  • Extended ascii? Commented Dec 9, 2019 at 20:17
  • Also note that "é" is not a std::string but a string literal, which is of type const char[N] Commented Dec 9, 2019 at 20:18
  • @NathanOliver-ReinstateMonica I never heard of extended ascii so thanks! Nevertheless, this only explains my first question if I understand correctly. Commented Dec 9, 2019 at 20:20
  • On what system are you? Commented Dec 9, 2019 at 20:21
  • 2
    Q: "C++ - Why isn't the unicode output correct?" A: "Because you used C++ and unicode in the same sentence" 😞 Commented Dec 9, 2019 at 20:26

1 Answer 1

2
  • g++ is using UTF-8 as its default execution charset. You can change it with -fexec-charset= but that means that your "é" in your first exemple is coded in UTF-8.

  • 2.a There is no operator<< taking an ostream and a wchar_t. That means that the later is promoted and displayed as a number (wchar_t like char is an integral type).

The other are working as expected. I don't think more explanation is needed. Yet one thing to be aware of is that there is a need to have your environment correctly configured. That's why I asked you to pipe the output in | od -t x1 to check that the output was the expected one. As it is, the issue is a display issue and if you still had it, you'd have to check the configuration of your terminal emulator.

answered Dec 9, 2019 at 20:51
Sign up to request clarification or add additional context in comments.

3 Comments

const char arr[] = "россиянᔙинΩaऋ"; std::cout << arr << std::endl; Could you explain to me why the code above works and prints out correctly? This cannot be due to extended ascii like mentioned in one comment to the question as the chars do not belong to that ascii set. This char 'ऋ' for example takes 3 bytes in utf-8. How can that be saved in a char which is has a size of 1 byte? I printed strlen(arr) which puts out 27. Is it possible that a char like 'ऋ' simply gets allocated in three chars?
@Spixmaster, yes that's what is happening. Gcc is using utf8 as encoding for narrow string literals.
@Spixmaster const char arr[] = "россиянᔙинΩaऋ"; std::cout << arr << std::endl; works fine if 1) your source file is saved as UTF-8, 2) your compiler is set to interpret the source code as UTF-8, and outputs the string literal data in the final executable as UTF-8, and 3) your console supports the display of UTF-8 output from your executable. However, you should not rely on all of this being true. If you are using an up-to-date compiler (or at least a C++11 one), consider using the u8 or L prefix on string literals to handle Unicode strings, don't rely on the compiler's charset settings.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.