6

I'm using Windows 11. I have a program "Hello.exe"

#include <iostream>
int main(int argc, char* argv[])
{
 for (int i = 0; i < argc; i++)
 {
 std::cout << argv[i] << std::endl;
 }
}

If I pass in a Japanese UTF-8 character to this program

Hello.exe う

Then nothing is printed. And strangely, the content of this character, as recorded in argv, is 3f. But the actual encoding of this character should be e3 81 86.

What I've tried

(1) However, if I directly print this character in my code, the encoding would be correct in memory, and the character can be printed to stdout.

SetConsoleOutputCP(CP_UTF8);
printf("う")

(2) I also tried using wmain instead of main, can't be printed either. The value stored in argv is 46 30

#include <iostream>
int wmain(int argc, wchar_t** argv)
{
 for (int i = 0; i < argc; i++)
 {
 std::wcout << argv[i] << std::endl;
 }
}

(3) I also wrote a Python program, which does the same thing, and the character can be printed.

What am I missing?

9
  • 1
    kinda relevant What is the encoding of argv? ... Commented Jun 17 at 4:22
  • 1
    "I also tried using wmain instead of main. Have the same problem." - please show your attempt. wmain should work well. Commented Jun 17 at 4:22
  • 3
    You need to use wmain and convert the utf16 characters to utf8 Commented Jun 17 at 6:06
  • 1
    For wmain, the value stored in argv[1][0] should be 0x3046 (a single 16-bit value). Do you see something else? Are you running the program from an IDE? Commented Jun 17 at 6:50
  • 1
    @n.m.couldbeanAI See (2) under "What I've tried". It looks like the result is exactly what you say it should be. Commented Jun 17 at 15:12

1 Answer 1

13

Use UTF-8 on Windows

Windows is using UTF-16 encoded text everywhere it expects strings. This makes implementation of cross-platform programs more difficult since typically other operating systems use UTF-8 as their preferred Unicode encoding. But the good news is that it is now possible to use UTF-8 in Windows applications as well.

  1. Embed UTF-8 Manifest

    Windows 10 since May 2019 (version 1903), and Windows 11 of course, support UTF-8 codepage. With help of a manifest file that needs to be embedded in the .exe file, the developper can tell Windows system to set UTF-8 codepage when running the application. The manifest file is typically that file:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
     <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
     <application>
     <windowsSettings>
     <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
     </windowsSettings>
     </application>
    </assembly>
    

    You use mt.exe to add the manifest to the executable, or add the file as manifest in .vsproj on Visual Studio

  2. Compile with /utf-8

    Microsoft compiler (MSVC) needs flag /utf-8 to let it know that the source files are encoded in UTF-8 and that you want to output text as UTF-8. Don't forget that flag in projects.

  3. Configure the console as UTF-8

    For Windows console applications, call at start of main function SetConsoleOutputCP(CP_UTF8); for output and SetConsoleCP(CP_UTF8); for input. This is curiously required even with the manifest, as the console defaults to Windows OEM locale and not UTF-8.

    BUG: from my experiments, it seems that on Windows 10, inputting UTF-8 from the console does not work, whatever you try, except if somehow you call ReadConsoleW manually and adjust. On Windows 11, however, it works.

  4. Always use the ANSI Windows API

    Windows API functions exist in two flavors. There are functions ending in A (for ANSI) that expect const char* zero-terminated strings, and there are those ending in W (for wide) that expect const wchar_t* zero-terminated strings. The type wchar_t is 16-bit wide on Windows, and the wide strings are expected to be UTF-16LE encoded.

    Since you enabled UTF-8 as application codepage, you don't want to use the W wide API, but the A ANSI functions. So, although you actually want to support Unicode, don't define neither _UNICODE nor UNICODE macros as those would select the W variant of API. Alternately, in Visual Studio, select Use Multi-Byte Character Set for the Character Set parameter (in Advanced configuration properties).

    Then you can also use the Unicode agnostic macros like MessageBox that will properly select MessageBoxA.

    There are unfortunately some rare Windows API that do only exist in UTF-16 (wchar_t*) version. For those, you will need to manually convert your UTF-8 string into UTF-16 for example with std::codecvt or MultiByteToWideChar.

Example

Here is a Hello World demonstration

Hello-UTF-8.cpp: must be stored with UTF-8 encoding. BOM is permitted, but not recommended.

#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <iostream>
#include <string>
#include <cstdio>
int main(int argc, char* argv[])
{
 SetConsoleOutputCP(CP_UTF8);
 SetConsoleCP(CP_UTF8);
 std::string str = "議論\n";
 for(int i=0; i<argc; i++)
 {
 str += argv[i];
 str += "\n";
 }
 std::cout << str;
 FILE* file = fopen("Деякий файл.txt", "wt");
 fputs(str.c_str(), file); 
 MessageBox(nullptr, str.c_str(), "Γεια σου κόσμε", MB_OK);
}

utf8.manifest: exactly as above (I don't care about the dummy name):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
 <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
 <application>
 <windowsSettings>
 <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
 </windowsSettings>
 </application>
</assembly>

Compiled and run on PowerShell (for proper Unicode handling):

PS E:\Привет> cl Hello-UTF-8.cpp /utf-8 /nologo User32.lib /EHsc
Hello-UTF-8.cpp
PS E:\Привет> mt -nologo -manifest utf8.manifest -outputresource:Hello-UTF-8.exe;#1
PS E:\Привет> .\Hello-UTF-8.exe こんにちは κόσμος
議論
E:\Привет\Hello-UTF-8.exe
こんにちは
κόσμος
PS E:\Привет> dir *.txt
 Répertoire : E:\Привет
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 20.06.2025 11:11 72 Деякий файл.txt
genpfault
52.3k12 gold badges93 silver badges153 bronze badges
answered Jun 17 at 7:38
Sign up to request clarification or add additional context in comments.

1 Comment

In addition, the UCRT (Universal C Runtime supports) supports UTF-8 since 2018-April (in Windows 10 version 1803 (10.0.17134.0)). q.v. UTF-8 support. (Why did it take Microsoft so long? Because UTF-8 came out after Microsoft had committed to UCS-2 (which later became Microsoft's support of UTF-16). It took quite a few years later for UTF-8 to become a generally preferred encoding format.)

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.