I'm using Windows 11. I have a program "Hello.exe"
#include <iostream>
int main(int argc, char* argv[])
{
for (int i = 0; i < argc; i++)
{
std::cout << argv[i] << std::endl;
}
}
If I pass in a Japanese UTF-8 character to this program
Hello.exe う
Then nothing is printed. And strangely, the content of this character, as recorded in argv, is 3f. But the actual encoding of this character should be e3 81 86.
What I've tried
(1) However, if I directly print this character in my code, the encoding would be correct in memory, and the character can be printed to stdout.
SetConsoleOutputCP(CP_UTF8);
printf("う")
(2) I also tried using wmain instead of main, can't be printed either. The value stored in argv is 46 30
#include <iostream>
int wmain(int argc, wchar_t** argv)
{
for (int i = 0; i < argc; i++)
{
std::wcout << argv[i] << std::endl;
}
}
(3) I also wrote a Python program, which does the same thing, and the character can be printed.
What am I missing?
1 Answer 1
Use UTF-8 on Windows
Windows is using UTF-16 encoded text everywhere it expects strings. This makes implementation of cross-platform programs more difficult since typically other operating systems use UTF-8 as their preferred Unicode encoding. But the good news is that it is now possible to use UTF-8 in Windows applications as well.
Embed UTF-8 Manifest
Windows 10 since May 2019 (version 1903), and Windows 11 of course, support UTF-8 codepage. With help of a manifest file that needs to be embedded in the
.exefile, the developper can tell Windows system to set UTF-8 codepage when running the application. The manifest file is typically that file:<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1"> <assemblyIdentity type="win32" name="..." version="6.0.0.0"/> <application> <windowsSettings> <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage> </windowsSettings> </application> </assembly>You use
mt.exeto add the manifest to the executable, or add the file as manifest in.vsprojon Visual StudioCompile with /utf-8
Microsoft compiler (MSVC) needs flag
/utf-8to let it know that the source files are encoded in UTF-8 and that you want to output text as UTF-8. Don't forget that flag in projects.Configure the console as UTF-8
For Windows console applications, call at start of
mainfunctionSetConsoleOutputCP(CP_UTF8);for output andSetConsoleCP(CP_UTF8);for input. This is curiously required even with the manifest, as the console defaults to Windows OEM locale and not UTF-8.BUG: from my experiments, it seems that on Windows 10, inputting UTF-8 from the console does not work, whatever you try, except if somehow you call
ReadConsoleWmanually and adjust. On Windows 11, however, it works.Always use the ANSI Windows API
Windows API functions exist in two flavors. There are functions ending in
A(for ANSI) that expectconst char*zero-terminated strings, and there are those ending inW(for wide) that expectconst wchar_t*zero-terminated strings. The typewchar_tis 16-bit wide on Windows, and the wide strings are expected to be UTF-16LE encoded.Since you enabled UTF-8 as application codepage, you don't want to use the
Wwide API, but theAANSI functions. So, although you actually want to support Unicode, don't define neither_UNICODEnorUNICODEmacros as those would select theWvariant of API. Alternately, in Visual Studio, selectUse Multi-Byte Character Setfor theCharacter Setparameter (inAdvancedconfiguration properties).Then you can also use the Unicode agnostic macros like
MessageBoxthat will properly selectMessageBoxA.There are unfortunately some rare Windows API that do only exist in UTF-16 (
wchar_t*) version. For those, you will need to manually convert your UTF-8 string into UTF-16 for example withstd::codecvtor MultiByteToWideChar.
Example
Here is a Hello World demonstration
Hello-UTF-8.cpp: must be stored with UTF-8 encoding. BOM is permitted, but not recommended.
#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <iostream>
#include <string>
#include <cstdio>
int main(int argc, char* argv[])
{
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
std::string str = "議論\n";
for(int i=0; i<argc; i++)
{
str += argv[i];
str += "\n";
}
std::cout << str;
FILE* file = fopen("Деякий файл.txt", "wt");
fputs(str.c_str(), file);
MessageBox(nullptr, str.c_str(), "Γεια σου κόσμε", MB_OK);
}
utf8.manifest: exactly as above (I don't care about the dummy name):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly>
Compiled and run on PowerShell (for proper Unicode handling):
PS E:\Привет> cl Hello-UTF-8.cpp /utf-8 /nologo User32.lib /EHsc
Hello-UTF-8.cpp
PS E:\Привет> mt -nologo -manifest utf8.manifest -outputresource:Hello-UTF-8.exe;#1
PS E:\Привет> .\Hello-UTF-8.exe こんにちは κόσμος
議論
E:\Привет\Hello-UTF-8.exe
こんにちは
κόσμος
PS E:\Привет> dir *.txt
Répertoire : E:\Привет
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 20.06.2025 11:11 72 Деякий файл.txt
wmaininstead ofmain. Have the same problem." - please show your attempt.wmainshould work well.