Before showing the hard(difficult) example of Windows 10 (Win10.21H2 in this case), let me show an easy example by Ubuntu 22.04 .
I place file names with Chinese and Korean characters on a USB flash drive, plug it into Ubuntu and Windows and see how they are displayed.
Ubuntu 22.04 displays them very well.
Ubuntu Terminal displays Unicode chars very well.
$ la
total 196
drwxr-xr-x 5 chj chj 32768 1970年01月01日 08:00:00 ./
drwxr-x---+ 3 root root 4096 2022年06月09日 00:43:35 ../
drwxr-xr-x 2 chj chj 32768 2021年10月24日 17:09:04 LEXAR-128G/
drwxr-xr-x 2 chj chj 32768 2022年02月17日 18:30:34 Raspi-wallpaper/
drwxr-xr-x 4 chj chj 32768 2016年02月17日 00:59:34 'System Volume Information'/
-rw-r--r-- 1 chj chj 23 2022年06月09日 08:39:10 한국파일.txt
-rw-r--r-- 1 chj chj 17 2022年06月09日 08:27:30 电脑文件.txt
But for Windows 10, I have to manually try many fonts(font family) until I find one that can display them well.
For an English version of Win10, the default CMD font is Consolas, then I try Lucida Console, SimHei(黑体), and finally NSimSun(新宋体).
Even though NSimSun display them correctly in this case, I'm still not sure whether NSimSun can cope with Unicode characters from other country/character-set (provided at least one font matching that country/character-set has been installed on the system).
Consolas:
Lucida Console:
SimHei:
NSimSun:
It has been year 2022 now, I'm wondering why Microsoft makes it so hard for a user to view Unicode characters correctly and conveniently in CMD window. Is there any best practice on this?
1 Answer 1
Windows Console was created before Unicode existed. A terrible decision was made to represent each text character as a fixed-length 16-bit value, (UCS-2). Because UCS-2 is a fixed-width 16-bit encoding, it is unable to represent all Unicode codepoints. And GDI is used to render Windows Console’s text, but GDI does not support font-fallback, so Windows Console is unable to display glyphs for codepoints that don’t exist in the currently selected font. This link provides an assessment from a Microsoft expert: Windows Command-Line: Unicode and UTF-8 Output Text Buffer
I have been unable to find a true good solution to the problem.
-
Why was it a "terrible" decision, given that Windows Console was created before Unicode existed? Have you tried Windows Terminal as a solution?Andrew Morton– Andrew Morton2024年11月06日 20:57:40 +00:00Commented Nov 6, 2024 at 20:57
-
1@andrew-morton: Whether or not Unicode existed is irrelevant; anyone with an American middle-school level worth of education should have instantly known that 2^16 is only 65k glyphs, & then thought: "Chinese alone will consume ~20k of those! That's not even enough to satisfy our existing needs, so it won't be good enough for the future, either." and then selected 32-bit encoding instead.Ace Frahm– Ace Frahm2024年11月22日 00:08:07 +00:00Commented Nov 22, 2024 at 0:08
-
It was good enough at the time, when storage, processing, and transmission of data was much more expensive. Didn't Windows Terminal work for you?Andrew Morton– Andrew Morton2024年11月22日 15:54:49 +00:00Commented Nov 22, 2024 at 15:54
-
1There's a half a dozen other reasons why it was a dumb decision ẾⱾꝔĘĆI̢ẢL̡ĹƳ at that time(!). I would have included 3 extra paragraphs WITH C🅰L🆄L🅰T🅸O🅽S, but there's an abitrary character limit on answers.Ace Frahm– Ace Frahm2024年12月02日 17:24:09 +00:00Commented Dec 2, 2024 at 17:24
cmdis probably more effort than just getting a better and more modern terminal replacement.cmdbut the default Windows console emulator which has always sucked. It’s the very definition of legacy.cmddoes this just fine – in Windows Terminal.chcp 65001.