1448 – UTF-8 output to console is seriously broken

D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 1448 - UTF-8 output to console is seriously broken
Summary: UTF-8 output to console is seriously broken
Status: REOPENED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: x86 Windows
: P3 normal
Assignee: No Owner
URL:
Keywords:
: 1608 (view as issue list)
Depends on:
Blocks:
Reported: 2007年08月28日 22:51 UTC by Alexander Solovey
Modified: 2024年12月13日 17:47 UTC (History)
7 users (show)

See Also:


Attachments
Small test cae for the same problem in DMC (336 bytes, application/octet-stream)
2007年08月28日 22:52 UTC, Alexander Solovey
Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.
Description Alexander Solovey 2007年08月28日 22:51:06 UTC
If windows console code page is set to 65001 (UTF-8) and program outputs non-ascii characters in UTF-8 encoding, there will be no more output after the first new line after accented character. I believe that problem is in underlying DMC stdio, but it is more disturbing with D as it has good Unicode support and it is very convenient to work international texts in it.
This problem has been reported in newsgroup several times before, see for example http://www.digitalmars.com/d/archives/digitalmars/D/announce/openquran_v0.21_8492.html
Here is the code to illustrate the problem:
////////
import std.c.stdio;
import std.c.windows.windows;
extern(Windows) export BOOL SetConsoleOutputCP( UINT );
void main() {
 SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead
 // Codepoint 00e9 is "Latin small letter e with acute"
 puts( "Output utf-8 accented char \u00e9\n... and the rest is cut off!\n" );
}
/////////
If you run it, "... and the rest is cut off!" won't be displayed. Do not forget to set console font to Lucida Console before trying this.
Comment 1 Alexander Solovey 2007年08月28日 22:52:24 UTC
Created attachment 172 [details] 
Small test cae for the same problem in DMC
Comment 2 Stewart Gordon 2007年08月29日 13:03:13 UTC
The problem doesn't show if I use the Windows API (either WriteConsole or WriteFile) to output. So the bug must be somewhere in DM's stdio implementation.
Comment 3 Walter Bright 2007年09月28日 22:15:07 UTC
Fixed dmd 1.021 and 2.004
Comment 4 Martin Krejcirik 2007年10月29日 11:02:51 UTC
The problem was NOT fixed for stderr (DMD 1.022)
Comment 5 Martin Krejcirik 2007年10月29日 11:04:25 UTC
*** Bug 1608 has been marked as a duplicate of this bug. ***
Comment 6 Martin Krejcirik 2008年09月03日 10:57:24 UTC
I hope this gets fixed one day. Here is an updated example, where it still doesn't work (for stderr, stdout is ok) as of DMD 1.035
import std.c.stdio;
import std.c.windows.windows;
extern(Windows) export BOOL SetConsoleOutputCP( UINT );
void main() {
 SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead
 // Codepoint 00e9 is "Latin small letter e with acute"
 fputs("Output utf-8 accented char \u00e9\n... and the rest is OK\n", stdout);
 fputs("Output utf-8 accented char \u00e9\n... and the rest is cut off!\n", stderr);
 fputs("STDOUT.\n", stdout);
 fputs("STDERR.\n", stderr);
}
Comment 7 Kevin 2012年02月07日 22:48:48 UTC
Sort of works for me.
The text doesn't get cut off, but the unicode characters don't get displayed either.
C:\Users\Kevin\Documents\D Projects\ConsoleApp1\ConsoleApp1\bin>ConsoleApp1.exe
Output utf-8 accented char é
... and the rest is OK
Output utf-8 accented char ��
... and the rest is cut off!
STDOUT.
STDERR.
C:\Users\Kevin\Documents\D Projects\ConsoleApp1\ConsoleApp1\bin>
Comment 8 Martin Krejcirik 2013年03月19日 18:21:18 UTC
Status update as of DMD 2.062 (Win XP 32 bit)
Still the same error for the above mentioned example, however, when modified to use write instead of fputs:
import std.stdio;
import std.c.windows.windows;
extern(Windows) BOOL SetConsoleOutputCP( UINT );
void main() {
 SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead
 stderr.write("STDERR:Output utf-8 accented char \u00e9\n... and the rest is cut off!\n");
 stderr.write("end_STDERR.\n");
}
I get this error:
STDERR:Output utf-8 accented char é
... and the rest is cut off!
std.exception.ErrnoException@D:\PROGRAMS\DMD2\WINDOWS\BIN\..\..\src\phobos\std\stdio.d(1264): (No error)
----------------
0x0040D874
0x0040D6FF
0x00402218
0x00402189
0x00402121
0x00402030
0x0040354E
0x00403151
0x00402388
0x7C81776F in RegisterWaitForInputIdle
----------------
So if anybody have a clue what's going on there...
Comment 9 Axel Bender 2013年08月07日 00:55:43 UTC
I can confirm this issue. When enumerating a directory (via dirEntries()) containing a file with a character in the CP850/CP1252 space (e.g. "säb"), depending on the codepage settings, the output is as follows:
chcp 1252 => output is "säb" (Unicode encoding for "ä")
chcp 65001 => output is "säbstd.exception.ErrnoException@D:\tools\d\bin\..\src\phobos\std\stdio.d(1352): (No error)"
In both cases e.g. cmd's dir shows the correct results.
The correct results are also shown when using - not really comparable - C with printf().
Tried the case in cmd, console2, and conemu. All show the same results.
It'd really be nice if this bug would get fixed...
Comment 10 Axel Bender 2013年08月07日 00:58:06 UTC
Addendum: Windows 7 64-bit, dmd v2.063.2.
Sorry.
Comment 11 Martin Krejcirik 2014年02月24日 17:18:25 UTC
Hallelujah, this (comment 8) seems fixed, finally. Can anybody confirm ? Works for me on Windows XP 32 bit, dmd 2.065.0
Beware, fputs still doesn't work. I think it's C library problem.
Comment 12 Sum Proxy 2014年10月25日 09:26:49 UTC
The issue still exists in 
DMD32 D Compiler v2.065, Windows 7
==============
Code:
==============
import std.stdio;
import std.c.windows.windows;
extern(Windows) BOOL SetConsoleOutputCP( UINT );
void main() {
 SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead
 stderr.write("STDERR:Output utf-8 accented char \u00e9\n... and the rest is cut off!\n");
 stderr.write("end_STDERR.\n");
}
==============
Output:
==============
STDERR:Output utf-8 accented char é
... and the rest is cut off!
==============
end_STDERR.\n 
is not written
Comment 13 Martin Krejcirik 2016年02月09日 21:07:53 UTC
Final note, as this is unlikely to be fixed: use -m32mscoff and Microsoft VS linker.
Comment 14 Martin Krejcirik 2016年11月30日 11:14:40 UTC
Partial fix or workaround in druntime for unhandled exceptions:
https://github.com/dlang/druntime/pull/1687 
Comment 15 kinke 2019年06月13日 18:33:49 UTC
Still an issue, but apparently restricted to stderr (and independent from DigitalMars/MS runtime):
```
import core.stdc.stdio;
import core.sys.windows.wincon, core.sys.windows.winnls;
void main()
{
 const oldCP = SetConsoleOutputCP(CP_UTF8);
 scope(exit) SetConsoleOutputCP(oldCP);
 fprintf(stdout, "HellöѬ LDC\n");
 fflush(stdout);
 fprintf(stderr, "HellöѬ LDC\n");
 fflush(stderr);
}
```
=>
```
HellöѬ LDC
Hell
```
Tested with DMD 2.086.0 (-m32, -m32mscoff, -m64) and LDC on Win10.
Comment 16 kinke 2019年06月15日 09:47:31 UTC
Update: it's working with Win10 v1903 (with the exact same binary that didn't work with v1803). According to Rainer Schütze, it's working since v1809. See https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/.
Comment 17 RazvanN 2019年08月12日 12:05:57 UTC
(In reply to kinke from comment #16)
> Update: it's working with Win10 v1903 (with the exact same binary that
> didn't work with v1803). According to Rainer Schütze, it's working since
> v1809. See
> https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-
> utf-8-output-text-buffer/.
So is this issue fixed? I don't have a windows machine to test it. Should we close this?
Comment 18 kinke 2019年08月12日 13:03:28 UTC
This isn't solved, but would now be solvable with recent Windows versions.
There are 2 things about this:
* DMD outputs a mix of UTF-8 and strings in the current codepage, AFAIK without setting any console codepage, so DMD output on Windows can be garbage. LDC v1.17 fixes this for LDC.
* User programs writing UTF-8 strings to the console suffer from the same issue. This *could* be worked around by setting the console codepage in druntime's _d_run_main and resetting it to the original one before termination.
Comment 19 RazvanN 2019年10月24日 09:32:25 UTC
(In reply to kinke from comment #18)
> This isn't solved, but would now be solvable with recent Windows versions.
> 
> There are 2 things about this:
> * DMD outputs a mix of UTF-8 and strings in the current codepage, AFAIK
> without setting any console codepage, so DMD output on Windows can be
> garbage. LDC v1.17 fixes this for LDC.
How does LDC solve the problem?
> * User programs writing UTF-8 strings to the console suffer from the same
> issue. This *could* be worked around by setting the console codepage in
> druntime's _d_run_main and resetting it to the original one before
> termination.
Comment 20 dlangBugzillaToGithub 2024年12月13日 17:47:58 UTC
THIS ISSUE HAS BEEN MOVED TO GITHUB
https://github.com/dlang/dmd/issues/17629
DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB


AltStyle によって変換されたページ (->オリジナル) /