readdir() returns inaccessible name if file was created with invalid UTF-8
Corinna Vinschen
corinna-cygwin@cygwin.com
Wed Jul 23 07:53:41 GMT 2025
On Jul 23 05:44, Thomas Wolff via Cygwin wrote:
> OK, suppose I'd consider to switch to mbs[[n]r]towcs, collecting bytes until
> the function gives me a result.
> This would work fine as long as I receive only valid sequences. But look at
> input string test case
> char nonbmp[] = {0xF8, 0x88, 0x8A, 0xAF, 0x2D, 0}; // an invalid sequence
> followed by a valid char
> The functions only return -1 and (in the case of mbsnrtowcs) do not advance
> the input pointer.
> So how am I supposed to recognize that the invalid sequence has ended and a
> valid character has arrived?
Yeah, I see the problem. One of the slightly puzzeling behaviours
of mbsnrtowcs is the fact that the src pointer stays at the start of
the invalid sequence. I think the idea is to skip the invalid sequence
byte-wise until wcsnrtombs reports a valid sequence again.
What bugs me is that we have the choice between a broken mbrtowc on
one side and a chance to generate broken filenames on the other side.
I think we should actually revert fa272e05bbd0 ("wcstombs: also call
__WCTOMB on terminating NUL if output buffer is NULL") and see if we can
fix the filename issue in the Cygwin functions for filename conversion
alone.
Any ideas appreciated.
Corinna
More information about the Cygwin
mailing list