musl/src/stdio/ungetwc.c, branch master musl - an implementation of the standard library for Linux-based systems fix FILE buffer underflow in ungetwc 2016年04月26日T19:26:40+00:00 Rich Felker dalias@aerifal.cx 2016年04月26日T19:26:40+00:00 6ed791e768d83b40ed56c99dbb1ed72c1e49aae7 commit 7e816a6487932cbb3cb71d94b609e50e81f4e5bf (version 1.1.11 release cycle) moved the code that performs wchar_t to multibyte conversion across code that used the resulting length in bytes, thereby breaking the unget buffer space check in ungetwc and clobbering up to three bytes below the start of the buffer. for allocated FILEs (all read-enabled FILEs except stdin), the underflow clobbers at most the FILE-specific locale pointer. no stores are performed through this pointer, but subsequent loads may result in a crash or mismatching encoding rule (UTF-8 multibyte vs byte-based). for stdin, the buffer lies in .bss and the underflow may clobber another object. in practice, for libc.so the adjacent object seems to be stderr's buffer, which is completely unused, but this could vary with linking options, or when static linking. applications which do not attempt to use more than one character of ungetwc pushback, or which do not use ungetwc, are not affected.
commit 7e816a6487932cbb3cb71d94b609e50e81f4e5bf (version 1.1.11
release cycle) moved the code that performs wchar_t to multibyte
conversion across code that used the resulting length in bytes,
thereby breaking the unget buffer space check in ungetwc and
clobbering up to three bytes below the start of the buffer.
for allocated FILEs (all read-enabled FILEs except stdin), the
underflow clobbers at most the FILE-specific locale pointer. no stores
are performed through this pointer, but subsequent loads may result in
a crash or mismatching encoding rule (UTF-8 multibyte vs byte-based).
for stdin, the buffer lies in .bss and the underflow may clobber
another object. in practice, for libc.so the adjacent object seems to
be stderr's buffer, which is completely unused, but this could vary
with linking options, or when static linking.
applications which do not attempt to use more than one character of
ungetwc pushback, or which do not use ungetwc, are not affected.
byte-based C locale, phase 2: stdio and iconv (multibyte callers) 2015年06月16日T06:10:29+00:00 Rich Felker dalias@aerifal.cx 2015年06月16日T05:35:31+00:00 16f18d036d9a7bf590ee6eb86785c0a9658220b6 this patch adjusts libc components which use the multibyte functions internally, and which depend on them operating in a particular encoding, to make the appropriate locale changes before calling them and restore the calling thread's locale afterwards. activating the byte-based C locale without these changes would cause regressions in stdio and iconv. in the case of iconv, the current implementation was simply using the multibyte functions as UTF-8 conversions. setting a multibyte UTF-8 locale for the duration of the iconv operation allows the code to continue working. in the case of stdio, POSIX requires that FILE streams have an encoding rule bound at the time of setting wide orientation. as long as all locales, including the C locale, used the same encoding, treating high bytes as UTF-8, there was no need to store an encoding rule as part of the stream's state. a new locale field in the FILE structure points to the locale that should be made active during fgetwc/fputwc/ungetwc on the stream. it cannot point to the locale active at the time the stream becomes oriented, because this locale could be mutable (the global locale) or could be destroyed (locale_t objects produced by newlocale) before the stream is closed. instead, a pointer to the static C or C.UTF-8 locale object added in commit commit aeeac9ca5490d7d90fe061ab72da446c01ddf746 is used. this is valid since categories other than LC_CTYPE will not affect these functions.
this patch adjusts libc components which use the multibyte functions
internally, and which depend on them operating in a particular
encoding, to make the appropriate locale changes before calling them
and restore the calling thread's locale afterwards. activating the
byte-based C locale without these changes would cause regressions in
stdio and iconv.
in the case of iconv, the current implementation was simply using the
multibyte functions as UTF-8 conversions. setting a multibyte UTF-8
locale for the duration of the iconv operation allows the code to
continue working.
in the case of stdio, POSIX requires that FILE streams have an
encoding rule bound at the time of setting wide orientation. as long
as all locales, including the C locale, used the same encoding,
treating high bytes as UTF-8, there was no need to store an encoding
rule as part of the stream's state.
a new locale field in the FILE structure points to the locale that
should be made active during fgetwc/fputwc/ungetwc on the stream. it
cannot point to the locale active at the time the stream becomes
oriented, because this locale could be mutable (the global locale) or
could be destroyed (locale_t objects produced by newlocale) before the
stream is closed. instead, a pointer to the static C or C.UTF-8 locale
object added in commit commit aeeac9ca5490d7d90fe061ab72da446c01ddf746
is used. this is valid since categories other than LC_CTYPE will not
affect these functions.
fix idiom for setting stdio stream orientation to wide 2015年06月13日T05:17:16+00:00 Rich Felker dalias@aerifal.cx 2015年06月13日T05:17:16+00:00 536c6d5a4205e2a3f161f2983ce1e0ac3082187d the old idiom, f->mode |= f->mode+1, was adapted from the idiom for setting byte orientation, f->mode |= f->mode-1, but the adaptation was incorrect. unless the stream was alreasdy set byte-oriented, this code incremented f->mode each time it was executed, which would eventually lead to overflow. it could be fixed by changing it to f->mode |= 1, but upcoming changes will require slightly more work at the time of wide orientation, so it makes sense to just call fwide. as an optimization in the single-character functions, fwide is only called if the stream is not already wide-oriented.
the old idiom, f->mode |= f->mode+1, was adapted from the idiom for
setting byte orientation, f->mode |= f->mode-1, but the adaptation was
incorrect. unless the stream was alreasdy set byte-oriented, this code
incremented f->mode each time it was executed, which would eventually
lead to overflow. it could be fixed by changing it to f->mode |= 1,
but upcoming changes will require slightly more work at the time of
wide orientation, so it makes sense to just call fwide. as an
optimization in the single-character functions, fwide is only called
if the stream is not already wide-oriented.
remove another invalid skip of locking in ungetwc 2015年06月06日T18:20:30+00:00 Rich Felker dalias@aerifal.cx 2015年06月06日T18:20:30+00:00 312eea2ea4f4363fb01b73660c08bfcf43dd3bb4
remove invalid skip of locking in ungetwc 2015年06月06日T18:11:17+00:00 Rich Felker dalias@aerifal.cx 2015年06月06日T18:11:17+00:00 7e816a6487932cbb3cb71d94b609e50e81f4e5bf aside from being invalid, the early check only optimized the error case, and likely pessimized the common case by separating the two branches on isascii(c) at opposite ends of the function.
aside from being invalid, the early check only optimized the error
case, and likely pessimized the common case by separating the
two branches on isascii(c) at opposite ends of the function.
fix failure of ungetc and ungetwc to work on files in eof status 2015年05月29日T04:04:36+00:00 Rich Felker dalias@aerifal.cx 2015年05月29日T03:08:12+00:00 2b4fcfdacf93c3dfd6ac15e31790a9e154374679 these functions were written to handle clearing eof status, but failed to account for the __toread function's handling of eof. with this patch applied, __toread still returns EOF when the file is in eof status, so that read operations will fail, but it also sets up valid buffer pointers for read mode, which are set to the end of the buffer rather than the beginning in order to make the whole buffer available to ungetc/ungetwc. minor changes to __uflow were needed since it's now possible to have non-zero buffer pointers while in eof status. as made, these changes remove a 'fast path' bypassing the function call to __toread, which could be reintroduced with slightly different logic, but since ordinary files have a syscall in f->read, optimizing the code path does not seem worthwhile. the __stdio_read function is also updated not to zero the read buffer pointers on eof/error. while not necessary for correctness, this change avoids the overhead of calling __toread in ungetc after reaching eof, and it also reduces code size and increases consistency with the fmemopen read operation which does not zero the pointers.
these functions were written to handle clearing eof status, but failed
to account for the __toread function's handling of eof. with this
patch applied, __toread still returns EOF when the file is in eof
status, so that read operations will fail, but it also sets up valid
buffer pointers for read mode, which are set to the end of the buffer
rather than the beginning in order to make the whole buffer available
to ungetc/ungetwc.
minor changes to __uflow were needed since it's now possible to have
non-zero buffer pointers while in eof status. as made, these changes
remove a 'fast path' bypassing the function call to __toread, which
could be reintroduced with slightly different logic, but since
ordinary files have a syscall in f->read, optimizing the code path
does not seem worthwhile.
the __stdio_read function is also updated not to zero the read buffer
pointers on eof/error. while not necessary for correctness, this
change avoids the overhead of calling __toread in ungetc after
reaching eof, and it also reduces code size and increases consistency
with the fmemopen read operation which does not zero the pointers.
clean up stdio_impl.h 2012年11月08日T21:39:41+00:00 Rich Felker dalias@aerifal.cx 2012年11月08日T21:39:41+00:00 835f9f950e2f6059532bd9ab9857a856ed21a4fd this header evolved to facilitate the extremely lazy practice of omitting explicit includes of the necessary headers in individual stdio source files; not only was this sloppy, but it also increased build time. now, stdio_impl.h is only including the headers it needs for its own use; any further headers needed by source files are included directly where needed.
this header evolved to facilitate the extremely lazy practice of
omitting explicit includes of the necessary headers in individual
stdio source files; not only was this sloppy, but it also increased
build time.
now, stdio_impl.h is only including the headers it needs for its own
use; any further headers needed by source files are included directly
where needed.
major stdio overhaul, using readv/writev, plus other changes 2011年03月28日T05:14:44+00:00 Rich Felker dalias@aerifal.cx 2011年03月28日T05:14:44+00:00 e3cd6c5c265cd481db6e0c5b529855d99f0bda30 the biggest change in this commit is that stdio now uses readv to fill the caller's buffer and the FILE buffer with a single syscall, and likewise writev to flush the FILE buffer and write out the caller's buffer in a single syscall. making this change required fundamental architectural changes to stdio, so i also made a number of other improvements in the process: - the implementation no longer assumes that further io will fail following errors, and no longer blocks io when the error flag is set (though the latter could easily be changed back if desired) - unbuffered mode is no longer implemented as a one-byte buffer. as a consequence, scanf unreading has to use ungetc, to the unget buffer has been enlarged to hold at least 2 wide characters. - the FILE structure has been rearranged to maintain the locations of the fields that might be used in glibc getc/putc type macros, while shrinking the structure to save some space. - error cases for fflush, fseek, etc. should be more correct. - library-internal macros are used for getc_unlocked and putc_unlocked now, eliminating some ugly code duplication. __uflow and __overflow are no longer used anywhere but these macros. switch to read or write mode is also separated so the code can be better shared, e.g. with ungetc. - lots of other small things.
the biggest change in this commit is that stdio now uses readv to fill
the caller's buffer and the FILE buffer with a single syscall, and
likewise writev to flush the FILE buffer and write out the caller's
buffer in a single syscall.
making this change required fundamental architectural changes to
stdio, so i also made a number of other improvements in the process:
- the implementation no longer assumes that further io will fail
 following errors, and no longer blocks io when the error flag is set
 (though the latter could easily be changed back if desired)
- unbuffered mode is no longer implemented as a one-byte buffer. as a
 consequence, scanf unreading has to use ungetc, to the unget buffer
 has been enlarged to hold at least 2 wide characters.
- the FILE structure has been rearranged to maintain the locations of
 the fields that might be used in glibc getc/putc type macros, while
 shrinking the structure to save some space.
- error cases for fflush, fseek, etc. should be more correct.
- library-internal macros are used for getc_unlocked and putc_unlocked
 now, eliminating some ugly code duplication. __uflow and __overflow
 are no longer used anywhere but these macros. switch to read or
 write mode is also separated so the code can be better shared, e.g.
 with ungetc.
- lots of other small things.
fix all implicit conversion between signed/unsigned pointers 2011年03月25日T20:34:03+00:00 Rich Felker dalias@aerifal.cx 2011年03月25日T20:34:03+00:00 9ae8d5fc71a4b61ec826d58f03f7b543755fb1d4 sadly the C language does not specify any such implicit conversion, so this is not a matter of just fixing warnings (as gcc treats it) but actual errors. i would like to revisit a number of these changes and possibly revise the types used to reduce the number of casts required.
sadly the C language does not specify any such implicit conversion, so
this is not a matter of just fixing warnings (as gcc treats it) but
actual errors. i would like to revisit a number of these changes and
possibly revise the types used to reduce the number of casts required.
initial check-in, version 0.5.0 2011年02月12日T05:22:29+00:00 Rich Felker dalias@aerifal.cx 2011年02月12日T05:22:29+00:00 0b44a0315b47dd8eced9f3b7f31580cf14bbfc01

AltStyle によって変換されたページ (->オリジナル) /