musl/src/internal/shgetc.c, branch master musl - an implementation of the standard library for Linux-based systems fix possible access to uninitialized memory in shgetc (via scanf) 2020年04月17日T19:55:17+00:00 Rich Felker dalias@aerifal.cx 2020年04月17日T19:31:16+00:00 086542fb5bc88f590547147365630b9a44df223b shgetc sets up to be able to perform an "unget" operation without the caller having to remember and pass back the character value, and for this purpose used a conditional store idiom: if (f->rpos[-1] != c) f->rpos[-1] = c to make it safe to use with non-writable buffers (setup by the sh_fromstring macro or __string_read with sscanf). however, validity of this depends on the buffer space at rpos[-1] being initialized, which is not the case under some conditions (including at least unbuffered files and fmemopen ones). whenever data was read "through the buffer", the desired character value is already in place and does not need to be written. thus, rather than testing for the absence of the value, we can test for rpos<=buf, indicating that the last character read could not have come from the buffer, and thereby that we have a "real" buffer (possibly of zero length) with writable pushback (UNGET bytes) below it.
shgetc sets up to be able to perform an "unget" operation without the
caller having to remember and pass back the character value, and for
this purpose used a conditional store idiom:
 if (f->rpos[-1] != c) f->rpos[-1] = c
to make it safe to use with non-writable buffers (setup by the
sh_fromstring macro or __string_read with sscanf).
however, validity of this depends on the buffer space at rpos[-1]
being initialized, which is not the case under some conditions
(including at least unbuffered files and fmemopen ones).
whenever data was read "through the buffer", the desired character
value is already in place and does not need to be written. thus,
rather than testing for the absence of the value, we can test for
rpos<=buf, indicating that the last character read could not have come
from the buffer, and thereby that we have a "real" buffer (possibly of
zero length) with writable pushback (UNGET bytes) below it.
fix crash/out-of-bound read in sscanf 2019年03月15日T00:52:18+00:00 Rich Felker dalias@aerifal.cx 2019年03月15日T00:52:18+00:00 8f12c4e110acb3bbbdc8abfb3a552c3ced718039 commit d6c855caa88ddb1ab6e24e23a14b1e7baf4ba9c7 caused this "regression", though the behavior was undefined before, overlooking that f->shend=0 was being used as a sentinel for "EOF" status (actual EOF or hitting the scanf field width) of the stream helper (shgetc) functions. obviously the shgetc macro could be adjusted to check for a null pointer in addition to the != comparison, but it's the hot path, and adding extra code/branches to it begins to defeat the purpose. so instead of setting shend to a null pointer to block further reads, which no longer works, set it to the current position (rpos). this makes the shgetc macro work with no change, but it breaks shunget, which can no longer look at the value of shend to determine whether to back up. Szabolcs Nagy suggested a solution which I'm using here: setting shlim to a negative value is inexpensive to test at shunget time, and automatically re-trips the cnt>=shlim stop condition in __shgetc no matter what the original limit was.
commit d6c855caa88ddb1ab6e24e23a14b1e7baf4ba9c7 caused this
"regression", though the behavior was undefined before, overlooking
that f->shend=0 was being used as a sentinel for "EOF" status (actual
EOF or hitting the scanf field width) of the stream helper (shgetc)
functions.
obviously the shgetc macro could be adjusted to check for a null
pointer in addition to the != comparison, but it's the hot path, and
adding extra code/branches to it begins to defeat the purpose.
so instead of setting shend to a null pointer to block further reads,
which no longer works, set it to the current position (rpos). this
makes the shgetc macro work with no change, but it breaks shunget,
which can no longer look at the value of shend to determine whether to
back up. Szabolcs Nagy suggested a solution which I'm using here:
setting shlim to a negative value is inexpensive to test at shunget
time, and automatically re-trips the cnt>=shlim stop condition in
__shgetc no matter what the original limit was.
fix undefined behavior in strto* via FILE buffer pointer abuse 2018年09月15日T06:48:25+00:00 Rich Felker dalias@aerifal.cx 2018年09月15日T06:33:08+00:00 d6c855caa88ddb1ab6e24e23a14b1e7baf4ba9c7 in order to produce FILE objects to pass to the intscan/floatscan backends without any (prohibitively costly) extra buffering layer, the strto* functions set the FILE's rend (read end) buffer pointer to an invalid value at the end of the address space, or SIZE_MAX/2 past the beginning of the string. this led to undefined behavior comparing and subtracting the end pointer with the buffer position pointer (rpos). the comparison issue is easily eliminated by using != instead of <. however the subtractions require nontrivial changes: previously, f->shcnt stored the count that would have been read if consuming the whole buffer, which required an end pointer for the buffer. the purpose for this was that it allowed reading it and adding rpos-rend at any time to get the actual count so far, and required no adjustment at the time of __shgetc (actual function call) since the call would only happen when reaching the end of the buffer. to get rid of the dependency on rend, instead offset shcnt by buf-rpos (start of buffer) at the time of last __shlim/__shgetc call. this makes for slightly more work in __shgetc the function, but for the inline macro it's still just as easy to compute the current count. since the scan helper interfaces used here are a big hack, comments are added to document their contracts and what's going on with their implementations.
in order to produce FILE objects to pass to the intscan/floatscan
backends without any (prohibitively costly) extra buffering layer, the
strto* functions set the FILE's rend (read end) buffer pointer to an
invalid value at the end of the address space, or SIZE_MAX/2 past the
beginning of the string. this led to undefined behavior comparing and
subtracting the end pointer with the buffer position pointer (rpos).
the comparison issue is easily eliminated by using != instead of <.
however the subtractions require nontrivial changes:
previously, f->shcnt stored the count that would have been read if
consuming the whole buffer, which required an end pointer for the
buffer. the purpose for this was that it allowed reading it and adding
rpos-rend at any time to get the actual count so far, and required no
adjustment at the time of __shgetc (actual function call) since the
call would only happen when reaching the end of the buffer.
to get rid of the dependency on rend, instead offset shcnt by buf-rpos
(start of buffer) at the time of last __shlim/__shgetc call. this
makes for slightly more work in __shgetc the function, but for the
inline macro it's still just as easy to compute the current count.
since the scan helper interfaces used here are a big hack, comments
are added to document their contracts and what's going on with their
implementations.
fix major scanf breakage with unbuffered streams, fmemopen, etc. 2013年06月22日T21:11:17+00:00 Rich Felker dalias@aerifal.cx 2013年06月22日T21:11:17+00:00 c20804500deebaabc56f383d48dd1ac77dce8349 the shgetc api, used internally in scanf and int/float scanning code to handle field width limiting and pushback, was designed assuming that pushback could be achieved via a simple decrement on the file buffer pointer. this only worked by chance for regular FILE streams, due to the linux readv bug workaround in __stdio_read which moves the last requested byte through the buffer rather than directly back to the caller. for unbuffered streams and streams not using __stdio_read but some other underlying read function, the first character read could be completely lost, and replaced by whatever junk happened to be in the unget buffer. to fix this, simply have shgetc, when it performs an underlying read operation on the stream, store the character read at the -1 offset from the read buffer pointer. this is valid even for unbuffered streams, as they have an unget buffer located just below the start of the zero-length buffer. the check to avoid storing the character when it is already there is to handle the possibility of read-only buffers. no application-exposed FILE types are allowed to use read-only buffers, but sscanf and strto* may use them internally when calling functions which use the shgetc api.
the shgetc api, used internally in scanf and int/float scanning code
to handle field width limiting and pushback, was designed assuming
that pushback could be achieved via a simple decrement on the file
buffer pointer. this only worked by chance for regular FILE streams,
due to the linux readv bug workaround in __stdio_read which moves the
last requested byte through the buffer rather than directly back to
the caller. for unbuffered streams and streams not using __stdio_read
but some other underlying read function, the first character read
could be completely lost, and replaced by whatever junk happened to be
in the unget buffer.
to fix this, simply have shgetc, when it performs an underlying read
operation on the stream, store the character read at the -1 offset
from the read buffer pointer. this is valid even for unbuffered
streams, as they have an unget buffer located just below the start of
the zero-length buffer. the check to avoid storing the character when
it is already there is to handle the possibility of read-only buffers.
no application-exposed FILE types are allowed to use read-only
buffers, but sscanf and strto* may use them internally when calling
functions which use the shgetc api.
fix buggy limiter handling in shgetc 2012年04月16日T19:36:18+00:00 Rich Felker dalias@aerifal.cx 2012年04月16日T19:36:18+00:00 cc762434d91a2f441a1d2f44962ab1d4854b607b this is needed for upcoming new scanf
this is needed for upcoming new scanf
fix broken shgetc limiter logic (wasn't working) 2012年04月16日T05:55:37+00:00 Rich Felker dalias@aerifal.cx 2012年04月16日T05:55:37+00:00 f007bb854b0b2d2d12cd45a8feb674fa9abe70b2
fix incorrect initial count in shgetc when data is already buffered 2012年04月11日T04:26:41+00:00 Rich Felker dalias@aerifal.cx 2012年04月11日T04:26:41+00:00 7ef1a9bba56aa756d8166c4c93cf4a178d6c0c0c
add "scan helper getc" and rework strtod, etc. to use it 2012年04月11日T01:47:37+00:00 Rich Felker dalias@aerifal.cx 2012年04月11日T01:47:37+00:00 2162541f38d3f642f5a643010548d62220d55a4d the immediate benefit is a significant debloating of the float parsing code by moving the responsibility for keeping track of the number of characters read to a different module. by linking shgetc with the stdio buffer logic, counting logic is defered to buffer refill time, keeping the calls to shgetc fast and light. in the future, shgetc will also be useful for integrating the new float code with scanf, which needs to not only count the characters consumed, but also limit the number of characters read based on field width specifiers. shgetc may also become a useful tool for simplifying the integer parsing code.
the immediate benefit is a significant debloating of the float parsing
code by moving the responsibility for keeping track of the number of
characters read to a different module.
by linking shgetc with the stdio buffer logic, counting logic is
defered to buffer refill time, keeping the calls to shgetc fast and
light.
in the future, shgetc will also be useful for integrating the new
float code with scanf, which needs to not only count the characters
consumed, but also limit the number of characters read based on field
width specifiers.
shgetc may also become a useful tool for simplifying the integer
parsing code.

AltStyle によって変換されたページ (->オリジナル) /