musl/src/thread/synccall.c, branch master musl - an implementation of the standard library for Linux-based systems synccall: add separate exit_sem to fix thread release logic bug 2023年11月06日T18:05:24+00:00 Markus Wichmann nullplan@gmx.net 2023年10月31日T16:03:44+00:00 7f3a2925369c00abc33457400e33632e6dacb8ae The code intends for the sem_post() in line 97 (now 98) to only unblock target threads waiting on line 29. But after the first thread is released, the next sem_post() might also unblock a thread waiting on line 36. That would cause the thread to return to the execution of user code before all threads are done, leading to user code being executed in a mixed-credentials environment. What's more, if this happens more than once, then the mass release on line 110 (now line 111) will cause multiple threads to execute the callback at the same time, and the callbacks are currently not written to cope with that situation. Adding another semaphore allows the caller to say explicitly which threads it wants to release.
The code intends for the sem_post() in line 97 (now 98) to only unblock
target threads waiting on line 29. But after the first thread is
released, the next sem_post() might also unblock a thread waiting on
line 36. That would cause the thread to return to the execution of user
code before all threads are done, leading to user code being executed in
a mixed-credentials environment.
What's more, if this happens more than once, then the mass release on
line 110 (now line 111) will cause multiple threads to execute the
callback at the same time, and the callbacks are currently not written
to cope with that situation.
Adding another semaphore allows the caller to say explicitly which
threads it wants to release.
use alt signal stack when present for implementation-internal signals 2022年08月20日T16:24:49+00:00 Rich Felker dalias@aerifal.cx 2022年08月20日T16:24:49+00:00 2e5fff43dd7fc808197744c67cca7908ac19bb4f a request for this behavior has been open for a long time. the motivation is that application code, particularly under some language runtimes designed around very-low-footprint coroutine type constructs, may be operating with extremely small stack sizes unsuitable for receiving signals, using a separate signal stack for any signals it might handle. progress on this was blocked at one point trying to determine whether the implementation is actually entitled to clobber the alt stack, but the phrasing "available to the implementation" in the POSIX spec for sigaltstack seems to make it clear that the application cannot rely on the contents of this memory to be preserved in the absence of signal delivery (on the abstract machine, excluding implementation-internal signals) and that we can therefore use it for delivery of signals that "don't exist" on the abstract machine. no change is made for SIGTIMER since it is always blocked when used, and accepted via sigwaitinfo rather than execution of the signal handler.
a request for this behavior has been open for a long time. the
motivation is that application code, particularly under some language
runtimes designed around very-low-footprint coroutine type constructs,
may be operating with extremely small stack sizes unsuitable for
receiving signals, using a separate signal stack for any signals it
might handle.
progress on this was blocked at one point trying to determine whether
the implementation is actually entitled to clobber the alt stack, but
the phrasing "available to the implementation" in the POSIX spec for
sigaltstack seems to make it clear that the application cannot rely on
the contents of this memory to be preserved in the absence of signal
delivery (on the abstract machine, excluding implementation-internal
signals) and that we can therefore use it for delivery of signals that
"don't exist" on the abstract machine.
no change is made for SIGTIMER since it is always blocked when used,
and accepted via sigwaitinfo rather than execution of the signal
handler.
avoid set*id/setrlimit misbehavior and hang in vforked/cloned child 2020年09月18日T03:58:01+00:00 Rich Felker dalias@aerifal.cx 2020年09月17日T19:09:46+00:00 a5aff1972c9e3981566414b09a28e331ccd2be5d taking the deprecated/dropped vfork spec strictly, doing pretty much anything but execve in the child is wrong and undefined. however, these are commonly needed operations to setup the child state before exec, and historical implementations tolerated them. for single-threaded parents, these operations already worked as expected in the vforked child. however, due to the need for __synccall to synchronize id/resource limit changes among all threads, calling these functions in the vforked child of a multithreaded parent caused a misdirected broadcast signaling of all threads in the parent. these signals could kill the parent entirely if the synccall signal handler had never been installed in the parent, or could be ignored if it had, or could signal/kill one or more utterly wrong processes if the parent already terminated (due to vfork semantics, only possible via fatal signal) and the parent tids were recycled. in any case, the expected number of semaphore posts would never happen, so the child would permanently hang (with all signals blocked) waiting for them. to mitigate this, and also make the normal usage case work as intended, treat the condition where the caller's actual tid does not match the tid in its thread structure as single-threaded, and bypass the entire synccall broadcast operation.
taking the deprecated/dropped vfork spec strictly, doing pretty much
anything but execve in the child is wrong and undefined. however,
these are commonly needed operations to setup the child state before
exec, and historical implementations tolerated them.
for single-threaded parents, these operations already worked as
expected in the vforked child. however, due to the need for __synccall
to synchronize id/resource limit changes among all threads, calling
these functions in the vforked child of a multithreaded parent caused
a misdirected broadcast signaling of all threads in the parent. these
signals could kill the parent entirely if the synccall signal handler
had never been installed in the parent, or could be ignored if it had,
or could signal/kill one or more utterly wrong processes if the parent
already terminated (due to vfork semantics, only possible via fatal
signal) and the parent tids were recycled. in any case, the expected
number of semaphore posts would never happen, so the child would
permanently hang (with all signals blocked) waiting for them.
to mitigate this, and also make the normal usage case work as
intended, treat the condition where the caller's actual tid does not
match the tid in its thread structure as single-threaded, and bypass
the entire synccall broadcast operation.
rewrite __synccall in terms of global thread list 2019年02月16日T15:11:22+00:00 Rich Felker dalias@aerifal.cx 2019年02月16日T14:13:45+00:00 e4235d70672d9751d7718ddc2b52d0b426430768 the __synccall mechanism provides stop-the-world synchronous execution of a callback in all threads of the process. it is used to implement multi-threaded setuid/setgid operations, since Linux lacks them at the kernel level, and for some other less-critical purposes. this change eliminates dependency on /proc/self/task to determine the set of live threads, which in addition to being an unwanted dependency and a potential point of resource-exhaustion failure, turned out to be inaccurate. test cases provided by Alexey Izbyshev showed that it could fail to reflect newly created threads. due to how the presignaling phase worked, this usually yielded a deadlock if hit, but in the worst case it could also result in threads being silently missed (allowed to continue running without executing the callback).
the __synccall mechanism provides stop-the-world synchronous execution
of a callback in all threads of the process. it is used to implement
multi-threaded setuid/setgid operations, since Linux lacks them at the
kernel level, and for some other less-critical purposes.
this change eliminates dependency on /proc/self/task to determine the
set of live threads, which in addition to being an unwanted dependency
and a potential point of resource-exhaustion failure, turned out to be
inaccurate. test cases provided by Alexey Izbyshev showed that it
could fail to reflect newly created threads. due to how the
presignaling phase worked, this usually yielded a deadlock if hit, but
in the worst case it could also result in threads being silently
missed (allowed to continue running without executing the callback).
split internal lock API out of libc.h, creating lock.h 2018年09月12日T22:40:35+00:00 Rich Felker dalias@aerifal.cx 2018年09月12日T14:19:54+00:00 5f12ffe1239a5e4f8d4e98e2dff4e191a71f4693 this further reduces the number of source files which need to include libc.h and thereby be potentially exposed to libc global state and internals. this will also facilitate further improvements like adding an inline fast-path, if we want to do so later.
this further reduces the number of source files which need to include
libc.h and thereby be potentially exposed to libc global state and
internals.
this will also facilitate further improvements like adding an inline
fast-path, if we want to do so later.
revise the definition of multiple basic locks in the code 2018年01月09日T18:15:27+00:00 Jens Gustedt Jens.Gustedt@inria.fr 2018年01月03日T13:17:12+00:00 32482f61da7650ff10741bd5aedd66bbc3ea165b In all cases this is just a change from two volatile int to one.
In all cases this is just a change from two volatile int to one.
fix spurious EINTR errors from multithreaded set*id, etc. 2017年01月19日T16:45:01+00:00 Rich Felker dalias@aerifal.cx 2017年01月19日T16:45:01+00:00 6894f8472614e22c76820b6469d2551d17e024ed commit 78a8ef47c4d92b7680c52a85f80a81e29da86bb9 inadvertently removed the SA_RESTART flag from the sigaction for the internal signal handler used by __synccall for broadcasting. as a result, programs which did not use interrupting signals but which used set*id() in a multithreaded context could wrongly observe EINTR errors they're not prepared to handle.
commit 78a8ef47c4d92b7680c52a85f80a81e29da86bb9 inadvertently removed
the SA_RESTART flag from the sigaction for the internal signal handler
used by __synccall for broadcasting. as a result, programs which did
not use interrupting signals but which used set*id() in a
multithreaded context could wrongly observe EINTR errors they're not
prepared to handle.
make all objects used with atomic operations volatile 2015年03月04日T03:50:02+00:00 Rich Felker dalias@aerifal.cx 2015年03月04日T03:50:02+00:00 56fbaa3bbe73f12af2bfbbcf2adb196e6f9fe264 the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,*p,f(*p)); where the latter is intended to show multiple loads of *p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.
the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
overhaul __synccall and fix AS-safety and other issues in set*id 2015年01月16日T04:17:38+00:00 Rich Felker dalias@aerifal.cx 2015年01月16日T04:17:38+00:00 78a8ef47c4d92b7680c52a85f80a81e29da86bb9 multi-threaded set*id and setrlimit use the internal __synccall function to work around the kernel's wrongful treatment of these process properties as thread-local. the old implementation of __synccall failed to be AS-safe, despite POSIX requiring setuid and setgid to be AS-safe, and was not rigorous in assuring that all threads were caught. in a worst case, threads late in the process of exiting could retain permissions after setuid reported success, in which case attacks to regain dropped permissions may have been possible under the right conditions. the new implementation of __synccall depends on the presence of /proc/self/task and will fail if it can't be opened, but is able to determine that it has caught all threads, and does not use any locks except its own. it thereby achieves AS-safety simply by blocking signals to preclude re-entry in the same thread. with this commit, all known conformance and safety issues in set*id functions should be fixed.
multi-threaded set*id and setrlimit use the internal __synccall
function to work around the kernel's wrongful treatment of these
process properties as thread-local. the old implementation of
__synccall failed to be AS-safe, despite POSIX requiring setuid and
setgid to be AS-safe, and was not rigorous in assuring that all
threads were caught. in a worst case, threads late in the process of
exiting could retain permissions after setuid reported success, in
which case attacks to regain dropped permissions may have been
possible under the right conditions.
the new implementation of __synccall depends on the presence of
/proc/self/task and will fail if it can't be opened, but is able to
determine that it has caught all threads, and does not use any locks
except its own. it thereby achieves AS-safety simply by blocking
signals to preclude re-entry in the same thread.
with this commit, all known conformance and safety issues in set*id
functions should be fixed.
eliminate use of cached pid from thread structure 2014年07月06日T03:29:55+00:00 Rich Felker dalias@aerifal.cx 2014年07月06日T03:29:55+00:00 83dc6eb087633abcf5608ad651d3b525ca2ec35e the main motivation for this change is to remove the assumption that the tid of the main thread is also the pid of the process. (the value returned by the set_tid_address syscall was used to fill both fields despite it semantically being the tid.) this is historically and presently true on linux and unlikely to change, but it conceivably could be false on other systems that otherwise reproduce the linux syscall api/abi. only a few parts of the code were actually still using the cached pid. in a couple places (aio and synccall) it was a minor optimization to avoid a syscall. caching could be reintroduced, but lazily as part of the public getpid function rather than at program startup, if it's deemed important for performance later. in other places (cancellation and pthread_kill) the pid was completely unnecessary; the tkill syscall can be used instead of tgkill. this is actually a rather subtle issue, since tgkill is supposedly a solution to race conditions that can affect use of tkill. however, as documented in the commit message for commit 7779dbd2663269b465951189b4f43e70839bc073, tgkill does not actually solve this race; it just limits it to happening within one process rather than between processes. we use a lock that avoids the race in pthread_kill, and the use in the cancellation signal handler is self-targeted and thus not subject to tid reuse races, so both are safe regardless of which syscall (tgkill or tkill) is used.
the main motivation for this change is to remove the assumption that
the tid of the main thread is also the pid of the process. (the value
returned by the set_tid_address syscall was used to fill both fields
despite it semantically being the tid.) this is historically and
presently true on linux and unlikely to change, but it conceivably
could be false on other systems that otherwise reproduce the linux
syscall api/abi.
only a few parts of the code were actually still using the cached pid.
in a couple places (aio and synccall) it was a minor optimization to
avoid a syscall. caching could be reintroduced, but lazily as part of
the public getpid function rather than at program startup, if it's
deemed important for performance later. in other places (cancellation
and pthread_kill) the pid was completely unnecessary; the tkill
syscall can be used instead of tgkill. this is actually a rather
subtle issue, since tgkill is supposedly a solution to race conditions
that can affect use of tkill. however, as documented in the commit
message for commit 7779dbd2663269b465951189b4f43e70839bc073, tgkill
does not actually solve this race; it just limits it to happening
within one process rather than between processes. we use a lock that
avoids the race in pthread_kill, and the use in the cancellation
signal handler is self-targeted and thus not subject to tid reuse
races, so both are safe regardless of which syscall (tgkill or tkill)
is used.

AltStyle によって変換されたページ (->オリジナル) /