commit 8cca79a72cccbdb54726125d690d7d0095fc2409 added use of SYS_pause to exit() without accounting for newer archs omitting the syscall. use the newly-added __sys_pause abstraction instead, which uses SYS_ppoll when SYS_pause is missing.
previously, global dtors, which are executed after all atexit handlers have been called rather than being implemented as an atexit handler themselves, would deadlock if they called atexit. it was intentional to disallow adding more atexit handlers past the last point where they would be executed, since a successful return from atexit imposes a contract that the handler will be executed, but this was only considered in the context of calls to atexit from other threads, not calls from the dtors. to fix this, release the lock after the exit handlers loop completes, but but set a flag first so that we can make all future calls to atexit return a failure code.
per the C and POSIX standards, calling exit "more than once", including via return from main, produces undefined behavior. this language predates threads, and at the time it was written, could only have applied to recursive calls to exit via atexit handlers. C++ likewise makes calls to exit from global dtors undefined. nonetheless, by the present specification as written, concurrent calls to exit by multiple threads also have undefined behavior. originally, our implementation of exit did have locking to handle concurrent calls safely, but that was changed in commit 2e55da911896a91e95b24ab5dc8a9d9b0718f4de based on it being undefined. from a standpoint of both hardening and quality of implementation, that change seems to have been a mistake. this change adds back locking, but with awareness of the lock owner so that recursive calls to exit can be trapped rather than deadlocking. this also opens up the possibility of allowing recursive calls to succeed, if future consensus ends up being in favor of that. prior to this change, exit already behaved partly as if protected by a lock as long as atexit was linked, but multiple threads calling exit could concurrently "pop off" atexit handlers and execute them in parallel with one another rather than serialized in the reverse order of registration. this was a likely unnoticed but potentially very dangerous manifestation of the undefined behavior. if on the other hand atexit was not linked, multiple threads calling exit concurrently could each run their own instance of global dtors, if any, likely producing double-free situations. now, if multiple threads call exit concurrently, all but the first will permanently block (in SYS_pause) until the process terminates, and all atexit handlers, global dtors, and stdio flushing/position consistency will be handled in the thread that arrived first. this is really the only reasonable way to define concurrent calls to exit. it is not recommended usage, but may become so in the future if there is consensus/standardization, as there is a push from the rust language community (and potentially other languages interoperating with the C runtime) to make concurrent calls to the language's exit interfaces safe even when multiple languages are involved in a program, and this is only possible by having the locking in the underlying C exit.
as the outcome of Austin Group tracker issue #62, future editions of POSIX have dropped the requirement that fork be AS-safe. this allows but does not require implementations to synchronize fork with internal locks and give forked children of multithreaded parents a partly or fully unrestricted execution environment where they can continue to use the standard library (per POSIX, they can only portably use AS-safe functions). up until recently, taking this allowance did not seem desirable. however, commit 8ed2bd8bfcb4ea6448afb55a941f4b5b2b0398c0 exposed the extent to which applications and libraries are depending on the ability to use malloc and other non-AS-safe interfaces in MT-forked children, by converting latent very-low-probability catastrophic state corruption into predictable deadlock. dealing with the fallout has been a huge burden for users/distros. while it looks like most of the non-portable usage in applications could be fixed given sufficient effort, at least some of it seems to occur in language runtimes which are exposing the ability to run unrestricted code in the child as part of the contract with the programmer. any attempt at fixing such contracts is not just a technical problem but a social one, and is probably not tractable. this patch extends the fork function to take locks for all libc singletons in the parent, and release or reset those locks in the child, so that when the underlying fork operation takes place, the state protected by these locks is consistent and ready for the child to use. locking is skipped in the case where the parent is single-threaded so as not to interfere with legacy AS-safety property of fork in single-threaded programs. lock order is mostly arbitrary, but the malloc locks (including bump allocator in case it's used) must be taken after the locks on any subsystems that might use malloc, and non-AS-safe locks cannot be taken while the thread list lock is held, imposing a requirement that it be taken last.
this change lifts undocumented restrictions on calls by replacement mallocs to libc functions that might take these locks, and sets the stage for lifting restrictions on the child execution environment after multithreaded fork. care is taken to #define macros to replace all four functions (malloc, calloc, realloc, free) even if not all of them will be used, using an undefined symbol name for the ones intended not to be used so that any inadvertent future use will be caught at compile time rather than directed to the wrong implementation.
assert is not specified to flush open stdio streams, and doing so can block indefinitely waiting for a lock already held or an output operation to a file that can't accept more output until an unsatisfiable condition is met.
the dummy definition of __abort_lock in sigaction.c was performing exactly the same role that putting the lock in its own source file could and should have been used to achieve. while we're moving it, give it a proper declaration.
this further reduces the number of source files which need to include libc.h and thereby be potentially exposed to libc global state and internals. this will also facilitate further improvements like adding an inline fast-path, if we want to do so later.
this cleans up what had become widespread direct inline use of "GNU C" style attributes directly in the source, and lowers the barrier to increased use of hidden visibility, which will be useful to recovering some of the efficiency lost when the protected visibility hack was dropped in commit dc2f368e565c37728b0d620380b849c3a1ddd78f, especially on archs where the PLT ABI is costly.
Linux makes this surprisingly difficult, but it can be done. the trick here is using the fact that we control the implementation of sigaction to prevent changing the disposition of SIGABRT to anything but SIG_DFL after abort has tried and failed to terminate the process simply by calling raise(SIGABRT).