musl/src/malloc/oldmalloc, branch master musl - an implementation of the standard library for Linux-based systems oldmalloc: preserve errno across free 2021年01月30日T22:28:08+00:00 Rich Felker dalias@aerifal.cx 2021年01月30日T22:28:08+00:00 9b77aaca86b53c367f23505c24dd3c02e240efad as an outcome of Austin Group issue #385, future versions of the standard will require free not to alter the value of errno. save and restore it individually around the calls to madvise and munmap so that the cost is not imposed on calls to free that do not result in any syscall.
as an outcome of Austin Group issue #385, future versions of the
standard will require free not to alter the value of errno. save and
restore it individually around the calls to madvise and munmap so that
the cost is not imposed on calls to free that do not result in any
syscall.
fix build regression in oldmalloc 2021年01月30日T22:26:34+00:00 Rich Felker dalias@aerifal.cx 2021年01月30日T22:26:34+00:00 98b9df994c85dcb6a8a5a9099495dd44c7cf2bce commit 8d37958d58cf36f53d5fcc7a8aa6d633da6071b2 inadvertently broke oldmalloc by having it implement __libc_malloc rather than __libc_malloc_impl.
commit 8d37958d58cf36f53d5fcc7a8aa6d633da6071b2 inadvertently broke
oldmalloc by having it implement __libc_malloc rather than
__libc_malloc_impl.
lift child restrictions after multi-threaded fork 2020年11月11日T20:55:30+00:00 Rich Felker dalias@aerifal.cx 2020年11月11日T18:37:33+00:00 167390f05564e0a4d3fcb4329377fd7743267560 as the outcome of Austin Group tracker issue #62, future editions of POSIX have dropped the requirement that fork be AS-safe. this allows but does not require implementations to synchronize fork with internal locks and give forked children of multithreaded parents a partly or fully unrestricted execution environment where they can continue to use the standard library (per POSIX, they can only portably use AS-safe functions). up until recently, taking this allowance did not seem desirable. however, commit 8ed2bd8bfcb4ea6448afb55a941f4b5b2b0398c0 exposed the extent to which applications and libraries are depending on the ability to use malloc and other non-AS-safe interfaces in MT-forked children, by converting latent very-low-probability catastrophic state corruption into predictable deadlock. dealing with the fallout has been a huge burden for users/distros. while it looks like most of the non-portable usage in applications could be fixed given sufficient effort, at least some of it seems to occur in language runtimes which are exposing the ability to run unrestricted code in the child as part of the contract with the programmer. any attempt at fixing such contracts is not just a technical problem but a social one, and is probably not tractable. this patch extends the fork function to take locks for all libc singletons in the parent, and release or reset those locks in the child, so that when the underlying fork operation takes place, the state protected by these locks is consistent and ready for the child to use. locking is skipped in the case where the parent is single-threaded so as not to interfere with legacy AS-safety property of fork in single-threaded programs. lock order is mostly arbitrary, but the malloc locks (including bump allocator in case it's used) must be taken after the locks on any subsystems that might use malloc, and non-AS-safe locks cannot be taken while the thread list lock is held, imposing a requirement that it be taken last.
as the outcome of Austin Group tracker issue #62, future editions of
POSIX have dropped the requirement that fork be AS-safe. this allows
but does not require implementations to synchronize fork with internal
locks and give forked children of multithreaded parents a partly or
fully unrestricted execution environment where they can continue to
use the standard library (per POSIX, they can only portably use
AS-safe functions).
up until recently, taking this allowance did not seem desirable.
however, commit 8ed2bd8bfcb4ea6448afb55a941f4b5b2b0398c0 exposed the
extent to which applications and libraries are depending on the
ability to use malloc and other non-AS-safe interfaces in MT-forked
children, by converting latent very-low-probability catastrophic state
corruption into predictable deadlock. dealing with the fallout has
been a huge burden for users/distros.
while it looks like most of the non-portable usage in applications
could be fixed given sufficient effort, at least some of it seems to
occur in language runtimes which are exposing the ability to run
unrestricted code in the child as part of the contract with the
programmer. any attempt at fixing such contracts is not just a
technical problem but a social one, and is probably not tractable.
this patch extends the fork function to take locks for all libc
singletons in the parent, and release or reset those locks in the
child, so that when the underlying fork operation takes place, the
state protected by these locks is consistent and ready for the child
to use. locking is skipped in the case where the parent is
single-threaded so as not to interfere with legacy AS-safety property
of fork in single-threaded programs. lock order is mostly arbitrary,
but the malloc locks (including bump allocator in case it's used) must
be taken after the locks on any subsystems that might use malloc, and
non-AS-safe locks cannot be taken while the thread list lock is held,
imposing a requirement that it be taken last.
give libc access to its own malloc even if public malloc is interposed 2020年11月11日T16:38:21+00:00 Rich Felker dalias@aerifal.cx 2020年11月11日T05:22:34+00:00 8d37958d58cf36f53d5fcc7a8aa6d633da6071b2 allowing the application to replace malloc (since commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732) has brought multiple headaches where it's used from various critical sections in libc components. for example: - the thread-local message buffers allocated for dlerror can't be freed at thread exit time because application code would then run in the context of a non-existant thread. this was handled in commit aa5a9d15e09851f7b4a1668e9dbde0f6234abada by queuing them for free later. - the dynamic linker has to be careful not to pass memory allocated at early startup time (necessarily using its own malloc) to realloc or free after redoing relocations with the application and all libraries present. bugs in this area were fixed several times, at least in commits 0c5c8f5da6e36fe4ab704bee0cd981837859e23f and 2f1f51ae7b2d78247568e7fdb8462f3c19e469a4 and possibly others. - by calling the allocator from contexts where libc-internal locks are held, we impose undocumented requirements on alternate malloc implementations not to call into any libc function that might attempt to take these locks; if they do, deadlock results. - work to make fork of a multithreaded parent give the child an unrestricted execution environment is blocked by lock order issues as long as the application-provided allocator can be called with libc-internal locks held. these problems are all fixed by giving libc internals access to the original, non-replaced allocator, for use where needed. it can't be used everywhere, as some interfaces like str[n]dup, open_[w]memstream, getline/getdelim, etc. are required to provide the called memory obtained as if by (the public) malloc. and there are a number of libc interfaces that are "pure library" code, not part of some internal singleton, and where using the application's choice of malloc implementation is preferable -- things like glob, regex, etc. one might expect there to be significant cost to static-linked programs, pulling in two malloc implementations, one of them mostly-unused, if malloc is replaced. however, in almost all of the places where malloc is used internally, care has been taken already not to pull in realloc/free (i.e. to link with just the bump allocator). this size optimization carries over automatically. the newly-exposed internal allocator functions are obtained by renaming the actual definitions, then adding new wrappers around them with the public names. technically __libc_realloc and __libc_free could be aliases rather than needing a layer of wrapper, but this would almost surely break certain instrumentation (valgrind) and the size and performance difference is negligible. __libc_calloc needs to be handled specially since calloc is designed to work with either the internal or the replaced malloc. as a bonus, this change also eliminates the longstanding ugly dependency of the static bump allocator on order of object files in libc.a, by making it so there's only one definition of the malloc function and having it in the same source file as the bump allocator.
allowing the application to replace malloc (since commit
c9f415d7ea2dace5bf77f6518b6afc36bb7a5732) has brought multiple
headaches where it's used from various critical sections in libc
components. for example:
- the thread-local message buffers allocated for dlerror can't be
 freed at thread exit time because application code would then run in
 the context of a non-existant thread. this was handled in commit
 aa5a9d15e09851f7b4a1668e9dbde0f6234abada by queuing them for free
 later.
- the dynamic linker has to be careful not to pass memory allocated at
 early startup time (necessarily using its own malloc) to realloc or
 free after redoing relocations with the application and all
 libraries present. bugs in this area were fixed several times, at
 least in commits 0c5c8f5da6e36fe4ab704bee0cd981837859e23f and
 2f1f51ae7b2d78247568e7fdb8462f3c19e469a4 and possibly others.
- by calling the allocator from contexts where libc-internal locks are
 held, we impose undocumented requirements on alternate malloc
 implementations not to call into any libc function that might
 attempt to take these locks; if they do, deadlock results.
- work to make fork of a multithreaded parent give the child an
 unrestricted execution environment is blocked by lock order issues
 as long as the application-provided allocator can be called with
 libc-internal locks held.
these problems are all fixed by giving libc internals access to the
original, non-replaced allocator, for use where needed. it can't be
used everywhere, as some interfaces like str[n]dup, open_[w]memstream,
getline/getdelim, etc. are required to provide the called memory
obtained as if by (the public) malloc. and there are a number of libc
interfaces that are "pure library" code, not part of some internal
singleton, and where using the application's choice of malloc
implementation is preferable -- things like glob, regex, etc.
one might expect there to be significant cost to static-linked
programs, pulling in two malloc implementations, one of them
mostly-unused, if malloc is replaced. however, in almost all of the
places where malloc is used internally, care has been taken already
not to pull in realloc/free (i.e. to link with just the bump
allocator). this size optimization carries over automatically.
the newly-exposed internal allocator functions are obtained by
renaming the actual definitions, then adding new wrappers around them
with the public names. technically __libc_realloc and __libc_free
could be aliases rather than needing a layer of wrapper, but this
would almost surely break certain instrumentation (valgrind) and the
size and performance difference is negligible. __libc_calloc needs to
be handled specially since calloc is designed to work with either the
internal or the replaced malloc.
as a bonus, this change also eliminates the longstanding ugly
dependency of the static bump allocator on order of object files in
libc.a, by making it so there's only one definition of the malloc
function and having it in the same source file as the bump allocator.
only use memcpy realloc to shrink if an exact-sized free chunk exists 2020年06月16日T04:58:22+00:00 Rich Felker dalias@aerifal.cx 2020年06月16日T04:53:57+00:00 fca7428c096066482d8c3f52450810288e27515c otherwise, shrink in-place. as explained in the description of commit 3e16313f8fe2ed143ae0267fd79d63014c24779f, the split here is valid without holding split_merge_lock because all chunks involved are in the in-use state.
otherwise, shrink in-place. as explained in the description of commit
3e16313f8fe2ed143ae0267fd79d63014c24779f, the split here is valid
without holding split_merge_lock because all chunks involved are in
the in-use state.
fix memset overflow in oldmalloc race fix overhaul 2020年06月16日T04:46:09+00:00 Rich Felker dalias@aerifal.cx 2020年06月16日T04:34:12+00:00 cb5babdc8d624a3e3e7bea0b4e28a677a2f2fc46 commit 3e16313f8fe2ed143ae0267fd79d63014c24779f introduced this bug by making the copy case reachable with n (new size) smaller than n0 (original size). this was left as the only way of shrinking an allocation because it reduces fragmentation if a free chunk of the appropriate size is available. when that's not the case, another approach may be better, but any such improvement would be independent of fixing this bug.
commit 3e16313f8fe2ed143ae0267fd79d63014c24779f introduced this bug by
making the copy case reachable with n (new size) smaller than n0
(original size). this was left as the only way of shrinking an
allocation because it reduces fragmentation if a free chunk of the
appropriate size is available. when that's not the case, another
approach may be better, but any such improvement would be independent
of fixing this bug.
only disable aligned_alloc if malloc was replaced but it wasn't 2020年06月11日T02:05:03+00:00 Rich Felker dalias@aerifal.cx 2020年06月11日T02:05:03+00:00 1fc67fc117f9d25d240d46bbef78ebccacec7097 it both malloc and aligned_alloc have been replaced but the internal aligned_alloc still gets called, the replacement is a wrapper of some sort. it's not clear if this usage should be officially supported, but it's at least a plausibly interesting debugging usage, and easy to do. it should not be relied upon unless it's documented as supported at some later time.
it both malloc and aligned_alloc have been replaced but the internal
aligned_alloc still gets called, the replacement is a wrapper of some
sort. it's not clear if this usage should be officially supported, but
it's at least a plausibly interesting debugging usage, and easy to do.
it should not be relied upon unless it's documented as supported at
some later time.
reintroduce calloc elison of memset for direct-mmapped allocations 2020年06月11日T00:51:17+00:00 Rich Felker dalias@aerifal.cx 2020年06月11日T00:44:51+00:00 25cef5c591fbee755a53f0d7920f4f554f343a53 a new weak predicate function replacable by the malloc implementation, __malloc_allzerop, is introduced. by default it's always false; the default version will be used when static linking if the bump allocator was used (in which case performance doesn't matter) or if malloc was replaced by the application. only if the real internal malloc is linked (always the case with dynamic linking) does the real version get used. if malloc was replaced dynamically, as indicated by __malloc_replaced, the predicate function is ignored and conditional-memset is always performed.
a new weak predicate function replacable by the malloc implementation,
__malloc_allzerop, is introduced. by default it's always false; the
default version will be used when static linking if the bump allocator
was used (in which case performance doesn't matter) or if malloc was
replaced by the application. only if the real internal malloc is
linked (always the case with dynamic linking) does the real version
get used.
if malloc was replaced dynamically, as indicated by __malloc_replaced,
the predicate function is ignored and conditional-memset is always
performed.
move __malloc_replaced to a top-level malloc file 2020年06月11日T00:42:54+00:00 Rich Felker dalias@aerifal.cx 2020年06月11日T00:42:54+00:00 501a92660ca2ed6469a0a01474574ce9930356cd it's not part of the malloc implementation but glue with musl dynamic linker.
it's not part of the malloc implementation but glue with musl dynamic
linker.
switch to a common calloc implementation 2020年06月11日T00:34:34+00:00 Rich Felker dalias@aerifal.cx 2020年06月10日T23:41:27+00:00 28f64fa6caeb621780bf116bc8cae7d7a48b477e abstractly, calloc is completely malloc-implementation-independent; it's malloc followed by memset, or as we do it, a "conditional memset" that avoids touching fresh zero pages. previously, calloc was kept separate for the bump allocator, which can always skip memset, and the version of calloc provided with the full malloc conditionally skipped the clearing for large direct-mmapped allocations. the latter is a moderately attractive optimization, and can be added back if needed. however, further consideration to make it correct under malloc replacement would be needed. commit b4b1e10364c8737a632be61582e05a8d3acf5690 documented the contract for malloc replacement as allowing omission of calloc, and indeed that worked for dynamic linking, but for static linking it was possible to get the non-clearing definition from the bump allocator; if not for that, it would have been a link error trying to pull in malloc.o. the conditional-clearing code for the new common calloc is taken from mal0_clear in oldmalloc, but drops the need to access actual page size and just uses a fixed value of 4096. this avoids potentially needing access to global data for the sake of an optimization that at best marginally helps archs with offensively-large page sizes.
abstractly, calloc is completely malloc-implementation-independent;
it's malloc followed by memset, or as we do it, a "conditional memset"
that avoids touching fresh zero pages.
previously, calloc was kept separate for the bump allocator, which can
always skip memset, and the version of calloc provided with the full
malloc conditionally skipped the clearing for large direct-mmapped
allocations. the latter is a moderately attractive optimization, and
can be added back if needed. however, further consideration to make it
correct under malloc replacement would be needed.
commit b4b1e10364c8737a632be61582e05a8d3acf5690 documented the
contract for malloc replacement as allowing omission of calloc, and
indeed that worked for dynamic linking, but for static linking it was
possible to get the non-clearing definition from the bump allocator;
if not for that, it would have been a link error trying to pull in
malloc.o.
the conditional-clearing code for the new common calloc is taken from
mal0_clear in oldmalloc, but drops the need to access actual page size
and just uses a fixed value of 4096. this avoids potentially needing
access to global data for the sake of an optimization that at best
marginally helps archs with offensively-large page sizes.

AltStyle によって変換されたページ (->オリジナル) /