musl/src/malloc/malloc.c, branch master

move oldmalloc to its own directory under src/malloc

2020年06月03日T23:23:02+00:00

this sets the stage for replacement, and makes it practical to keep
oldmalloc around as a build option for a while if that ends up being
useful.
only the files which are actually part of the implementation are
moved. memalign and posix_memalign are entirely generic. in theory
calloc could be pulled out too, but it's useful to have it tied to the
implementation so as to optimize out unnecessary memset when
implementation details make it possible to know the memory is already
clear.

move __expand_heap into malloc.c

2020年06月03日T23:17:19+00:00

this function is no longer used elsewhere, and moving it reduces the
number of source files specific to the malloc implementation.

fix unbounded heap expansion race in malloc

2020年06月02日T23:39:37+00:00

this has been a longstanding issue reported many times over the years,
with it becoming increasingly clear that it could be hit in practice.
under concurrent malloc and free from multiple threads, it's possible
to hit usage patterns where unbounded amounts of new memory are
obtained via brk/mmap despite the total nominal usage being small and
bounded.
the underlying cause is that, as a fundamental consequence of keeping
locking as fine-grained as possible, the state where free has unbinned
an already-free chunk to merge it with a newly-freed one, but has not
yet re-binned the combined chunk, is exposed to other threads. this is
bad even with small chunks, and leads to suboptimal use of memory, but
where it really blows up is where the already-freed chunk in question
is the large free region "at the top of the heap". in this situation,
other threads momentarily see a state of having almost no free memory,
and conclude that they need to obtain more.
as far as I can tell there is no fix for this that does not harm
performance. the fix made here forces all split/merge of free chunks
to take place under a single lock, which also takes the place of the
old free_lock, being held at least momentarily at the time of free to
determine whether there are neighboring free chunks that need merging.
as a consequence, the pretrim, alloc_fwd, and alloc_rev operations no
longer make sense and are deleted. simplified merging now takes place
inline in free (__bin_chunk) and realloc.
as commented in the source, holding the split_merge_lock precludes any
chunk transition from in-use to free state. for the most part, it also
precludes change to chunk header sizes. however, __memalign may still
modify the sizes of an in-use chunk to split it into two in-use
chunks. arguably this should require holding the split_merge_lock, but
that would necessitate refactoring to expose it externally, which is a
mess. and it turns out not to be necessary, at least assuming the
existing sloppy memory model malloc has been using, because if free
(__bin_chunk) or realloc sees any unsynchronized change to the size,
it will also see the in-use bit being set, and thereby can't do
anything with the neighboring chunk that changed size.

restore lock-skipping for processes that return to single-threaded state

2020年05月22日T21:45:47+00:00

the design used here relies on the barrier provided by the first lock
operation after the process returns to single-threaded state to
synchronize with actions by the last thread that exited. by storing
the intent to change modes in the same object used to detect whether
locking is needed, it's possible to avoid an extra (possibly costly)
memory load after the lock is taken.

don't use libc.threads_minus_1 as relaxed atomic for skipping locks

2020年05月22日T21:39:57+00:00

after all but the last thread exits, the next thread to observe
libc.threads_minus_1==0 and conclude that it can skip locking fails to
synchronize with any changes to memory that were made by the
last-exiting thread. this can produce data races.
on some archs, at least x86, memory synchronization is unlikely to be
a problem; however, with the inline locks in malloc, skipping the lock
also eliminated the compiler barrier, and caused code that needed to
re-check chunk in-use bits after obtaining the lock to reuse a stale
value, possibly from before the process became single-threaded. this
in turn produced corruption of the heap state.
some uses of libc.threads_minus_1 remain, especially for allocation of
new TLS in the dynamic linker; otherwise, it could be removed
entirely. it's made non-volatile to reflect that the remaining
accesses are only made under lock on the thread list.
instead of libc.threads_minus_1, libc.threaded is now used for
skipping locks. the difference is that libc.threaded is permanently
true once an additional thread has been created. this will produce
some performance regression in processes that are mostly
single-threaded but occasionally creating threads. in the future it
may be possible to bring back the full lock-skipping, but more care
needs to be taken to produce a safe design.

move declarations for malloc internals to malloc_impl.h

2018年09月12日T18:34:28+00:00

reintroduce hardening against partially-replaced allocator

2018年04月20日T02:22:11+00:00

commit 618b18c78e33acfe54a4434e91aa57b8e171df89 removed the previous
detection and hardening since it was incorrect. commit
72141795d4edd17f88da192447395a48444afa10 already handled all that
remained for hardening the static-linked case. in the dynamic-linked
case, have the dynamic linker check whether malloc was replaced and
make that information available.
with these changes, the properties documented in commit
c9f415d7ea2dace5bf77f6518b6afc36bb7a5732 are restored: if calloc is
not provided, it will behave as malloc+memset, and any of the
memalign-family functions not provided will fail with ENOMEM.

return chunks split off by memalign using __bin_chunk instead of free

2018年04月20日T00:56:26+00:00

this change serves multiple purposes:
1. it ensures that static linking of memalign-family functions will
pull in the system malloc implementation, thereby causing link errors
if an attempt is made to link the system memalign functions with a
replacement malloc (incomplete allocator replacement).
2. it eliminates calls to free that are unpaired with allocations,
which are confusing when setting breakpoints or tracing execution.
as a bonus, making __bin_chunk external may discourage aggressive and
unnecessary inlining of it.

move malloc implementation types and macros to an internal header

2018年04月19日T22:44:17+00:00

revert detection of partially-replaced allocator

2018年04月19日T19:25:48+00:00

commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732 included checks to
make calloc fallback to memset if used with a replaced malloc that
didn't also replace calloc, and the memalign family fail if free has
been replaced. however, the checks gave false positives for
replacement whenever malloc or free resolved to a PLT entry in the
main program.
for now, disable the checks so as not to leave libc in a broken state.
this means that the properties documented in the above commit are no
longer satisfied; failure to replace calloc and the memalign family
along with malloc is unsafe if they are ever called.
the calloc checks were correct but useless for static linking. in both
cases (simple or full malloc), calloc and malloc are in a source file
together, so replacement of one but not the other would give linking
errors. the memalign-family check was useful for static linking, but
broken for dynamic as described above, and can be replaced with a
better link-time check.