this sets the stage for replacement, and makes it practical to keep oldmalloc around as a build option for a while if that ends up being useful. only the files which are actually part of the implementation are moved. memalign and posix_memalign are entirely generic. in theory calloc could be pulled out too, but it's useful to have it tied to the implementation so as to optimize out unnecessary memset when implementation details make it possible to know the memory is already clear.
this function is no longer used elsewhere, and moving it reduces the number of source files specific to the malloc implementation.
this has been a longstanding issue reported many times over the years, with it becoming increasingly clear that it could be hit in practice. under concurrent malloc and free from multiple threads, it's possible to hit usage patterns where unbounded amounts of new memory are obtained via brk/mmap despite the total nominal usage being small and bounded. the underlying cause is that, as a fundamental consequence of keeping locking as fine-grained as possible, the state where free has unbinned an already-free chunk to merge it with a newly-freed one, but has not yet re-binned the combined chunk, is exposed to other threads. this is bad even with small chunks, and leads to suboptimal use of memory, but where it really blows up is where the already-freed chunk in question is the large free region "at the top of the heap". in this situation, other threads momentarily see a state of having almost no free memory, and conclude that they need to obtain more. as far as I can tell there is no fix for this that does not harm performance. the fix made here forces all split/merge of free chunks to take place under a single lock, which also takes the place of the old free_lock, being held at least momentarily at the time of free to determine whether there are neighboring free chunks that need merging. as a consequence, the pretrim, alloc_fwd, and alloc_rev operations no longer make sense and are deleted. simplified merging now takes place inline in free (__bin_chunk) and realloc. as commented in the source, holding the split_merge_lock precludes any chunk transition from in-use to free state. for the most part, it also precludes change to chunk header sizes. however, __memalign may still modify the sizes of an in-use chunk to split it into two in-use chunks. arguably this should require holding the split_merge_lock, but that would necessitate refactoring to expose it externally, which is a mess. and it turns out not to be necessary, at least assuming the existing sloppy memory model malloc has been using, because if free (__bin_chunk) or realloc sees any unsynchronized change to the size, it will also see the in-use bit being set, and thereby can't do anything with the neighboring chunk that changed size.
the design used here relies on the barrier provided by the first lock operation after the process returns to single-threaded state to synchronize with actions by the last thread that exited. by storing the intent to change modes in the same object used to detect whether locking is needed, it's possible to avoid an extra (possibly costly) memory load after the lock is taken.
after all but the last thread exits, the next thread to observe libc.threads_minus_1==0 and conclude that it can skip locking fails to synchronize with any changes to memory that were made by the last-exiting thread. this can produce data races. on some archs, at least x86, memory synchronization is unlikely to be a problem; however, with the inline locks in malloc, skipping the lock also eliminated the compiler barrier, and caused code that needed to re-check chunk in-use bits after obtaining the lock to reuse a stale value, possibly from before the process became single-threaded. this in turn produced corruption of the heap state. some uses of libc.threads_minus_1 remain, especially for allocation of new TLS in the dynamic linker; otherwise, it could be removed entirely. it's made non-volatile to reflect that the remaining accesses are only made under lock on the thread list. instead of libc.threads_minus_1, libc.threaded is now used for skipping locks. the difference is that libc.threaded is permanently true once an additional thread has been created. this will produce some performance regression in processes that are mostly single-threaded but occasionally creating threads. in the future it may be possible to bring back the full lock-skipping, but more care needs to be taken to produce a safe design.
commit 618b18c78e33acfe54a4434e91aa57b8e171df89 removed the previous detection and hardening since it was incorrect. commit 72141795d4edd17f88da192447395a48444afa10 already handled all that remained for hardening the static-linked case. in the dynamic-linked case, have the dynamic linker check whether malloc was replaced and make that information available. with these changes, the properties documented in commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732 are restored: if calloc is not provided, it will behave as malloc+memset, and any of the memalign-family functions not provided will fail with ENOMEM.
this change serves multiple purposes: 1. it ensures that static linking of memalign-family functions will pull in the system malloc implementation, thereby causing link errors if an attempt is made to link the system memalign functions with a replacement malloc (incomplete allocator replacement). 2. it eliminates calls to free that are unpaired with allocations, which are confusing when setting breakpoints or tracing execution. as a bonus, making __bin_chunk external may discourage aggressive and unnecessary inlining of it.
commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732 included checks to make calloc fallback to memset if used with a replaced malloc that didn't also replace calloc, and the memalign family fail if free has been replaced. however, the checks gave false positives for replacement whenever malloc or free resolved to a PLT entry in the main program. for now, disable the checks so as not to leave libc in a broken state. this means that the properties documented in the above commit are no longer satisfied; failure to replace calloc and the memalign family along with malloc is unsafe if they are ever called. the calloc checks were correct but useless for static linking. in both cases (simple or full malloc), calloc and malloc are in a source file together, so replacement of one but not the other would give linking errors. the memalign-family check was useful for static linking, but broken for dynamic as described above, and can be replaced with a better link-time check.