this eliminates the last need for the SHARED macro to control how files in the src tree are compiled. the same code is used for both libc.a and libc.so, with additional code for the dynamic linker (from the new ldso tree) being added to libc.so but not libc.a. separate .o and .lo object files still exist for the src tree, but the only difference is that the .lo files are built as PIC. in the future, if/when we add dlopen support for static-linked programs, much of the code in dynlink.c may be moved back into the src tree, but properly factored into separate source files. in that case, the code in the ldso tree will be reduced to just the dynamic linker entry point, self-relocation, and loading of libraries needed by the main application.
the function name is still __-prefixed because it requires an asm wrapper to pass the caller's address in order for RTLD_NEXT to work. since this was the last function in dynlink.c still used for static linking, now the whole file is conditional on SHARED being defined.
the ultimate goal of this change is to get all code used in libc.a out of dynlink.c, so that the dynamic linker code can be moved to its own tree and object files in the src tree can all be shared between libc.a and libc.so.
if two or more threads accessed tls in a dso that was loaded after the threads were created, then __tls_get_new could do out-of-bound memory access (leading to segfault). accidentally byte count was used instead of element count when the new dtv pointer was computed. (dso->new_dtv is (void**).) it is rare that the same dso provides dtv for several threads, the crash was not observed in practice, but possible to trigger.