Update: I have another post that covers more ground (ELF loading, kernel module loading etc): http://lastweek.io/notes/dynamic_linking/.
csu/libc-start.c__libc_start_main()is the entry point. Inside, it will call__libc_csu_init(). Then it will call user'smain().- Great reference: Linux x86 Program Start Up. I print a PDF copy in this repo.
- ELF's
.interpsection points to the dynamic linker, and here it is. - Related code:
elf/rtld.c,sysdep/generic,sysdep/x86_64/, and more - Inside
dl_main(), you can see howLD_PRELOADis handled. GOT[1]contains address of thelink_mapdata structure.GOT[2]points to_dl_runtime_resolve()! This is the runtime dynamic linker entry point.
File sysdep/generic/dl-machine.c populates GOT[1] and GOT[2].
/* Set up the loaded object described by L so its unrelocated PLT entries will jump to the on-demand fixup code in dl-runtime.c. */ static inline int elf_machine_runtime_setup (struct link_map *l, int lazy) { extern void _dl_runtime_resolve (Elf32_Word); if (lazy) { /* The GOT entries for functions in the PLT have not yet been filled in. Their initial contents will arrange when called to push an offset into the .rel.plt section, push _GLOBAL_OFFSET_TABLE_[1], and then jump to _GLOBAL_OFFSET_TABLE[2]. */ Elf32_Addr *got = (Elf32_Addr *) D_PTR (l, l_info[DT_PLTGOT]); got[1] = (Elf32_Addr) l; /* Identify this shared object. */ /* This function will get called to fix up the GOT entry indicated by the offset on the stack, and then jump to the resolved address. */ got[2] = (Elf32_Addr) &_dl_runtime_resolve; } return lazy; }
_dl_runtime_resolve() is architecture specific and has a mix of assembly and C code.
The flow is similar to the syscall handling: it first saves the registers,
then calling the actual resolver, then restore all saved registers.
For 64bit x86, the source code is in sysdeps/x86_64/dl-trampoline.h:
.globl _dl_runtime_resolve .type _dl_runtime_resolve, @function _dl_runtime_resolve: ... ... # Copy args pushed by PLT in register. # %rdi: link_map, %rsi: reloc_index mov (LOCAL_STORAGE_AREA +8)(%BASE), %RSI_LP mov LOCAL_STORAGE_AREA(%BASE), %RDI_LP call _dl_fixup # Call resolver. mov %RAX_LP, %R11_LP # Save return value ...
Bingo, _dl_fixup() is the final piece of the runtime dynamic linker resolver. We could find it in elf/dl-runtime.c, which is a file for on-demand PLT fixup.:
/* This function is called through a special trampoline from the PLT the first time each PLT entry is called. We must perform the relocation specified in the PLT of the given shared object, and return the resolved function address to the trampoline, which will restart the original call to that address. Future calls will bounce directly from the PLT to the function. */ DL_FIXUP_VALUE_TYPE attribute_hidden __attribute ((noinline)) ARCH_FIXUP_ATTRIBUTE _dl_fixup ( # ifdef ELF_MACHINE_RUNTIME_FIXUP_ARGS ELF_MACHINE_RUNTIME_FIXUP_ARGS, # endif struct link_map *l, ElfW(Word) reloc_arg) { ... }
Understanding this piece of code requires some effort. Happy hacking!
Most recent ELF produced by GCC is slightly different than
the ones described by previous textbook or papers.
The difference is small, though. You should use man elf to check latest.
- When a program imports a certain function or variable, the linker
will include a string with the function or variable’s name in the
.dynstrsection. - A symbol (Elf Sym) that refers to the function or variable's name in the
.dynsymsection, and a relocation (Elf Rel) pointing to that symbol in the.rela.pltsection. .rela.dynand.rela.pltare for imported variables and functions, respectively..pltis the normal one, it has instructions..gotand.got.pltmaybe the first is for variable, and the latter is for function. But essentially the same global offset table functionality.
Relationship among .dynstr, .dynsym, .rela.dyn or .rela.plt. Credit: link:
image1
PIC Lazy Binding. Credit: link: image2
Also that nowadays, even an non-PIC binary will always have GOT and PLT sections. In theory, it probably should use load-time relocation. I suspect GOT and PLT are adopted for the following 2 reasons: a) load-time relocation needs to modify code and this not good during time. Especially considering code section probably is not writable. b) GOT/PLT's lazy-binding has performance win at start-up time. However, keep in mind that GOT/PLT's lazy-bindling pay extra runtime cost!
Reading: