Is it possible to write (pack) a shorter than 145 bytes version of a program with "Hello world" (plus new line) output if the length of the program is measured as a number of bytes in program's ELF (x86) representation? Reduction technique in mind is described here:
Please reply with providing a code example. Note, 1 says "Hi world" and [2] returns 42 instead of an output, hence I assume current solution to be 142 + (len("Hello") - len("Hi")) which is 145.
[EDIT]
Limits of ELF: syntactically a program translated into ELF is determined by its interpreter (libc), which is a combination of:
- Target architecture (Generic, AMD64, ARM, IA-32, MIPS, etc.)
- OS kernel (which communicates with implementation of ELF interpreter and often is itself packed into an ELF binary)
From TIS spec. 1.2
There is one valid program interpreter for programs conforming to the ELF specification for the Intel architecture: /usr/lib/libc.so.1
In theory - all clear - multiple combinations times version differences (Arch, OS) are possible. In practice - because standard is less strict than any BNF for CFG or other finer (smaller) formal language - there can be implementation differences (options) including shorter length of the program.
On one hand, because ELF is not that precise (one naturally expects) it is much more difficult to do code-golf with it. Hence a "Hello world\n" program is expected to be by default in TIS ELF 1.2, x86 (Generic Intel), Linux 2.6.(20+). On the other hand having a shorter ELF that runs e.g. on *BSD seems like an extremely valuable knowledge to me!
For example ELF64 (latest draft) is much more interesting incl. the differences with ELF(x86). So, please share a solution that can be accepted with correction to specific configuration.
Motivation: It is the practical part that is interesting for me and code-golf is a very nice way to show how tricky it is to come up with any machine- (or even byte) code in general and why ELF does apparently such a good job (in my understanding). Thus a golf solution is not only cool per say but also can provide a practical knowledge of unexpected interpretation differences of ELF (if an alternative combination is given).
2 Answers 2
Even if you require full adherence to the ELF specification, you can squeeze it into 98 bytes:
org 0x04B34000
db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident
dd 0, 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw 0x34 ; e_ehsize
dw 0x20 ; e_phentsize
phdr: dd 1 ; e_phnum ; p_type
; e_shentsize
dd 0 ; e_shnum ; p_offset
; e_shstrndx
db 0 ; p_vaddr
_start: inc eax
mov bl, 4
mov dl, 12 ; p_paddr
jmp short part2
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
str: db 'Hello world', 10
part2: mov ecx, str
again: xchg eax, ebx
int 0x80
jmp short again
filesize equ $ - $$
Or if you prefer hex bytes:
7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 02 00 03 00 01 00 00 00 35 40 B3 04 2C 00 00 00 00 00 00 00 00 00 00 00 34 00 20 00 01 00 00 00 00 00 00 00 00 40 B3 04 B2 0C EB 1C 62 00 00 00 62 00 00 00 05 00 00 00 00 10 00 00 48 65 6C 6C 6F 20 77 6F 72 6C 64 0A B9 4C 40 B3 04 93 CD 80 EB FB
The ELF specification explicitly permits different sections to overlap, so it is perfectly legal to let the program segment header table share bytes with the ELF header. Furthermore, the x86 version of the ELF standard states that the p_paddr field is ignored, as opposed to merely being unused (which typically comes with the requirement that it be set to zero. Thus it is safe to contain arbitrary bytes.
Finally, three more bytes are saved by overlapping the code with the load address.
-
1\$\begingroup\$ Direct from the source ;) The xchg/jmp is particularly clever. \$\endgroup\$primo– primo2012年05月05日 08:41:43 +00:00Commented May 5, 2012 at 8:41
-
1\$\begingroup\$ to make it work on amd64, I've added
BITS 32at the very top. It doesn't change the byte count in the executable. \$\endgroup\$jfs– jfs2016年05月16日 20:41:08 +00:00Commented May 16, 2016 at 20:41 -
\$\begingroup\$ Try it online! \$\endgroup\$Chris– Chris2020年10月02日 04:31:21 +00:00Commented Oct 2, 2020 at 4:31
Looking at link [2] A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux, by appending the literal string 'Hello, world!' to the end (instead of letting the last 7 bytes auto fill with zeros), you should be able to squeeze it to 59 bytes, assuming your print logic isn't longer than 8 bytes.
In fact, the user 'breadbox', who I assume to be the writer of the post, has posted a 57 byte version on anarchy golf. It might be useful to look at some of his other post mortems: they seem to be all ELF format.
EDIT: 61 bytes
Using breadbox's nested header approach, I was able to produce this 61 byte solution:
BITS 32
org 0x05000000
db 0x7F, "ELF"
dd 1
dd 0
dd $$
dw 2
dw 3
dd 0x0500001B
dd 0x0500001B
dd 4
mov dl, 12
mov ecx, msg
int 0x80
db 0x25
dw 0x20
dw 0x01
inc eax
int 0x80
msg db 'Hello world', 10
Assemble as:
nasm -fbin hello.asm
chmod +x hello
./hello
Producing:
00000000 7f 45 4c 46 01 00 00 00 00 00 00 00 00 00 00 05 |.ELF............|
00000010 02 00 03 00 1b 00 00 05 1b 00 00 05 04 00 00 00 |................|
00000020 b2 0c b9 31 00 00 05 cd 80 25 20 00 01 00 40 cd |...1.....% ...@.|
00000030 80 48 65 6c 6c 6f 20 77 6f 72 6c 64 0a |.Hello world.|
The code begins at 0x001B:
0540000000 add eax, 0x00000004 ;sys_write
B20C mov dl, 12 ;message length
B92E000005 mov ecx, msg ;message pointer
CD80 int 0x80 ;syscall
2520000100 and eax, 0x00010020 ;clears eax
40 inc eax ;sys_exit
CD80 int 0x80 ;syscall
The 32-bit add is necessary (instead of mov al, 4 for example), because it's also used as the program header offset, and needs to be exactly 4 (where the program header starts in the above code). The and later on is used to clear eax, because 0x20 and 0x01 are unavoidable header values.
EDIT: 111 bytes (with proper headers)
If, on the other hand, you'd rather your headers adhere to the ELF specifications (even if your kernel ignores them), you could use this instead:
BITS 32
org 0x08048000
ehdr: ; Elf32_Ehdr
db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
phdrsize equ $ - phdr
_start: mov al, 4
mov dl, msg_len
mov ecx, msg
int 0x80
mov al, 1
int 0x80
msg db 'Hello world', 10
msg_len equ $ - msg
filesize equ $ - $$
It's tempting to overlap the last 8 bytes of the ehdr with the first 8 bytes of the phdr, since they are identical (and therefore, all headers would be correct), but in the spirit of being proper, I decided not to.
This assembles to a 111 byte solution:
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 03 00 01 00 00 00 54 80 04 08 34 00 00 00 |........T...4...|
00000020 00 00 00 00 00 00 00 00 34 00 20 00 01 00 00 00 |........4. .....|
00000030 00 00 00 00 01 00 00 00 00 00 00 00 00 80 04 08 |................|
00000040 00 80 04 08 6f 00 00 00 6f 00 00 00 05 00 00 00 |....o...o.......|
00000050 00 10 00 00 b0 04 b2 0c b9 63 80 04 08 cd 80 b0 |.........c......|
00000060 01 cd 80 48 65 6c 6c 6f 20 77 6f 72 6c 64 0a |...Hello world.|
-
\$\begingroup\$ That tutorial says "half of the values in this file violate some part of the ELF standard", whereas Adam says his is "the minimal possible ELF without doing any sneaky trickery from that article". Since the tutorial is cited by the OP, it looks like I misread the present question as requiring adherence to the ELF standard. OTOH, the OP says in a comment that it's "about limits of ELF", so I'm not sure. \$\endgroup\$r.e.s.– r.e.s.2012年05月04日 11:51:57 +00:00Commented May 4, 2012 at 11:51
-
\$\begingroup\$ On my x86-64 machine running 32-bit Ubuntu, the 60-byte executable outputs "Hello, world!\n", then causes a "Segmentation fault" message. (Neverminding the run-time error, the OP requires just "Hello world\n" -- saving you two more bytes.) \$\endgroup\$r.e.s.– r.e.s.2012年05月04日 14:06:55 +00:00Commented May 4, 2012 at 14:06
-
\$\begingroup\$ This was the result of a missing system exit call, and I've corrected for this at the cost of 3 bytes. I've also added a 111 'clean' version, in which all of the headers are set correctly. \$\endgroup\$primo– primo2012年05月04日 16:42:33 +00:00Commented May 4, 2012 at 16:42
-
2\$\begingroup\$ I haven't actually read the ELF standard, but I'd be surprised if it explicitly said that the headers can't overlap (as long as all the values are still set according to spec). So I'd guess the 8 byte saving ought to be fine. \$\endgroup\$Ilmari Karonen– Ilmari Karonen2012年05月04日 21:44:12 +00:00Commented May 4, 2012 at 21:44
-
\$\begingroup\$ The 61-byte solution given here is largely equivalent to the 62-byte solution I have on my page, the extra char being the comma that's missing from this version. The 57-byte version only works on certain versions of the 2.4 kernel; by loading to address zero, it allows for greater interspersing of code inside the header fields. (See the above web page for a link and more details.) \$\endgroup\$breadbox– breadbox2012年05月07日 08:48:21 +00:00Commented May 7, 2012 at 8:48
7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 02 00 03 00 01 00 00 00 54 80 04 08 34 00 00 00 00 00 00 00 00 00 00 00 34 00 20 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 80 04 08 00 80 04 08 74 00 00 00 74 00 00 00 05 00 00 00 00 10 00 00 B0 04 31 DB 43 B9 69 80 04 08 31 D2 B2 0C CD 80 31 C0 40 CD 80 48 65 6C 6C 6F 20 77 6F 72 6C 64 0A. \$\endgroup\$