Assembly Language - Waikato Linux Users Group

Penguin
Recent Changes

AssemblyLanguage is 1:1 translation of MachineCode into English mnemonics.

The Art of AssemblyLanguage Programming is a delicate topic. By programming in AssemblyLanguage you can hand optimize code and achieve efficiency that is difficult if not impossible to duplicate in a higher level language. However, current computers are fast enough to write most code in less efficient higher level languages. AssemblyLanguage is still used for embedded systems (where space and CPU speed are limited), and in parts of an OperatingSystem that are run very frequently or must run fast (InterruptHandlers etc.). Some parts of the GNU C library are also written in assembly for the same reasons (for example, some of the maths functions).

AssemblyLanguage code is not portable across different CPU architectures, of which there are many: Intel x86, MIPS, and the Motorola m68000 series, to name but a few. Early versions of Unix were written in assembler, and when BellLabs got new machines, they re-wrote their operating system for the new MachineCode, until they finally re-wrote most of it in C in 1973.

AssemblyLanguage code is difficult to understand and maintain. It is usually easier to start from scratch than to debug faulty code.

A Compiler such as GCC will hide its generation of AssemblyLanguage code from you as it generates its object files and the executables. It is however possible to tell it to generate the AssemblyLanguage code for you by passing it the -S CommandLine option

Here is an example. First, the C code:

#include <stdio.h>
int main(void) {
 int i;
 i = 5;
 i = i * 3;
 printf("%d\n",i);
 i = 0xff;
 return i;
}

Now you can translate this to assembler. If I do this on an x86 (ie Intel machine), I get:

$ gcc -S x.c && cat x.s
 .file "x.c"
 .section .rodata
.LC0:
 .string "%d\n"
 .text
.globl main
 .type main, @function
main:
 pushl %ebp
 movl %esp, %ebp
 subl 8,ドル %esp
 andl $-16, %esp
 movl 0,ドル %eax
 addl 15,ドル %eax
 addl 15,ドル %eax
 shrl 4,ドル %eax
 sall 4,ドル %eax
 subl %eax, %esp
 movl 5,ドル -4(%ebp)
 movl -4(%ebp), %edx
 movl %edx, %eax
 addl %eax, %eax
 addl %edx, %eax
 movl %eax, -4(%ebp)
 subl 8,ドル %esp
 pushl -4(%ebp)
 pushl $.LC0
 call printf
 addl 16,ドル %esp
 movl 255,ドル -4(%ebp)
 movl -4(%ebp), %eax
 leave
 ret
 .size main, .-main
 .section .note.GNU-stack,"",@progbits
 .ident "GCC: (GNU) 3.4.6"

movl, jmp, addl, etc are mnemonics for individual CPU instruction OpCodes. %esp, %ebp etc are mnemonics for registers. For example, %esp is the Stack Pointer - it points to the top of the current process's Stack. The first movl copies the value in %esp into %ebp, then the subl subtracts 24 off %esp, so that the Stack has grown by 24 bytes. The next movl copies the value 5 into Stack, 4 bytes below its end. This address is where the variable i is being stored, so all accesses to i in the C code become references to this memory location in MachineCode. We can also witness an optimization here: instead of doing i*3, it does i+(i+i). That's the addl and leal instructions. Below that, it puts some pointers (to printf's arguments) on the stack and calls printf, which pulls its arguments from the stack.

As you can see, explaining what AssemblyLanguage code is doing line-by-line is tediously boring. This is how programmers used to write code, and it is a common fact that AssemblyLanguage programmers get paid more per line of code than those who hack away in higher level languages.

We can also note that it is extremely bad for your health to rely on the GCC output of some C code when learning x86 AssemblyLanguage. GCC generates extremely horrid code on occassion, especially when working with multiplication and division because x86 multiplication and division instructions are restricted in the registers they can use.

However, the output of GCC can be a tremendously useful resource when optimising C code. Especialy when mixing different sizes of integers (char, int, long), the resulting MachineCode is sometimes flooded with unexpected typecasting instructions. While concealed at the C level, these extra instructions are quite obvious in the AssemblyLanguage (lots of and instructions and often additional mov).

Another sample piece of AssemblyLanguage code for Linux can be found in the HelloWorld page.


CategoryProgrammingLanguages

AltStyle によって変換されたページ (->オリジナル) /