AT&T assembly - Basic loop & write - follow-up

Question 1

This is a follow-up question to this one: AT&T Assembly - Basic loop & write

The code loops to display "Hello, World!" ten times.

I implemented the syscall instead of int 0ドルx80, used a decrementing loop to avoid a useless instruction and commented the code. Is there any way to make it better?

When debugging with GDB, it appears exit 0ドル is part of l. However, I would like it to be part of _start (since l represents the loop only). Is that possible?

.section .data
 hello: .string "Hello, world!\n"
 len = . -hello
 .equ EXIT, 60
 .equ WRITE, 1
 .equ STDOUT, 1
.section .bss
 # Write a str of length len on the standard output.
 .macro write str, len
 movq $WRITE, %rax
 movq $STDOUT, %rdi
 movq \str, %rsi
 movq \len, %rdx
 syscall
 .endm
 # Exit with the specified error code
 .macro exit code
 movq $EXIT, %rax
 movq \code, %rdi
 syscall
 .endm
.section .text
 .globl _start
 # Loops to display "Hello, World!" ten times.
 _start:
 movq 10,ドル %r8 # Counter
 l:
 write $hello, $len
 dec %r8
 jnz l
 exit 0ドル

Question 2

This is much better.

Since this is v2, let me introduce you a couple more slightly advanced concepts.

First: The second time you execute write, what will be in (most of) those registers (rdi, rsi, rdx)?

It's easy to think in terms of C (or practically every other high level language), and how variables have 'scope.' But asm's registers (even when used with call) don't work that way.

The is nothing 'wrong' with using macros like this. It makes reading / maintaining much easier. And in more complex examples, you probably wouldn't be able to assume that these registers haven't been modified since the last write. But since we are critiquing this particular case, be aware that the code is slightly less efficient than it could be (since it re-assigns values that are already there 3 * 9 times).

Second (follows from the first):

write modifies a LOT of registers (as a percentage of how many registers there are). In addition to the ones you see (rax, rdi, rsi, rdx), there are also the registers 'clobbered' by using syscall (rcx, r11). It's going to be difficult to compose complex code if every time you want to write something, that many registers get destroyed.

This leads us to the concept of 'calling conventions' (something I've written about before). In short, you come up with (or use an existing) standard set of rules about what registers your macros/routines are allow to modify, then avoid using those 'volatile' registers in the routines that call them (or save the contents of the registers yourself before making the call). And if the macros/routines need more registers than your 'rules' allow, they must push/pop all the additional registers to preserve their contents.

Yes, this is starting to get a bit deeper. But register handling is something you need to start thinking about if you want to read/write asm. Additionally, it starts to explain some of the 'junk' you see when you disassemble your C code. Those push/pops you see around the calls and at the top of functions? This is why they are there.

I could talk about the cost/benefits of macros vs routines, but this is probably enough

A few more thoughts:

Add a comment at the top of the file. Even something simple like The code loops to display "Hello, World!" ten times.
Stylistically, perhaps . - hello? Negative hello seems weird.

Yes, nitpicky. But that's what you get when you ask for code reviews on such basic code.

Edit1:

Hmm. You haven't accepted an answer yet. Were you looking for something else? Or perhaps I wasn't clear about what I was trying to explain? Let me take a shot at explaining this again:

Using macros isn't the same as defining routines. Instead of defining the code in one place and calling it from other places, it pastes the entire macro in where ever it is invoked.

So your code basically expands to:

movq 10,ドル %r8
l:
 movq 1,ドル %rax
 movq 1,ドル %rdi
 movq $hello, %rsi
 movq 15,ドル %rdx
 syscall
 dec %r8
 jnz l
movq 60,ドル %rax
movq 0,ドル %rdi
syscall

If you were to call write twice in a row, you'd get:

movq 10,ドル %r8
l:
 movq 1,ドル %rax
 movq 1,ドル %rdi
 movq $hello, %rsi
 movq 15,ドル %rdx
 syscall
 movq 1,ドル %rax
 movq 1,ドル %rdi
 movq $hello, %rsi
 movq 15,ドル %rdx
 syscall
 dec %r8
 jnz l
movq 60,ドル %rax
movq 0,ドル %rdi
syscall

Now, if I'm not mistaken, the syscall you are using overwrites rax (uses it as a return value), but leaves the other parameters alone. Such being the case, it would be slightly more efficient to write:

movq 1,ドル %rdi
movq $hello, %rsi
movq 15,ドル %rdx
movq 10,ドル %r8
l:
 movq 1,ドル %rax # or perhaps movq %rdi, %rax?
 syscall
 dec %r8
 jnz l
movq 60,ドル %rax
movq 0,ドル %rdi
syscall

This doesn't lend itself well to macros. But if performance was more important to you than "maintainability," this would be (slightly) better.

Alternately, you could do this as a routine. The existing code for write uses 6 of the 15 x64 registers, stomping on their existing values in order to make the call. Registers are a precious and limited resource. If you were doing anything much more complex, you would start to run out. Using routines allows you to bundle code in such a way that only a limited number of registers get modified, causing a minimum amount of disruption in the code that calls it.

For example, if you were to use the Microsoft x86 'fastcall' calling convention (a poor choice for 64bit linux, but useful as an illustration), then the first parameter gets placed in rcx, the second in rdx, and the return value (if any) goes in rax. rcx, rdx and rax can all be changed by the routine, but all other registers must be returned unchanged.

So, re-working write with this in mind, we get something like this:

# On entry:
# rcx points to the string to print
# rdx contains the length of the string
push %rsi # Save the non-volatile registers we modify
push %rdi
push %r11
movq %rcx, %rsi # Move the string pointer to the correct register
movq $WRITE, %rax
movq $STDOUT, %rdi
syscall
# At this point:
# rax contains the return value from the call
# rcx/r11 have been clobbered
pop %r11 # Restore the registers and return
pop %rdi
pop %rsi
ret

You can call it like this:

mov $hello, rcx
mov $len, rdx
call write
# At this point, the contents of rcx and rdx are undefined.

Note that when I say "rcx and rdx are undefined," I mean they are undefined by definition. Yes, you can look at write and see what they would contain, but you pretend like you don't. This way someone can modify write to work slightly differently, and every place that calls it will still work correctly. As long as everybody follows the agreed-upon 'rules.'

The implications here define a lot of how registers actually get used, both by compilers when they generate asm and by people who write their own. If you know how registers will be treated when you make a call, that helps you choose which registers to use for what. For example, write needs the length in rdx. So if you needed to count how many bytes were in a string before passing it to write, then using rdx when doing the count suddenly makes a lot more sense. And using rcx to hold your 1-10 loop counter would obviously be a poor choice, since it gets wiped out during each write.

To sum up:

Using the write macro allows for easy-to-read code. It also allows you to call the macro (somewhat) generically. However, it has some limitations that may make it a bad choice for more complicated code, or if performance it a primary consideration.

That's what I see when I read this code.

David Wohlferd David Wohlferd 1,5181 gold badge8 silver badges17 bronze badges · Accepted Answer · 2016-12-27 04:24:13Z

This is much better.

Since this is v2, let me introduce you a couple more slightly advanced concepts.

First: The second time you execute write, what will be in (most of) those registers (rdi, rsi, rdx)?

It's easy to think in terms of C (or practically every other high level language), and how variables have 'scope.' But asm's registers (even when used with call) don't work that way.

The is nothing 'wrong' with using macros like this. It makes reading / maintaining much easier. And in more complex examples, you probably wouldn't be able to assume that these registers haven't been modified since the last write. But since we are critiquing this particular case, be aware that the code is slightly less efficient than it could be (since it re-assigns values that are already there 3 * 9 times).

Second (follows from the first):

write modifies a LOT of registers (as a percentage of how many registers there are). In addition to the ones you see (rax, rdi, rsi, rdx), there are also the registers 'clobbered' by using syscall (rcx, r11). It's going to be difficult to compose complex code if every time you want to write something, that many registers get destroyed.

This leads us to the concept of 'calling conventions' (something I've written about before). In short, you come up with (or use an existing) standard set of rules about what registers your macros/routines are allow to modify, then avoid using those 'volatile' registers in the routines that call them (or save the contents of the registers yourself before making the call). And if the macros/routines need more registers than your 'rules' allow, they must push/pop all the additional registers to preserve their contents.

Yes, this is starting to get a bit deeper. But register handling is something you need to start thinking about if you want to read/write asm. Additionally, it starts to explain some of the 'junk' you see when you disassemble your C code. Those push/pops you see around the calls and at the top of functions? This is why they are there.

I could talk about the cost/benefits of macros vs routines, but this is probably enough

A few more thoughts:

Add a comment at the top of the file. Even something simple like The code loops to display "Hello, World!" ten times.
Stylistically, perhaps . - hello? Negative hello seems weird.

Yes, nitpicky. But that's what you get when you ask for code reviews on such basic code.

Edit1:

Hmm. You haven't accepted an answer yet. Were you looking for something else? Or perhaps I wasn't clear about what I was trying to explain? Let me take a shot at explaining this again:

Using macros isn't the same as defining routines. Instead of defining the code in one place and calling it from other places, it pastes the entire macro in where ever it is invoked.

So your code basically expands to:

movq 10,ドル %r8
l:
 movq 1,ドル %rax
 movq 1,ドル %rdi
 movq $hello, %rsi
 movq 15,ドル %rdx
 syscall
 dec %r8
 jnz l
movq 60,ドル %rax
movq 0,ドル %rdi
syscall

If you were to call write twice in a row, you'd get:

movq 10,ドル %r8
l:
 movq 1,ドル %rax
 movq 1,ドル %rdi
 movq $hello, %rsi
 movq 15,ドル %rdx
 syscall
 movq 1,ドル %rax
 movq 1,ドル %rdi
 movq $hello, %rsi
 movq 15,ドル %rdx
 syscall
 dec %r8
 jnz l
movq 60,ドル %rax
movq 0,ドル %rdi
syscall

Now, if I'm not mistaken, the syscall you are using overwrites rax (uses it as a return value), but leaves the other parameters alone. Such being the case, it would be slightly more efficient to write:

movq 1,ドル %rdi
movq $hello, %rsi
movq 15,ドル %rdx
movq 10,ドル %r8
l:
 movq 1,ドル %rax # or perhaps movq %rdi, %rax?
 syscall
 dec %r8
 jnz l
movq 60,ドル %rax
movq 0,ドル %rdi
syscall

This doesn't lend itself well to macros. But if performance was more important to you than "maintainability," this would be (slightly) better.

Alternately, you could do this as a routine. The existing code for write uses 6 of the 15 x64 registers, stomping on their existing values in order to make the call. Registers are a precious and limited resource. If you were doing anything much more complex, you would start to run out. Using routines allows you to bundle code in such a way that only a limited number of registers get modified, causing a minimum amount of disruption in the code that calls it.

For example, if you were to use the Microsoft x86 'fastcall' calling convention (a poor choice for 64bit linux, but useful as an illustration), then the first parameter gets placed in rcx, the second in rdx, and the return value (if any) goes in rax. rcx, rdx and rax can all be changed by the routine, but all other registers must be returned unchanged.

So, re-working write with this in mind, we get something like this:

# On entry:
# rcx points to the string to print
# rdx contains the length of the string
push %rsi # Save the non-volatile registers we modify
push %rdi
push %r11
movq %rcx, %rsi # Move the string pointer to the correct register
movq $WRITE, %rax
movq $STDOUT, %rdi
syscall
# At this point:
# rax contains the return value from the call
# rcx/r11 have been clobbered
pop %r11 # Restore the registers and return
pop %rdi
pop %rsi
ret

You can call it like this:

mov $hello, rcx
mov $len, rdx
call write
# At this point, the contents of rcx and rdx are undefined.

Note that when I say "rcx and rdx are undefined," I mean they are undefined by definition. Yes, you can look at write and see what they would contain, but you pretend like you don't. This way someone can modify write to work slightly differently, and every place that calls it will still work correctly. As long as everybody follows the agreed-upon 'rules.'

The implications here define a lot of how registers actually get used, both by compilers when they generate asm and by people who write their own. If you know how registers will be treated when you make a call, that helps you choose which registers to use for what. For example, write needs the length in rdx. So if you needed to count how many bytes were in a string before passing it to write, then using rdx when doing the count suddenly makes a lot more sense. And using rcx to hold your 1-10 loop counter would obviously be a poor choice, since it gets wiped out during each write.

To sum up:

Using the write macro allows for easy-to-read code. It also allows you to call the macro (somewhat) generically. However, it has some limitations that may make it a bad choice for more complicated code, or if performance it a primary consideration.

That's what I see when I read this code.

Stack Exchange Network

AT&T assembly - Basic loop & write - follow-up

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

AT&T assembly - Basic loop & write - follow-up

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions