Exercise description:
"Write a program that takes a double word (4 bytes) as an argument, and then adds all the 4 bytes. It returns the sum as output. Note that all the bytes are considered to be of unsigned value.
Example: For the number 03ff0103 the program will calculate 0x03 + 0xff + 0x01 + 0x3 = 0x106, and the output will be 0x106
HINT: Use division to get to the values of the highest two bytes."
Full description on GitHub: xorpd
The code I've written:
format PE console
entry start
include 'win32a.inc'
; ===============================================
section '.text' code readable executable
start:
mov eax, 0x01020304
xor ebp, ebp
process_eax:
movzx ebx, al
add ecx, ebx
movzx ebx, ah
add ecx, ebx
cmp ebp, 0x1
je print_result
xor edx, edx
mov ebx, 0xffff
div ebx
mov ebp, 0x1
jmp process_eax
print_result:
mov eax, ecx
call print_eax ; Provided by the teacher. Prints eax to the console.
exitProgram:
; Exit the process:
push 0
call [ExitProcess]
include 'training.inc'
I think it works. I've tried it with different values and the sums were correct.
Screenshot with the output of the code above (with 0x01020304 as the hardcoded value).
Screenshot with 0x01020304 as value
But it's surely not the most efficient way to solve the exercise.
2 Answers 2
Since you're still learning, I won't cheat you out of the opportunity to discover for yourself, but I will offer some words of advice on how you can improve your program.
Minimize register use
The current code uses eax
, ebx
, ecx
, edx
and ebp
. One of the most important things for an assembly language programmer is to use registers efficiently and effectively. This particular task can easily be done with just two registers.
Prefer shift to division
As alluded to in a comment, shift instructions are typically much faster to execute than divide instructions. For that reason, in tasks like this, it's much more common to see a shift than a divide.
Avoid loops
Branching tends to be computationally disruptive for processors. While modern desktop machines tend to compensate for this via speculative execution and large cache sizes, code often runs faster if loops and branches are avoided entirely. This can confer other benefits such as more predictable running time which can be important for the scheduling of Real Time Operating Systems (RTOS) and in some kinds of cryptographic code to provide some resistance to side channel attacks.
There are a few errors in this program.
You build the result in
ECX
but you did not clear that one beforehand. If results are correct, as you stated, it's because theECX
register was empty and you got lucky.To bring the high word down to the low word, you need to divide by 65536, not by 65535 (0xffff) like you did.
Optimizations.
Instead of dividing a mere shift down by 16 would produce the same result.
Of course I noticed that the task hinted to use the division operation, but then again a hint is just a hint, not something mandatory!
The second
movzx ebx, ah
could be written also asmov bl,ah
since the highest 24 bits ofEBX
are still empty.You're using
EBP
as a flag (values 0 and 1 only). You can replacecmp ebp, 0x1
by the shortertest ebp, ebp
. Remember to jump on the opposite condition:jnz print_result
.You're using
EBP
as a flag (values 0 and 1 only). You can replacemov ebp, 0x1
by the shorterinc ebp
.
Your program but modified based on the above.
start:
mov eax, 0x01020304
xor ecx, ecx
xor ebp, ebp
process_eax:
movzx ebx, al
add ecx, ebx
mov bl, ah
add ecx, ebx
test ebp, ebp
jnz print_result
xor edx, edx
mov ebx, 0x10000
div ebx
inc ebp
jmp process_eax
print_result:
Your program but modified more using 1 register less.
Only repeat the code when the quotient produced a non-zero AX
.
start:
mov eax, 0x01020304
xor ecx, ecx
process_eax:
movzx ebx, al
add ecx, ebx
mov bl, ah
add ecx, ebx
xor edx, edx
mov ebx, 0x10000
div ebx
test ax, ax
jnz process_eax ;At most 1 time
Your program but modified more using 2 registers less and preferring to use shift over divide.
Only repeat the code when the quotient produced a non-zero AX
.
start:
mov eax, 0x01020304
xor ecx, ecx
process_eax:
movzx ebx, al
add ecx, ebx
mov bl, ah
add ecx, ebx
shr eax, 16
jnz process_eax ;At most 1 time
eax & 0xFF
to the sum and shifteax
right by 8 would suffice, wouldn't it? \$\endgroup\$mov ebx, 0xffff div ebx
supposed to do? Does your code work for0xdeadface
? \$\endgroup\$