I try to learn (flat-, FASM-) Assembly currently.
For to become more familiar with the concept of branching I made a tiny program. Inspired by an exercise I had to do once in school when learning PHP.
"Write a program which takes an integer given by the user. The program then calculates and prints the square numbers from 1 to the given number."
Here my source code. I have added comments in the code. Trying to explain my thoughts and motivations.
format PE console
entry start
include 'win32a.inc'
; ===========================================
; = Displays the square number from 1 until =
; = number n. n is given by the user. =
; = @param [ integer ] n - The upper =
; = limit. The last integer to calculate =
; = the square number of. =
; ===========================================
; - Example --------------------------------
; 100 // User have entered 100.
; 1
; 4
; 9
; 10 // 16 in base 10
; 19 // 25 in base 10
; 24 // 36 in base 10
; 31 // ...
; 40
; 51
; 64
; 79
section '.text' code readable executable
start:
call read_hex ; Provided by teacher. Reads in a hexadecimal number from stdin.
mov ecx, eax
mov ebx, 1 ; First number ...
cmp eax, 1000000 ; When the number received is smaller the upper limit ...
jle createSquareNumbers ; ... then you do not have to override it.
mov ecx, ebp ; ... otherwise you override it with the upper limit.
createSquareNumbers:
mov edx, 0
mov eax, ebx
mul ebx ; Multiply the number with itself.
call print_eax ; ; Provided by teacher. Prints eax to stdin.
inc ebx ; Go to the next integer.
cmp ebx, ecx
jle createSquareNumbers
push 0
call [ExitProcess]
include 'training.inc'
It seems to work as expected.
But I'm sure there's a lot of room for improvements ...
So therefore: Hints and comments from more experienced Assembly programmers very appreciated. :)
-
\$\begingroup\$ assembly questions need to state which instruction set architecture (ISA) they are targeting. Is this x86 assembly? \$\endgroup\$Toby Speight– Toby Speight2023年02月11日 15:03:31 +00:00Commented Feb 11, 2023 at 15:03
3 Answers 3
cmp eax, 1000000 ; When the number received is smaller the upper limit ... jle createSquareNumbers ; ... then you do not have to override it. mov ecx, ebp ;
I suspect that the upper limit is in the ebp
register. (Why else would you move it to ecx
when the input is too large?) Then it would be better to not use the immediate value and just write:
cmp eax, ebp
createSquareNumbers: mov edx, 0
Prior to the multiplication it is not necessary to zero the edx
register. It's the division that needs this. This way you'll shave off some bytes.
You should test for overflow on the multiplication! You are printing results from the EAX
register only, but look what happens when the square exceeds the 32 bits of EAX
. You should use this overflow (EDX
<> 0) as an extra reason to exit the loop.
You should write in a more readable fashion.
I suggest you make all the instructions, operands, and comments start in their own columns.
start:
call read_hex ; Provided by teacher. Reads in a hexadecimal number from stdin.
mov ecx, eax
mov ebx, 1 ; First number ...
cmp eax, ebp ; When the number received is smaller the upper limit ...
jle createSquareNumbers ; ... then you do not have to override it.
mov ecx, ebp ; ... otherwise you override it with the upper limit.
-
\$\begingroup\$ Concerning "cmp eax, ebp" : Uuhh, ... I got an error in there. You get it right. Thanks again. \$\endgroup\$michael.zech– michael.zech2017年02月12日 15:09:35 +00:00Commented Feb 12, 2017 at 15:09
The biggest problem here—and generally with assembly—is the use of registers in the code without explicit documentation of what those registers contain, e.g., in the form of comments. Certain assumptions can be made based on common calling conventions (like that functions will return their result in eax
), but that should still be documented for clarity. In other cases, it is not quite so obvious. For example, in this code, ebp
apparently contains the "upper limit", but other than assuming that the code works and reverse-engineering its logic, I have no basis for that assumption. In fact, in the current implementation, the contents of the ebp
register are a pre-condition for the function, so its use should unarguably be documented as part of the function's signature.
I further assume that ebp
is actually just 1000000
, in which case this portion of your code is overly verbose (and thus inefficient). You can simply do:
cmp eax, 1000000 ; see if input is over the upper limit
jle createSquareNumbers ; if not, skip the next line
mov eax, 1000000 ; if it is, bound it
If you are targeting the Pentium Pro or later (so basically, any modern x86 processor) you can use conditional move instructions like so:
cmp eax, ebp
cmovg eax, ebp
In some cases, conditional moves will make the code faster by eliminating the possibility of branch mispredictions. In this case, you won't see any performance improvement, but these instructions are still something to be aware of and this does make the code slightly easier to read. (Assuming that you want to keep the dependency on the ebp
register, which might be handy if you want to make the upper limit variable. Otherwise, just hard-code the 1000000 and use the first version. Better yet, define a constant for that upper bound so you can give it a readable name.)
mov edx, 0
This is completely correct code for setting a register to 0. However, it is not idiomatic x86 assembly because it is inefficient. Both compilers and expert assembly-language programmers will instead write this as:
xor edx, edx
This works because bitwise XORing any value with itself gives you back 0. The advantage is that it is shorter (2 bytes instead of 5 bytes) and faster.
The only time you would ever want to use mov reg, 0
is if you didn't want to clobber the flags, which is handy when you're preparing to do a conditional move and in certain other tightly hand-optimized code. Otherwise, whenever you want to clear a register, think xor reg, reg
. While this is arguably a micro-optimization, you really do need to know this idiom so you can read other peoples' (and compilers') assembly code, so you might as well use it yourself.
I can tell that you're working with signed integer values because you're using the l
and g
condition codes. If you were working with unsigned integer values, you'd be using the b
and a
condition codes. Therefore, you should be doing a signed integer multiply (imul
instead of mul
). In fact, imul
is preferable for a number of reasons, chief among them is the fact that it has a normal, two-operand form that works just like any other instruction, which means you don't have to worry about the hard-coded register operands that mul
requires. This makes the code easier to optimize and, frankly, easier to read. You can use imul
for unsigned multiplications, too, as long as you aren't worried about overflow, since the lower 32 bits are always the same. (And clearly you aren't worried about that here, since your print_eax
function used to display the output can only print 32-bit values!)
There is, unfortunately, one bug lying in wait here:
You are only checking the upper bound, so what happens when the input value is 0? Your code will print 1
, when instead it should print nothing! Also, I wonder if it is possible for read_hex
to return a negative value? I don't have the function's implementation, or any documentation for it, so I can't tell. If you're not confident that it won't ever return a negative value, your code needs to handle this case. Fortunately, you can fix both of these potential problems easily by adding a check for the lower bound before entering the loop.
And while we're nitpicking, some of your comments have "bugs", too:
call print_eax ; ; Provided by teacher. Prints eax to stdin.
That should be
stdout
, notstdin
! :-); =========================================== ; = Displays the square number from 1 until = ; = number n. n is given by the user. = ; = @param [ integer ] n - The upper = ; = limit. The last integer to calculate = ; = the square number of. = ; ===========================================
Your function does not actually take a parameter. Rather, it retrieves the input by calling the
read_hex
function. Therefore, this documentation is wrong, because it suggests that the function should be able to read a parameter from the normal register used for passing parameters in whatever calling convention it uses, which is not correct. It is important to document your functions, and it's important to document them accurately. In actuality, the function neither receives nor returns any values.
In terms of readability, you should align your code into vertical columns so that instruction opcodes are clearly separated from their operands. This makes it much easier to read and scan. Also, you should either vertically align your end-of-line comments, or put them above the code.
I would also rename your createSquareNumbers
label to printSquares
, since (1) it prints output rather than simply creating things, and (2) "numbers" is redundant. I also personally like to use PascalCase for labels to make them distinct, but whichever is fine, as long as you're consistent.
Finally, I would prefer to use the ecx
register as my loop "counter", since that's what it was originally intended for (the "c" stands for "counter"). Technically, this doesn't matter at all in modern x86 code, and you can treat all of the registers as general-purpose registers, using them interchangeably. And, indeed, in optimized code, you will want to do this. But when you're writing it by hand and don't necessarily care about speed, it is just nice and readable to use registers in a predictable way.
Here is how I would personally write the code for your function, taking all of these things that I've mentioned into account:
start:
; Provided by teacher: reads in a hexadecimal number from stdin,
; returning it in EAX.
call read_hex
; Check for underflow: if the input value is less than or equal to 0,
; we skip the loop altogether.
; NOTE: The naive way to do this check is "cmp eax, 0", but as with
; the XOR trick discussed above, "test eax, eax" is both
; shorter and faster, so that's what everyone uses when they
; want to test the value of a register against 0.
; (You still use CMP with memory operands, of course.)
test eax, eax
jle finished
; Check for overflow: if the input value is larger than 1000000,
; we bound it to be exactly 1000000.
cmp eax, 1000000
mov edx, 1000000
cmovg eax, edx
; Prepare to enter the loop:
mov edx, eax ; make a copy of the input
mov ecx, 1 ; initialize our counter
printSquares:
mov eax, ecx ; put the counter into our scratch register, EAX
imul eax, eax ; EAX = EAX * EAX
; Print the square (in EAX) by calling teacher-provided function,
; which prints the contents of EAX to stdout.
; It may or may not clobber EAX (we don't care),
; but it does not return a significant value.
; TODO: Make sure it doesn't clobber ECX or EDX!
call print_eax
inc ecx ; increment our counter
; See if we've reached the input value yet.
; If not, continue looping.
cmp ecx, edx
jle printSquares
finished:
; We're now done with the loop, so exit.
push 0
call [ExitProcess]
Warning: There is still potentially a bug in here, depending on how the print_eax
function behaves. Normally, in most calling conventions, a function is allowed to clobber the eax
, ecx
, and edx
registers. Therefore, to be on the safe side, you would have to assume that print_eax
could clobber the values in these registers, meaning that you would have to either pick different registers to store your temporary values in, or you would have to explicitly save the contents of these registers before calling print_eax
and restore them afterwards. However, if you know from the documentation or implementation of print_eax
that it does not clobber either ecx
or edx
, then you are safe. Your original code had this bug, too, yet it worked properly, so I assume this is safe.
-
\$\begingroup\$ I want to further note that modern CPUs directly understand that xor and sub with the same register as both arguments is a code to clear that register to zero. \$\endgroup\$JDługosz– JDługosz2017年02月13日 04:01:04 +00:00Commented Feb 13, 2017 at 4:01
-
\$\begingroup\$ Another case when you want a
mov reg, 0
is when you want to align code, avoiding placing anynop
s at all — such difference in length of instructions can be handy. \$\endgroup\$Ruslan– Ruslan2017年02月13日 05:31:14 +00:00Commented Feb 13, 2017 at 5:31 -
1\$\begingroup\$ Yeah, I linked to a really comprehensive discussion of what modern CPUs understand about the XOR idiom. I didn't want to get into it any more here, even to provide a summary. It is a lot more complicated than your comment makes it sound, @JDługosz. Some CPUs recognize SUB, but not all. All CPUs that recognize a zeroing idiom recognize XOR as one. \$\endgroup\$Cody Gray– Cody Gray2017年02月13日 11:42:07 +00:00Commented Feb 13, 2017 at 11:42
-
1\$\begingroup\$ @ruslan No, that would be inefficient and ineffective. Why clobber a register and potentially introduce a false dependency for no reason? There are tons of pseudo-NOP operations that you can use for padding if you need it that all have the advantage of not clearing a register. Assemblers know about them and will insert them automatically if you use whatever
ALIGN
directive they provide. \$\endgroup\$Cody Gray– Cody Gray2017年02月13日 11:43:19 +00:00Commented Feb 13, 2017 at 11:43 -
1\$\begingroup\$ No modern version of Microsoft's compiler uses SUB in this way, nor any other compiler from a major vendor. I have seen VC 4.2 use SUB, and perhaps earlier versions did, too, but the processors of that vintage weren't even superscalar and didn't have register renaming, so none of this would be relevant. The first to have that was the Pentium Pro, and to my knowledge, it did not recognize SUB. Microsoft updated their compiler to use XOR in response. As discussed in the linked answer, some processors now do recognize SUB, but not all. @JDługosz \$\endgroup\$Cody Gray– Cody Gray2017年02月13日 11:53:14 +00:00Commented Feb 13, 2017 at 11:53
You can simplify the algorithm. This multiplication isn't required:
mul ebx ; Multiply the number with itself.
If we have n
and n2
, it's easy to compute (n+1)2
- that's just n2 + 2n + 1
, so all we need to do in each iteration is (in pseudo-code):
square += n
n += 1
square += n
That should be more energy-efficient than
square := n * n
n += 1