Euler's Totient Function in x86 assembly (MASM)

Question 1

My task was to implement Euler's totient function, and I'm looking for any and all criticism.

.386 
.model flat, stdcall
.stack 4096
ExitProcess PROTO, dwExitCode:DWORD 
.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
 x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
 push ebp ; setup stack, save regs
 mov ebp, esp
 push ebx
 push ecx
 push edx
 mov ecx, x ; init result as x
; Consider the prime factors of x and subtract their multiples
; from the result
 mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
 mul ebx 
 cmp DWORD PTR x, eax ; loop if( x >= p * p)
 jnge endL3;
; Check if p is a prime factor
 cdq
 mov eax, x
 div ebx 
 cmp edx, 0 ; if(x % p == 0)
 jne noprime
;if yes, update x and result 
L2: cdq 
 mov eax, x
 div ebx
 cmp edx, 0 ; if(x % p == 0)
 jne endL2
 mov x , eax ; x /= p 
 jmp L2
endL2:
 cdq
 mov eax, ecx
 div ebx
 sub ecx, eax ;result -= result / p
noprime: 
 inc ebx 
 jmp L3
endL3: 
; If x has a prime factor greater than sqrt(x)
 cmp DWORD PTR x, 1
 jle finish
 cdq
 mov eax, ecx
 div DWORD PTR x
 sub ecx, eax
finish:
 mov eax, ecx
 pop edx
 pop ecx
 pop ebx
 pop ebp
 ret 4
main:
 push 100
 call _phi@4
 INVOKE ExitProcess, 0
 END main

Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?

Question 2

How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.

Question 3

To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.

Question 4

you need to define efficient so that we know if you're talking about size or speed

Question 5

fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache

Question 6

In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.

Question 7

You're not using cdq properly. cdq sign extends EAX into EDX usually in preparation for an idiv (signed division) instruction. However, you're using div (unsigned division), so instead of using cdq (which can potentially fill EDX with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv, though, the cdq is in the wrong place since it needs to come after loading data into EAX. And there are a couple of places where you already know that EDX will be 0 so you don't need the instruction at all.

In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.

Your L2 loop can be shortened a bit. If EDX is zero (no remainder after the division), you already know that EDX is zero, so you don't need to zero it out, and that x is already in EAX. So you can loop back to the div instruction.

Question 8

test vs cmp

Question 9

@LưuVĩnhPhúc I've incorporated that link into my answer.

Question 10

I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.

If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.

Question 11

I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.

score 18 · Accepted Answer · 2018-04-05 03:50:31Z

You're not using cdq properly. cdq sign extends EAX into EDX usually in preparation for an idiv (signed division) instruction. However, you're using div (unsigned division), so instead of using cdq (which can potentially fill EDX with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv, though, the cdq is in the wrong place since it needs to come after loading data into EAX. And there are a couple of places where you already know that EDX will be 0 so you don't need the instruction at all.

In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.

Your L2 loop can be shortened a bit. If EDX is zero (no remainder after the division), you already know that EDX is zero, so you don't need to zero it out, and that x is already in EAX. So you can loop back to the div instruction.

1

\$\begingroup\$ test vs cmp \$\endgroup\$

phuclv
– phuclv

2018年04月05日 16:05:53 +00:00
Commented Apr 5, 2018 at 16:05
\$\begingroup\$ @LưuVĩnhPhúc I've incorporated that link into my answer. \$\endgroup\$

1201ProgramAlarm
– 1201ProgramAlarm

2018年04月05日 19:55:05 +00:00
Commented Apr 5, 2018 at 19:55

Stack Exchange Network

Euler's Totient Function in x86 assembly (MASM)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Euler's Totient Function in x86 assembly (MASM)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions