28
\$\begingroup\$

My task was to implement Euler's totient function, and I'm looking for any and all criticism.

.386 
.model flat, stdcall
.stack 4096
ExitProcess PROTO, dwExitCode:DWORD 
.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
 x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
 push ebp ; setup stack, save regs
 mov ebp, esp
 push ebx
 push ecx
 push edx
 mov ecx, x ; init result as x
; Consider the prime factors of x and subtract their multiples
; from the result
 mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
 mul ebx 
 cmp DWORD PTR x, eax ; loop if( x >= p * p)
 jnge endL3;
; Check if p is a prime factor
 cdq
 mov eax, x
 div ebx 
 cmp edx, 0 ; if(x % p == 0)
 jne noprime
;if yes, update x and result 
L2: cdq 
 mov eax, x
 div ebx
 cmp edx, 0 ; if(x % p == 0)
 jne endL2
 mov x , eax ; x /= p 
 jmp L2
endL2:
 cdq
 mov eax, ecx
 div ebx
 sub ecx, eax ;result -= result / p
noprime: 
 inc ebx 
 jmp L3
endL3: 
; If x has a prime factor greater than sqrt(x)
 cmp DWORD PTR x, 1
 jle finish
 cdq
 mov eax, ecx
 div DWORD PTR x
 sub ecx, eax
finish:
 mov eax, ecx
 pop edx
 pop ecx
 pop ebx
 pop ebp
 ret 4
main:
 push 100
 call _phi@4
 INVOKE ExitProcess, 0
 END main

Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?

Toby Speight
87.2k14 gold badges104 silver badges322 bronze badges
asked Apr 5, 2018 at 3:22
\$\endgroup\$
5
  • 2
    \$\begingroup\$ How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit. \$\endgroup\$ Commented Apr 5, 2018 at 8:59
  • 3
    \$\begingroup\$ To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it. \$\endgroup\$ Commented Apr 5, 2018 at 12:16
  • 3
    \$\begingroup\$ you need to define efficient so that we know if you're talking about size or speed \$\endgroup\$ Commented Apr 5, 2018 at 12:49
  • 4
    \$\begingroup\$ fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache \$\endgroup\$ Commented Apr 5, 2018 at 16:05
  • \$\begingroup\$ In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job. \$\endgroup\$ Commented Apr 5, 2018 at 17:32

2 Answers 2

18
\$\begingroup\$

You're not using cdq properly. cdq sign extends EAX into EDX usually in preparation for an idiv (signed division) instruction. However, you're using div (unsigned division), so instead of using cdq (which can potentially fill EDX with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv, though, the cdq is in the wrong place since it needs to come after loading data into EAX. And there are a couple of places where you already know that EDX will be 0 so you don't need the instruction at all.

In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.

Your L2 loop can be shortened a bit. If EDX is zero (no remainder after the division), you already know that EDX is zero, so you don't need to zero it out, and that x is already in EAX. So you can loop back to the div instruction.

answered Apr 5, 2018 at 3:50
\$\endgroup\$
2
  • 1
    \$\begingroup\$ test vs cmp \$\endgroup\$ Commented Apr 5, 2018 at 16:05
  • \$\begingroup\$ @LưuVĩnhPhúc I've incorporated that link into my answer. \$\endgroup\$ Commented Apr 5, 2018 at 19:55
4
\$\begingroup\$

I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.

If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.

answered Apr 5, 2018 at 23:09
\$\endgroup\$
1
  • 3
    \$\begingroup\$ I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea. \$\endgroup\$ Commented Apr 6, 2018 at 13:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.