Please critique this very, very basic routine which returns the length of a given char buffer or "string."
strlen: ; NOTE: RDI IS THE DEFAULT SRC FOR SCASB
push rdi
push rcx
xor rcx, rcx
mov rcx, -1
xor al, al
cld
repne scasb
neg rcx
sub rcx, 1
mov rax, rcx
pop rcx
pop rdi
ret
1 Answer 1
Saving rcx
is usually not necessary, it is not callee-save in common calling conventions. On Linux (and similar) rdi
also does not need to be saved, I guess you're using that since the Win64 calling convention does not pass an argument in rdi
. You can save them anyway if you want, which can be useful if you're using custom calling conventions. Saving an even number of registers makes the stack not-16-aligned though, you will probably get away with that now, but for example if you call some function that uses XMM registers it may save them at locations that it assumes are aligned (and there are some other cases where it causes trouble).
xor rcx, rcx
mov rcx, -1
The xor
is not useful, rcx
does not need to be zeroed before overwriting it for correctness reasons, and simply mov
-ing into a 64 (or 32) bit register already has no dependency on the previous value. By the way, when you do want to zero a 64bit register, you can use a 32bit xor
since writing to the low 32 bits of a register zeroes out the top half of the 64 bit register. There is not really an immediate performance difference, but using the 32bit version often lets you save the REX prefix, unless of course one of the "numbered registers" is an operand.
Because -x - 1= ~x + 1 - 1 = ~x
(using the definition of two's complement, -x = ~x + 1
) and you don't use the flags set by the sub
,
neg rcx
sub rcx, 1
mov rax, rcx
is equivalent to:
not rcx
mov rax, rcx
So all combined, this function could be simplified slightly to (assuming saving rdi
and rcx
is useful):
strlen:
push rdi
push rcx
mov rcx, -1
xor eax, eax
repne scasb
not rcx
mov rax, rcx
pop rcx
pop rdi
ret
-
2\$\begingroup\$ How do you feel about
xor ecx, ecx ; dec rcx
(5 bytes) instead ofmov rcx, -1
(7 bytes)? Or evenlea rcx, -1[rax]
(4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so. \$\endgroup\$David Wohlferd– David Wohlferd2018年07月31日 23:25:36 +00:00Commented Jul 31, 2018 at 23:25 -
1\$\begingroup\$
xor r32, r32
should be used even for the high numbered registers, sincexor r64, r64
is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently \$\endgroup\$phuclv– phuclv2018年08月01日 02:05:16 +00:00Commented Aug 1, 2018 at 2:05 -
1\$\begingroup\$ @phuclv That link seems to like my
lea rcx, -1[rax]
solution, since we already have a zeroed register we can use (rax). \$\endgroup\$David Wohlferd– David Wohlferd2018年08月01日 02:53:46 +00:00Commented Aug 1, 2018 at 2:53 -
\$\begingroup\$ Note that if you want this
strlen(rdi)
to return the same value assize_t strlen(const char *s);
from<string.h>
, you still need to subtract 1 fromrcx
because "strlen()
calculates the length of the string pointed to by s, excluding the terminating null byte". \$\endgroup\$Christian Hujer– Christian Hujer2022年07月26日 06:28:29 +00:00Commented Jul 26, 2022 at 6:28
xor al, al
. In general avoid partial register update like that \$\endgroup\$