These are a few string routines (just the mem*
ones). I've tried to optimize them the best I can without having them be too big, but I'm unsure if I've done a good job.
I'd prefer size over speed unless it's just a few bytes, in which case that would be fine. I would also prefer not to sacrifice simplicity for speed.
memchr.S
(related):
.globl memchr
memchr:
mov %rdx, %rcx
movzbl %sil, %eax
repne scasb
lea -1(%rdi), %rax
test %rcx, %rcx
cmove %rcx, %rax
ret
memcmp.S
:
.globl memcmp
memcmp:
mov %rdx, %rcx
repe cmpsb
movzbl -1(%rdi), %eax
movzbl -1(%rsi), %edx
sub %edx, %eax
ret
memcpy.S
:
.globl memcpy
memcpy:
mov %rdx, %rcx
mov %rdi, %rax
rep movsb
ret
memmove.S
:
.globl memmove
memmove:
mov %rdx, %rcx
mov %rdi, %rax
cmp %rdi, %rsi
jge 0f
dec %rdx
add %rdx, %rdi
add %rdx, %rsi
std
0: rep movsb
cld
ret
memrchr.S
:
.globl memrchr
memrchr:
mov %rdx, %rcx
add %rdx, %rdi
movzbl %sil, %eax
std
repne scasb
cld
lea 1(%rdi), %rax
test %rcx, %rcx
cmove %rcx, %rax
ret
memset.S
:
.globl memset
memset:
mov %rdx, %rcx
mov %rdi, %rdx
movzbl %sil, %eax
rep stosb
mov %rdx, %rax
ret
As usual for Stack Exchange sites, this code is released under CC/by-sa 3.0, but any future changes can be accessed here.
3 Answers 3
The code looks straight-forward and really optimized for size and simplicity.
There's a small detail that I would change, though: replace cmove
with cmovz
, to make the code more expressive. It's not that "being equal" would be of any interest here, it's the zeroness of %ecx
that is interesting.
I like the omitted second jmp
in memmove. It's obvious after thinking a few seconds about it.
According to this quote it's ok to rely on the direction flag being always cleared.
I still suggest to write a few unit tests to be on the safe side.
-
\$\begingroup\$ See my answer for a bug that I found on my own (found by writing unit tests). \$\endgroup\$S.S. Anne– S.S. Anne2019年08月16日 22:57:28 +00:00Commented Aug 16, 2019 at 22:57
There's a bug in your code if memchr
finds %sil
in the last byte of %rdi
; if %rcx
tests to be zero and yet the byte has been found, it will incorrectly return zero.
To fix that, do something like this:
.globl memchr
memchr:
mov %rdx, %rcx
movzbl %sil, %eax
repne scasb
sete %cl
lea -1(%rdi), %rax
test %cl, %cl
cmovz %rcx, %rax
ret
The same applies to memrchr
.
In memmove you have the following:
cmp %rdi, %rsi
jge 0f
(cmp rsi, rdi
in Intel syntax I believe.) For rsi = 8000_0000_0000_0000h and rdi = 7FFF_FFFF_FFFF_FFFFh (we want to jump to make a forward move here) the signed-comparison conditional branch "jump if greater or equal" evaluates rsi as being "less than" rdi (rsi being a negative number in 64-bit two's complement while rdi is positive), so it doesn't jump and will make a backwards move. This is incorrect. You should use the equivalent unsigned branch "jump if above or equal", jae
instead.
-
1\$\begingroup\$ Isn't this only an issue when a userspace address and kernelspace address are mixed? \$\endgroup\$user555045– user5550452019年08月17日 14:26:37 +00:00Commented Aug 17, 2019 at 14:26
-
\$\begingroup\$ How likely is this in reality, though? The least you could expect is a segfault. \$\endgroup\$S.S. Anne– S.S. Anne2019年08月17日 14:27:25 +00:00Commented Aug 17, 2019 at 14:27
-
1\$\begingroup\$ It may not be an issue depending on the operating system / address-space layout. However, if it does happen, then a wrong move direction (if the buffers are actually overlapping) will result in silently corrupting the destination buffer. \$\endgroup\$ecm– ecm2019年08月17日 14:29:09 +00:00Commented Aug 17, 2019 at 14:29
-
\$\begingroup\$ This won't ever happen because addresses are only actually used up to 48 bits, much less than
0x8000000000000000
. \$\endgroup\$S.S. Anne– S.S. Anne2019年09月25日 21:40:17 +00:00Commented Sep 25, 2019 at 21:40