7
\$\begingroup\$

These are a few string routines (just the mem* ones). I've tried to optimize them the best I can without having them be too big, but I'm unsure if I've done a good job.

I'd prefer size over speed unless it's just a few bytes, in which case that would be fine. I would also prefer not to sacrifice simplicity for speed.

memchr.S (related):

.globl memchr
memchr:
 mov %rdx, %rcx
 movzbl %sil, %eax
 repne scasb
 lea -1(%rdi), %rax
 test %rcx, %rcx
 cmove %rcx, %rax
 ret

memcmp.S:

.globl memcmp
memcmp:
 mov %rdx, %rcx
 repe cmpsb
 movzbl -1(%rdi), %eax
 movzbl -1(%rsi), %edx
 sub %edx, %eax
 ret

memcpy.S:

.globl memcpy
memcpy:
 mov %rdx, %rcx
 mov %rdi, %rax
 rep movsb
 ret

memmove.S:

.globl memmove
memmove:
 mov %rdx, %rcx
 mov %rdi, %rax
 cmp %rdi, %rsi
 jge 0f
 dec %rdx
 add %rdx, %rdi
 add %rdx, %rsi
 std
0: rep movsb
 cld
 ret

memrchr.S:

.globl memrchr
memrchr:
 mov %rdx, %rcx
 add %rdx, %rdi
 movzbl %sil, %eax
 std
 repne scasb
 cld
 lea 1(%rdi), %rax
 test %rcx, %rcx
 cmove %rcx, %rax
 ret

memset.S:

.globl memset
memset:
 mov %rdx, %rcx
 mov %rdi, %rdx
 movzbl %sil, %eax
 rep stosb
 mov %rdx, %rax
 ret

As usual for Stack Exchange sites, this code is released under CC/by-sa 3.0, but any future changes can be accessed here.

asked Aug 16, 2019 at 19:38
\$\endgroup\$

3 Answers 3

5
\$\begingroup\$

The code looks straight-forward and really optimized for size and simplicity.

There's a small detail that I would change, though: replace cmove with cmovz, to make the code more expressive. It's not that "being equal" would be of any interest here, it's the zeroness of %ecx that is interesting.

I like the omitted second jmp in memmove. It's obvious after thinking a few seconds about it.

According to this quote it's ok to rely on the direction flag being always cleared.

I still suggest to write a few unit tests to be on the safe side.

S.S. Anne
1,78510 silver badges27 bronze badges
answered Aug 16, 2019 at 21:15
\$\endgroup\$
1
  • \$\begingroup\$ See my answer for a bug that I found on my own (found by writing unit tests). \$\endgroup\$ Commented Aug 16, 2019 at 22:57
3
\$\begingroup\$

There's a bug in your code if memchr finds %sil in the last byte of %rdi; if %rcx tests to be zero and yet the byte has been found, it will incorrectly return zero.

To fix that, do something like this:

.globl memchr
memchr:
 mov %rdx, %rcx
 movzbl %sil, %eax
 repne scasb
 sete %cl
 lea -1(%rdi), %rax
 test %cl, %cl
 cmovz %rcx, %rax
 ret

The same applies to memrchr.

answered Aug 16, 2019 at 22:41
\$\endgroup\$
1
\$\begingroup\$

In memmove you have the following:

 cmp %rdi, %rsi
 jge 0f

(cmp rsi, rdi in Intel syntax I believe.) For rsi = 8000_0000_0000_0000h and rdi = 7FFF_FFFF_FFFF_FFFFh (we want to jump to make a forward move here) the signed-comparison conditional branch "jump if greater or equal" evaluates rsi as being "less than" rdi (rsi being a negative number in 64-bit two's complement while rdi is positive), so it doesn't jump and will make a backwards move. This is incorrect. You should use the equivalent unsigned branch "jump if above or equal", jae instead.

answered Aug 17, 2019 at 14:17
\$\endgroup\$
4
  • 1
    \$\begingroup\$ Isn't this only an issue when a userspace address and kernelspace address are mixed? \$\endgroup\$ Commented Aug 17, 2019 at 14:26
  • \$\begingroup\$ How likely is this in reality, though? The least you could expect is a segfault. \$\endgroup\$ Commented Aug 17, 2019 at 14:27
  • 1
    \$\begingroup\$ It may not be an issue depending on the operating system / address-space layout. However, if it does happen, then a wrong move direction (if the buffers are actually overlapping) will result in silently corrupting the destination buffer. \$\endgroup\$ Commented Aug 17, 2019 at 14:29
  • \$\begingroup\$ This won't ever happen because addresses are only actually used up to 48 bits, much less than 0x8000000000000000. \$\endgroup\$ Commented Sep 25, 2019 at 21:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.