5
\$\begingroup\$

The following routine is the block-allocation procedure in a fixed-size allocator being written for library use. It is designed to be accessed from C++ (the un-mangled symbol name is void * Superblock::alloc_block()). The performance of this routine is the critical to the library; and I, with almost no prior experience in assembly, am looking for suggestions about performance and also any holes in the logic. It seems to be working correctly right now, but I'm worried about edge cases.

Execution context: written for x86_64 System V abi, full allocator may be found here: https://github.com/cmura81/experimental-allocator.

alloc_sb_alloc_block.s:

.globl __ZN10Superblock11alloc_blockEv
// 'this is passed in %rdi
// Variables:
// this->beginning = 0(%rdi) +8
// this->last_privately_freed_block = 8(%rdi) +8
// this->last_publicly_freed_block = 16(%rdi) +8
// this->block_size = 24(%rdi) +2
// this->free_blocks = 26(%rdi) +2
// this->max_blocks = 28(%rdi) +2
__ZN10Superblock11alloc_blockEv:
 xorq %rcx, %rcx
 // this->beginning -> rsi
 movq 0(%rdi), %rsi
 // allocated_block = this->last_privately_freed_block
 movq 8(%rdi), %rax
 // if this->last_privately_freed_block == NULL
 testq %rax, %rax
 jz __ZN10Superblock11alloc_blockEv.i_pubchalloc
 // this->free_blocks--
 decw 26(%rdi)
 // Move 2 bytes from rax into edx
 movzwl (%rax), %edx
 notw %dx
 jz __ZN10Superblock11alloc_blockEv.i_nbi_ffff
 notw %dx
 movslq %edx, %rcx
 // this->last_privately_freed_block = this->beginning + next_block_index
 addq %rcx, %rsi
 movq %rsi, 8(%rdi)
 ret
__ZN10Superblock11alloc_blockEv.i_nbi_ffff:
 movq 0,ドル 8(%rdi)
 ret
__ZN10Superblock11alloc_blockEv.i_pubchalloc:
 movq 16(%rdi), %rax
 // if this->last_publicly_freed_block == NULL
 testq %rax, %rax
 jz __ZN10Superblock11alloc_blockEv.i_noalloc
 // this->free_blocks--
 decw 26(%rdi)
 // if *(uint16_t)(this->last_publicly_freed_block + next_block_index) == 0xFFFF
 movzwl (%rax), %edx
 notw %dx
 jz __ZN10Superblock11alloc_blockEv.i_pubnbi_ffff
 notw %dx
 movslq %edx, %rcx
 // this->last_privately_freed_block = this->beginning + next_block_index
 addq %rcx, %rsi
 movq %rsi, 16(%rdi)
 ret
__ZN10Superblock11alloc_blockEv.i_pubnbi_ffff:
 movq 0,ドル 8(%rdi)
 ret
__ZN10Superblock11alloc_blockEv.i_noalloc:
 movq 0,ドル %rax
 ret
asked May 29, 2018 at 0:49
\$\endgroup\$
1
  • \$\begingroup\$ Don't you have some unit tests for the edge cases? Perhaps you should have posted them - it's often easier to find gaps in the testing than to spot the errors in the code. \$\endgroup\$ Commented May 29, 2018 at 7:55

1 Answer 1

3
\$\begingroup\$

These comments all have to do with optimization, not "logic holes." Also, they are written without having tested them. Obviously you'll want to both ensure correct function, and profile to see if they actually improve anything.

1) As an obvious first step:

xorq %rcx, %rcx

can be done as

xorq %ecx, %ecx

It produces the same effect, but it's 1 byte shorter.

2) It looks like you are using notw %dx (in 2 different places) to see if dx is 0xffff? I might try cmp 0ドルxffff, %dx instead. It's 2 bytes longer, but you don't have to "put it back" if it's not (the extra notw %dx costs 3 bytes).

3) (Check me on this one) You are using movslq %edx, %rcx to move edx to rcx. However because of how you populate edx, I believe it already clears the upper bits. Such being the case, a simple movq %rdx, %rcx might suffice.

4) The __ZN10Superblock11alloc_blockEv.i_pubchalloc loop should probably have a pause inserted. Perhaps something like this:

__ZN10Superblock11alloc_blockEv.i_pubchalloc2:
 pause
__ZN10Superblock11alloc_blockEv.i_pubchalloc:
 movq 16(%rdi), %rax
 // if this->last_publicly_freed_block == NULL
 testq %rax, %rax
 jz __ZN10Superblock11alloc_blockEv.i_pubchalloc2

See the docs on pause for why this is a good idea.

These are all pretty nitpicky. Since there's only the 1 loop, there's just not that much to optimize.

answered May 29, 2018 at 2:26
\$\endgroup\$
2
  • \$\begingroup\$ Actually, .i_pubchalloc isn't a loop - it was a typo on my part; should have been .i_noalloc, never got noticed because it's an extremely low-probability event \$\endgroup\$ Commented May 29, 2018 at 2:51
  • \$\begingroup\$ Ahh. I just assumed it was a multi-threading thing. \$\endgroup\$ Commented May 29, 2018 at 2:59

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.