The following routine is the block-allocation procedure in a fixed-size allocator being written for library use. It is designed to be accessed from C++ (the un-mangled symbol name is void * Superblock::alloc_block()
). The performance of this routine is the critical to the library; and I, with almost no prior experience in assembly, am looking for suggestions about performance and also any holes in the logic. It seems to be working correctly right now, but I'm worried about edge cases.
Execution context: written for x86_64 System V abi, full allocator may be found here: https://github.com/cmura81/experimental-allocator.
alloc_sb_alloc_block.s:
.globl __ZN10Superblock11alloc_blockEv
// 'this is passed in %rdi
// Variables:
// this->beginning = 0(%rdi) +8
// this->last_privately_freed_block = 8(%rdi) +8
// this->last_publicly_freed_block = 16(%rdi) +8
// this->block_size = 24(%rdi) +2
// this->free_blocks = 26(%rdi) +2
// this->max_blocks = 28(%rdi) +2
__ZN10Superblock11alloc_blockEv:
xorq %rcx, %rcx
// this->beginning -> rsi
movq 0(%rdi), %rsi
// allocated_block = this->last_privately_freed_block
movq 8(%rdi), %rax
// if this->last_privately_freed_block == NULL
testq %rax, %rax
jz __ZN10Superblock11alloc_blockEv.i_pubchalloc
// this->free_blocks--
decw 26(%rdi)
// Move 2 bytes from rax into edx
movzwl (%rax), %edx
notw %dx
jz __ZN10Superblock11alloc_blockEv.i_nbi_ffff
notw %dx
movslq %edx, %rcx
// this->last_privately_freed_block = this->beginning + next_block_index
addq %rcx, %rsi
movq %rsi, 8(%rdi)
ret
__ZN10Superblock11alloc_blockEv.i_nbi_ffff:
movq 0,ドル 8(%rdi)
ret
__ZN10Superblock11alloc_blockEv.i_pubchalloc:
movq 16(%rdi), %rax
// if this->last_publicly_freed_block == NULL
testq %rax, %rax
jz __ZN10Superblock11alloc_blockEv.i_noalloc
// this->free_blocks--
decw 26(%rdi)
// if *(uint16_t)(this->last_publicly_freed_block + next_block_index) == 0xFFFF
movzwl (%rax), %edx
notw %dx
jz __ZN10Superblock11alloc_blockEv.i_pubnbi_ffff
notw %dx
movslq %edx, %rcx
// this->last_privately_freed_block = this->beginning + next_block_index
addq %rcx, %rsi
movq %rsi, 16(%rdi)
ret
__ZN10Superblock11alloc_blockEv.i_pubnbi_ffff:
movq 0,ドル 8(%rdi)
ret
__ZN10Superblock11alloc_blockEv.i_noalloc:
movq 0,ドル %rax
ret
-
\$\begingroup\$ Don't you have some unit tests for the edge cases? Perhaps you should have posted them - it's often easier to find gaps in the testing than to spot the errors in the code. \$\endgroup\$Toby Speight– Toby Speight2018年05月29日 07:55:59 +00:00Commented May 29, 2018 at 7:55
1 Answer 1
These comments all have to do with optimization, not "logic holes." Also, they are written without having tested them. Obviously you'll want to both ensure correct function, and profile to see if they actually improve anything.
1) As an obvious first step:
xorq %rcx, %rcx
can be done as
xorq %ecx, %ecx
It produces the same effect, but it's 1 byte shorter.
2) It looks like you are using notw %dx
(in 2 different places) to see if dx is 0xffff? I might try cmp 0ドルxffff, %dx
instead. It's 2 bytes longer, but you don't have to "put it back" if it's not (the extra notw %dx
costs 3 bytes).
3) (Check me on this one) You are using movslq %edx, %rcx
to move edx to rcx. However because of how you populate edx, I believe it already clears the upper bits. Such being the case, a simple movq %rdx, %rcx
might suffice.
4) The __ZN10Superblock11alloc_blockEv.i_pubchalloc
loop should probably have a pause
inserted. Perhaps something like this:
__ZN10Superblock11alloc_blockEv.i_pubchalloc2:
pause
__ZN10Superblock11alloc_blockEv.i_pubchalloc:
movq 16(%rdi), %rax
// if this->last_publicly_freed_block == NULL
testq %rax, %rax
jz __ZN10Superblock11alloc_blockEv.i_pubchalloc2
See the docs on pause
for why this is a good idea.
These are all pretty nitpicky. Since there's only the 1 loop, there's just not that much to optimize.
-
\$\begingroup\$ Actually,
.i_pubchalloc
isn't a loop - it was a typo on my part; should have been.i_noalloc
, never got noticed because it's an extremely low-probability event \$\endgroup\$Lux– Lux2018年05月29日 02:51:59 +00:00Commented May 29, 2018 at 2:51 -
\$\begingroup\$ Ahh. I just assumed it was a multi-threading thing. \$\endgroup\$David Wohlferd– David Wohlferd2018年05月29日 02:59:24 +00:00Commented May 29, 2018 at 2:59
Explore related questions
See similar questions with these tags.