I just made a working implementation of calculating a string's length, but I'm not exactly sure it's the most efficient way. I divide the string up in blocks of four bytes each (32 bits), and add 4 to EAX. Once I hit a completely NULL block, I roll back the count of EAX by 4 bytes, and then calculate the final block's size one byte at a time. It only works using ASCII characters, and is limited to 32 bit computing (though that can be easily upgraded to 64 bit soon). I'm using assembly on a Linux maching using NASM. Here's the code:
; Function to compute the length of a string in blocks
; this only works in ASCII, or else it may give the wrong character count.
; input register: EAX - the string
section .data
BYTE_COUNT equ 4 ;4 bytes = 32 bits
NULL_TERMINATOR equ 0 ;0,円 aka 0
section .text
global _strlen ;main entry point
_strlen:
push ebp ; c calling convention: preserve ebp
mov ebx, eax ; mov to ebx the string
jmp _NextBlock ; go to _NextChar
_NextBlock:
cmp word [eax], NULL_TERMINATOR ; Compare whether or not this block is completely null (e.g. 0)
jz _RollbackFinalBlock ; if it is, jump to _RollbackFinalBlock
; if not...
add eax, BYTE_COUNT ; Add to EAX the block size
jmp _NextBlock; repeat loop
_RollbackFinalBlock:
sub eax, BYTE_COUNT ;Subtract the block size from EAX, because it is null
jmp _FinalizeFinalBlock ; go to _FinalizeFinalBlock
_FinalizeFinalBlock:
cmp byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
jz _Exit ;if this byte is null, proceed to _Exit
; if not...
inc eax ;increment EAX
jmp _FinalizeFinalBlock ;repeat loop
_Exit:
dec eax ; We will have a null terminator at the end. remove it.
sub eax, ebx ; compute difference to get answer, and store result in EAX
pop ebp ; load preserved EBP value from the stack
ret ; exit this function
You can test this function by linking it with a c program, and calling it from there.
The problem I have is that I have an extra jump to _RollbackFinalBlock and _FinalizeFinalBlock. Is there a way to make this implementation faster and more efficient?
2 Answers 2
I have a number of comments that I hope you find helpful.
Code comments
First, your code is generally well commented and I had little difficulty in understanding what the code was doing or why. That's a good thing. A few minor points, though. First, your comment for the overall function says
; input register: EAX - the string
however, it's probably useful to point out that EAX is actually a pointer to the string and isn't intended to contain a whole string.
Also, comments should tell something not obvious about the code. So this:
jmp _NextBlock; repeat loop
is arguably OK, but this:
inc eax ;increment EAX
is not really helpful. Instead you should say why the code is incrementing EAX
rather than simply repeating what the instruction does.
Structure
Overall, the structure was OK, but the line
jmp _NextBlock ; go to _NextChar
near the top of the function is not needed, since _NextBlock
is the next instruction anyway. The same is true of the jmp
just before _FinalizeFinalBlock
. Generally speaking, you should strive to eliminate unconditional jumps. So for example, your code includes this:
_FinalizeFinalBlock:
cmp byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
jz _Exit ;if this byte is null, proceed to _Exit
; if not...
inc eax ;increment EAX
jmp _FinalizeFinalBlock ;repeat loop
_Exit:
But you could easily restructure that to eliminate the unconditional jmp
like this:
dec eax ;
_FinalizeFinalBlock:
inc eax ;increment EAX
cmp byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
jnz _FinalizeFinalBlock ; keep going until final block is NULL
_Exit:
Algorithm
The most significant problem I see with the code is the algorithm that relies on strings having multiple trailing NUL bytes. This may be true for code that calls this function, but it's not generally true of C strings which typically end with only a single NUL character.
Instruction set use
The repnz scasb
instruction is probably going to be a lot more efficient that the loop you've constructed. I'd recommend reading about it and then incorporating that into your code.
Register use
The ebx
register is destroyed and not saved by the function. This may be fine for your purposes, but it should be documented in comments at the top of the function.
Also, the code doesn't actually alter ebp
so there isn't any point to pushing it onto the stack and popping it off at the end of the routine. (Maybe you meant ebx
?)
Update:
On my machine (quad core running 3.2GHz under 64-bit Linux) I found this version to be slightly (15%) faster than the corresponding repnz scasb
version:
strlen2:
mov edx, [esp + 4]
xor eax,eax ; count = 0
jmp overit
looptop:
inc edx
inc eax
overit:
cmp byte[edx],0
jnz looptop
ret
-
1\$\begingroup\$ Thanks for the review. I still have to brush up on the REPNZ SCASB instruction sets, but from what I can tell it can compare the string, byte for byte, for the null character until it is found. I implemented something similar which compares it byte for byte, but it took about 43 microseconds to complete. With this version of the code, it took 29-30 microseconds to complete. Let's see if REPNZ SCASB is able to do better. \$\endgroup\$Paco G– Paco G2014年04月24日 17:13:33 +00:00Commented Apr 24, 2014 at 17:13
I'm using assembly on a Linux maching using NASM.
In that case, the parameter passed to _strlen
is pushed on the stack, not passed in the eax register.
cmp word [eax], NULL_TERMINATOR ; Compare whether or not this block is completely null (e.g. 0)
jz _RollbackFinalBlock ; if it is, jump to _RollbackFinalBlock
; if not...
add eax, BYTE_COUNT ; Add to EAX the block size
This is wrong: because you can't assume that a string has more than one zero-byte. Your implementation detects end-of-string only if all the word [eax]
bytes are 0; whereas it should detect end-of-string if any of the word [eax]
bytes are 0.
Also, is word
what you wanted to say: or should you have written dword
?
Is there a way to make this implementation faster and more efficient?
An algorithm like this might be faster than a naive implementation (not necessarily faster than an incorrect implementation like yours) because it tests 4 bytes at a time.
Alternatively on some processors there are SSE2 instructions which can be used for strlen.
Or, the simplest solution (and fast on some CPUs) is repnz scasb
.
-
\$\begingroup\$ Regarding repnz scasb, it requires that ECX be loaded with the counter. Where am I going to get the value for ECX, and if I did get the value already for ECX, wouldn't that already have been the string length? \$\endgroup\$Paco G– Paco G2014年04月25日 01:29:26 +00:00Commented Apr 25, 2014 at 1:29
-
1\$\begingroup\$ @pacoalphonso google.com/search?q=strlen+ecx suggests using
sub ecx, ecx
followed bynot ecx
to set it to0xffffffff
\$\endgroup\$ChrisW– ChrisW2014年04月25日 01:46:02 +00:00Commented Apr 25, 2014 at 1:46