String length calculation implementation in Assembly (NASM)

Question 1

I just made a working implementation of calculating a string's length, but I'm not exactly sure it's the most efficient way. I divide the string up in blocks of four bytes each (32 bits), and add 4 to EAX. Once I hit a completely NULL block, I roll back the count of EAX by 4 bytes, and then calculate the final block's size one byte at a time. It only works using ASCII characters, and is limited to 32 bit computing (though that can be easily upgraded to 64 bit soon). I'm using assembly on a Linux maching using NASM. Here's the code:

; Function to compute the length of a string in blocks
; this only works in ASCII, or else it may give the wrong character count.
; input register: EAX - the string
section .data
 BYTE_COUNT equ 4 ;4 bytes = 32 bits 
 NULL_TERMINATOR equ 0 ;0,円 aka 0
section .text
 global _strlen ;main entry point
 _strlen:
 push ebp ; c calling convention: preserve ebp
 mov ebx, eax ; mov to ebx the string
 jmp _NextBlock ; go to _NextChar
 _NextBlock: 
 cmp word [eax], NULL_TERMINATOR ; Compare whether or not this block is completely null (e.g. 0)
 jz _RollbackFinalBlock ; if it is, jump to _RollbackFinalBlock
 ; if not...
 add eax, BYTE_COUNT ; Add to EAX the block size
 jmp _NextBlock; repeat loop
 _RollbackFinalBlock:
 sub eax, BYTE_COUNT ;Subtract the block size from EAX, because it is null
 jmp _FinalizeFinalBlock ; go to _FinalizeFinalBlock
 _FinalizeFinalBlock:
 cmp byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
 jz _Exit ;if this byte is null, proceed to _Exit
 ; if not...
 inc eax ;increment EAX
 jmp _FinalizeFinalBlock ;repeat loop
 _Exit:
 dec eax ; We will have a null terminator at the end. remove it. 
 sub eax, ebx ; compute difference to get answer, and store result in EAX
 pop ebp ; load preserved EBP value from the stack
 ret ; exit this function

You can test this function by linking it with a c program, and calling it from there.

The problem I have is that I have an extra jump to _RollbackFinalBlock and _FinalizeFinalBlock. Is there a way to make this implementation faster and more efficient?

Question 2

I have a number of comments that I hope you find helpful.

Code comments

First, your code is generally well commented and I had little difficulty in understanding what the code was doing or why. That's a good thing. A few minor points, though. First, your comment for the overall function says

; input register: EAX - the string

however, it's probably useful to point out that EAX is actually a pointer to the string and isn't intended to contain a whole string.

Also, comments should tell something not obvious about the code. So this:

 jmp _NextBlock; repeat loop

is arguably OK, but this:

 inc eax ;increment EAX

is not really helpful. Instead you should say why the code is incrementing EAX rather than simply repeating what the instruction does.

Structure

Overall, the structure was OK, but the line

 jmp _NextBlock ; go to _NextChar

near the top of the function is not needed, since _NextBlock is the next instruction anyway. The same is true of the jmp just before _FinalizeFinalBlock. Generally speaking, you should strive to eliminate unconditional jumps. So for example, your code includes this:

 _FinalizeFinalBlock:
 cmp byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
 jz _Exit ;if this byte is null, proceed to _Exit
 ; if not...
 inc eax ;increment EAX
 jmp _FinalizeFinalBlock ;repeat loop
 _Exit:

But you could easily restructure that to eliminate the unconditional jmp like this:

 dec eax ; 
 _FinalizeFinalBlock:
 inc eax ;increment EAX
 cmp byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
 jnz _FinalizeFinalBlock ; keep going until final block is NULL
 _Exit:

Algorithm

The most significant problem I see with the code is the algorithm that relies on strings having multiple trailing NUL bytes. This may be true for code that calls this function, but it's not generally true of C strings which typically end with only a single NUL character.

Instruction set use

The repnz scasb instruction is probably going to be a lot more efficient that the loop you've constructed. I'd recommend reading about it and then incorporating that into your code.

Register use

The ebx register is destroyed and not saved by the function. This may be fine for your purposes, but it should be documented in comments at the top of the function.

Also, the code doesn't actually alter ebp so there isn't any point to pushing it onto the stack and popping it off at the end of the routine. (Maybe you meant ebx?)

Update:

On my machine (quad core running 3.2GHz under 64-bit Linux) I found this version to be slightly (15%) faster than the corresponding repnz scasb version:

strlen2:
 mov edx, [esp + 4]
 xor eax,eax ; count = 0
 jmp overit
looptop:
 inc edx
 inc eax
overit:
 cmp byte[edx],0
 jnz looptop
 ret

Question 3

Thanks for the review. I still have to brush up on the REPNZ SCASB instruction sets, but from what I can tell it can compare the string, byte for byte, for the null character until it is found. I implemented something similar which compares it byte for byte, but it took about 43 microseconds to complete. With this version of the code, it took 29-30 microseconds to complete. Let's see if REPNZ SCASB is able to do better.

Question 4

I'm using assembly on a Linux maching using NASM.

In that case, the parameter passed to _strlen is pushed on the stack, not passed in the eax register.

 cmp word [eax], NULL_TERMINATOR ; Compare whether or not this block is completely null (e.g. 0)
 jz _RollbackFinalBlock ; if it is, jump to _RollbackFinalBlock
 ; if not...
 add eax, BYTE_COUNT ; Add to EAX the block size

This is wrong: because you can't assume that a string has more than one zero-byte. Your implementation detects end-of-string only if all the word [eax] bytes are 0; whereas it should detect end-of-string if any of the word [eax] bytes are 0.

Also, is word what you wanted to say: or should you have written dword?

Is there a way to make this implementation faster and more efficient?

An algorithm like this might be faster than a naive implementation (not necessarily faster than an incorrect implementation like yours) because it tests 4 bytes at a time.

Alternatively on some processors there are SSE2 instructions which can be used for strlen.

Or, the simplest solution (and fast on some CPUs) is repnz scasb.

Question 5

Regarding repnz scasb, it requires that ECX be loaded with the counter. Where am I going to get the value for ECX, and if I did get the value already for ECX, wouldn't that already have been the string length?

Question 6

@pacoalphonso google.com/search?q=strlen+ecx suggests using sub ecx, ecx followed by not ecx to set it to 0xffffffff

Edward EdwardEdward 67.2k4 gold badges120 silver badges284 bronze badges · Answer 1 · 2014-04-24 14:27:06Z

I have a number of comments that I hope you find helpful.

Code comments

First, your code is generally well commented and I had little difficulty in understanding what the code was doing or why. That's a good thing. A few minor points, though. First, your comment for the overall function says

; input register: EAX - the string

however, it's probably useful to point out that EAX is actually a pointer to the string and isn't intended to contain a whole string.

Also, comments should tell something not obvious about the code. So this:

 jmp _NextBlock; repeat loop

is arguably OK, but this:

 inc eax ;increment EAX

is not really helpful. Instead you should say why the code is incrementing EAX rather than simply repeating what the instruction does.

Structure

Overall, the structure was OK, but the line

 jmp _NextBlock ; go to _NextChar

near the top of the function is not needed, since _NextBlock is the next instruction anyway. The same is true of the jmp just before _FinalizeFinalBlock. Generally speaking, you should strive to eliminate unconditional jumps. So for example, your code includes this:

 _FinalizeFinalBlock:
 cmp byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
 jz _Exit ;if this byte is null, proceed to _Exit
 ; if not...
 inc eax ;increment EAX
 jmp _FinalizeFinalBlock ;repeat loop
 _Exit:

But you could easily restructure that to eliminate the unconditional jmp like this:

 dec eax ; 
 _FinalizeFinalBlock:
 inc eax ;increment EAX
 cmp byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
 jnz _FinalizeFinalBlock ; keep going until final block is NULL
 _Exit:

Algorithm

The most significant problem I see with the code is the algorithm that relies on strings having multiple trailing NUL bytes. This may be true for code that calls this function, but it's not generally true of C strings which typically end with only a single NUL character.

Instruction set use

The repnz scasb instruction is probably going to be a lot more efficient that the loop you've constructed. I'd recommend reading about it and then incorporating that into your code.

Register use

The ebx register is destroyed and not saved by the function. This may be fine for your purposes, but it should be documented in comments at the top of the function.

Also, the code doesn't actually alter ebp so there isn't any point to pushing it onto the stack and popping it off at the end of the routine. (Maybe you meant ebx?)

Update:

On my machine (quad core running 3.2GHz under 64-bit Linux) I found this version to be slightly (15%) faster than the corresponding repnz scasb version:

strlen2:
 mov edx, [esp + 4]
 xor eax,eax ; count = 0
 jmp overit
looptop:
 inc edx
 inc eax
overit:
 cmp byte[edx],0
 jnz looptop
 ret

Thanks for the review. I still have to brush up on the REPNZ SCASB instruction sets, but from what I can tell it can compare the string, byte for byte, for the null character until it is found. I implemented something similar which compares it byte for byte, but it took about 43 microseconds to complete. With this version of the code, it took 29-30 microseconds to complete. Let's see if REPNZ SCASB is able to do better.

ChrisW ChrisWChrisW 13k1 gold badge35 silver badges76 bronze badges · Answer 2 · 2014-04-24 22:38:05Z

I'm using assembly on a Linux maching using NASM.

In that case, the parameter passed to _strlen is pushed on the stack, not passed in the eax register.

 cmp word [eax], NULL_TERMINATOR ; Compare whether or not this block is completely null (e.g. 0)
 jz _RollbackFinalBlock ; if it is, jump to _RollbackFinalBlock
 ; if not...
 add eax, BYTE_COUNT ; Add to EAX the block size

This is wrong: because you can't assume that a string has more than one zero-byte. Your implementation detects end-of-string only if all the word [eax] bytes are 0; whereas it should detect end-of-string if any of the word [eax] bytes are 0.

Also, is word what you wanted to say: or should you have written dword?

Is there a way to make this implementation faster and more efficient?

An algorithm like this might be faster than a naive implementation (not necessarily faster than an incorrect implementation like yours) because it tests 4 bytes at a time.

Alternatively on some processors there are SSE2 instructions which can be used for strlen.

Or, the simplest solution (and fast on some CPUs) is repnz scasb.

Regarding repnz scasb, it requires that ECX be loaded with the counter. Where am I going to get the value for ECX, and if I did get the value already for ECX, wouldn't that already have been the string length?
@pacoalphonso google.com/search?q=strlen+ecx suggests using sub ecx, ecx followed by not ecx to set it to 0xffffffff

Stack Exchange Network

String length calculation implementation in Assembly (NASM)

2 Answers 2

Code comments

Structure

Algorithm

Instruction set use

Register use

Update:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

String length calculation implementation in Assembly (NASM)

2 Answers 2

Code comments

Structure

Algorithm

Instruction set use

Register use

Update:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions