Unsigned integer division ARM Cortex-M0+ Assembly

Question 1

I am writing a subroutine for unsigned integer division in Assembly. I will call the subroutine DIVU.

Inputs: R1 will be the dividend. The divisor will be in R0.
Outputs: The quotient is going to be in RO and the remained in R1.

Basically, I am trying to make something like this:

R1 / R0 = R0remainderR1

If R0=0, I want to leave the input parameters unchanged and set the C flag when it returns. Otherwise, I just want to clear the C flag. I do not want to change any other registers' values after returning.

I have followed this idea:

Quotient = 0; 
while (Dividend >= Divisor) { 
 Dividend -= Divisor; 
 Quotient += 1;
}
Remainder = Dividend;

This is just a learning exercise, so the low performance of repeated subtraction is ok, as discussed in comments on the original Stack Overflow question posted before writing this code.

And in Assembly this is what I produced:

DIVU
 CMP R1,#0 ;compares R1 to 0
 BEQ AnsZero ;if R1=0, it branches to AnsZero (the final answer will be 0)
 CMP R0,#0 ;compares R0 to 0
 BEQ EndFlag ;if R0=0, it will go to the end to set C flag
 PUSH {R3, LR} ;saves R3 so it can used as a counter for quotient
 MOV R3,#0 ;sets R3 to 0
 While CMP R0,R1 ;start of while loop 
 BLT EndWhil ;Branches to end of while when dividend < divisor, otherwise goes through loop
 SUB R1,R1,R0 ;R1=R1-R0 , dividend=dividend-divisor
 ADD R3,R3,#1 ;R3=R3+1, quotient=quotient+1 (init is zero, so 0+1=1 if one successful loop)
 B While ;continues loop
EndWhil MOV R0,R3 ;R0=R3, the register that had the divisor gets the quotient
 POP {R3, PC} ;R3's original value is returned
 BX LR ;ends subroutine
EndFlag SUBS R0,R0,#1 
 MOV R0, #0 
 BX LR ;ends subroutine
AnsZero MOV R0,#0 ;sets R0=0 because R1=1, 0/X=0r0
 BX LR ;ends subroutine
 BX LR ;ends subroutine

Question 2

@PeterCordes did you have the improved solution you had mentioned in the other thread?

Question 3

Still working on it. Thumb mode is difficult; there's no MOV reg, #immediate, only MOVS, so it's actually hard to return with the C flag set. IDK if I can avoid saving/restoring a register in the return path that needs to have C set, just because ADD and SUB immediate are only available in their flag-setting forms. It would be a lot more sensible to make callers check for divide by zero instead of doing this weird flag-return calling convention!

Question 4

There is a MOV R#,R# instruction that won't set any flags if that is what you are trying to avoid. But thank you!

Question 5

this is my handy list of things i know i can use imgur.com/a/PTh4Y (i took screenshots and posted on imgur for you)

Question 6

That's exactly what I meant by having to save/restore a register, if I have to use an extra register to hold an immediate zero without setting flags. see stackoverflow.com/questions/30980160/… and infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0432c/…. BTW, the usual notation would be MOV Rd, Rn or something, not R#. I first read that as MOV Rd, #imm

Question 7

You have at least a couple bugs. The LT condition is signed less-than. You need BLO to branch on the unsigned Lower-Than condition (branch if Carry is unset). See also this article about carry vs. overflow.

Also, I think you forgot to put the remainder into R1.

Your custom calling convention makes life difficult. Flag return values appear to be cumbersome in Thumb mode, because many instructions are only available in flag-setting form. (Cortex-M0 only supports Thumb mode, with these instructions.) It's also strange to not let your function clobber R2 and R3 like the standard calling convention allows. This would reduce code-size for the function, although it would increase overall code size if there are many call sites.

It's normal to arrange a loop so the conditional branch is at the end. That reduces the instruction-count by one (removing the unconditional branch). Sometimes you need to test if the loop should even run once before falling into it, or jump to the test at the end, if you can't guarantee that it should always run at least once (do{}while() style).

You can combine the exit code-paths around AnsZero. You have MOV R0, #0 / BX LR twice, so you should just put AnsZero pointing at the first one and leave out the second. You also have two consecutive BX LR instructions, where you previously had a B to the next line at the end of the function. Never branch to the instruction that normal fall-through execution would take you to anyway.

The comment character in ARM asm is @. ; is used in x86 NASM / MASM, but the GNU assembler uses it to separate multiple instructions on the same line. Maybe there are ARM assemblers that use ; as a comment character, but making your code assemble with GAS seems like a good idea. Note that mov r3, #0 won't assemble with -mcpu=cortex-m0, because movs is the only immediate-mov instruction it supports. Cortex-M0 has very limited instruction choices.

Further style points: use : after label names, even if your assembler syntax doesn't technically require it. Some people may like to omit it when assembling data sections, but I don't think anyone likes it for code sections.

All-upper-case for asm instructions and register names is a valid choice. I don't like it, but I guess it doesn't hurt. Using it for symbol names is a bad idea, because you don't want to have to use all-caps names to call it from C.

Avoid useless comments like CMP R0,#0 ;compares R0 to 0. asm mnemonics are not that hard to decipher (except PowerPC). Comment space is limited, don't waste it saying the same thing the reader learned from reading the code itself.

Leave blank spaces between logically-separate blocks of code, even when there aren't branches. This improves human-readability.

I like to leave a space between operands in the operand list, like cmp r0, #0 instead of cmp r0,#0.

My version:

Always comment the top of your function with some high level description of input/output register usage. Just like in a higher-level language, describe the contract the function makes with its caller.

My asm code usually ends up littered with comments about alternatives I decided against. It's not an ideal example of good style.

.syntax unified @ allow 2 or 3 operand forms of instructions.
.cpu cortex-m0
.thumb @ this is probably implied already by the .cpu
@@ input: R0=divisor, R1=dividend
@@ calculate R1/R0 by repeated subtraction
@@ output: R0=quotient, R1=remainder, C flag unset.
@@ or on division by zero: R0,R1 unchanged, C flag set.
@@ Other regs unmodified (even r2 and r3, which the normal calling convention allows functions to use as scratch regs)
.globl divu
divu:
 CMP R1, #0 @ return 0,0 instead of divide error for the 0/0 corner case.
 BEQ zero_dividend @ label names that describe why you go there are usually good. Comments at the label can describe what happens there.
 CMP R0, #0
 BEQ div_by_zero
 @CMP R0, R1 @ let this case fall through the loop once, instead of slowing down the common case to speed up this special case.
 @BLO QuotientZero
 PUSH {R3, LR} @ LR doesn't make a good scratch reg, since many insns can only use low regs (R0-R7). Push/popping it saves a BX LR
 MOVS R3, #0 @ R3 = quotient = repeated-subtraction counter
@ LDR R3, =#-1 @ account for the loop overshoot up-front. But don't do this because cortex-m0 can't encode it in one insn other than a PC-relative load
sub_loop: @do{
 ADDS R3, #1 @ quotient += 1. (init is zero, so 0+1=1 if one successful loop)
 SUBS R1, R0 @ dividend -= divisor and set flags,
 @ CMP R0, R1 @ ...avoiding this cmp instruction. Potentially a significant speedup for a tight loop.
 BLO sub_loop @} while(that didn't carry)@ i.e. while divisor was lower (unsigned) than the old value of dividend.
@EndWhile:
 @@ now we've subtracted one too many times. Detecting that carry is the loop exit condition.
 @@ It's worth extra instructions outside the loop to save one inside the loop.
 @@ BUGFIX: original forgot to put the remainder in R1
 ADD R1, R0 @ remainder, undoing the overshoot
 SUBS R0, R3, #1 @ quotient, undoing the overshoot and clearing the C flag.
 @ Except that this will carry for 0xFFFFFFFF / 1.
 CMP R0, R0 @ clear C flag. TODO: avoid this otherwise-redundant instruction
 POP {R3, PC} @ return by popping straight into the PC
div_by_zero: @@ We only get here with r0 == 0
 SUBS R0, #1 @ Set the C flag and fall through to a mov the restores R0 to its original value
zero_dividend: @@ CMP cleared the C flag
 MOVS R0, #0 @ doesn't affect the C flag. MOV Rd, #imm isn't available for Cortex-M0
 BX LR

I'm not an ARM expert, and there may be a slight difference between popping into PC vs. running BX LR. BX LR can return from Thumb code to ARM code, or vice versa, but popping into PC can't. AFAIK, either is fine for Thumb -> Thumb returns which are your only option on a Cortex-M0.

This really does assemble. I didn't test it, but the disassembly looks like we'd expect (which is a useful sanity check):

$ arm-linux-gnueabi-objdump -d arm-divu.o 
arm-divu.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <divu>:
 0: 2900 cmp r1, #0
 2: d00a beq.n 1a <zero_dividend>
 4: 2800 cmp r0, #0
 6: d007 beq.n 18 <div_by_zero>
 8: b508 push {r3, lr}
 a: 2300 movs r3, #0
0000000c <sub_loop>:
 c: 3301 adds r3, #1
 e: 1a09 subs r1, r1, r0
 10: d3fc bcc.n c <sub_loop>
 12: 4401 add r1, r0
 14: 1e58 subs r0, r3, #1
 16: 4280 cmp r0, r0
 18: bd08 pop {r3, pc}
0000001a <div_by_zero>:
 1a: 3801 subs r0, #1
0000001c <zero_dividend>:
 1c: 2000 movs r0, #0
 1e: 4770 bx lr

I don't know much at all about tuning for Cortex-M0, but perhaps aligning the top of sub_loop would be good. Maybe to a 16-byte boundary, or at least so all three instructions are in the same 16-byte block. (Currently the branch is in the next block after the ADDS/SUBS.)

Question 8

This is great! I do enjoy your commenting, it makes it really easy to follow what is happening with the code. Is the sub-loop just a loop in a subroutine, or is there more to it?

Question 9

@JasonR: It's a loop that subtracts... It's an attempt to be more descriptive with the labels than While :) And thanks, yeah my code may be cluttered, but I think I'm pretty good at writing comments that explains what's going on at the higher level, not just describing what the asm instructions already say.

Question 10

okay. i thought it was something simple, was just double checking your naming thought. im gonna read through the rest of the code and comments you made and try to learn it all. thanks for spending the time on this

Peter Cordes Peter Cordes 3,76118 silver badges28 bronze badges · Accepted Answer · 2016-09-15 03:15:31Z