I wrote a program in AVR ASM
for converting 32-bit
unsigned binary numbers to 8 digit
decimals based on the shift-add-3
. (I know that 32-bit
is more than 8 digit, but I only need 8.)
The 32-bit
input is in R16-R19
(low-high).
The 8 digit
output is in R20-R24
(low-high), 2 number / byte, one in the lower nibble, one in the higher nibble.
My problem: It takes ~1500 cycles to compute a 16-bit
number and ~2000 cycles to compute a 32-bit
.
Can anybody suggest me a faster, more professional method for this? Running a 2000 cycle procedure on a ATtiny at 32,768 Khz
is not something I am comfortable with.
Memory usage map:
Memory map for BinaryToBCD
Definitions:
.def a0 = r16
.def a1 = r17
.def a2 = r18
.def a3 = r19
.def b0 = r20
.def b1 = r21
.def b2 = r22
.def b3 = r23
.def i = r24
.def j = r25
The code:
BinaryToBCD:
clr b0
clr b1
clr b2
clr b3
ldi i, 32
sts 0x0068, i ;(SRAM s8)
BinaryToBCD_1:
clc
rol a0
rol a1
rol a2
rol a3
rol b0
rol b1
rol b2
rol b3
lds i, 0x0068 ;(SRAM s8)
dec i
sts 0x0068, i ;(SRAM s8)
brne BinaryToBCD_2
ret
BinaryToBCD_2:
cpi b0, 0
breq BinaryToBCD_3
mov i, b0
rcall Add3ToNibbles
mov b0, i
BinaryToBCD_3:
cpi b1, 0
breq BinaryToBCD_4
mov i, b1
rcall Add3ToNibbles
mov b1, i
BinaryToBCD_4:
cpi b2, 0
breq BinaryToBCD_5
mov i, b2
rcall Add3ToNibbles
mov b2, i
BinaryToBCD_5:
cpi b3, 0
breq BinaryToBCD_1
mov i, b3
rcall Add3ToNibbles
mov b3, i
rjmp BinaryToBCD_1
Add3ToNibbles:
mov j, i
andi j, 0b00001111
cpi j, 5
in j, SREG
sbrs j, 0
subi i, -3
mov j, i
swap j
andi j, 0b00001111
cpi j, 5
in j, SREG
sbrs j, 0
subi i, -48
ret
-
\$\begingroup\$ What about a look-up table and exploiting the fact that it is triangular? \$\endgroup\$venny– venny2014年09月15日 10:08:50 +00:00Commented Sep 15, 2014 at 10:08
-
\$\begingroup\$ Wh not use faster internal oscillators? \$\endgroup\$Golaž– Golaž2014年09月15日 11:47:07 +00:00Commented Sep 15, 2014 at 11:47
-
\$\begingroup\$ Please tell me more about the table and the "triangularity", I do not know what you mean. Cannot use faster Osc, because this chip manages time and date also. 32768 is the highest-precision, with this I only need 16*2 bit overflow on the timer. \$\endgroup\$Gábor DANI– Gábor DANI2014年09月15日 15:35:45 +00:00Commented Sep 15, 2014 at 15:35
-
1\$\begingroup\$ @GáborDani What I meant was to have an array of decimal numbers with one byte per digit for every 2^n, like (3,2,7,6,8),(1,6,3,8,4),(0,8,1,9,2). Then you go through the individual bits of the binary number and add the numbers to 8-byte long array (one byte for every digit). As you go from bit 31 to 0, the decimals get shorter so less additions are required(that is what i meant by triangularity). \$\endgroup\$venny– venny2014年09月15日 16:01:15 +00:00Commented Sep 15, 2014 at 16:01
-
\$\begingroup\$ i would try to write it in c and look at the output of the compiler (do not forget to switch on optimization) to maybe learn some tricks to apply on my own code \$\endgroup\$vlad_tepesch– vlad_tepesch2015年04月05日 20:37:24 +00:00Commented Apr 5, 2015 at 20:37
2 Answers 2
This is based on venny's approach (venny called it triangulation), expressed on a "pseudo-C":
uint32 x; // input variable to convert
w = { 2, 1, 4, 7, 4, 8, 3, 6, 4, 8 }; // 2^31
r = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; // initial result = 0
for (i = 31; i >= 0; i --)
{
if ( 2^i AND x ) // is x's bit i up?
add(r, w); // if yes, 1 ASCII ADD and 9 ASCII ADD w/CARRY MAX
divide(w, 2) // 10 SHIFT RIGHT MAX
}
Routines add and divide are not needed explanation, imo.
There are a number of papers and application notes on the subject. For example, http://www.element14.com/community/servlet/JiveServlet/downloadBody/47820-102-3-258641/Cypress.Application_Notes_35.pdf