I'm working on a game for the Atari VCS and have been dusting off my 30+ year old skills in writing MOS 6502 assembly language.
The VCS runs at 1.19 MHz and is based on a MOS 6507 chip with some custom logic and 128 bytes only of RAM.
Outline of Problem
The game needs to display a value as 3 ASCII digits, but I'm working in full range signed and unsigned bytes, rather than BCD, so I now have the problem of converting from a full range byte to ASCII digits.
What I have so Far
I have been able to put the following together, and it appears to work. I'd welcome any tips for making it smaller, faster etc.
Currently the program is estimated to run in around 114+ cycles, just counting the instruction cycle counts with no branches taken.
;
; FormatUnsigned.asm
;
; 6502 routine to format an unsigned byte into a space
; followed by 2 decimal digits if < 100, or 3 decimal
; digits if >= 100.
;
seg.u Variables
org 80ドル
char1 .byte ; output chars
char2 .byte
char3 .byte
Temp .byte ; temp
seg Code
org $f000
TestFormat
lda #211 ; sample value
jsr FormatUnsigned ; format
brk ; halt
; A = value to format unsigned
FormatUnsigned:
ldy #20ドル ; default hundreds digit
sty char1 ; is a space
; calculate char1
ldy #30ドル ; '0'
sec ; set carry for sub loop
Sub100
sbc #100
iny ; y = '1', '2'
bcs Sub100 ; loop whilst carry is not borrowed
adc #100 ; add 100 back
dey ; and take back the inc to y
cpy #30ドル ; if y is '0', just leave the space in there
beq SkipHundreds
sty char1 ; save '1' or '2' into char1
SkipHundreds
; format value < 100 into BCD
tay ; save a in y
lsr
lsr
lsr
lsr
and #$F ; get high nybble
tax ; into x for indexing
lda highNybble,x
sta Temp ; save in temp
tya ; get value - 100s back
and #0ドルf ; low nybble
tay ; low nybble in y
cmp #0ドルa ; <10?
bcc NoSub10
sbc #0ドルa ; subtract 10
tay ; save in y
clc
lda #10ドル ; Add '10' to bcd saved value
adc Temp
sta Temp
tya
NoSub10
sed ; decimal mode
adc Temp ; add bcd value to 0-9 in a
sta Temp ; save bcd value
cld ; leave decimal mode
Write2ndChar
lsr ; get high nybble
lsr
lsr
lsr
and #0ドルf
adc #30ドル
sta char2 ; save 2nd character
Write3rdChar
lda Temp
and #0ドルf ; get low nybble
adc #30ドル
sta char3 ; save 3rd character
rts
highNybble
.byte 00ドル
.byte 16ドル
.byte 32ドル
.byte 48ドル
.byte 64ドル
.byte 80ドル
.byte 96ドル
The program works correctly for tested inputs, producing an output of 32ドル 31ドル 31ドル for the decimal value 211.
1 Answer 1
First observations.
Memory access is costly in terms of cycles. Your solution uses 10 of these. The program I present later on manages to just require 3 accesses to memory.
By placing the most used variables in the zero-page, you obtained the best possible memory access performance, but avoiding variables is preferable.
The FormatUnsigned subroutine uses some 76 bytes and the byte-variables add another 11 bytes. This is a lot for such an humble task. The routine I present later on will be written in
(削除) 46 (削除ここまで)43 bytes and only add 3 byte-variables.
An opportunity for optimizing.
lsr lsr lsr lsr and #$F ; get high nybble
You don't need the and #0ドルF
instruction because the 4 lsr
instructions in a row already leave the high nibble empty. This shaves off 2 bytes from the program and reduces the number of cycles by 2. These benefits double because you used this construct twice throughout the program.
A unclear comment.
; 6502 routine to format an unsigned byte into a space ; followed by 2 decimal digits if < 100, or 3 decimal ; digits if >= 100.
From this comment I understand that numbers in the range [0,9] will be shown with 1 leading space and 1 leading zero. Wouldn't you agree this to be ugly?
The version I propose next will show 2 leading spaces on such numbers.
My version.
The worst case runs in 108 cycles. This happens with values from 190 to 199.
The best case for this code runs in just 44 cycles. Not surprisingly this is for inputs from 0 to 9.
I personally don't like the decimal mode of the 6502 and so this solution doesn't use it.
ldx #20ドル ;" "
sec
sbc #100
bcc OK1
ldx #31ドル ;"1"
sbc #100 ; (1)
bcc OK1
inx ;"2"
sbc #100 ; (2)
OK1
adc #100
stx char1
ldy #2ドルF ;"0"-1 (4)
sec
Sub10
sbc #10 ; (3)
iny ;["0","9"] (4)
bcs Sub10
adc #10
cpy #30ドル ;"0"
bne OK2
cpx #20ドル ;" " (5)
bne OK2 ; (5)
ldy #20ドル ;" "
OK2
sty char2
ora #30ドル ;["0","9"] (6)
sta char3
- (1)(2)(3) The carry flag is already set. No need to use
sec
beforesbc
. - (2) This instruction is meant to be a jump over the following
adc #100
. This would have required 3 bytes usingjmp STXCHAR1
or 2 bytes usingbcs STXCHAR1
. In both cases it would also have taken 3 cycles to run. By writing a complementary subtraction I undo the effect of the following addition, and realise the shortest code and the least cycles! - (4) The peculiar value 2ドルF is correct because the loop at Sub10 always runs at least once the
iny
instruction. Therefore 1 to 10 iterations of this loop will yield Y-register values from 30ドル to 39ドル. - (5) This extra code takes care of the 2 leading spaces on numbers from 0 to 9.
- (6) Normally here we would like to add 48 to the value in the accumulator holding [0,9]. Using the
adc #48
instruction would have required aclc
to give correct result. By using theora #48
the result is equally correct but the code is 1 byte smaller and 2 cycles faster.
Still room for improvements.
In an effort to get rid of the separate sec
instruction I came up with a 1 byte smaller solution. This also no longer contains the complementary adc #100
that you commented about.
ldx #20ドル ;" "
cmp #100
bcc OK1
sbc #100
ldx #31ドル ;"1"
cmp #100
bcc OK1
sbc #100
inx ;"2"
OK1
stx char1
Nice, but wouldn't it be possible to combine a few things (the first sbc #100
and the second cmp #100
)? Yes. Using sbc #200
and a conditional complementary adc #100
the code has yet again shrunk, this time by another 2 bytes.
ldx #20ドル ;" "
cmp #100
bcc OK1
ldx #32ドル ;"2"
sbc #200
bcs OK1
adc #100
dex ;"1"
OK1
stx char1
Some speed measurements.
All of these numbers relate to the first part of the program where the hundreds digit gets calculated.
Input Question Answer1 Answer2 Answer3
------- -------- ------- ------- -------
0- 99 24 14 10 10 cycles
100-199 33 20 18 19 cycles
200-255 40 23 21 16 cycles
------- -------- ------- ------- -------
Average 31.02 18.31 15.53 14.83 cycles
21 20 19 17 bytes
For the average it is understood that all numbers from 0 to 255 have the same probability.
-
\$\begingroup\$ Thanks for that. Some good points, the most basic of which I'd already managed to hit, including the 'optimizing for carry flag state'. You have carried over the adc #100. In fact, there's a shorter solution avoiding adding back the 100 deducted. I will return with some changes. Thanks, Jonathan \$\endgroup\$Jonathan Watmough– Jonathan Watmough2017年01月23日 12:26:22 +00:00Commented Jan 23, 2017 at 12:26
-
\$\begingroup\$ You must have read my mind! Early this week I came up with a new solution that's both shorter and faster. Today I've added it to my answer. \$\endgroup\$Sep Roland– Sep Roland2017年01月29日 19:38:03 +00:00Commented Jan 29, 2017 at 19:38