5
\$\begingroup\$

I'm working on a game for the Atari VCS and have been dusting off my 30+ year old skills in writing MOS 6502 assembly language.

The VCS runs at 1.19 MHz and is based on a MOS 6507 chip with some custom logic and 128 bytes only of RAM.

Outline of Problem

The game needs to display a value as 3 ASCII digits, but I'm working in full range signed and unsigned bytes, rather than BCD, so I now have the problem of converting from a full range byte to ASCII digits.

What I have so Far

I have been able to put the following together, and it appears to work. I'd welcome any tips for making it smaller, faster etc.

Currently the program is estimated to run in around 114+ cycles, just counting the instruction cycle counts with no branches taken.

;
; FormatUnsigned.asm
;
; 6502 routine to format an unsigned byte into a space
; followed by 2 decimal digits if < 100, or 3 decimal
; digits if >= 100.
;
 seg.u Variables
 org 80ドル
char1 .byte ; output chars
char2 .byte
char3 .byte
Temp .byte ; temp
 seg Code
 org $f000
TestFormat
 lda #211 ; sample value
 jsr FormatUnsigned ; format
 brk ; halt
; A = value to format unsigned
FormatUnsigned:
 ldy #20ドル ; default hundreds digit
 sty char1 ; is a space
; calculate char1
 ldy #30ドル ; '0'
 sec ; set carry for sub loop
Sub100
 sbc #100
 iny ; y = '1', '2'
 bcs Sub100 ; loop whilst carry is not borrowed
 adc #100 ; add 100 back
 dey ; and take back the inc to y
 cpy #30ドル ; if y is '0', just leave the space in there
 beq SkipHundreds
 sty char1 ; save '1' or '2' into char1
SkipHundreds
; format value < 100 into BCD
 tay ; save a in y
 lsr
 lsr
 lsr
 lsr
 and #$F ; get high nybble
 tax ; into x for indexing
 lda highNybble,x
 sta Temp ; save in temp
 tya ; get value - 100s back
 and #0ドルf ; low nybble
 tay ; low nybble in y
 cmp #0ドルa ; <10?
 bcc NoSub10
 sbc #0ドルa ; subtract 10
 tay ; save in y
 clc 
 lda #10ドル ; Add '10' to bcd saved value
 adc Temp
 sta Temp
 tya
NoSub10
 sed ; decimal mode
 adc Temp ; add bcd value to 0-9 in a
 sta Temp ; save bcd value
 cld ; leave decimal mode
Write2ndChar
 lsr ; get high nybble
 lsr
 lsr
 lsr
 and #0ドルf
 adc #30ドル
 sta char2 ; save 2nd character
Write3rdChar
 lda Temp
 and #0ドルf ; get low nybble
 adc #30ドル
 sta char3 ; save 3rd character
 rts
highNybble
 .byte 00ドル
 .byte 16ドル
 .byte 32ドル
 .byte 48ドル
 .byte 64ドル
 .byte 80ドル
 .byte 96ドル

The program works correctly for tested inputs, producing an output of 32ドル 31ドル 31ドル for the decimal value 211.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jan 15, 2017 at 20:29
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

First observations.

  • Memory access is costly in terms of cycles. Your solution uses 10 of these. The program I present later on manages to just require 3 accesses to memory.

  • By placing the most used variables in the zero-page, you obtained the best possible memory access performance, but avoiding variables is preferable.

  • The FormatUnsigned subroutine uses some 76 bytes and the byte-variables add another 11 bytes. This is a lot for such an humble task. The routine I present later on will be written in (削除) 46 (削除ここまで) 43 bytes and only add 3 byte-variables.


An opportunity for optimizing.

lsr
lsr
lsr
lsr
and #$F ; get high nybble

You don't need the and #0ドルF instruction because the 4 lsr instructions in a row already leave the high nibble empty. This shaves off 2 bytes from the program and reduces the number of cycles by 2. These benefits double because you used this construct twice throughout the program.


A unclear comment.

; 6502 routine to format an unsigned byte into a space
; followed by 2 decimal digits if < 100, or 3 decimal
; digits if >= 100.

From this comment I understand that numbers in the range [0,9] will be shown with 1 leading space and 1 leading zero. Wouldn't you agree this to be ugly?
The version I propose next will show 2 leading spaces on such numbers.


My version.

The worst case runs in 108 cycles. This happens with values from 190 to 199.
The best case for this code runs in just 44 cycles. Not surprisingly this is for inputs from 0 to 9.
I personally don't like the decimal mode of the 6502 and so this solution doesn't use it.

 ldx #20ドル ;" "
 sec
 sbc #100
 bcc OK1
 ldx #31ドル ;"1"
 sbc #100 ; (1)
 bcc OK1
 inx ;"2"
 sbc #100 ; (2)
OK1
 adc #100
 stx char1
 ldy #2ドルF ;"0"-1 (4)
 sec
Sub10
 sbc #10 ; (3)
 iny ;["0","9"] (4)
 bcs Sub10
 adc #10
 cpy #30ドル ;"0"
 bne OK2
 cpx #20ドル ;" " (5)
 bne OK2 ; (5)
 ldy #20ドル ;" "
OK2
 sty char2
 ora #30ドル ;["0","9"] (6)
 sta char3
  • (1)(2)(3) The carry flag is already set. No need to use sec before sbc.
  • (2) This instruction is meant to be a jump over the following adc #100. This would have required 3 bytes using jmp STXCHAR1 or 2 bytes using bcs STXCHAR1. In both cases it would also have taken 3 cycles to run. By writing a complementary subtraction I undo the effect of the following addition, and realise the shortest code and the least cycles!
  • (4) The peculiar value 2ドルF is correct because the loop at Sub10 always runs at least once the iny instruction. Therefore 1 to 10 iterations of this loop will yield Y-register values from 30ドル to 39ドル.
  • (5) This extra code takes care of the 2 leading spaces on numbers from 0 to 9.
  • (6) Normally here we would like to add 48 to the value in the accumulator holding [0,9]. Using the adc #48 instruction would have required a clc to give correct result. By using the ora #48 the result is equally correct but the code is 1 byte smaller and 2 cycles faster.

Still room for improvements.

In an effort to get rid of the separate sec instruction I came up with a 1 byte smaller solution. This also no longer contains the complementary adc #100 that you commented about.

 ldx #20ドル ;" "
 cmp #100
 bcc OK1
 sbc #100
 ldx #31ドル ;"1"
 cmp #100
 bcc OK1
 sbc #100
 inx ;"2"
OK1
 stx char1

Nice, but wouldn't it be possible to combine a few things (the first sbc #100 and the second cmp #100)? Yes. Using sbc #200 and a conditional complementary adc #100 the code has yet again shrunk, this time by another 2 bytes.

 ldx #20ドル ;" "
 cmp #100
 bcc OK1
 ldx #32ドル ;"2"
 sbc #200
 bcs OK1
 adc #100
 dex ;"1"
OK1
 stx char1

Some speed measurements.

All of these numbers relate to the first part of the program where the hundreds digit gets calculated.

 Input Question Answer1 Answer2 Answer3
------- -------- ------- ------- -------
 0- 99 24 14 10 10 cycles
100-199 33 20 18 19 cycles
200-255 40 23 21 16 cycles
------- -------- ------- ------- -------
Average 31.02 18.31 15.53 14.83 cycles
 21 20 19 17 bytes

For the average it is understood that all numbers from 0 to 255 have the same probability.

answered Jan 22, 2017 at 1:20
\$\endgroup\$
2
  • \$\begingroup\$ Thanks for that. Some good points, the most basic of which I'd already managed to hit, including the 'optimizing for carry flag state'. You have carried over the adc #100. In fact, there's a shorter solution avoiding adding back the 100 deducted. I will return with some changes. Thanks, Jonathan \$\endgroup\$ Commented Jan 23, 2017 at 12:26
  • \$\begingroup\$ You must have read my mind! Early this week I came up with a new solution that's both shorter and faster. Today I've added it to my answer. \$\endgroup\$ Commented Jan 29, 2017 at 19:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.