10
\$\begingroup\$

-ish because I removed the input functionality from the compiler. And, the compiler does not support nested loops

I've been recently reading up on compilers and how they work. Although this doesn't use most of the things I've learned about (the lexical and parser things), I thought it would be fun to try create a brainf*ck compiler, rather than an interpreter.

bfc.c

#include <stdio.h>
static int i;
static char *code; // these two as static so get_amt_to_change can easily interact
int get_amt_to_change(char c); // so the compiler doesn't write (ie) add di, 1 ten times
int main(int argc, char **argv) {
 code = argv[1];
 puts("xor di, di\n"
 "setup_loop:\n"
 "mov byte [tape + di], 0\n"
 "add di, 1\n"
 "cmp di, 101\n"
 "jne setup_loop\n"
 "xor di, di"); // sets up the tape with all 0's
 int loop_count = 0; // to keep track of asm subroutines for [ and ]
 for(i = 0; code[i] != '0円'; i++) {
 switch(code[i]) {
 case '+':
 printf("add byte [tape + di], %d\n", get_amt_to_change('+'));
 break;
 case '-':
 printf("sub byte [tape + di], %d\n", get_amt_to_change('-'));
 break;
 case '>':
 printf("add di, %d\n", get_amt_to_change('>'));
 break;
 case '<':
 printf("sub di, %d\n", get_amt_to_change('<'));
 break;
 case '.':
 puts("mov ah, 0Eh\n"
 "mov al, byte [tape +di]\n"
 "int 10h");
 break;
 case '[':
 printf("cmp byte [tape + di], 0\n"
 "je end_loop%d\n"
 "start_loop%d:\n", loop_count, loop_count);
 break;
 case ']':
 printf("cmp byte [tape + di], 0\n"
 "jne start_loop%d\n"
 "end_loop%d:\n", loop_count, loop_count);
 loop_count++; // to not repeat subroutine names
 break;
 }
 }
 puts("jmp $\nsection .bss\ntape resb 100"); // a 100 byte tape
 return 0;
}
int get_amt_to_change(char c) {
 int amt;
 for(amt = 0; code[i] == c; amt++, i++);
 i--; // if it wasn't == c, then go back one character and find out what it was equal to in the next call
 return amt;
}

The above code reads Brainf*ck code passed via command line argument and produces an assembly code, which could then be passed into the NASM assembler.

Examples

Purpose: (削除) nothing. brainf*ck doesn't have a purpose (削除ここまで) logs 'd' to output

Brainf*ck

Note: the backslashes are there so the terminal doesn't read the < and > symbols as it normally would.

\>++++++++++[\<++++++++++\>-]\<.

Assembly

xor di, di
setup_loop:
mov byte [tape + di], 0
add di, 1
cmp di, 101
jne setup_loop
xor di, di
add di, 1
add byte [tape + di], 10
cmp byte [tape + di], 0
je end_loop0
start_loop0:
sub di, 1
add byte [tape + di], 10
add di, 1
sub byte [tape + di], 1
cmp byte [tape + di], 0
jne start_loop0
end_loop0:
sub di, 1
mov ah, 0Eh
mov al, byte [tape +di]
int 10h
jmp $
section .bss
tape resb 100

Purpose: to show the generated assembly code for each of the symbols

Brainf*ck

++--\>\>\<\<.,[]

Assembly

xor di, di
setup_loop:
mov byte [tape + di], 0
add di, 1
cmp di, 101
jne setup_loop
xor di, di
add byte [tape + di], 2
sub byte [tape + di], 2
add di, 2
sub di, 2
mov ah, 0Eh
mov al, byte [tape +di]
int 10h
cmp byte [tape + di], 0
je end_loop0
start_loop0:
cmp byte [tape + di], 0
jne start_loop0
end_loop0:
jmp $
section .bss
tape resb 100

Questions

  • My C code only has one subroutine. Are there any other logical ones to add?

  • I did a lot of optimizing on the assembly code output. Could it be further optimized?

  • My main problem with compilers I've written in the past is that I've over-complicated things. Is that an issue with this (C) code?

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 8, 2015 at 22:40
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

You need to enlarge tape because your initialization zeroes 101 bytes.

 section .bss
 tape resb 101

When in graphics mode, the teletype function uses the BL and BH registers as arguments.

 mov bx,0007h ;Display page 0 and Color 7
 mov ah, 0Eh
 mov al, byte [tape +di]
 int 10h

When in text mode, the teletype function uses the BH register as an argument.

 mov bh,0 ;Display page 0
 mov ah, 0Eh
 mov al, byte [tape +di]
 int 10h

You can optimize the setup code by iterating backwards. It shaves off 2 instructions! It too will leave DI=0.

 mov di, 101
setup_loop:
 sub di,1
 mov byte [tape + di], 0
 jnz setup_loop

I don't know if you would care but the code for [ ] by itself produces an infinite loop if the byte at [tape + di] is anything but zero.

answered Mar 10, 2015 at 17:30
\$\endgroup\$
3
  • \$\begingroup\$ I am a little confused by your second improvement recommendation. Since the BL and BH arguments are not required, why would I waste space by adding that instruction in? It seems pointless to me. \$\endgroup\$ Commented Mar 10, 2015 at 22:18
  • \$\begingroup\$ My BIOS reference includes them. You can also look it up in Ralph Brown Interrupt List. If you know for sure the display is not graphics then only code mov bh,0. \$\endgroup\$ Commented Mar 10, 2015 at 22:47
  • \$\begingroup\$ I, too, use that list but through my testing, I've discovered that those are not necessary (at least in text mode). I actually hadn't thought that the user might embed the code into another bit of code, which might be in a graphics mode at the point of entry. I recommend adding to your answer that the page number is necessary for graphics modes. \$\endgroup\$ Commented Mar 10, 2015 at 22:58

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.