This is a simple interpreter for the esoteric brainfuck programming language, written in C. It is one of my personal projects and I would love to receive expert advice.
Behavior
All characters except the eight commands
><+-.,[]
are ignored and treated as comments.The array has 30,000 cells and each cell in the array contains a byte (most commonly 8 bits).
Moving the pointer outside the bounds of the array will result in undefined behavior.
Running
After successfully installing the interpreter, you can run it:
brain <file>
This is an example of running brain:
brain example.bf
You can visit the Git Repository at gitlab.com/andy-sb/brain for building/compiling instructions and further information.
Code
In brain.c
I have:
#include <stdio.h>
#include <stdlib.h>
void goto_open(char **codep)
{
int search = 1;
while (search > 0) {
--*codep;
switch (**codep) {
case '[':
--search;
break;
case ']':
++search;
break;
default:
break;
}
}
}
void goto_close(char **codep)
{
int search = 1;
while (search > 0) {
++*codep;
switch (**codep) {
case '[':
++search;
break;
case ']':
--search;
break;
default:
break;
}
}
}
void exec_command(char **codep, char **memp)
{
switch (**codep) {
case '>':
++*memp;
break;
case '<':
--*memp;
break;
case '+':
++**memp;
break;
case '-':
--**memp;
break;
case '.':
fputc(**memp, stdout);
break;
case ',':
**memp = fgetc(stdin);
break;
case '[':
if (!**memp)
goto_close(codep);
break;
case ']':
if (**memp)
goto_open(codep);
break;
default:
break;
}
}
void interpret(FILE *fp)
{
fseek(fp, 0, SEEK_END);
long fsize = ftell(fp);
fseek(fp, 0, SEEK_SET);
char code[fsize];
char *codep = code;
fread(code, 1, fsize, fp);
char mem[30000] = {0};
char *memp = mem;
while (*codep != '0円') {
exec_command(&codep, &memp);
++codep;
}
}
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
exit(EXIT_FAILURE);
}
FILE *fp = fopen(argv[1], "r");
if (!fp) {
fprintf(stderr, "Error: failed to open file '%s'\n", argv[1]);
exit(EXIT_FAILURE);
}
interpret(fp);
fclose(fp);
return EXIT_SUCCESS;
}
1 Answer 1
The code is easy to read and to follow. There isn't a global in sight, the indentation is consistent, you use EXIT_SUCCESS
and EXIT_FAILURE
, and the control flow is simple. Therefore I have no comments about superficial aspects of style.
However, there is a bug in interpret
: You are reading uninitialized memory. The code assumes that code
will end in a null character; this is not usually the case. Keep in mind that ftell
, fseek
, fread
do not account for a terminating 0円
.
Most of the time, fsize
will be the number of characters in the file. If the file happens to end in 0円
, all is well. But if it ends in some other fashion, then code[fsize - 1]
(the last valid index of code
) will not be 0円
. Hence codep
will begin wandering into memory outside of code
until it chances upon a 0.
In C, it is valid to have a pointer to one element past the end of an array. In other words, there is no issue having codep > code + fsize
. The problem occurs when codep
is dereferenced, since the contents of *codep
is undefined.
You can fix this by reserving space for a null character and adding it yourself.
After fixing the problem, there are some other things to be careful about. You check the the return value of fopen
, but you don't do the same for other input/output functions. For example, fgetc
's result is assigned to a cell blindly, even though fgetc
does not return a char
. It returns an int
, which might be EOF
; you should check for and handle this case, as long as the type of your cell is char
. (See also the remarks on parameterization below.)
Function name | Error return |
---|---|
fputc |
EOF |
fgetc |
EOF |
fseek |
a nonzero value |
ftell |
-1L |
fread |
a value less than the third argument in the call |
fprintf |
a negative value |
fclose |
EOF |
My (somewhat controversial) stance on error checking is to do it for any function that can fail. Most people would probably not check fputc
and fprintf
, even if they check the others. I certainly don't object to this, but it isn't my personal choice.
In each failure case, you should exit(EXIT_FAILURE)
, probably after printing an indication of what happened. The perror
function might be useful.
Why not perform bounds checking to avoid undefined behavior if the pointer moves outside the memory? You can print an error message and leave the pointer unchanged. A similar case to look out for is when a [
or a ]
is missing. One way to handle bounds checking is to use offsets (probably of type size_t
) instead of real pointers.
I would also suggest parameterizing the program a bit more. Your specification does say that the memory is 30 000 cells in size, but why not give the 30000
a symbolic name? But I think the most interesting aspect is the types you use.
For the most part, char
works great as a cell type. However, there are two properties to note: (1) char
might be wider than eight bits; (2) char
might be signed or unsigned. The former is much less important than the latter. Because the signedness of char
is uncertain, I would suggest using a fixed-width and fixed-signedness type.
Regardless of the underlying type, you should probably give a meaningful name to it with typedef
—perhaps cell_type
. I recommend doing this even if you stick with char
.
Most brainfuck interpreters use signed, eight-bit cells. One choice of type is naturally int8_t
, but such a type may not be provided by an implementation. You can test for its presence by checking whether INT8_WIDTH
is defined. Alternatively, you could use int_least8_t
or int_fast8_t
, both of which are guaranteed to be provided by a conforming C implementation.
You should pay attention to overflow in memory cells, since overflow of signed numbers in C is undefined behavior. The simplest thing to do is to use unsigned types (uint8_t
, uint_least8_t
, uint_fast8_t
), which wrap on overflow.
Finally, I should address your method of reading the input file. It is simple to read it all in at once, but ftell
and fseek
might fail, and you're wasting memory. An alternative is to store only the text associated with currently active loops. This is much more complicated to implement, however, especially if you want to allow loops to be arbitrarily sized.
-
1\$\begingroup\$ Can I just set the condition
code < code + fsize
for the while-loop instead of reserving space for a null character and adding it myself? \$\endgroup\$Andy Sukowski-Bang– Andy Sukowski-Bang2021年04月14日 15:38:34 +00:00Commented Apr 14, 2021 at 15:38 -
\$\begingroup\$ @AndySukowski-Bang Yes, that would work. Some people might find the null more natural, if
code
is supposed to be understood as a string. It's up to you. \$\endgroup\$texdr.aft– texdr.aft2021年04月14日 15:59:46 +00:00Commented Apr 14, 2021 at 15:59 -
2\$\begingroup\$ Welcome to the Code Review Community, nice first answer. \$\endgroup\$2021年04月17日 12:37:04 +00:00Commented Apr 17, 2021 at 12:37