BrainF**k interpreter in python

Question 1

Here's what I've come up with:

def bf_interpreter(code: str) -> str:
 """
 Interprets BrainF**k code
 Examples:
 ++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
 Hello World!
 +++++>++++[<+>-]++++++++[<++++++>-]<.
 9
 """
 outputs = []
 ptr = 0
 values = [0]
 length = 1
 brackets = []
 index = 0
 code_length = len(code)
 while index < code_length:
 char = code[index]
 while length <= ptr:
 length += 1
 values.append(0)
 if char == '>': ptr += 1
 if char == '<': ptr -= 1
 if char == '+': values[ptr] = (values[ptr] + 1) % 256
 if char == '-': values[ptr] = (values[ptr] - 1) % 256
 if char == '[':
 brackets.append(index)
 if char == ']':
 if values[ptr] == 0:
 brackets.pop()
 else:
 index = brackets[-1]
 if char == '.':
 outputs.append(chr(values[ptr]))
 if char == ',':
 values[ptr] = ord(input())
 index += 1
 return ''.join(outputs)

How do I make this look better? (More consistent, as well as strictly following PEP 8 conventions)

Question 2

Can you describe the initial intention, what are the parsing rules in your custom "interpreter" ?

Question 3

@AlexV Thanks for pointing that out. I've updated my question as such.

Question 4

@RomanPerekhrest I don't quite understand what you mean by parsing rules. Could you please elaborate?

Question 5

When char == '[', you're supposed to check whether the value at the pointer is zero; if it is, you need to jump to after the corresponding ].

Question 6

All of your if char == '>': ptr += 1 and similar checks should use elif after the first check. By using if for all of the checks, you're forcing them all to run, even once you've found a match. This is wasteful because the checks are necessarily exclusive of each other. Once one check is true, none of the other checks can be. For example, you should have:

if char == '>': ptr += 1
elif char == '<': ptr -= 1
elif char == '+': values[ptr] = (values[ptr] + 1) % 256
elif char == '-': values[ptr] = (values[ptr] - 1) % 256

Now the checks stop once a match is found which prevents unnecessary equality checks.

I'd also try to break this down into a few functions to help testing. Right now, you can only test bf_interpreter as one whole. You could have a function that takes the current character, and the state of the program (the brackets, ptr, outputs...), and returns a new state. That way you can easily test for a given state if a certain command produces a correct new state.

I'm assuming this loop is just to add padding so you don't go off the end of the slots?:

while length <= ptr:
 length += 1
 values.append(0)

You could make that a little neater by just using math and some concatenation. You could also just get rid of length and use len(values):

needed = ptr - len(values)
values += [0] * needed

ptr - len(values) calculates how many slots are needed, then [0] * needed produces that many 0s, and += adds them to values. If needed is negative, [0] * needed will produce [], and essentially cause no change.

If you want to avoid the temporary list that [0] * needed creates, you could replace that with:

values += (0 for _ in range(needed))

Now += just pulls from a generator that produces values as needed.

And then, just like how you don't need length, you don't need code_length either. len(code) is fine; len runs in constant time. You don't need to cache it for performance reasons.

Here's some timings to show the difference in runtime this can cause:

import timeit
TEST_CODE = "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++."
>>> timeit.timeit(lambda: bf_interpreter_orig(TEST_CODE), number=int(2e5)) # Two hundred thousand tests
77.3481031 # Takes 77 seconds
>>> timeit.timeit(lambda: bf_interpreter_len(TEST_CODE), number=int(2e5))
88.93794809999997

Where bf_interpreter_orig is your original code, and bf_interpreter_len is your original code but using len.

Yes, there's a difference. Note though, that's a ~11 second difference across 200,000 calls. That works out to roughly 58 microseconds per call to the interpreting function.

Unless you're calling bf_interpreter hundreds of thousands of times in a tight loop, the difference is unlikely to matter. This also likely has nothing to do with the fact that you're requesting a length, and more to do with one extra function call. Function calls aren't super fast in Python. Likely any extra call to any function would have similar effects.

Question 7

Yes, it would be possible to use len(values), but wouldn't it be more time consuming? If having a variable to calculate the length takes 4.78 seconds, using len(values) takes 7.549 seconds.

Question 8

@Srivaths See my edit at the bottom. The time difference isn't as big as your findings show. You're likely using too small of a sample, or a poor benchmarking method.

Question 9

I did not exactly use the bf_interpreter function to time the difference. See pastebin.com/uZNR3ArW

Question 10

@Srivaths When I run that exact test, I get 13.8 seconds for code1, and 11.2 seconds for code2. That's in the neighborhood of the results that I got with my full test (81% as fast for my test vs 86% as fast with your test).

Question 11

Your tests seem abnormally slow. Do you have something running in the background that could be messing with tests?

Question 12

+1 on specifying argument code: str and return type -> str.

I implemented a class seen below which has the advantage of seeing chars and their corresponding functions at a glance. Adding new chars if need be is incredibly easy.

 self.actions = {
 '>': self.greater_than,
 '<': self.less_than,
 '+': self.plus,
 '-': self.minus,
 '[': self.left_bracket,
 ']': self.right_bracket,
 '.': self.dot,
 ',': self.comma
 }

You can use a

try:
 ...
except KeyError:
 ...

to detect unrecognised chars.

Complete Class


class BFInterpreter:
 def __init__(self):
 self.outputs = []
 self.ptr = 0
 self.values = [0]
 self.length = 1
 self.brackets = []
 self.index = 0
 def greater_than(self):
 self.ptr += 1
 def less_than(self):
 self.ptr -= 1
 def plus(self):
 self.values[self.ptr] = (self.values[self.ptr] + 1) % 256
 def minus(self):
 self.values[self.ptr] = (self.values[self.ptr] - 1) % 256
 def left_bracket(self):
 self.brackets.append(self.index)
 def right_bracket(self):
 if self.values[self.ptr] == 0:
 self.brackets.pop()
 else:
 self.index = self.brackets[-1]
 def dot(self):
 self.outputs.append(chr(self.values[self.ptr]))
 def comma(self):
 self.values[self.ptr] = ord(input())
 def evaluate(self, code):
 self.code = code
 self.code_length = len(self.code)
 self.actions = {
 '>': self.greater_than,
 '<': self.less_than,
 '+': self.plus,
 '-': self.minus,
 '[': self.left_bracket,
 ']': self.right_bracket,
 '.': self.dot,
 ',': self.comma
 }
 while self.index < self.code_length:
 char = self.code[self.index]
 while self.length <= self.ptr:
 self.length += 1
 self.values.append(0)
 self.actions[char]()
 self.index += 1
 return ''.join(self.outputs)

Usage

bf = BFInterpreter()
print(bf.evaluate('+++++>++++[<+>-]++++++++[<++++++>-]<.'))

code has been specified as an argument in the evaluate method rather than in the constructor to be able to evaluate without creating new objects.

Question 13

Now change BFInterpreter to BFCode(str) and change evaluate(self, code) to __call__(self). Boom.

Question 14

"...to be able to evaluate without creating new objects." And what exactly is the point of that? You cannot reuse the object after calling evaluate anyways. Your concept of __init__ + evaluate is bad for readability and usability of your code. I would either make a wrapper method that creates a BFInterpreter, then calls evaluate on it and returns the result; or create what I call a "function-like class": a class with __new__ method in which you create an instance, call some method(s) on it (such as evaluate), and return the result (instead of returning an instance of the class).

Question 15

In the spirit of you once forge a spoon and use it may times, not creating a spoon each time you want to use it. You create an object then use a method many times versus creating a new instance each time you want to evaluate

Question 16

Have you read and understood my comment? You cannot use your instance multiple times, since you don't reset the variables self.outputs, self.values, etc. in the evaluate method. You could use the object multiple times if you did reset those variables in the evaluate method, but then there would be absolutely no point in creating and reusing the object, since there would be no initialization of the object done in the constructor. You might as well change evaluate to a classmethod and create a new instance in that method.

Question 17

Oh and by the way, your implementation is neither thread- nor exception-safe. It would be, however, if you did what I told you. PS. if you still believe that your object is reusable, just try to reuse it on these inputs: '+++++>++++[<+>-]++++++++[<++++++>-]<.' and '' (an empty string). The first should (and does) print 9; however the second should print nothing but prints 9 instead. And one more thing, your program fails for inputs which contain other characters than the 8 control ones, which makes it neither a valid BF interpreter, nor equivalent with OP's original program.

score 8 · Accepted Answer · 2019-12-02 16:38:40Z

All of your if char == '>': ptr += 1 and similar checks should use elif after the first check. By using if for all of the checks, you're forcing them all to run, even once you've found a match. This is wasteful because the checks are necessarily exclusive of each other. Once one check is true, none of the other checks can be. For example, you should have:

if char == '>': ptr += 1
elif char == '<': ptr -= 1
elif char == '+': values[ptr] = (values[ptr] + 1) % 256
elif char == '-': values[ptr] = (values[ptr] - 1) % 256

Now the checks stop once a match is found which prevents unnecessary equality checks.

I'd also try to break this down into a few functions to help testing. Right now, you can only test bf_interpreter as one whole. You could have a function that takes the current character, and the state of the program (the brackets, ptr, outputs...), and returns a new state. That way you can easily test for a given state if a certain command produces a correct new state.

I'm assuming this loop is just to add padding so you don't go off the end of the slots?:

while length <= ptr:
 length += 1
 values.append(0)

You could make that a little neater by just using math and some concatenation. You could also just get rid of length and use len(values):

needed = ptr - len(values)
values += [0] * needed

ptr - len(values) calculates how many slots are needed, then [0] * needed produces that many 0s, and += adds them to values. If needed is negative, [0] * needed will produce [], and essentially cause no change.

If you want to avoid the temporary list that [0] * needed creates, you could replace that with:

values += (0 for _ in range(needed))

Now += just pulls from a generator that produces values as needed.

And then, just like how you don't need length, you don't need code_length either. len(code) is fine; len runs in constant time. You don't need to cache it for performance reasons.

Here's some timings to show the difference in runtime this can cause:

import timeit
TEST_CODE = "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++."
>>> timeit.timeit(lambda: bf_interpreter_orig(TEST_CODE), number=int(2e5)) # Two hundred thousand tests
77.3481031 # Takes 77 seconds
>>> timeit.timeit(lambda: bf_interpreter_len(TEST_CODE), number=int(2e5))
88.93794809999997

Where bf_interpreter_orig is your original code, and bf_interpreter_len is your original code but using len.

Yes, there's a difference. Note though, that's a ~11 second difference across 200,000 calls. That works out to roughly 58 microseconds per call to the interpreting function.

Unless you're calling bf_interpreter hundreds of thousands of times in a tight loop, the difference is unlikely to matter. This also likely has nothing to do with the fact that you're requesting a length, and more to do with one extra function call. Function calls aren't super fast in Python. Likely any extra call to any function would have similar effects.

Yes, it would be possible to use len(values), but wouldn't it be more time consuming? If having a variable to calculate the length takes 4.78 seconds, using len(values) takes 7.549 seconds.
@Srivaths See my edit at the bottom. The time difference isn't as big as your findings show. You're likely using too small of a sample, or a poor benchmarking method.
I did not exactly use the bf_interpreter function to time the difference. See pastebin.com/uZNR3ArW
@Srivaths When I run that exact test, I get 13.8 seconds for code1, and 11.2 seconds for code2. That's in the neighborhood of the results that I got with my full test (81% as fast for my test vs 86% as fast with your test).
Your tests seem abnormally slow. Do you have something running in the background that could be messing with tests?

Stack Exchange Network

BrainF**k interpreter in python

2 Answers 2

Complete Class

Usage

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

BrainF**k interpreter in python

2 Answers 2

Complete Class

Usage

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions