Calculator using tokens of numbers and operators

Question 1

Lately there has been some questions regarding calculators, where the response typically has suggested simplifying the logic and using either abstract syntax tree or token lists (like I did myself in a Java calculator question).

To prove the concept, and as a training exercise for my self I implemented this calculator (in Python 2.x), as a proof concept, and to visualize one way of implementing a calculator which supports operator precedence, exponentiation and floor division.

Here is the code:

# -*- coding: utf-8 -*-
from __future__ import division, print_function
from textwrap import dedent
import math
import operator as op
from collections import OrderedDict
import re
import doctest
# Regex to split into numbers and operators
TOKEN_SPLITTER = re.compile("""
 \s* # Ignore leading whitespace
 ( # Group the result, which is either ...
 (?<!\d)-? # A possibly negative (but not the operator minus)
 \d+ # number
 (?:\.\d+)? # with optional fraction part
 | # ... or alternate group of ...
 (?://|\*\*) # Non-capturing single operator 
 |(?:[+-/*^]) # or two-character operator group
 ) # ... end of group
 \s* # Ignore trailing whitespace
 """, re.X)
# Define legal operators in precedence order
OPERATORS = OrderedDict([
 # '(' and ')' are not implemented yet!, and needs special care
 ('^', math.pow),
 ('**', math.pow),
 ('*', op.mul),
 ('/', op.div),
 ('//', op.floordiv),
 ('+', op.add),
 ('-', op.sub),
])
def tokenizer(expression):
 """Splits expression into numbers and operator tokens, and returns list.
 The expression can contain numbers with or without decimal part, and 
 various operators. In general spaces are allowed, but to be able to 
 correctly determine negative numbers, there can be no space inbetween 
 the '-' and the number, i.e. "-5" is legal, but "- 5" is tokenized as
 the minus operator, and the number 5.
 Here are some various expressions with resulting token lists:
 >>> tokenizer("10-2^3*5/2--5")
 [10, '-', 2, '^', 3, '*', 5, '/', 2, '-', -5]
 >>> tokenizer("1*2.0/3**4+5-6")
 [1, '*', 2.0, '/', 3, '**', 4, '+', 5, '-', 6]
 >>> tokenizer(" 2 * -3 + 6 /3--5 ")
 [2, '*', -3, '+', 6, '/', 3, '-', -5]
 """
 def _numberify(token):
 """Turn token into int or float, if possible."""
 try:
 # Check if it is an int and return it
 return int(token)
 except ValueError:
 # Not an int, is it possibly a float?
 try:
 return float(token)
 except ValueError:
 # Not a number, return operator as is
 return token
 ## Split into numbers and operators, and make number strings into numbers
 return map(_numberify, TOKEN_SPLITTER.findall(expression))
def calculate(expression):
 """Calculate the value of the list or expression.
 The function can be called with a string expression, or a prebuilt
 list of tokens from tokenizer(). It will apply the calculations
 according to precedence of operators, and return the correct answer.
 Note that it will perform integer calculations, unless you specify
 (one or more) floats in the number. And it will raise a ValueError
 it the numbers and operators doesn't match a legal calculation.
 To be able to correctly determine negative numbers, there can be no
 space inbetween the '-' and the number. For powers you can use either
 '**' or '^', and if you use '//' you get a floored division.
 Here are some examples:
 >>> calculate("23//2-2^3*5/2--5") # Combining most operators
 -4.0
 >>> calculate("1 + 2*3 - 5 / 6") # The last term becomes 0...
 7
 >>> calculate("2**2** 2*3 - 2*2^4") # Strange spacing here...
 16.0
 """
 # Check whether our list is an expression, or a prebuilt token list
 if isinstance(expression, str):
 token_list = tokenizer(expression)
 else:
 token_list = expression
 # Loop through all operators in precedented order, applying each of them
 for operator in OPERATORS:
 # Apply the operator and reduce the two surrounding elements
 # using this operation
 while True:
 # Find operator in list, and break out of loop if not found
 try:
 idx = token_list.index(operator)
 except ValueError:
 break
 # Operator found, calculate and reduce
 if idx == 0 or idx == len(token_list):
 raise ValueError("No numbers to perform operation on")
 operation_slice = slice(idx - 1, idx + 2)
 operands = token_list[operation_slice]
 # Here is the magic: It replaces a slice of the list, with the
 # calculated result of the operation applied to operands
 token_list[operation_slice] = [OPERATORS[operator](operands[0], operands[2])] 
 # If the token list only has one element left, we're done calculating
 # and can return the result
 if len(token_list) == 1:
 return token_list[0]
 # If however we've exhausted all operators, and the list has multiple
 # elements there is an error in the original expression.
 raise ValueError("Couldn't calculate all of it! Remaining tokens: {}".format(token_list))
def main():
 print(dedent("""
 Welcome to an interactive calculator!
 Enter your expression and hit enter to calculate it. The calculator
 supports the normal operations, powers and floor division. Currrently
 parenthesis are not allowed. Enter 'quit' as the expression when you
 want to end the calculator."""))
 while True:
 expression = raw_input("Type in expression (or 'quit'): ").lower()
 if expression == 'quit':
 break
 print(" {} = {}".format(expression, calculate(expression)))
if __name__ == '__main__':
 doctest.testmod()
 main()

Some extra comments on the code:

Doctest is used to verify correct behavior of tokenizer() and calculate(). This is triggered through the doctest.testmod() in the main code.
An OrderedDict is used to preserve the order (and thusly precedence) of the operators
Code is written in Python 2.7, but the only change needed should be to use input() instead of raw_input() in the main code
I've used if __name__ == '__main__': to separate the code and allow for it to be called as a module function, or to be used as an interactive calculator if called as a script
The main lifting of the tokenizer is done using a regex, which is kind of a beast, but it does do the work.

Some possible extensions

Some extension I've taken height for, but not implemented yet are:

Parenthesis – Since calculate() also accepts a token list, I could(/will) implement parenthesis through recursion on the sub list of matching parenthesis.
Square root – One could implement a dedicated operator for square roots, which I haven't done yet, but this can be calculated using **0.5
Constants – In addition to _numberify() one could implement similar code to add constants to the calculator, or if so inclined extend the syntax into something like, a = 5, b=10: 5 * 39 / a + b. This should only require a higher level operator, and a little more regex magic. (Possibly a pre-parser to split/parse constants from expression)

So there you have it, a calculator utilizing token list, allowed for use both as a module function and as an interactive calculator. Can you give a review on the chosen solution, and suggest possible improvements?

Question 2

Not exactly an "improvement" to your code, but this comment:

# '(' and ')' are not implemented yet!, and needs special care

I would change this to:

# TODO: '(' and ')' are not implemented yet!, and needs special care

It is an extremely small detail, but there are a lot of reasons why. My editor is set up to highlight the word TODO, similarly, it is now easy to search through your code and determine where work needs to be done. It doesn't matter too much for this small project, but for larger projects you'll frequently find the TODO throughout a project.

Question 3

I've used that a lot in Visual Studio, but in my current Python editor I don't think it supports the ToDo comments. But it is still good advice!

Question 4

Coding style looks very good to me, but I found some issues in functionality:

A simple bug: this line is trying to check if idx points to first or last element, so should use len(token_list)-1:
```
if idx == 0 or idx == len(token_list):
```
Because re.findall skips over any non-matching parts of the string, the program is silent about incorrect input. A simple way to address this would be to add an alternative capturing group that matches any character in the end, and raise an error if that group matches something.
Some operators should have the same precedence. For example 1-2+3 should evaluate to 2 not -4.
Dealing with the minus sign could be improved, 1 -2 trips the program over. Now the regex is trying to distinguish between the two uses of -. A better place for that is in the parser function calculate

Question 5

I did have some issues with the minus, which I thought I'd resolve all of them. Seems like I need to implement my first thought having operators at same precedence level, and going through multiple operators at the same run. Good catch on the other items as well.

Question 6

Your code is neat, easy to read and follow. I have just a few nitpicks:

The indentation on the regex feels weird; especially the comments.
You use OPERATORS[operator] in a for operator in OPERATORS:; I would have used for operator, operation in OPERATORS.items() (or iteritems in Python 2) and used operation directly within the loop.
I know that you value horizontal spacing but sometimes it feels weird. Especially between the first if and the first else in calculate. You also
```
try:
 idx = token_list.index(operator)
except ValueError:
 break
```
whereas every other try .. except has spacing between the two clauses.
You can simplify your search of operands using tuple unpacking:
```
operation_slice = slice(idx - 1, idx + 2)
left_operand, _, right_operand = token_list[operation_slice]
token_list[operation_slice] = [operation(left_operand, right_operand)] # see second bullet
```
This let you get rid of the if idx == 0 or idx == len(token_list) since it will produce ValueError: need more than X values to unpack (X being 0, 1 or 2) if the operator applies on the wrong number of arguments. You can either catch this one and produce your own or let it bubble up top.

Python 3 compatibility

Note that on top of the raw_input/input (and possibly iteritems/items) difference, there is a major change in Python 3: map returns an iterator instead of a list. Thus I advise you to use an explicit list-comprehension (return [_numberify(token) for token in TOKEN_SPLITTER.findall(expression)]) as return value of tokenizer so that the doctest will always be true, whatever the version is.

You could then easily use either:

try:
 OPERATORS_ITEMS = OPERATORS.iteritems
 input_method = raw_input
except AttributeError:
 OPERATORS_ITEMS = OPERATORS.items
 input_method = input

or

try:
 input_method = raw_input
 OPERATORS_ITEMS = OPERATORS.iteritems
except NameError:
 input_method = input
 OPERATORS_ITEMS = OPERATORS.items

to discriminate between Python 2 and 3. And use these new functions in your code.

Question 7

I left out the tuple unpacking just because of the Python 2 and 3 issues, but you are right that this can be circumvented using the alias function as you suggest. Good review!

Dair Dair 6,2101 gold badge21 silver badges45 bronze badges · Answer 1 · 2015-12-21 22:49:49Z

Not exactly an "improvement" to your code, but this comment:

# '(' and ')' are not implemented yet!, and needs special care

I would change this to:

# TODO: '(' and ')' are not implemented yet!, and needs special care

It is an extremely small detail, but there are a lot of reasons why. My editor is set up to highlight the word TODO, similarly, it is now easy to search through your code and determine where work needs to be done. It doesn't matter too much for this small project, but for larger projects you'll frequently find the TODO throughout a project.

I've used that a lot in Visual Studio, but in my current Python editor I don't think it supports the ToDo comments. But it is still good advice!

Janne Karila Janne Karila 10.6k21 silver badges34 bronze badges · Answer 2 · 2015-12-22 15:06:06Z

Coding style looks very good to me, but I found some issues in functionality:

A simple bug: this line is trying to check if idx points to first or last element, so should use len(token_list)-1:
```
if idx == 0 or idx == len(token_list):
```
Because re.findall skips over any non-matching parts of the string, the program is silent about incorrect input. A simple way to address this would be to add an alternative capturing group that matches any character in the end, and raise an error if that group matches something.
Some operators should have the same precedence. For example 1-2+3 should evaluate to 2 not -4.
Dealing with the minus sign could be improved, 1 -2 trips the program over. Now the regex is trying to distinguish between the two uses of -. A better place for that is in the parser function calculate

I did have some issues with the minus, which I thought I'd resolve all of them. Seems like I need to implement my first thought having operators at same precedence level, and going through multiple operators at the same run. Good catch on the other items as well.

score 1 · Answer 3 · 2015-12-22 09:26:57Z

Your code is neat, easy to read and follow. I have just a few nitpicks:

The indentation on the regex feels weird; especially the comments.
You use OPERATORS[operator] in a for operator in OPERATORS:; I would have used for operator, operation in OPERATORS.items() (or iteritems in Python 2) and used operation directly within the loop.
I know that you value horizontal spacing but sometimes it feels weird. Especially between the first if and the first else in calculate. You also
```
try:
 idx = token_list.index(operator)
except ValueError:
 break
```
whereas every other try .. except has spacing between the two clauses.
You can simplify your search of operands using tuple unpacking:
```
operation_slice = slice(idx - 1, idx + 2)
left_operand, _, right_operand = token_list[operation_slice]
token_list[operation_slice] = [operation(left_operand, right_operand)] # see second bullet
```
This let you get rid of the if idx == 0 or idx == len(token_list) since it will produce ValueError: need more than X values to unpack (X being 0, 1 or 2) if the operator applies on the wrong number of arguments. You can either catch this one and produce your own or let it bubble up top.

Python 3 compatibility

Note that on top of the raw_input/input (and possibly iteritems/items) difference, there is a major change in Python 3: map returns an iterator instead of a list. Thus I advise you to use an explicit list-comprehension (return [_numberify(token) for token in TOKEN_SPLITTER.findall(expression)]) as return value of tokenizer so that the doctest will always be true, whatever the version is.

You could then easily use either:

try:
 OPERATORS_ITEMS = OPERATORS.iteritems
 input_method = raw_input
except AttributeError:
 OPERATORS_ITEMS = OPERATORS.items
 input_method = input

or

try:
 input_method = raw_input
 OPERATORS_ITEMS = OPERATORS.iteritems
except NameError:
 input_method = input
 OPERATORS_ITEMS = OPERATORS.items

to discriminate between Python 2 and 3. And use these new functions in your code.

I left out the tuple unpacking just because of the Python 2 and 3 issues, but you are right that this can be circumvented using the alias function as you suggest. Good review!

Stack Exchange Network

Calculator using tokens of numbers and operators

Some possible extensions

3 Answers 3

Python 3 compatibility

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Calculator using tokens of numbers and operators

Some possible extensions

3 Answers 3

Python 3 compatibility

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions