Lately there has been some questions regarding calculators, where the response typically has suggested simplifying the logic and using either abstract syntax tree or token lists (like I did myself in a Java calculator question).
To prove the concept, and as a training exercise for my self I implemented this calculator (in Python 2.x), as a proof concept, and to visualize one way of implementing a calculator which supports operator precedence, exponentiation and floor division.
Here is the code:
# -*- coding: utf-8 -*-
from __future__ import division, print_function
from textwrap import dedent
import math
import operator as op
from collections import OrderedDict
import re
import doctest
# Regex to split into numbers and operators
TOKEN_SPLITTER = re.compile("""
\s* # Ignore leading whitespace
( # Group the result, which is either ...
(?<!\d)-? # A possibly negative (but not the operator minus)
\d+ # number
(?:\.\d+)? # with optional fraction part
| # ... or alternate group of ...
(?://|\*\*) # Non-capturing single operator
|(?:[+-/*^]) # or two-character operator group
) # ... end of group
\s* # Ignore trailing whitespace
""", re.X)
# Define legal operators in precedence order
OPERATORS = OrderedDict([
# '(' and ')' are not implemented yet!, and needs special care
('^', math.pow),
('**', math.pow),
('*', op.mul),
('/', op.div),
('//', op.floordiv),
('+', op.add),
('-', op.sub),
])
def tokenizer(expression):
"""Splits expression into numbers and operator tokens, and returns list.
The expression can contain numbers with or without decimal part, and
various operators. In general spaces are allowed, but to be able to
correctly determine negative numbers, there can be no space inbetween
the '-' and the number, i.e. "-5" is legal, but "- 5" is tokenized as
the minus operator, and the number 5.
Here are some various expressions with resulting token lists:
>>> tokenizer("10-2^3*5/2--5")
[10, '-', 2, '^', 3, '*', 5, '/', 2, '-', -5]
>>> tokenizer("1*2.0/3**4+5-6")
[1, '*', 2.0, '/', 3, '**', 4, '+', 5, '-', 6]
>>> tokenizer(" 2 * -3 + 6 /3--5 ")
[2, '*', -3, '+', 6, '/', 3, '-', -5]
"""
def _numberify(token):
"""Turn token into int or float, if possible."""
try:
# Check if it is an int and return it
return int(token)
except ValueError:
# Not an int, is it possibly a float?
try:
return float(token)
except ValueError:
# Not a number, return operator as is
return token
## Split into numbers and operators, and make number strings into numbers
return map(_numberify, TOKEN_SPLITTER.findall(expression))
def calculate(expression):
"""Calculate the value of the list or expression.
The function can be called with a string expression, or a prebuilt
list of tokens from tokenizer(). It will apply the calculations
according to precedence of operators, and return the correct answer.
Note that it will perform integer calculations, unless you specify
(one or more) floats in the number. And it will raise a ValueError
it the numbers and operators doesn't match a legal calculation.
To be able to correctly determine negative numbers, there can be no
space inbetween the '-' and the number. For powers you can use either
'**' or '^', and if you use '//' you get a floored division.
Here are some examples:
>>> calculate("23//2-2^3*5/2--5") # Combining most operators
-4.0
>>> calculate("1 + 2*3 - 5 / 6") # The last term becomes 0...
7
>>> calculate("2**2** 2*3 - 2*2^4") # Strange spacing here...
16.0
"""
# Check whether our list is an expression, or a prebuilt token list
if isinstance(expression, str):
token_list = tokenizer(expression)
else:
token_list = expression
# Loop through all operators in precedented order, applying each of them
for operator in OPERATORS:
# Apply the operator and reduce the two surrounding elements
# using this operation
while True:
# Find operator in list, and break out of loop if not found
try:
idx = token_list.index(operator)
except ValueError:
break
# Operator found, calculate and reduce
if idx == 0 or idx == len(token_list):
raise ValueError("No numbers to perform operation on")
operation_slice = slice(idx - 1, idx + 2)
operands = token_list[operation_slice]
# Here is the magic: It replaces a slice of the list, with the
# calculated result of the operation applied to operands
token_list[operation_slice] = [OPERATORS[operator](operands[0], operands[2])]
# If the token list only has one element left, we're done calculating
# and can return the result
if len(token_list) == 1:
return token_list[0]
# If however we've exhausted all operators, and the list has multiple
# elements there is an error in the original expression.
raise ValueError("Couldn't calculate all of it! Remaining tokens: {}".format(token_list))
def main():
print(dedent("""
Welcome to an interactive calculator!
Enter your expression and hit enter to calculate it. The calculator
supports the normal operations, powers and floor division. Currrently
parenthesis are not allowed. Enter 'quit' as the expression when you
want to end the calculator."""))
while True:
expression = raw_input("Type in expression (or 'quit'): ").lower()
if expression == 'quit':
break
print(" {} = {}".format(expression, calculate(expression)))
if __name__ == '__main__':
doctest.testmod()
main()
Some extra comments on the code:
- Doctest is used to verify correct behavior of
tokenizer()
andcalculate()
. This is triggered through thedoctest.testmod()
in the main code. - An OrderedDict is used to preserve the order (and thusly precedence) of the operators
- Code is written in Python 2.7, but the only change needed should be to use
input()
instead ofraw_input()
in the main code - I've used
if __name__ == '__main__':
to separate the code and allow for it to be called as a module function, or to be used as an interactive calculator if called as a script - The main lifting of the tokenizer is done using a regex, which is kind of a beast, but it does do the work.
Some possible extensions
Some extension I've taken height for, but not implemented yet are:
- Parenthesis – Since
calculate()
also accepts a token list, I could(/will) implement parenthesis through recursion on the sub list of matching parenthesis. - Square root – One could implement a dedicated operator for square roots, which I haven't done yet, but this can be calculated using
**0.5
- Constants – In addition to
_numberify()
one could implement similar code to add constants to the calculator, or if so inclined extend the syntax into something like,a = 5, b=10: 5 * 39 / a + b
. This should only require a higher level operator, and a little more regex magic. (Possibly a pre-parser to split/parse constants from expression)
So there you have it, a calculator utilizing token list, allowed for use both as a module function and as an interactive calculator. Can you give a review on the chosen solution, and suggest possible improvements?
3 Answers 3
Not exactly an "improvement" to your code, but this comment:
# '(' and ')' are not implemented yet!, and needs special care
I would change this to:
# TODO: '(' and ')' are not implemented yet!, and needs special care
It is an extremely small detail, but there are a lot of reasons why. My editor is set up to highlight the word TODO, similarly, it is now easy to search through your code and determine where work needs to be done. It doesn't matter too much for this small project, but for larger projects you'll frequently find the TODO throughout a project.
-
\$\begingroup\$ I've used that a lot in Visual Studio, but in my current Python editor I don't think it supports the ToDo comments. But it is still good advice! \$\endgroup\$holroy– holroy2015年12月21日 22:56:31 +00:00Commented Dec 21, 2015 at 22:56
Coding style looks very good to me, but I found some issues in functionality:
A simple bug: this line is trying to check if
idx
points to first or last element, so should uselen(token_list)-1
:if idx == 0 or idx == len(token_list):
Because
re.findall
skips over any non-matching parts of the string, the program is silent about incorrect input. A simple way to address this would be to add an alternative capturing group that matches any character in the end, and raise an error if that group matches something.- Some operators should have the same precedence. For example
1-2+3
should evaluate to2
not-4
. - Dealing with the minus sign could be improved,
1 -2
trips the program over. Now the regex is trying to distinguish between the two uses of-
. A better place for that is in the parser functioncalculate
-
\$\begingroup\$ I did have some issues with the minus, which I thought I'd resolve all of them. Seems like I need to implement my first thought having operators at same precedence level, and going through multiple operators at the same run. Good catch on the other items as well. \$\endgroup\$holroy– holroy2015年12月22日 16:30:07 +00:00Commented Dec 22, 2015 at 16:30
Your code is neat, easy to read and follow. I have just a few nitpicks:
- The indentation on the regex feels weird; especially the comments.
- You use
OPERATORS[operator]
in afor operator in OPERATORS:
; I would have usedfor operator, operation in OPERATORS.items()
(oriteritems
in Python 2) and usedoperation
directly within the loop. I know that you value horizontal spacing but sometimes it feels weird. Especially between the first
if
and the firstelse
incalculate
. You alsotry: idx = token_list.index(operator) except ValueError: break
whereas every other
try .. except
has spacing between the two clauses.You can simplify your search of operands using tuple unpacking:
operation_slice = slice(idx - 1, idx + 2) left_operand, _, right_operand = token_list[operation_slice] token_list[operation_slice] = [operation(left_operand, right_operand)] # see second bullet
This let you get rid of the
if idx == 0 or idx == len(token_list)
since it will produceValueError: need more than X values to unpack
(X
being 0, 1 or 2) if the operator applies on the wrong number of arguments. You can either catch this one and produce your own or let it bubble up top.
Python 3 compatibility
Note that on top of the raw_input
/input
(and possibly iteritems
/items
) difference, there is a major change in Python 3: map
returns an iterator instead of a list. Thus I advise you to use an explicit list-comprehension (return [_numberify(token) for token in TOKEN_SPLITTER.findall(expression)]
) as return value of tokenizer
so that the doctest will always be true, whatever the version is.
You could then easily use either:
try:
OPERATORS_ITEMS = OPERATORS.iteritems
input_method = raw_input
except AttributeError:
OPERATORS_ITEMS = OPERATORS.items
input_method = input
or
try:
input_method = raw_input
OPERATORS_ITEMS = OPERATORS.iteritems
except NameError:
input_method = input
OPERATORS_ITEMS = OPERATORS.items
to discriminate between Python 2 and 3. And use these new functions in your code.
-
\$\begingroup\$ I left out the tuple unpacking just because of the Python 2 and 3 issues, but you are right that this can be circumvented using the alias function as you suggest. Good review! \$\endgroup\$holroy– holroy2015年12月22日 16:28:20 +00:00Commented Dec 22, 2015 at 16:28