I implemented the Shunting-yard algorithm in Python, I have experience in C# but I'm looking for comments making this more pythonic. I'm also be interested if these methods would be better as classmethods instead of staticmethods.
from collections import deque
import re
class ShuntingYardParser:
"""Converts an infix expression to a postfix expression
using the Shunting-yard algorithm"""
_operatorPrecedence = {"*": 3,
"/": 3,
"+": 2,
"-": 2}
@staticmethod
def _tokenize(expression):
tokenizer = re.compile(r'\s*([()+*/-]|\d+)')
index = 0
tokens = []
while index < len(expression):
match = tokenizer.match(expression, index)
if match is None:
raise ValueError("Syntax error.")
tokens.append(match.group(1))
index = match.end()
return tokens
@staticmethod
def _is_operand(token):
try:
int(token)
return True
except ValueError:
return False
@staticmethod
def _is_operator(token):
return token in ShuntingYardParser._operatorPrecedence
@staticmethod
def parse(infix_expression):
"""Parses the infix expression and returns the postfix expression"""
if not isinstance(infix_expression, str):
raise TypeError("Input is not of type string")
output = list()
operators = deque()
tokens = ShuntingYardParser._tokenize(infix_expression)
for token in tokens:
if ShuntingYardParser._is_operand(token):
output.append(token)
elif ShuntingYardParser._is_operator(token):
current_operator_precedence = ShuntingYardParser._operatorPrecedence[token]
while (operators
and operators[-1] != "("
and ShuntingYardParser._operatorPrecedence[operators[-1]] <= current_operator_precedence):
output.append(operators.pop())
operators.append(token)
elif token == "(":
operators.append(token)
elif token == ")":
# Pop all the operators in front of the "(".
while operators and operators[-1] != "(":
output.append(operators.pop())
# The previous operation would have removed all the operators
# because there is no matching opening parenthesises.
if not operators:
raise ValueError("Non matching parenthesises in the input.")
# Remove matching parenthesis.
operators.pop()
for operator in operators:
# If there are still opening parenthesises in the stack this means
# we haven't found a matching closing one.
if operator == "(":
raise ValueError("Non matching parenthesises in the input.")
output.append(operator)
return output
if __name__ == "__main__":
while True:
action = input("Enter infix expression:")
try:
postfix_notation = ShuntingYardParser.parse(action)
print(postfix_notation)
except ValueError as e:
print(e.args)
Unit tests:
from unittest import TestCase
from ShuntingYardParser import ShuntingYardParser
class TestShuntingYardParser(TestCase):
def test_single_number(self):
# arrange & act
output = ShuntingYardParser.parse("5")
# assert
self.assertEqual(output, list("5"))
def test_non_string_input(self):
# arrange & act & assert
self.assertRaises(TypeError, ShuntingYardParser.parse, 5)
def test_matching_parenthesises(self):
# arrange & act
output = ShuntingYardParser.parse("(5+2)")
# assert
self.assertEqual(output, ["5", "2", "+"])
def test_non_matching_parenthesises_start(self):
# arrange & act
self.assertRaises(ValueError, ShuntingYardParser.parse, "((5+2)")
def test_non_matching_parenthesises_end(self):
# arrange & act
self.assertRaises(ValueError, ShuntingYardParser.parse, "(5+2))")
1 Answer 1
Interface design
Note that you only support nonnegative integers as operands.
Your algorithm correctly rearranges the input tokens in postfix order, but it would be more useful if it preserved the operator/operand categorization in its output. It's likely that the results will be fed to an RPN evaluator, and it would be nice if the RPN evaluator didn't have to redo that work. (Your ._is_operator()
and ._is_operand()
methods are "private", and therefore shouldn't be reused.)
Consider turning .parse()
and .tokenize()
into generators that yield
s their results as they go, instead of building a list to be returned.
I don't see any reason to define ShuntingYardParser
as a class. The class doesn't keep any instance state, so it's just a glorified namespace. It would be less cumbersome to write these "methods" as functions in the ShuntingYardParser.py
module.
Implementation
The tokenizer could take advantage of regular expressions to classify the token types, so that you wouldn't need ._is_operand()
and .is_operator()
. I suggest using the tokenizer recipe in Python's re
documentation.
In .parse()
, operators
doesn't need to be a collections.deque
; a list would work just fine.
You don't really need to test if not isinstance(infix_expression, str):
, since regular expression searches on anything other than a string (or bytestring) will raise a TypeError
anyway.
"parenthesises" is not a word. The exception message could be more informative, specifying whether an opening or closing parenthesis is unmatched.
Similarly, the "Syntax error" could be more specific about the offending token.
To print the message associated with a caught exception (rather than some weird args
tuple), do:
except ValueError as e:
print(e)
Suggested solution
ShuntingYardParser.py
from collections import namedtuple
import re
class Token(namedtuple('Token', 'kind value')):
@property
def precedence(self):
if self.kind == 'OPERATOR':
return {
'*': 3, '/': 3,
'+': 2, '-': 2,
}[self.value]
def tokenize(expression):
"""
Yield tokens in the order that they are encountered in the expression.
"""
# https://docs.python.org/3/library/re.html#writing-a-tokenizer
token_spec = [
('PAREN', r'[()]'),
('OPERATOR', r'[+*/-]'),
('OPERAND', r'\d+'),
('IGNORE', r'\s+'),
('JUNK', r'\S+?\b'),
]
tokenizer = re.compile('|'.join(
'(?P<{kind}>{pattern})'.format(kind=kind, pattern=pattern)
for kind, pattern in token_spec
))
for match in tokenizer.finditer(expression):
kind, value = match.lastgroup, match.group(match.lastgroup)
if kind == 'JUNK':
raise ValueError('Unrecognized token: {0}'.format(value))
elif kind != 'IGNORE':
yield Token(kind, value)
def parse(infix_expression):
"""
Yield tokens in the infix expression, reordered as as a postfix expression.
"""
operators = []
for token in tokenize(infix_expression):
if token.kind == 'OPERAND':
yield token
elif token.kind == 'OPERATOR':
while (operators and
operators[-1].value != '(' and
operators[-1].precedence <= token.precedence):
yield operators.pop()
operators.append(token)
elif token.value == '(':
operators.append(token)
elif token.value == ')':
# Pop all the operators in front of the "(".
while operators and operators[-1].value != '(':
yield operators.pop()
if not operators:
raise ValueError('Unmatched )')
# Remove matching prenthesis.
operators.pop()
for operator in operators:
if operator.value == '(':
raise ValueError('Unmatched (')
yield operator
if __name__ == '__main__':
while True:
try:
postfix_expr = parse(input("Enter infix expression: "))
print([op for op in postfix_expr])
except ValueError as e:
print(e)
Unit tests:
import unittest
import ShuntingYardParser
class TestShuntingYardParser(unittest.TestCase):
def test_single_number(self):
output = list(ShuntingYardParser.parse("5"))
self.assertEqual(output, [
ShuntingYardParser.Token('OPERAND', '5'),
])
def test_non_string_input(self):
# arrange & act & assert
self.assertRaises(TypeError, lambda: list(ShuntingYardParser.parse(5)))
...
-
\$\begingroup\$ Thanks! I have a few questions: how come the join iterable parameter does not require [] around it? Should I raise an exception if I call .precedence on a non-operator Token? Can I replace the 'magic strings' (e.g. 'OPERAND', 'OPERATOR', ...) by global variables in the module scope? Or would that be bad practice? \$\endgroup\$VincentC– VincentC2018年08月09日 11:18:08 +00:00Commented Aug 9, 2018 at 11:18
-
1\$\begingroup\$
'|'.join(...)
is using a generator expression, which is lazier and therefore better than a list comprehension. I've chosen to make the.precedence
of a non-operatorNone
— I consider that to be the simplest reasonable design, though you could choose to raise an exception instead. As for the various kinds of operators, you could use an enum, and arguably it would be better than my magic strings. \$\endgroup\$200_success– 200_success2018年08月10日 00:10:11 +00:00Commented Aug 10, 2018 at 0:10
Explore related questions
See similar questions with these tags.