Shunting-yard algorithm

Question 1

I implemented the Shunting-yard algorithm in Python, I have experience in C# but I'm looking for comments making this more pythonic. I'm also be interested if these methods would be better as classmethods instead of staticmethods.

from collections import deque
import re
class ShuntingYardParser:
 """Converts an infix expression to a postfix expression
 using the Shunting-yard algorithm"""
 _operatorPrecedence = {"*": 3,
 "/": 3,
 "+": 2,
 "-": 2}
 @staticmethod
 def _tokenize(expression):
 tokenizer = re.compile(r'\s*([()+*/-]|\d+)')
 index = 0
 tokens = []
 while index < len(expression):
 match = tokenizer.match(expression, index)
 if match is None:
 raise ValueError("Syntax error.")
 tokens.append(match.group(1))
 index = match.end()
 return tokens
 @staticmethod
 def _is_operand(token):
 try:
 int(token)
 return True
 except ValueError:
 return False
 @staticmethod
 def _is_operator(token):
 return token in ShuntingYardParser._operatorPrecedence
 @staticmethod
 def parse(infix_expression):
 """Parses the infix expression and returns the postfix expression"""
 if not isinstance(infix_expression, str):
 raise TypeError("Input is not of type string")
 output = list()
 operators = deque()
 tokens = ShuntingYardParser._tokenize(infix_expression)
 for token in tokens:
 if ShuntingYardParser._is_operand(token):
 output.append(token)
 elif ShuntingYardParser._is_operator(token):
 current_operator_precedence = ShuntingYardParser._operatorPrecedence[token]
 while (operators
 and operators[-1] != "("
 and ShuntingYardParser._operatorPrecedence[operators[-1]] <= current_operator_precedence):
 output.append(operators.pop())
 operators.append(token)
 elif token == "(":
 operators.append(token)
 elif token == ")":
 # Pop all the operators in front of the "(".
 while operators and operators[-1] != "(":
 output.append(operators.pop())
 # The previous operation would have removed all the operators
 # because there is no matching opening parenthesises.
 if not operators:
 raise ValueError("Non matching parenthesises in the input.")
 # Remove matching parenthesis.
 operators.pop()
 for operator in operators:
 # If there are still opening parenthesises in the stack this means
 # we haven't found a matching closing one.
 if operator == "(":
 raise ValueError("Non matching parenthesises in the input.")
 output.append(operator)
 return output
if __name__ == "__main__":
 while True:
 action = input("Enter infix expression:")
 try:
 postfix_notation = ShuntingYardParser.parse(action)
 print(postfix_notation)
 except ValueError as e:
 print(e.args)

Unit tests:

from unittest import TestCase
from ShuntingYardParser import ShuntingYardParser
class TestShuntingYardParser(TestCase):
 def test_single_number(self):
 # arrange & act
 output = ShuntingYardParser.parse("5")
 # assert
 self.assertEqual(output, list("5"))
 def test_non_string_input(self):
 # arrange & act & assert
 self.assertRaises(TypeError, ShuntingYardParser.parse, 5)
 def test_matching_parenthesises(self):
 # arrange & act
 output = ShuntingYardParser.parse("(5+2)")
 # assert
 self.assertEqual(output, ["5", "2", "+"])
 def test_non_matching_parenthesises_start(self):
 # arrange & act
 self.assertRaises(ValueError, ShuntingYardParser.parse, "((5+2)")
 def test_non_matching_parenthesises_end(self):
 # arrange & act
 self.assertRaises(ValueError, ShuntingYardParser.parse, "(5+2))")

Question 2

Interface design

Note that you only support nonnegative integers as operands.

Your algorithm correctly rearranges the input tokens in postfix order, but it would be more useful if it preserved the operator/operand categorization in its output. It's likely that the results will be fed to an RPN evaluator, and it would be nice if the RPN evaluator didn't have to redo that work. (Your ._is_operator() and ._is_operand() methods are "private", and therefore shouldn't be reused.)

Consider turning .parse() and .tokenize() into generators that yields their results as they go, instead of building a list to be returned.

I don't see any reason to define ShuntingYardParser as a class. The class doesn't keep any instance state, so it's just a glorified namespace. It would be less cumbersome to write these "methods" as functions in the ShuntingYardParser.py module.

Implementation

The tokenizer could take advantage of regular expressions to classify the token types, so that you wouldn't need ._is_operand() and .is_operator(). I suggest using the tokenizer recipe in Python's re documentation.

In .parse(), operators doesn't need to be a collections.deque; a list would work just fine.

You don't really need to test if not isinstance(infix_expression, str):, since regular expression searches on anything other than a string (or bytestring) will raise a TypeError anyway.

"parenthesises" is not a word. The exception message could be more informative, specifying whether an opening or closing parenthesis is unmatched.

Similarly, the "Syntax error" could be more specific about the offending token.

To print the message associated with a caught exception (rather than some weird args tuple), do:

except ValueError as e:
 print(e)

Suggested solution

ShuntingYardParser.py

from collections import namedtuple
import re
class Token(namedtuple('Token', 'kind value')):
 @property
 def precedence(self):
 if self.kind == 'OPERATOR':
 return {
 '*': 3, '/': 3,
 '+': 2, '-': 2,
 }[self.value]
def tokenize(expression):
 """
 Yield tokens in the order that they are encountered in the expression.
 """
 # https://docs.python.org/3/library/re.html#writing-a-tokenizer
 token_spec = [
 ('PAREN', r'[()]'),
 ('OPERATOR', r'[+*/-]'),
 ('OPERAND', r'\d+'),
 ('IGNORE', r'\s+'),
 ('JUNK', r'\S+?\b'),
 ]
 tokenizer = re.compile('|'.join(
 '(?P<{kind}>{pattern})'.format(kind=kind, pattern=pattern)
 for kind, pattern in token_spec
 ))
 for match in tokenizer.finditer(expression):
 kind, value = match.lastgroup, match.group(match.lastgroup)
 if kind == 'JUNK':
 raise ValueError('Unrecognized token: {0}'.format(value))
 elif kind != 'IGNORE':
 yield Token(kind, value)
def parse(infix_expression):
 """
 Yield tokens in the infix expression, reordered as as a postfix expression.
 """
 operators = []
 for token in tokenize(infix_expression):
 if token.kind == 'OPERAND':
 yield token
 elif token.kind == 'OPERATOR':
 while (operators and
 operators[-1].value != '(' and
 operators[-1].precedence <= token.precedence):
 yield operators.pop()
 operators.append(token)
 elif token.value == '(':
 operators.append(token)
 elif token.value == ')':
 # Pop all the operators in front of the "(".
 while operators and operators[-1].value != '(':
 yield operators.pop()
 if not operators:
 raise ValueError('Unmatched )')
 # Remove matching prenthesis.
 operators.pop()
 for operator in operators:
 if operator.value == '(':
 raise ValueError('Unmatched (')
 yield operator
if __name__ == '__main__':
 while True:
 try:
 postfix_expr = parse(input("Enter infix expression: "))
 print([op for op in postfix_expr])
 except ValueError as e:
 print(e)

Unit tests:

import unittest
import ShuntingYardParser
class TestShuntingYardParser(unittest.TestCase):
 def test_single_number(self):
 output = list(ShuntingYardParser.parse("5"))
 self.assertEqual(output, [
 ShuntingYardParser.Token('OPERAND', '5'),
 ])
 def test_non_string_input(self):
 # arrange & act & assert
 self.assertRaises(TypeError, lambda: list(ShuntingYardParser.parse(5)))
 ...

Question 3

Thanks! I have a few questions: how come the join iterable parameter does not require [] around it? Should I raise an exception if I call .precedence on a non-operator Token? Can I replace the 'magic strings' (e.g. 'OPERAND', 'OPERATOR', ...) by global variables in the module scope? Or would that be bad practice?

Question 4

'|'.join(...) is using a generator expression, which is lazier and therefore better than a list comprehension. I've chosen to make the .precedence of a non-operator None — I consider that to be the simplest reasonable design, though you could choose to raise an exception instead. As for the various kinds of operators, you could use an enum, and arguably it would be better than my magic strings.

score 6 · Accepted Answer · 2018-08-08 22:23:31Z

Interface design

Note that you only support nonnegative integers as operands.

Your algorithm correctly rearranges the input tokens in postfix order, but it would be more useful if it preserved the operator/operand categorization in its output. It's likely that the results will be fed to an RPN evaluator, and it would be nice if the RPN evaluator didn't have to redo that work. (Your ._is_operator() and ._is_operand() methods are "private", and therefore shouldn't be reused.)

Consider turning .parse() and .tokenize() into generators that yields their results as they go, instead of building a list to be returned.

I don't see any reason to define ShuntingYardParser as a class. The class doesn't keep any instance state, so it's just a glorified namespace. It would be less cumbersome to write these "methods" as functions in the ShuntingYardParser.py module.

Implementation

The tokenizer could take advantage of regular expressions to classify the token types, so that you wouldn't need ._is_operand() and .is_operator(). I suggest using the tokenizer recipe in Python's re documentation.

In .parse(), operators doesn't need to be a collections.deque; a list would work just fine.

You don't really need to test if not isinstance(infix_expression, str):, since regular expression searches on anything other than a string (or bytestring) will raise a TypeError anyway.

"parenthesises" is not a word. The exception message could be more informative, specifying whether an opening or closing parenthesis is unmatched.

Similarly, the "Syntax error" could be more specific about the offending token.

To print the message associated with a caught exception (rather than some weird args tuple), do:

except ValueError as e:
 print(e)

Suggested solution

ShuntingYardParser.py

from collections import namedtuple
import re
class Token(namedtuple('Token', 'kind value')):
 @property
 def precedence(self):
 if self.kind == 'OPERATOR':
 return {
 '*': 3, '/': 3,
 '+': 2, '-': 2,
 }[self.value]
def tokenize(expression):
 """
 Yield tokens in the order that they are encountered in the expression.
 """
 # https://docs.python.org/3/library/re.html#writing-a-tokenizer
 token_spec = [
 ('PAREN', r'[()]'),
 ('OPERATOR', r'[+*/-]'),
 ('OPERAND', r'\d+'),
 ('IGNORE', r'\s+'),
 ('JUNK', r'\S+?\b'),
 ]
 tokenizer = re.compile('|'.join(
 '(?P<{kind}>{pattern})'.format(kind=kind, pattern=pattern)
 for kind, pattern in token_spec
 ))
 for match in tokenizer.finditer(expression):
 kind, value = match.lastgroup, match.group(match.lastgroup)
 if kind == 'JUNK':
 raise ValueError('Unrecognized token: {0}'.format(value))
 elif kind != 'IGNORE':
 yield Token(kind, value)
def parse(infix_expression):
 """
 Yield tokens in the infix expression, reordered as as a postfix expression.
 """
 operators = []
 for token in tokenize(infix_expression):
 if token.kind == 'OPERAND':
 yield token
 elif token.kind == 'OPERATOR':
 while (operators and
 operators[-1].value != '(' and
 operators[-1].precedence <= token.precedence):
 yield operators.pop()
 operators.append(token)
 elif token.value == '(':
 operators.append(token)
 elif token.value == ')':
 # Pop all the operators in front of the "(".
 while operators and operators[-1].value != '(':
 yield operators.pop()
 if not operators:
 raise ValueError('Unmatched )')
 # Remove matching prenthesis.
 operators.pop()
 for operator in operators:
 if operator.value == '(':
 raise ValueError('Unmatched (')
 yield operator
if __name__ == '__main__':
 while True:
 try:
 postfix_expr = parse(input("Enter infix expression: "))
 print([op for op in postfix_expr])
 except ValueError as e:
 print(e)

Unit tests:

import unittest
import ShuntingYardParser
class TestShuntingYardParser(unittest.TestCase):
 def test_single_number(self):
 output = list(ShuntingYardParser.parse("5"))
 self.assertEqual(output, [
 ShuntingYardParser.Token('OPERAND', '5'),
 ])
 def test_non_string_input(self):
 # arrange & act & assert
 self.assertRaises(TypeError, lambda: list(ShuntingYardParser.parse(5)))
 ...

Thanks! I have a few questions: how come the join iterable parameter does not require [] around it? Should I raise an exception if I call .precedence on a non-operator Token? Can I replace the 'magic strings' (e.g. 'OPERAND', 'OPERATOR', ...) by global variables in the module scope? Or would that be bad practice?
'|'.join(...) is using a generator expression, which is lazier and therefore better than a list comprehension. I've chosen to make the .precedence of a non-operator None — I consider that to be the simplest reasonable design, though you could choose to raise an exception instead. As for the various kinds of operators, you could use an enum, and arguably it would be better than my magic strings.

Stack Exchange Network

Shunting-yard algorithm

1 Answer 1

Interface design

Implementation

Suggested solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Shunting-yard algorithm

1 Answer 1

Interface design

Implementation

Suggested solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions