Return to Answer

added 141 characters in body

edited May 15, 2018 at 3:55

Don't be disheartened, though, parsing arbitrarily complex expressions is not trivial. Even the famous Shunting Yard algorithm is incomplete and fails on various inputs &8212; fails to detect bad input and/or doesn't properly parse (and it gets worse ifsuch as we try to enhance the grammar, such as having unary negation and subtraction together in the same grammar).

I prefer to identifyInfix expressions naturally have two states at the start of any given token: unary and binary — hence, andI prefer to model the two states explicitly.&nbsp To track any prior state I prefer to use two stacks, one for operands, and one for operators. The states are implemented by two different sections of code (i.e. there's no need for a state variable as we know we're in unary state if we're in the unary section of code, and vice versa).

Don't be disheartened, though, parsing arbitrarily complex expressions is not trivial. Even the Shunting Yard algorithm is incomplete and fails on various inputs (and it gets worse if we try to enhance the grammar, such as having unary negation and subtraction together in the same grammar).

I prefer to identify two states: unary and binary, and use two stacks, one for operands, and one for operators. The states are implemented by two different sections of code (i.e. there's no need for a state variable as we know we're in unary state if we're in the unary section of code, and vice versa).

Don't be disheartened, though, parsing arbitrarily complex expressions is not trivial. Even the famous Shunting Yard algorithm is incomplete and fails on various inputs &8212; fails to detect bad input and/or doesn't properly parse (such as we try to enhance the grammar, such as having unary negation and subtraction together in the same grammar).

Infix expressions naturally have two states at the start of any given token: unary and binary — hence, I prefer to model the two states explicitly.&nbsp To track any prior state I prefer to use two stacks, one for operands, and one for operators. The states are implemented by two different sections of code (i.e. there's no need for a state variable as we know we're in unary state if we're in the unary section of code, and vice versa).

added 224 characters in body

Source Link

edited May 15, 2018 at 1:28

Erik Eidt

edited May 15, 2018 at 1:28

Erik Eidt

"End of input" is allowed in the binary state (we could allow it in unary state also if you wanted to allow the empty expression). On EOInput we have to reconcile the operator stack against, well the lowest precedence operator "nothing". if we encounter an ( grouping paren on the stack, then that is an input error of unbalanced ( paren. When all is said and done, if the parse succeeds, the operator stack is empty and the result is the top and only item on the operand stack.

added 224 characters in body

Source Link

edited May 15, 2018 at 1:21

Erik Eidt

edited May 15, 2018 at 1:21

Erik Eidt

Don't be disheartened, though, parsing arbitrarily complex expressions is not trivial. Even the Shunting Yard algorithm is incomplete and fails on various inputs (and it gets worse if youwe try to enhance the grammar, such as having unary negation and subtraction together in the same grammar).

I prefer to identify two states: unary and binary, and use two stacks, one for operands, and one for operators. The states are implemented by two different sections of code (i.e. there's no need for a state variable as youwe know you'rewe're in unary state if you'rewe're in the unary section of code, and vice versa).

The parse starts in the unary state where (, -, +, numbers (and variables) are accepted (things we don't recognize should error). These are grouping paren, unary negation, unary plus, etc... After a number or variable push it to operand stack and switch to binary state; otherwise, push the operator and stay in unary state.

In binary state, youwe can accept ), +, -, *, / (things we don't recognize should error). After ) stay in binary state and reduce operators on on the operator stack until reaching a matching ( (if it's not there there's an unbalanced ).; for the others: reconcile the newest operator with the top operator on the operator stack (here operator precedence is handled, like * higher than +), then push that newest operator, and switch back to unary state.

Reconciling a higher precedence operator already on the stack means consuming it (from the operator stack) and consuming its operands from the operand stack, and then, returning to the operand stack a representation of the operationsintermediate computation.

For example, in 4*5+6, when we get to the + operator, we have 4 and 5 on the operand stack, and * on the operator stack. Since * is higher precedence than +, the * is "executed", by taking4 and 5* off the operandoperator stack, and taking *4 and 5 off the operatoroperand stack, and multiplying the two popped(popped) operands, and then pushing the result (either 20, or if you're doing a compiler, then a tree/AST node that representing the multiplication of 4*5) back onto the operand stack.

By treating the unary and binary states separately, not only can we distinguish between unary - and subtraction -, if we see, for example, ( in the binary state, then that means we have a function call parenthesis (e.g. a(b)), instead of a grouping paren as when same seen in the unary state(a+b). It is pretty hard to parse infix expressions well without separating these states.

Don't be disheartened, though, parsing arbitrarily complex expressions is not trivial. Even the Shunting Yard algorithm is incomplete and fails on various inputs (and it gets worse if you try to enhance the grammar, such as having unary negation and subtraction together in the same grammar).

I prefer to identify two states: unary and binary, and use two stacks, one for operands, and one for operators. The states are implemented by two different sections of code (i.e. there's no need for a state variable as you know you're in unary state if you're in the unary section of code, and vice versa).

The parse starts in the unary state where (, -, +, numbers (and variables) are accepted. These are grouping paren, unary negation, unary plus, etc... After a number or variable push it to operand stack and switch to binary state; otherwise, push the operator and stay in unary state.

In binary state, you can accept ), +, -, *, /. After ) stay in binary state and reduce operators on on the operator stack until reaching a matching ( (if it's not there there's an unbalanced ).; for the others: reconcile the newest operator with the top operator on the operator stack (here operator precedence is handled, like * higher than +), then push that newest operator, and switch back to unary state.

Reconciling a higher precedence operator already on the stack means consuming it and its operands from the operand stack, and returning to the operand stack a representation of the operations.

For example, in 4*5+6, when we get to the + operator, we have 4 and 5 on the operand stack, and * on the operator stack. Since * is higher precedence than +, the * is "executed", by taking4 and 5 off the operand stack, taking * off the operator stack, and multiplying the two popped operands, and then pushing the result (either 20, or if you're doing a compiler, then a tree/AST node that representing the multiplication of 4*5) back onto the operand stack.

In binary state, we can accept ), +, -, *, / (things we don't recognize should error). After ) stay in binary state and reduce operators on on the operator stack until reaching a matching ( (if it's not there there's an unbalanced ).; for the others: reconcile the newest operator with the top operator on the operator stack (here operator precedence is handled, like * higher than +), then push that newest operator, and switch back to unary state.

For example, in 4*5+6, when we get to the + operator, we have 4 and 5 on the operand stack, and * on the operator stack. Since * is higher precedence than +, the * is "executed", by taking * off the operator stack, and taking 4 and 5 off the operand stack, and multiplying the two (popped) operands, and then pushing the result (either 20, or if you're doing a compiler, then a tree/AST node that representing the multiplication of 4*5) back onto the operand stack.

Source Link

answered May 15, 2018 at 1:14

Erik Eidt

answered May 15, 2018 at 1:14

Erik Eidt

lang-cpp