I recently wrote the following code for a codewars.com kata for evaluating mathematical expressions. I've been writing ruby for years, but almost everything I do is for personal use and isn't shared, so I've gotten essentially zero feedback on my ruby style. The following block of code (with more comments than usual) is fairly typical of my style, but it never feels as ruby-ish as the code snippets I see online.
What can be changed to make this code closer to ideal idiomatic ruby?
#parse the initial expression, break it up into an array
# of tokens (numbers, parens, operations, etc)
# this also detects negation and cheges the symbol from '-' to 'n'
# and finally, this converts numbers to floats
def tokenize_expression(expression)
#remove spaces
s = expression.gsub(/\s/,'')
#convert negations to 'n' character
while md = s.match(%r{(?<![\d\)])-}) do
s[md.begin(0)] = 'n'
end
#iterate through string
#if number, get full number and add to array
#otherwise grab just the first character which will be an operation or parenthesis
tokens = []
while not s.empty?
if not s.match(%r{^\d}).nil? #first char is digit
md = s.match %r{[\d\.]+}
s = md.post_match
tokens << md[0].to_f
else
tokens << s[0] #first char is parenthesis or operation
s = s[1..-1] #everything but first char
end
end
tokens
end
#take the array and make sub arrays based on the parentheses
# e.g. ['(', '1', '+', '(', '2', '+', '3', ')', ')'] -> ['1', '+', ['2', '+', '3']]
def nest_parens(tokens)
result = []
stack = []
first, *rest = *tokens
while not first.nil? do
case first #look at first token
when '('
stack.push result #store current partial result on stack if open parens
result = [] #start new result
when ')'
child = result #store result in temp var
result = stack.pop #get previous partial result
result << child #add temp result to current result
else
result << first #add this token to the current result
end
first, *rest = *rest
end
throw "Unclosed parenthesis" if not stack.empty?
result
end
#find all the neagtions and convert them to nested postfix
# e.g. '5-n6' becomes '5-[n 6]'
def postfix_negation(tokens)
return tokens if not tokens.is_a? Array
tokens = tokens.map{ |t| postfix_negation(t) } #recursively process everything below the current level
result = []
first, *rest = *tokens
while not first.nil?
case first
when 'n'
second, *rest = *rest
result << [first, second] #e.g. [n 6]
else
result << first
end
first, *rest = *rest
end
result
end
#find all operations (mult/div or plus/minus) and convert to nested postfix
# e.g. '1+2*3' becomes '[+ 1 [* 2 3]]'
def postfix_ops(tokens, ops=['/','*'])
return tokens if not tokens.is_a? Array
tokens = tokens.map{ |t| postfix_ops(t, ops) } #recursively process everything below the current level
result = []
first, *rest = *tokens
while not first.nil?
second = rest.first #if there is an operator, second will contain it
if ops.include? second
second, third, *rest = *rest
first = [second, first, third] #[op, arg1, arg2]. This now becomes first and is compared again to the following tokens,
next #which will handle cases like 1+2+3 --> [+ [+ 1 2] 3]
else
result << first
end
first, *rest = *rest
end
result
end
#take a fully processed, postfix tree and recursively evaluate the expressions
def eval(tree)
return tree if not tree.is_a? Array #if this isn't an array, return it
tree = tree.map {|n| eval(n) } #recursively process everything below the current level
return tree.first if tree.length == 1 #sometimes we end up with a single value as an array, e.g. [5], so just return the inner value
first, second, third = tree #process arguments
case first
when 'n' then return -second
when '+' then return second + third
when '-' then return second - third
when '*' then return second * third
when '/' then return second / third
else raise "Unkown Op: #{first}"
end
end
#wrapper to call all the needed steps for processing
def calc(expression)
tokens = tokenize_expression(expression)
tokens = nest_parens(tokens)
tokens = postfix_negation(tokens)
tokens = postfix_ops(tokens, ['/','*'])
tokens = postfix_ops(tokens, ['+','-'])
eval(tokens)
end
#test input
[
"2/(2+3)*4",
"2-(-(3*-2))",
"3*((4*5)*6*(7*8))",
"-(55--(-(1+2))--12)"
].each do |s|
puts "\nString: #{s}"
puts "Eval: #{calc(s)}"
end
3 Answers 3
Single responsibility principle
#parse the initial expression, break it up into an array
# of tokens (numbers, parens, operations, etc)
# this also detects negation and cheges the symbol from '-' to 'n'
# and finally, this converts numbers to floats
def tokenize_expression(expression)
Your function tokenize_expression
does:
- break [the initial expression] up into an array
- detects negation and changes the symbol from '-' to 'n'
- converts numbers to floats
This should be three functions:
def negation_to_n(expression)
s = expression.gsub(/\s/,'')
while md = s.match(%r{(?<![\d\)])-}) do
s[md.begin(0)] = 'n'
end
s
end
def split_tokens(expression)
### Implement
end
def numbers_to_floats(tokens)
tokens.map{|t| t.match(%r{^\d}) ? t.to_f : t}
end
Use functional programming when practical
You should not feel forced to stick to imperative programming even when it becomes so hard to follow, for example negation_to_n
can be written just like (please note that the string should be passed in already without spaces):
def matching_indexes(s, regex)
s
.enum_for(:scan, regex)
.map { Regexp.last_match.begin(0) }
end
def negation_to_n(s)
(0...s.length)
.map{|i| matching_indexes(s, %r{(?<![\d\)])-}).include?(i) ? 'n' : s[i]}
end
Another example is split_tokens
:
def split_tokens(s)
s
.chars
.chunk {|d| is_digit?(d)}
.map{|_, xs| xs}
.map {|g| g.all? {|ch| is_digit?(ch)} ? g.join : g}
.flatten
end
Surely other parts can be similarly simplified.
-
\$\begingroup\$ I like what you have suggested with matching_indexes and negation_to_n. I'm not really familiar with ".enum_for", so I'll have to look into incorporating that in the future. I see what you are trying to suggest with break_into_array and numbers_to_floats but I question the usefulness. Detecting the full number and breaking it out is the hard part and requires the bulk of the work. Making a whole new function just to avoid appending ".to_f" on the end seems questionable. I suppose the "to_f" could be added to the operands in eval, but it would look messy \$\endgroup\$Zack– Zack2016年02月10日 22:19:29 +00:00Commented Feb 10, 2016 at 22:19
-
1\$\begingroup\$ I see where you are coming from: Making a whole new function you feel like writing a function more is a chore or a difficulty, but I think that more functions make it easier to write code because you can test them one by one. \$\endgroup\$Caridorc– Caridorc2016年02月10日 22:28:19 +00:00Commented Feb 10, 2016 at 22:28
-
\$\begingroup\$ For what it's worth, tokenize_expression was originally a full class with a next_token() and to_a() methods, but it seemed like overkill and needlessly complicated, especially for purposes of soliciting feedback, so I merged it into a single function for posting. \$\endgroup\$Zack– Zack2016年02月10日 22:36:50 +00:00Commented Feb 10, 2016 at 22:36
-
2\$\begingroup\$ Instead of everything in your
split_token
method, might I suggests.scan /\d+|./
? \$\endgroup\$Jordan– Jordan2016年02月11日 01:18:08 +00:00Commented Feb 11, 2016 at 1:18 -
2\$\begingroup\$ @Zack please don't simplify your code when posting here. We want your real code, precisely to avoid the "but my real code looks like this" comments. Imagine how you'd feel if you took the time to offer assistance to someone and they replied with that. \$\endgroup\$RubberDuck– RubberDuck2016年02月11日 03:38:33 +00:00Commented Feb 11, 2016 at 3:38
Updated Code After Feedback
I've remodeled my code after Caridorc's example. It took a lot of time to refactor the code, possibly longer than I spent writing it in the first place. I'd guess it is a little more readable and idiomatic than it was before, although I'm not sure it was worth the time investment. Regardless, I've at least learned several new ruby tricks out of all of this. Any thought on new code?
def apply_negation_symbol(s)
copy = s.clone
s
.enum_for(:scan, %r{(?<![\d\)])-})
.map { Regexp.last_match.begin(0) }
.each { |ix| copy[ix] = 'n' }
copy
end
def nest_parenthesis(s)
stack = []
s
.chars
.inject Array.new do |arr, char|
case(char)
when '(' then stack.push arr; Array.new
when ')' then stack.pop << arr
else arr << char
end
end
end
def is_numerical?(c)
not (c =~ %r{[\d\.]}).nil?
end
def digits_to_floats(tree)
return tree if not tree.is_a? Array
tree = tree.map {|t| digits_to_floats(t) }
tree
.chunk {|d| is_numerical?(d)}
.map{|_, xs| xs}
.flat_map {|g| g.all? {|ch| is_numerical?(ch)} ? g.join.to_f : g}
end
def negation_to_postfix(tree)
return tree if not tree.is_a? Array
tree = tree.map{ |t| negation_to_postfix(t) }
tree
.slice_before{|item| item == 'n'}
.flat_map { |arr| arr.first == 'n' ? [arr.first(2), arr[2..-1]] : arr}
.delete_if { |item| item.is_a? Array and item.empty?}
end
def operations_to_postfix(tree, ops=['/','*'])
return tree if not tree.is_a? Array
tree = tree.map{ |t| operations_to_postfix(t, ops) }
tree
.map.with_index{ |item, ix| ops.include?(item) ? ix : nil}
.compact
.reverse
.each do |ix|
arg1, op, arg2 = tree.slice!(ix-1..ix+1)
tree[ix-1] = [op, arg1, arg2]
end
tree
end
def calc(expression)
s = apply_negation_symbol(expression)
tree = nest_parenthesis(s)
tree = digits_to_floats(tree)
tree = negation_to_postfix(tree)
tree = operations_to_postfix(tree, ops=['/','*'])
tree = operations_to_postfix(tree, ops=['+','-'])
eval(tree)
end
-
1\$\begingroup\$ I don't want to be negative for the sake of being negative, but while this is more functional, which I like, I find it way less readable than your original code. \$\endgroup\$Jordan– Jordan2016年02月11日 23:59:43 +00:00Commented Feb 11, 2016 at 23:59
-
\$\begingroup\$ Those were more or less my feelings. Some of it is better, some of it is worse, and it took a whole lot of time. Changing negation_to_postfix() to use the same array replacement logic I used in operations_to_postfix() would help a little for readability, but altering arrays in place (even though it is a local variable) feels like it violates the spirit of functional programming. \$\endgroup\$Zack– Zack2016年02月12日 00:05:56 +00:00Commented Feb 12, 2016 at 0:05
Overall I think your code is great. Almost zero WTFs. Of course, I wouldn't be posting if I didn't have suggestions. I've broken it up into methods below, with a "Miscellaneous" section at the end for general advice.
tokenize_expression
I'll take this in two sections:
Convert negations to n
I think you overthought this part a bit. This:
#convert negations to 'n' character
while md = s.match(%r{(?<![\d\)])-}) do
s[md.begin(0)] = 'n'
end
...is better expressed like this:
s.gsub!(/(?<![\d\)])-/, 'n')
Consume tokens
Your code:
while not s.empty?
if not s.match(%r{^\d}).nil? #first char is digit
md = s.match %r{[\d\.]+}
s = md.post_match
tokens << md[0].to_f
else
tokens << s[0] #first char is parenthesis or operation
s = s[1..-1] #everything but first char
end
end
Firstly, this line:
if not s.match(%r{^\d}).nil?
...would be better expressed as:
if s.match(/^\d/) # `nil` is falsey in Ruby
...but since you're not using the MatchData object returned by match
, you should use the =~
operator (it returns a Fixnum, which is always truthy, or nil
):
if s =~ /^\d/
Secondly, this is a waste:
if not s.match(%r{^\d}).nil? #first char is digit
md = s.match %r{[\d\.]+}
s = md.post_match
You're throwing away the MatchData object from the first line and then performing a very similar match on the second line. You can do this with one match operation:
if md = s.match(/^\d+(\.\d+)?/)
s = md.post_match
This Regexp matches one or more digits optionally followed by a decimal point and one or more digits. (This also solves a bug in your code that would have matched 1.2.3
as a valid number.)
We can improve this further, though:
until s.empty?
tokens <<
if num = s.slice!(/^\d+(\.\d+)?/)
num.to_f
else
s.slice!(0)
end
end
Instead of matching the expression, checking the result, adding the matched part to tokens
and setting s
to everything after it, we do all of those things at once by passing the expression to String#slice!
. If nothing is matched slice!
returns nil
, so we do s.slice!(0)
to chop off just the first character instead.
Or further, although we're getting somewhat occult now:
while s.slice!(/^\d+(?:\.\d+)?|(.)/)
tokens << (1ドル || $&.to_f)
end
This is basically the same except I've added an alternation that, if a number wasn't matched, captures the first character in capture group 1 (1ドル
). If a number was matched, 1ドル
will be nil
and so we call to_f
on the whole match ($&.to_f
).
nest_parens
This method is really solid. I only have one suggestion, which is to use Enumerable#reduce
or Enumerable#each_with_object
instead of a while
loop:
result = tokens.reduce([]) do |res, token|
case token
when '(' then stack.push(res); []
when ')' then stack.pop << res
else res << token
end
end
postfix_negation
This one is also really solid. Unlike the above I don't think reduce
works well here (since you're sometimes, but not always, consuming a second element).
One thing that's slightly smelly to me is having first, *rest = tokens
before your loop and first, *rest = rest
at the bottom:
first, *rest = *tokens
while not first.nil?
case first
when 'n'
second, *rest = *rest
result << [first, second] #e.g. [n 6]
else
result << first
end
first, *rest = *rest
end
Also, I think a case
expression is overkill when you only have one condition and an else
. I'd probably rewrite your loop like this:
while first = tokens.shift
if first == 'n'
result << [ first, tokens.shift ]
else
result << first
end
end
postfix_ops
Similar to the above:
while first = tokens.shift
if ops.include?(tokens.first)
second, third = tokens.shift(2)
tokens.unshift([ second, first, third ])
else
result << first
end
end
As you can see, I prefer a, b = rest.shift(2)
to a, b, *rest = rest
.
eval
Don't shadow Kernel#eval
. I'd rename this method to evaluate
.
Otherwise, this method is very good. One thing I'd do is remove all of the return
s from your case
expression. The case
is the last expression in the method, so whichever then
is reached will be the return value of the method:
case first
when 'n' then -second
when '+' then second + third
# ...
else raise "Unknown Op: #{first}"
end
If you want to get fancy, you could replace this with the following:
if first == 'n'
first, second, third = 0, '-', second
end
first.send(second, third)
However, this way you lose error handling for unknown operators. You could add it back in simply enough, but by then it might not be worth it.
Miscellaneous
Use consistent indentation. Two spaces is standard for Ruby.
Don't use
do
after awhile
(orfor
). It's implied.Use
unless
instead ofif not
.while not first.nil? do
should either bewhile first
oruntil first.nil?
. Similarly, I recommenduntil s.empty?
instead ofwhile not s.empty?
.You don't need the splat before
tokens
infirst, *rest = *tokens
, becausetokens
is already an array.Regexp "percent notation" (
%r{...}
) is good for reducing visual noise when your expression contains slashes, because you don't have to escape them (%r{a/b/c}
vs./a\/b\/c/
); however, when you don't have any slashes, I find it actually adds visual noise and prefer normal slash notation.
-
\$\begingroup\$ Thanks for your feedback, Jordan. A few of your suggestions I've already done on the improved code. Your suggestion about using gsub() for swapping out the negation symbol made me smack my forehead; I really overthought that one. With regards to Misc#5, the splat before tokens is required for what I want (a Lisp style first() and rest() destructuring); it doesn't work without it. As for Misc #6, I left it %r{} to keep the regexes consistent, rather than mixing escape syntax. Thank you again for your feedback \$\endgroup\$Zack– Zack2016年02月12日 00:13:27 +00:00Commented Feb 12, 2016 at 0:13
-
\$\begingroup\$ You're mistaken about the splat. Take a look: ideone.com/PtwYaQ A splat on the right-hand side of an assignment just invokes
to_ary
, on it; sincetokens
is an array this is a no-op. \$\endgroup\$Jordan– Jordan2016年02月12日 00:22:23 +00:00Commented Feb 12, 2016 at 0:22 -
\$\begingroup\$ Hmm. According to this article (devblog.avdi.org/2010/01/31/first-and-rest-in-ruby), it was required for proper handling of empty lists, but maybe it's changed in more recent versions of ruby (which testing seems to confirm) \$\endgroup\$Zack– Zack2016年02月12日 01:13:09 +00:00Commented Feb 12, 2016 at 1:13
Explore related questions
See similar questions with these tags.
.each
to iterate over lists instead of yourwhile
s... \$\endgroup\$