Counting words / lines in Ruby

Question 1

I solved this problem in Ruby:

Write an utility that takes 3 command-line parameters P1, P2 and P3. P3 is OPTIONAL (see below) P1 is always a file path/name. P2 can take the values:

"lines"

"words"

"find"

Only P2 is "find", then P3 is relevant/needed, otherwise it is not.

So, the utility does the following:

If P2 is "rows" it says how many lines it has

If P2 is "words" it says how many words it has (the complete file)

If P2 is "find" it prints out the lines where P3 is present

My solution looks like this:

#!/usr/bin/env ruby
def print_usage
 puts "Usage: #{0ドル} <file> words|lines"
 puts " #{0ドル} <file> find <what-to-find>"
end
class LineCounter
 # Initialize instance variables
 def initialize
 @line_count = 0
 end
 def process(line)
 @line_count += 1
 end
 def print_result
 puts "#{@line_count} lines"
 end
end
class WordCounter
 # Initialize instance variables
 def initialize
 @word_count = 0
 end
 def process(line)
 @word_count += line.scan(/\w+/).size
 end
 def print_result
 puts "#{@word_count} words"
 end
end
class WordMatcher
 # Initialize instance variables, using constructor parameter
 def initialize(word_to_find)
 @matches = []
 @word_to_find = word_to_find
 end
 def process(line)
 if line.scan(/#{@word_to_find}/).size > 0 
 @matches << line
 end
 end
 def print_result
 @matches.each { |line|
 puts line
 }
 end 
end
# Main program
if __FILE__ == $PROGRAM_NAME
 processor = nil
 # Try to find a line-processor
 if ARGV.length == 2
 if ARGV[1] == "lines"
 processor = LineCounter.new
 elsif ARGV[1] == "words"
 processor = WordCounter.new
 end
 elsif ARGV.length == 3 && ARGV[1] == "find"
 word_to_find = ARGV[2]
 processor = WordMatcher.new(word_to_find)
 end
 if not processor
 # Print usage and exit if no processor found
 print_usage
 exit 1
 else
 # Process the lines and print result
 File.readlines(ARGV[0]).each { |line|
 processor.process(line)
 }
 processor.print_result
 end
end

My questions are:

Is there a more Ruby-esque way of solving it?
More compact, but still readable / elegant?

It seems checking for correct command-line parameter combinations takes up a lot of space...

Contrast it to the Scala version found here:

https://gist.github.com/anonymous/93a975cb7aba6dae5a91#file-counting-scala

Question 2

If you are satisfied with any of the answers, you should select the one that was most helpful to you.

Question 3

Some notes:

Those counter classes are probably overkill, keep it simple.
Ruby is an OOP language, but it's not necessary to create a bunch of classes for simple scripts like this.
Idiomatic: if not x -> if !x
Idiomatic: { ... } for one-line blocks, do/end for multi-line.

I'd write:

fail("Usage: #{0} PATH (lines|words|find REGEXP)") unless ARGV.size >= 2
path, mode, optional_regexp = ARGV
open(path) do |fd|
 case mode
 when "lines"
 puts(fd.lines.count)
 when "words"
 puts(fd.lines.map { |line| line.split.size }.reduce(0, :+))
 when "find"
 if optional_regexp
 fd.lines.each { |line| puts(line) if line.match(optional_regexp) }
 else
 fail("mode find requires a REGEXP argument")
 end
 else
 fail("Unknown mode: #{mode}")
 end
end

Question 4

Thanks for the tips about idiomatic Ruby code. And thanks for the example. I know there was a "Ruby way" of doing it... short, compact, pragmatic, to the point, yet readable.

Question 5

Upvoted. Great answer. One small suggestion for an improvement: Put all the argument checking and fail statements at the top. Then the program reads: 1. data validation 2. actual content. It has the added benefit of getting rid of all the "if... else" statements.

Question 6

Formatting

Most Rubiest favor some white space between methods, such as:

class LineCounter
 # Initialize instance variables
 def initialize
 @line_count = 0
 end
 def process(line)
 @line_count += 1
 end
 def print_result
 puts "#{@line_count} lines"
 end
end

{...} vs do...end

For multi-line blocks, prefer do...end:

File.readlines(arguments.path).each do |line|
 arguments.processor.process(line)
end

Comments

Comments, when used, should say something the code doesn't already say. This comment, and some of the others, can be eliminated without injuring the reader's ability to understand the code:

 # Initialize instance variables
 def initialize
 @line_count = 0
 end

Argument parsing

You are correct that argument parsing in this script has the potential to be improved. There are a few different ideas that could help here.

Separate class

I usually like to put argument parsing in its own class:

class Arguments
 attr_reader :path
 attr_reader :processor
 def initialize(argv)
 @path = argv[0]
 if argv.length == 2
 if argv[1] == "lines"
 @processor = LineCounter.new
 elsif argv[1] == "words"
 @processor = WordCounter.new
 end
 elsif argv.length == 3 && argv[1] == "find"
 word_to_find = argv[2]
 @processor = WordMatcher.new(word_to_find)
 end
 if not @processor
 print_usage
 exit 1
 end
 end
 private
 def print_usage
 puts "Usage: #{0ドル} <file> words|lines"
 puts " #{0ドル} <file> find <what-to-find>"
 end
end

The main program becomes:

if __FILE__ == $PROGRAM_NAME
 arguments = Arguments.new(ARGV)
 File.readlines(arguments.path).each { |line|
 arguments.processor.process(line)
 }
 arguments.processor.print_result
end

I had more I was going to write, but after seeing the simplicity of @tokland's answer, I think the approaches I was going to take are not so good.

Question 7

Thanks for the tips. Interesting approach with your Arguments class... Have you considered using a special library for command-line argument validation?

Question 8

@Sebastian Yes, I did. optparse, of course, only takes care of switch (--foo) arguments, so it would be no help. I have often looked for libraries which do good handling of non-switch arguments; I am not aware of one that just parses arguments. The ones I've seen have strong opinions on parts of your program that are not argument parsing.

Question 9

As you have not indicated whether you are looking for a quick and dirty--possibly one-off--solution, or production code, and have said nothing of file size, I decided to suggest something you could employ for the former purpose, when the file is not humongous (because I read it all into a string):

fname, op, regex = ARGV
s = File.read(fname)
case op
when 'rows'
 puts s[-1] == $/ ? s.count($/) : s.count($/) + 1
when 'words'
 puts s.split.size
when 'find' 
 regex = /#{regex}/
 s.each_line {|l| puts l if l =~ regex}
end

where $/ is the end-of-line character(s). Let's create a file for demonstration purposes:

text =<<_
Now is the time
for all good
Rubiests to
spend some
time coding. 
_
File.write('f1', text)

If the above code is in the file 'file_op.rb', we get these results:

ruby 'file_op.rb' 'f1' 'rows' #=> 5
ruby 'file_op.rb' 'f1' 'words' #=> 13
ruby 'file_op.rb' 'f1' 'find' 'time'
 #=> Now is the time
 # time coding.

Question 10

Thanks for the super-compact solution. It is a good example and serves me well, however I would like to show an "usage" text in case of missing / incorrect arguments. But please don't change your example! I like it that it's so short.

Question 11

I think you can remove the + [nil]. Unlike Python, you can de-struct even if sizes do not match.

Question 12

Sebastian, I figured you could add whatever data checks you wanted. @tokland, thank you-good to know that, edited my answer--and I'd also like to thank Ruby.

tokland tokland 11.2k1 gold badge21 silver badges26 bronze badges · Accepted Answer · 2014-02-12 19:01:21Z

Some notes:

Those counter classes are probably overkill, keep it simple.
Ruby is an OOP language, but it's not necessary to create a bunch of classes for simple scripts like this.
Idiomatic: if not x -> if !x
Idiomatic: { ... } for one-line blocks, do/end for multi-line.

I'd write:

fail("Usage: #{0} PATH (lines|words|find REGEXP)") unless ARGV.size >= 2
path, mode, optional_regexp = ARGV
open(path) do |fd|
 case mode
 when "lines"
 puts(fd.lines.count)
 when "words"
 puts(fd.lines.map { |line| line.split.size }.reduce(0, :+))
 when "find"
 if optional_regexp
 fd.lines.each { |line| puts(line) if line.match(optional_regexp) }
 else
 fail("mode find requires a REGEXP argument")
 end
 else
 fail("Unknown mode: #{mode}")
 end
end

Thanks for the tips about idiomatic Ruby code. And thanks for the example. I know there was a "Ruby way" of doing it... short, compact, pragmatic, to the point, yet readable.
Upvoted. Great answer. One small suggestion for an improvement: Put all the argument checking and fail statements at the top. Then the program reads: 1. data validation 2. actual content. It has the added benefit of getting rid of all the "if... else" statements.

Stack Exchange Network

Counting words / lines in Ruby

3 Answers 3

Formatting

{...} vs do...end

Comments

Argument parsing

Separate class

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Counting words / lines in Ruby

3 Answers 3

Formatting

{...} vs do...end

Comments

Argument parsing

Separate class

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions