Regular Expression to remove blank lines

Question 1

This is basically my first Ruby class, which is to remove all empty lines and blank times from a .txt file. A simple test case is also included.

I tested the code and it works fine. But I did not have much idea about if the code style is good, or if there are some potential issues. Can someone please do a quick review and kindly provide some feedback?

Is there any potential issue in the code? did I miss some boundary check?
Is the code style good in general? e.g, name convention, indent, etc.
what can I do if I want to make it more professional? e.g., better choice of functions, error-handling, object-oriented, test-driven, etc.

require 'test/unit'
class BlankLineRemover
 def initialize()
 puts "BlankLineRemover initialized."
 end
 def generate(input_filename, output_filename)
 reader = File.read(input_filename)
 f = File.new(output_filename, "w")
 f.puts remove(reader)
 f.close
 end
 def remove(stringToremove)
 regEx = /^[\s]*$\n/
 stringToReturn = stringToremove.gsub(regEx, '')
 stringToReturn.strip!
 if stringToReturn.length >= 1
 return stringToReturn
 else
 return ""
 end
 end
end
class TestCleaner < Test::Unit::TestCase
 def test_basic
 sInput1 = "line1\n\t\nline4\n\tline5\n"
 sExpectedOutput1 = "line1\nline4\n\tline5"
 sInput2=""
 sExpectedOutput2 = ""
 sInput3="\n\t\t\n"
 sExpectedOutput3 = ""
 testCleaner = BlankLineRemover.new
 assert_equal(sExpectedOutput1, testCleaner.remove(sInput1))
 assert_equal(sExpectedOutput2, testCleaner.remove(sInput2))
 assert_equal(sExpectedOutput3, testCleaner.remove(sInput3))
 end
end
unless ARGV.length == 2
 puts "Usage: ruby Blanker.rb input.txt output.txt\n"
 exit
end
simpleRemover = BlankLineRemover.new
simpleRemover.generate(ARGV[0],ARGV[1])

Question 2

On the method remove you have an if/else that could be re-written as stringToReturn.length >= 1 ? stringToReturn : "", but to check that a string has a size bigger than one you're better of with !stringToReturn.empty? but then again the only way to fall on that else is if stringToReturn is an empty string (which is the return value of the else) and therefore you don't need that if/else at all and simply let stringToReturn.strip! be the return value of that method.

Question 3

I think, it is not great idea to read full file at once, you may need to read it line by line and populate result file in a go. Also, you can remove "()" parameterless method declaration. Usually, only underscores are used in variable and method names in Ruby to separate words instead of camel case.

Question 4

I would also remove the following: \r\n

Also, this part could be simplified a tad:

if stringToReturn.length >= 1
 return stringToReturn
else
 return ""
end

to:

 stringToReturn.length >= 1 ? stringToReturn : ""

Once you have the \r\n removed, don't forget to test for that. Lastly (just a style issue), I wouldn't use camelcase variables in Ruby as it's not commonly done (that I've seen), so stringToremove could become string_to_remove for instance.

Question 5

@ James - thx for yr feedback. but which "following: \r\n" did u mean? can you specify?

Question 6

He means the newline \r\n, check this: en.wikipedia.org/wiki/Newline and see how it can vary depending on the OS

Question 7

got it. thx for the clarification. ( i guess you guys know each other:) ). besides the string length check and newline, any feedbacks at other aspects like error-handling, object-oriented, testing case, file handling, etc?

Question 8

@ Derp/James - I guess I have the completionist mind-set. so forgive me if i am getting overly detail-oriented here. :)

Question 9

I added a little more to my answer.

Question 10

What follows is a fairly harsh critique, but I'm hoping that that's what you're looking for as you seem to want to improve you ruby and programming style. Ruby is a lovely language, but getting to know it takes some time - eventually you'll realise that just about everything you write used to take twice as many lines. So, a few pointers:

Variables

Generally in ruby normal variables are underscored. So instead of aVariableName you should probably use a_variable_name. Both will work, but the latter would fit the Ruby style better.

Additionally you choice of variable names seems to go from overly verbose to the opposite. Some are great, but others leave my guessing.

For example where you write:

reader = File.read(input_filename)

I would prefer to write:

in_file = File.read(in_filename)

Functions

Like your variables, the naming of your functions could do with a touch-up. Your functions do not make use of the implicit return that Ruby offers, something that is often considered a better practice.

For example where you write:

 def remove(stringToremove)
 regEx = /^[\s]*$\n/
 stringToReturn = stringToremove.gsub(regEx, '')
 stringToReturn.strip!
 if stringToReturn.length >= 1
 return stringToReturn
 else
 return ""
 end
 end

In this regard I might have written:

 def remove(stringToremove)
 regEx = /^[\s]*$\n/
 stringToReturn = stringToremove.gsub(regEx, '').strip
 end

Also on your initialize function you include brackets in the definitions this is not required, and is discouraged. Additionally you don't need to include a initializer if there's nothing for it to do.

Rubyness

Some things you might want to consider further as you do more programming in ruby are:

Chaining. For example you can do a_string.gsub("some", "substitution").strip.reverse and other such things all in one line and still keep readability high. Obviously there's a point where it's too much, but that's your judgement.
Your code formatting is a bit out. In a lot of places you seem to be using extra spaces. For example you class definition is class BlankLineRemover and should really be class BlankLineRemover. You might be using soft-tabs here, and if that's the case - please don't.
You spacing is also very generous. Unless you're separating code into particular chunks then you can skip a few newlines.
This is a real nit-pick, but generally using .size is considered better than .length.

Rewrite

Here's your main class rewritten the way I would have gone about it.

class BlankLineRemover
 def run(in_filename, out_filename)
 File.open(out_filename, "w") do |out_file|
 out_file.write clean(File.read(in_filename))
 end
 end
 def clean(input)
 input.gsub(/^[\s]*$\n/, "").strip
 end
end

Question 11

Also, you may want to consider using File.expand_path() unless the filenames are known to be full paths. That's caught me more than once.

Question 12

Good point, that too.

Question 13

Here's how I might reformulate your core business logic (warning -- untested -- you should probably consider this pseudocode):

outfile = File.new(outfilename)
File.read(infilename) do |line|
 unless line =~ /^[\s]*$\n/ # unless all spaces
 outfile.puts(line)
 end
end

I might write the class around this as the 'core' method -- note that one of the keys in a dynamic language is that it's important to keep the level of abstraction of your statements consistent throughout a code 'neighborhood'. In terms of 'Rubyisms' you could demonstrate knowledge of Ruby's mixin capabilities by writing a AbstractFileTransformer that you could extend from, while includ-ing a module BlankLineRemovingCopyTransformation -- it would basically just be refactoring the class you've written into two, and then creating a 'concrete instance' which mixed both together. (Alternatively you could do this with classes and yield.) If you wish I can attempt to provide a pseudocode example here, but at any rate, let's consider your tests for a moment.

Given the apparent simplicity of the functionality, you may certainly be forgiven for writing a barebones test case. I might suggest splitting the test case into several sub-tests which together describe the system. Note there's lots of different kinds of tests you can write besides ones that just demonstrate 'how it works' -- you can also write tests about integrating different pieces of the system together (as opposed to testing each in isolation); you can write tests that involve intentionally giving the system bad input and guaranteeing that it fails, etc. You could have a test which tries to invoke your script as a command-line tool and the validates the behavior is as expected. It's worth trying to write a lot of different kinds of tests for an application.

james_schorrjames_schorr · Accepted Answer · 2011-09-22 04:00:21Z

2

\$\begingroup\$

I would also remove the following: \r\n

Also, this part could be simplified a tad:

if stringToReturn.length >= 1
 return stringToReturn
else
 return ""
end

to:

 stringToReturn.length >= 1 ? stringToReturn : ""

Once you have the \r\n removed, don't forget to test for that. Lastly (just a style issue), I wouldn't use camelcase variables in Ruby as it's not commonly done (that I've seen), so stringToremove could become string_to_remove for instance.

Share

answered Sep 22, 2011 at 4:00

james_schorrjames_schorr

\$\endgroup\$

6

\$\begingroup\$ @ James - thx for yr feedback. but which "following: \r\n" did u mean? can you specify? \$\endgroup\$

Jonathan Smith
– Jonathan Smith

2011年09月22日 04:29:06 +00:00
Commented Sep 22, 2011 at 4:29
\$\begingroup\$ He means the newline \r\n, check this: en.wikipedia.org/wiki/Newline and see how it can vary depending on the OS \$\endgroup\$

derp
– derp

2011年09月22日 04:36:27 +00:00
Commented Sep 22, 2011 at 4:36
\$\begingroup\$ got it. thx for the clarification. ( i guess you guys know each other:) ). besides the string length check and newline, any feedbacks at other aspects like error-handling, object-oriented, testing case, file handling, etc? \$\endgroup\$

Jonathan Smith
– Jonathan Smith

2011年09月22日 05:13:49 +00:00
Commented Sep 22, 2011 at 5:13
\$\begingroup\$ @ Derp/James - I guess I have the completionist mind-set. so forgive me if i am getting overly detail-oriented here. :) \$\endgroup\$

Jonathan Smith
– Jonathan Smith

2011年09月22日 05:17:06 +00:00
Commented Sep 22, 2011 at 5:17
\$\begingroup\$ I added a little more to my answer. \$\endgroup\$

james_schorr
– james_schorr

2011年09月22日 13:02:33 +00:00
Commented Sep 22, 2011 at 13:02

| Show 1 more comment

Stack Exchange Network

Regular Expression to remove blank lines

3 Answers 3

Variables

Functions

Rubyness

Rewrite

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Regular Expression to remove blank lines

3 Answers 3

Variables

Functions

Rubyness

Rewrite

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions