7
\$\begingroup\$

I've refactored my code such that every function has only one responsibility. Now I'd like like to work on making it DRY.

I'm noticing a lot of duplication in the DocumentAnalyzer class. In nearly every function, I make a reference to @document.

Should my one-statement functions be collapsed into one line?

require 'set'
class Document
 attr_reader :words
 def initialize(words)
 @words = words
 end
end
class DocumentAnalyzer
 def initialize(document)
 @document = document
 end
 def unique_words
 @document.words.to_set
 end
 def unique_words_count
 unique_words.count
 end
 def word_count
 @document.words.size
 end
 def total_characters
 total = 0.0
 @document.words.each { |word| total += word.size }
 total 
 end
 def average_word_length
 total_characters / word_count
 end
end
d = Document.new %w(here are some words)
da = DocumentAnalyzer.new(d)
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jun 28, 2014 at 23:45
\$\endgroup\$

1 Answer 1

10
\$\begingroup\$

To my eyes, there should really only be one class: Document.

As you noticed, the methods in DocumentAnalyzer all use @document. The analyzer class is so tied up in Document (and its methods are so simple) that it probably shouldn't be a wholly separate class.

But if there's a reason for it to be separate class (I can't think of any), then there's nothing inherently wrong in referencing the same variable in several places.

DRY does not mean "Don't ever repeat anything, ever!". It means "Don't duplicate logic". Accessing a variable isn't in itself logic; logic is what you do with it.

So if you want to always upcase all the words (or something), you should make a method to do that for you, instead of repeating that bit of logic everywhere you need it. But even so, you'd still have to call your upcased_words method in all those places, so it's not like you can avoid any and all repetition.

And no, don't collapse your methods. A single line is still a method body. Collapsing code is practically only used when you make a completely empty method or class on purpose, and want to really highlight that it's intended to be empty. E.g. def no_op; end.
You'll also see it in code that defines custom exception classes. The classes just inherit from an existing exception class, but they neither add or override anything; the point is just to define them as distinct classes. Hence, they're usually defined with class CustomError < ExistingError; end

Other things I noticed:

  • Fix your indentation. Sometimes it's 2 spaces, sometimes it's 4. Stick to 2.

  • Your Document class could - in its current form - be replaced by a plain array, really. It doesn't really do anything as it is right now. I'd expect a class named Document to (for instance) have methods to load/read a file or something. Instead, you give it an array, and it stores that array. But if you already have that array, why bother?

  • You don't need Set. You can just do words.uniq

  • Your total_characters method is odd. Why does total start out as a float? It'll always be an integer (unless you've discovered fractional characters)

  • Speaking of, the summing up can be done with inject (aka reduce)

    def total_characters
     @document.words.inject(0) { |sum, word| sum += word.length }
    end
    
  • Also, the method should probably be called character_count to stick to the convention established by word_count and unique_word_count. Neither of those is called total_...

  • You should probably memoize your methods. That is, store the return value after calculating it the first time, so it doesn't have to be calculated again later:

    def total_characters
     @total_characters ||= @document.words.inject(0) { |sum, word| sum += word.length }
    end
    

    This code will return the instance variable immediately unless it's nil or false. Otherwise, it'll actually evaluate the right-hand expression (i.e. sum up the character count), assign the result to the instance variable, and then return the value.

  • If you want to memoize return values, you should also freeze the words array. Otherwise you can do something like some_document.words.fill("foobar") after which your previously computed values stop making sense.

  • I might just skip the *_count methods, since they're not even shortcuts. Anyone can call some_document.words.count or some_document.unique_words.count. Heck, you could even skip unique_words since it's the same as words.uniq (which is even shorter!).
    Basically, if the class itself doesn't rely on a method, and the logic is trivial, there's no reason to add that method (i.e. unique_words isn't used by the class, and is very trivial).
    Edit As jwg points out in the comments, memoization might be a good reason to add a method "ahead of time". Still, if the class itself doesn't have an obvious use for the method, think twice before adding it.

Here's what I might write

class Document
 attr_reader :words
 def initialize(words)
 @words = words.dup.freeze
 end
 def characters
 @characters ||= words.flat_map { |word| word.chars.to_a }.freeze
 end
 def average_word_length
 return 0 if words.empty? # avoid division by zero
 characters.count / words.count.to_f
 end
end

Again, I don't think the *_count methods add anything, since you can just write .count anyway:

doc = Document.new(%w(here are some words))
doc.words # => ["here", "are", "some", "words"]
doc.characters # => ["h", "e", "r", "e", "a", "r", "e", "s", "o", "m", "e", "w", "o", "r", "d", "s"]
doc.average_word_length # => 4.0
doc.words.count # => 4
doc.characters.count # => 16
doc.words.uniq # => ["here", "are", "some", "words"]
doc.words.uniq.count # => 4

As you can see, I've also left out the unique_words method, because it's so simple. However, if the words array is very long, it might be more efficient to provide a memoized method for people to use. I.e.

def unique_words
 @unique_words ||= words.uniq
end
answered Jun 29, 2014 at 1:16
\$\endgroup\$
4
  • \$\begingroup\$ Don't you think the possibilty of memoizing later (when it can be judged whether the optimization is worthwhile) makes it worth writing the method rather than the direct call now? \$\endgroup\$ Commented Jun 29, 2014 at 7:38
  • 1
    \$\begingroup\$ @jwg Indeed, that'd be a good reason to add the methods right away. But the *_count methods are so simple that memoization won't make much of a difference, so it's mostly unique_words that would benefit. I kept it apart for simplicity's sake, but your point is valid. \$\endgroup\$ Commented Jun 29, 2014 at 11:01
  • \$\begingroup\$ Thanks for the review, @Flambino! The reason I created a DocumentAnalyzer class is because Documents don't actually have unique words, and don't inherently provide word counts and average word lengths. In real life, I would be asking the DocumentAnalyzer those questions. Thanks for shedding light on the fact that DRY doesn't mean to not repeat myself ever, but not to repeat logic. I'm a little lost on memoization and ||=. This is my first time seeing and using #uniq and #freeze. Cool stuff. You've also helped me to become more mindful of how I name functions. \$\endgroup\$ Commented Jun 29, 2014 at 16:19
  • 1
    \$\begingroup\$ @DylanRichards Makes sense, but in real life, documents aren't just an array of words either. So if we're going to represent our "document" like that, we might as well give it all the other things too. As for @x ||= y you can read it as "if @x is non-nil/non-false, then set it to y. Otherwise, leave it alone". So the first time you call it, @x is nil, and thus it gets set to whatever's on the right-hand side. But the next time, @x already has a value, so it doesn't have to be set again. \$\endgroup\$ Commented Jun 29, 2014 at 16:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.