I've refactored my code such that every function has only one responsibility. Now I'd like like to work on making it DRY.
I'm noticing a lot of duplication in the DocumentAnalyzer
class. In nearly every function, I make a reference to @document
.
Should my one-statement functions be collapsed into one line?
require 'set'
class Document
attr_reader :words
def initialize(words)
@words = words
end
end
class DocumentAnalyzer
def initialize(document)
@document = document
end
def unique_words
@document.words.to_set
end
def unique_words_count
unique_words.count
end
def word_count
@document.words.size
end
def total_characters
total = 0.0
@document.words.each { |word| total += word.size }
total
end
def average_word_length
total_characters / word_count
end
end
d = Document.new %w(here are some words)
da = DocumentAnalyzer.new(d)
1 Answer 1
To my eyes, there should really only be one class: Document
.
As you noticed, the methods in DocumentAnalyzer
all use @document
. The analyzer class is so tied up in Document
(and its methods are so simple) that it probably shouldn't be a wholly separate class.
But if there's a reason for it to be separate class (I can't think of any), then there's nothing inherently wrong in referencing the same variable in several places.
DRY does not mean "Don't ever repeat anything, ever!". It means "Don't duplicate logic". Accessing a variable isn't in itself logic; logic is what you do with it.
So if you want to always upcase all the words (or something), you should make a method to do that for you, instead of repeating that bit of logic everywhere you need it. But even so, you'd still have to call your upcased_words
method in all those places, so it's not like you can avoid any and all repetition.
And no, don't collapse your methods. A single line is still a method body. Collapsing code is practically only used when you make a completely empty method or class on purpose, and want to really highlight that it's intended to be empty. E.g. def no_op; end
.
You'll also see it in code that defines custom exception classes. The classes just inherit from an existing exception class, but they neither add or override anything; the point is just to define them as distinct classes. Hence, they're usually defined with class CustomError < ExistingError; end
Other things I noticed:
Fix your indentation. Sometimes it's 2 spaces, sometimes it's 4. Stick to 2.
Your
Document
class could - in its current form - be replaced by a plain array, really. It doesn't really do anything as it is right now. I'd expect a class namedDocument
to (for instance) have methods to load/read a file or something. Instead, you give it an array, and it stores that array. But if you already have that array, why bother?You don't need
Set
. You can just dowords.uniq
Your
total_characters
method is odd. Why doestotal
start out as a float? It'll always be an integer (unless you've discovered fractional characters)Speaking of, the summing up can be done with
inject
(akareduce
)def total_characters @document.words.inject(0) { |sum, word| sum += word.length } end
Also, the method should probably be called
character_count
to stick to the convention established byword_count
andunique_word_count
. Neither of those is calledtotal_...
You should probably memoize your methods. That is, store the return value after calculating it the first time, so it doesn't have to be calculated again later:
def total_characters @total_characters ||= @document.words.inject(0) { |sum, word| sum += word.length } end
This code will return the instance variable immediately unless it's
nil
orfalse
. Otherwise, it'll actually evaluate the right-hand expression (i.e. sum up the character count), assign the result to the instance variable, and then return the value.If you want to memoize return values, you should also
freeze
thewords
array. Otherwise you can do something likesome_document.words.fill("foobar")
after which your previously computed values stop making sense.I might just skip the
*_count
methods, since they're not even shortcuts. Anyone can callsome_document.words.count
orsome_document.unique_words.count
. Heck, you could even skipunique_words
since it's the same aswords.uniq
(which is even shorter!).
Basically, if the class itself doesn't rely on a method, and the logic is trivial, there's no reason to add that method (i.e.unique_words
isn't used by the class, and is very trivial).
Edit As jwg points out in the comments, memoization might be a good reason to add a method "ahead of time". Still, if the class itself doesn't have an obvious use for the method, think twice before adding it.
Here's what I might write
class Document
attr_reader :words
def initialize(words)
@words = words.dup.freeze
end
def characters
@characters ||= words.flat_map { |word| word.chars.to_a }.freeze
end
def average_word_length
return 0 if words.empty? # avoid division by zero
characters.count / words.count.to_f
end
end
Again, I don't think the *_count
methods add anything, since you can just write .count
anyway:
doc = Document.new(%w(here are some words))
doc.words # => ["here", "are", "some", "words"]
doc.characters # => ["h", "e", "r", "e", "a", "r", "e", "s", "o", "m", "e", "w", "o", "r", "d", "s"]
doc.average_word_length # => 4.0
doc.words.count # => 4
doc.characters.count # => 16
doc.words.uniq # => ["here", "are", "some", "words"]
doc.words.uniq.count # => 4
As you can see, I've also left out the unique_words
method, because it's so simple. However, if the words
array is very long, it might be more efficient to provide a memoized method for people to use. I.e.
def unique_words
@unique_words ||= words.uniq
end
-
\$\begingroup\$ Don't you think the possibilty of memoizing later (when it can be judged whether the optimization is worthwhile) makes it worth writing the method rather than the direct call now? \$\endgroup\$jwg– jwg2014年06月29日 07:38:41 +00:00Commented Jun 29, 2014 at 7:38
-
1\$\begingroup\$ @jwg Indeed, that'd be a good reason to add the methods right away. But the
*_count
methods are so simple that memoization won't make much of a difference, so it's mostlyunique_words
that would benefit. I kept it apart for simplicity's sake, but your point is valid. \$\endgroup\$Flambino– Flambino2014年06月29日 11:01:12 +00:00Commented Jun 29, 2014 at 11:01 -
\$\begingroup\$ Thanks for the review, @Flambino! The reason I created a
DocumentAnalyzer
class is becauseDocument
s don't actually have unique words, and don't inherently provide word counts and average word lengths. In real life, I would be asking theDocumentAnalyzer
those questions. Thanks for shedding light on the fact that DRY doesn't mean to not repeat myself ever, but not to repeat logic. I'm a little lost on memoization and||=
. This is my first time seeing and using#uniq
and#freeze
. Cool stuff. You've also helped me to become more mindful of how I name functions. \$\endgroup\$Dylan Richards– Dylan Richards2014年06月29日 16:19:15 +00:00Commented Jun 29, 2014 at 16:19 -
1\$\begingroup\$ @DylanRichards Makes sense, but in real life, documents aren't just an array of words either. So if we're going to represent our "document" like that, we might as well give it all the other things too. As for
@x ||= y
you can read it as "if @x is non-nil/non-false, then set it to y. Otherwise, leave it alone". So the first time you call it,@x
is nil, and thus it gets set to whatever's on the right-hand side. But the next time,@x
already has a value, so it doesn't have to be set again. \$\endgroup\$Flambino– Flambino2014年06月29日 16:31:01 +00:00Commented Jun 29, 2014 at 16:31