This is a simple Ruby program to analyze the number of lines characters speak in Shakespeare's Macbeth using Nokogiri and open-uri to parse a given url containing xml. I wrote this as practice getting used to Ruby and TDD.
I'm new to Ruby so I'm wondering if anything looks strange to more experienced Rubyists.
Note: I had trouble comparing hashes in the spec "correctly analyzes character line counts"
. Even after calling to_s
on both, it looks equal but the test fails. I had trouble with that. I'm wondering if this is even a good test to make with large hashes?
Also: I was wondering if I should be testing the calls using open-uri open
? I didn't want to download from the webpage in the tests. I'd appreciate it if someone might demonstrate how to do this. I tried using a Mock with something like expect(Kernel).to receive(:open).with(MACBETH_URL)
but had trouble with it.
Given:
Write a command-line program that prints the number of lines spoken by each character in the play.
Sample usage/output (using made-up numbers):
$ ruby macbeth_analyzer.rb 543 Macbeth 345 Banquo 220 Duncan (etc.)
You can find an XML-encoded version of Macbeth here: http://www.ibiblio.org/xml/examples/shakespeare/macbeth.xml. Your program should download and parse this file at runtime.
Your solution must be tested, preferably via TDD. Running your tests should not download the play from the ibiblio.org server.
Note: some lines are attributed to a speaker called "ALL". Your program should ignore these.
macbeth_analyzer.rb
#!/usr/bin/env ruby
require_relative 'parser'
require_relative 'printer'
class MacbethAnalyzer
attr_reader :speaker_line_counts
MACBETH_URL = "http://www.ibiblio.org/xml/examples/shakespeare/macbeth.xml"
IGNORE_SPEAKERS = %w(ALL)
def initialize
@parser = Parser.new(MACBETH_URL)
@speaker_line_counts = analyze_speaker_line_counts
end
def analyze_speaker_line_counts
speaker_line_counts = Hash.new(0)
speakers = @parser.parse(query: '//SPEAKER')
speakers.map do |speaker|
unless IGNORE_SPEAKERS.include? speaker.text
speaker_line_counts[speaker.text] += @parser.parse(
node: speaker, query: '../LINE').count
end
end
speaker_line_counts
end
end
if __FILE__ == 0ドル
analyzer = MacbethAnalyzer.new
Printer.print_hash_val_capitalized_key(
hash: analyzer.speaker_line_counts, filter: 'ALL')
end
macbeth_analyzer_spec.rb
require 'macbeth_analyzer'
describe MacbethAnalyzer do
describe "#analyze_speaker_line_counts" do
it "analyzes speaker line counts and stores in hash" do
SPEAKER_HASH_COUNT = 40
analyzer = MacbethAnalyzer.new
result = analyzer.analyze_speaker_line_counts
expect(result.count).to eq(SPEAKER_HASH_COUNT)
expect(result["MACBETH"]).to eq(719)
expect(result["BANQUO"]).to eq(113)
expect(result["DUNCAN"]).to eq(70)
end
it "should ignore speaker title 'ALL'" do
analyzer = MacbethAnalyzer.new
result = analyzer.analyze_speaker_line_counts
expect(result["ALL"]).to eq(0)
end
xit "correctly analyzes character line counts" do
macbeth_line_counts = {
"First Witch": 62,
"Second Witch": 27,
"Third Witch": 27,
"DUNCAN": 70,
"MALCOLM": 212,
"Sergeant": 35,
"LENNOX": 74,
"ROSS": 135,
"MACBETH": 719,
"BANQUO": 113,
"ANGUS": 21,
"LADY MACBETH": 265,
"Messenger": 23,
"FLEANCE": 2,
"Porter": 46,
"MACDUFF": 180,
"DONALBAIN": 10,
"Old Man": 11,
"ATTENDANT": 1,
"First Murderer": 30,
"Second Murderer": 15,
"Both Murderers": 2,
"Servant": 5,
"Third Murderer": 8,
"Lords": 3,
"HECATE": 39,
"Lord": 21,
"First Apparition": 2,
"Second Apparition": 4,
"Third Apparition": 5,
"LADY MACDUFF": 41,
"Son": 20,
"Doctor": 45,
"Gentlewoman": 23,
"MENTEITH": 12,
"CAITHNESS": 11,
"SEYTON": 5,
"SIWARD": 30,
"Soldiers": 1,
"YOUNG SIWARD": 7
}
macbeth = MacbethAnalyzer.new
result = macbeth.analyze_speaker_line_counts
# TODO: how to make this pass? tried to_s both
expect(result).to eq(macbeth_line_counts)
end
end
end
parser.rb
require 'open-uri'
require 'nokogiri'
class Parser
def initialize(url)
@doc = Nokogiri::XML(open(url))
end
def parse(node: @doc, query:)
node.xpath(query)
end
end
parser_spec.rb
require 'parser'
describe Parser do
describe "#parse" do
it "parses speaker nodes" do
SPEAKER_COUNT = 650
parser = Parser.new('spec/fixtures/macbeth.xml')
result = parser.parse(query: '//SPEAKER')
expect(result.count).to eq(SPEAKER_COUNT)
end
end
end
printer.rb
class Printer
def self.print_hash_val_capitalized_key(hash:, filter: nil)
hash.each do |key, value|
puts "#{value} #{key.capitalize}" unless key.to_s === filter
end
end
end
printer_spec.rb
require 'printer'
describe Printer do
describe "#print_hash_val_capitalized_key" do
it "prints a hash of name keys capitalized and count values" do
test_hash = {
"LENNOX": 74,
"ROSS": 135,
"MACBETH": 719,
}
expect{Printer.print_hash_val_capitalized_key(hash: test_hash)}.to output(
"74 Lennox\n135 Ross\n719 Macbeth\n").to_stdout
end
it "excludes a given filter from output" do
test_hash = {
"LENNOX": 74,
"ALL": 24,
"ROSS": 135,
"MACBETH": 719,
}
expect{Printer.print_hash_val_capitalized_key(hash: test_hash, filter: "ALL")}.to output(
"74 Lennox\n135 Ross\n719 Macbeth\n").to_stdout
end
end
end
1 Answer 1
Naming
Choose Names at the Appropriate Level of Abstraction
I noticed an issue that reminded me of something I read in Clean Code, which says:
"Don’t pick names that communicate implementation; choose names the reflect the level of abstraction of the class or function you are working in." (Martin, 311)
Martin gives an example of a Modem interface that contains:
boolean dial(String phoneNumber);
He points out that in some applications, modems aren't connected by dailing but rather hardwiring or sending a port number, etc. The point is that using phoneNumber
is the wrong level of abstraction. He suggests changing to:
boolean connect(String connectionLocator);
This better communicates the ways a consumer could use the interface.
The relation to my application is in Parser:
class Parser
def initialize(url)
@doc = Nokogiri::XML(open(url))
end
...
open-uri
's open()
takes a uri (who would've thought?). A uri is a level of abstraction above a url. In the tests, I initialize a Parser
object with a local xml fixture uri (not a url):
parser = Parser.new('spec/fixtures/macbeth.xml')
Therefore, Parser should become:
class Parser
def initialize(uri)
@doc = Nokogiri::XML(open(uri))
end
...
So this may seem rather trivial in the context of my small application but it does reflect a valuable lesson from Clean Code in choosing names at the appropriate level of abstraction.
Make Meaningful Distinctions
When I started working on this problem, early on I decided what the responsibilities would be. I reasoned that we'd need a concrete runner class MacbethAnalyzer that's responsible for handling the specifics of the problem statement, a Parser, who's responsible for parsing, and a Printer to handle print.
Having completed a working implementation, I notice that Parser
is only responsible for parsing Xml, which has a clear dependency in the constructor:
class Parser
def initialize(uri)
@doc = Nokogiri::XML(open(uri))
end
The given problem is small, but it's easy to imagine a larger project where we parse from several sources of varying formats beyond XML. It seems clear that class Parser
isn't specific enough and should be more meaningfully distinguished by renaming to XmlParser
.
Dependency Inversion Principle
The constructor of MacbethAnalzyer
has tight coupling in the dependency to the low-level implementation details of XmlParser
:
class MacbethAnalyzer
@parser = XmlParser.new(MACBETH_URL)
@speaker_line_counts = analyze_speaker_line_counts
end
What if XmlParser
changes? What if we wanted to use a different kind of Parser? What if we wanted to construct a Fake object for testing and pass that to MacbethAnalyzer
? These are issues related to the Dependency Inversion Principle.
The solution here is to use "dependency injection" to inject Parser
into MacbethAnalyzer
:
class MacbethAnalyzer
attr_reader :speaker_line_counts
MACBETH_URL = "http://www.ibiblio.org/xml/examples/shakespeare/macbeth.xml"
IGNORE_SPEAKERS = %w(ALL)
def initialize(parser)
@parser = parser
@speaker_line_counts = analyze_speaker_line_counts
end
def analyze_speaker_line_counts
speaker_line_counts = Hash.new(0)
speakers = parser.parse(query: '//SPEAKER')
speakers.map do |speaker|
unless IGNORE_SPEAKERS.include? speaker.text
speaker_line_counts[speaker.text] += parser.parse(
node: speaker, query: '../LINE').count
end
end
speaker_line_counts
end
private
attr_reader :parser
end
if __FILE__ == 0ドル
analyzer = MacbethAnalyzer.new(XmlParser.new(MACBETH_URL))
Printer.print_hash_val_capitalized_key(
hash: analyzer.speaker_line_counts, filter: 'ALL')
end
Now MacbethAnalyzer
doesn't depend on a lower-level implementation; we use duck typing to make things flexible, removing the dependency and resulting in much more loosely coupled code.
Explore related questions
See similar questions with these tags.