Refactoring a large case statement

3 Feb, 2019

0 views

ruby

I’m currently building a gem for converting curl commands to something else, and one early problem I needed to deal with was parsing the curl command arguments.

The basic idea for a gem is to convert curl command to a sweet looking hash that we can use to do magical things. So when given:

curl https://isrubydead.com/

Our program converts it to:

{
 "body": "",
 "method": "GET",
 "headers": {},
 "url": "https://isrubydead.com/"
}

We’ll call this hash a Request, so let’s define it:

Request = Struct.new(:body, :method, :headers, :url) do
 def initialize
 self.method = 'GET'
 self.headers = {}
 self.url = ''
 self.body = ''
 end
end

Our program should expect the curl command as a command line argument, so let’s use ARGV.first to read the curl command:

require 'shellwords'
module ParseCurl
 class << self
 def parse(command)
 self.request = Request.new
 self.parser = Parser.new(request)
 parser.parse(Shellwords.split(command.strip))
 request.to_h
 end
 private
 attr_accessor :request
 attr_accessor :parser
 end
end
ParseCurl.parse(ARGV.first)

We use Shellwords here to convert the curl command to an array of tokens, to make it easier to parse.

Now that we have a place to store the result of parsing the curl command, and we are passing the command line argument to a parser, it’s time to write an actual parser.

Being a very serious and very senior developer, I’ve ended up with this very appealing class:

class Parser
 def initialize(request)
 @request = request
 end
 def parse(words)
 self.words = words
 while (word = words.shift)
 parse_word(word)
 end
 end
 private
 attr_accessor :words
 attr_reader :request
 def parse_word(word)
 case word
 when URI::DEFAULT_PARSER.make_regexp
 request.url = word
 when /\A(-H|--header)\Z/
 parse_header
 when /\A--compressed\Z/
 request.headers['Accept-Encoding'] ||= 'deflate, gzip'
 when /\A-X|\A--request\Z/
 request.method = extract_method(word)
 when /\A(-u|--user)\Z/
 request.headers['Authorization'] ||=
 "Basic #{Base64.strict_encode64(words.shift)}"
 when /\A(-b|--cookie)\Z/
 request.headers['Set-Cookie'] ||= words.shift
 when /\A(-A|--user-agent)\Z/
 request.headers['User-Agent'] ||= words.shift
 when /\A(-I|--head)\Z/
 request.method = 'HEAD'
 when /\A(-d|--data|--data-ascii)\Z/
 parse_data
 when /\A--data-binary\Z/
 parse_data(strip_newlines: false)
 when /\A--data-raw\Z/
 parse_data(raw: true)
 when /\A--data-urlencode\Z/
 parse_data(url_encode: true)
 when /\./
 request.url = "http://#{word}" if request.url == ''
 end
 end
end

I know, a thing a beauty.

What’s also very nice is that this class also have to know everything about parsing curl arguments; from header and method arguments to various data arguments.

Super nice! Just look at these beautiful methods, that all belong to this class:

class Parser
 private
 def parse_data(**args)
 request.method = 'POST' if request.method == 'GET' ||
 request.method == 'HEAD'
 request.headers['Content-Type'] ||=
 'application/x-www-form-urlencoded'
 data = extract_data(args)
 request.body = if request.body.empty?
 data
 else
 "#{request.body}&#{data}"
 end
 end
 def parse_header
 header = extract_header(words.shift)
 return unless header
 request.headers[header.keys.first] =
 header.values.first
 end
 def extract_data(args)
 data = words.shift
 data = read_file(data, args) if data =~ /\A@/ || args[:raw]
 data
 end
 def read_file(word,
 strip_newlines: true,
 raw: false,
 url_encode: false)
 path = raw ? word : word[1..-1]
 content = File.read(path)
 content.gsub!(/\n|\r/, '') if strip_newlines
 content = CGI.escape(content) if url_encode
 content
 rescue Errno::ENOENT
 file_path = File.expand_path(data[1..-1])
 raise FileNotFoundError,
 "Cannot read file: #{file_path}. Does it exist?"
 end
 # Needed for extracting -XPUT and similar formats.
 def extract_method(word)
 return words.shift unless word =~ /^-X/
 if !(method = word.match(/^-X(.*)/)[1]).empty?
 method
 else
 words.shift
 end
 end
 def extract_header(header)
 [header.split(/: (.+)/)].to_h
 rescue StandardError
 nil
 end
end

Marvelous! This class knows absolutely everything about curl and all the arguments and options. We could ask it the curl’s shoe size, and it would give us the answer.

Now that we have thought this class so much, and it has achieved the master’s degree worth of education in the University of curl, I’m a little worried it’s going to take over the world.

Sure, it only knows about curl for now, but what happens when someone decides it should know about, I don’t know, some scary AI thing. It looks innocent right now, but then you turn around and it gets interested in doors.

I don’t trust it one bit, so let’s steal some knowledge from this class and let’s spread it to other classes.

I’ll now start explaining the refactoring step by step, but you can take a look at the final result if you’d like to skip.

This class will from now on be ignorant of curl arguments, so the humongous case statement becomes:

class Parser
 def parse_word(word)
 Word.parse(request, words, word)
 end
end

Since every scenario we have to cover in the case statement consists of the regex check and the parsing part, the idea is to abstract those into class.

Let’s introduce the Word::Base class:

module Word
 class Base
 # Used to check if the word given matches the subclass' regex.
 def self.===(word)
 word =~ regex
 end
 def self.regex
 raise NotImplementedError, "#{name} doesn't have .regex defined"
 end
 def parse(*)
 raise NotImplementedError, "#{self.class.name} doesn't have #parse defined"
 end
 end
end

The plan is to have one base class, that will be inherited for every scenario we’d like to cover. Here’s the first example of one such class:

module Word
 class Header < Base
 def self.regex
 /\A(-H|--header)\Z/
 end
 def parse(*)
 header = extract_header(words.shift)
 return unless header
 request.headers[header.keys.first] = header.values.first
 end
 private
 def extract_header(header)
 [header.split(/: (.+)/)].to_h
 rescue StandardError
 nil
 end
 end
end

This class stores information about everything that’s needed to parse a header argument. It detects the argument (when the curl command includes -H or --header in arguments) and then changes the request hash, which we return in the end.

Since this class has to know about words and request too, we’ll add that to the base class. We expect other descendants to need that too:

module Word
 class Base
 def initialize(words, request)
 @words = words
 @request = request
 end
 private
 attr_reader :words
 attr_reader :request
 end
end

The next problem we have is that there’s no connection between the original parse_word method and this new class we have created, so let’s write a method that connects those two:

module Word
 def self.parse(request, words, word)
 subclass = Word::Base.subclasses
 .detect { |klass| klass === word }
 subclass&.new(words, request)&.parse(word)
 end
 class Base
 # List of subclasses, first-inherited class first.
 def self.subclasses
 ObjectSpace.each_object(Base.singleton_class)
 .reject { |klass| klass == Base }.reverse
 end
 end
end

With this method we can find a Word::Base subclass that matches the word that was given and then parses that word. If we find the subclass, we try to parse the word. So our current code works for parsing --header or -h words from curl.

Our code now supports parsing headers, so let’s proceed to add classes for every argument we’d like to support:

module Word
 class Compressed < Base
 def self.regex
 /\A--compressed\Z/
 end
 def parse(*)
 request.headers['Accept-Encoding'] ||= 'deflate, gzip'
 end
 end
 class Request < Base
 def self.regex
 /\A-X|\A--request\Z/
 end
 def parse(word)
 request.method = extract_method(word)
 end
 private
 # Needed for extracting -XPUT and similar formats.
 def extract_method(word)
 return words.shift unless word =~ /^-X/
 if !(method = word.match(/^-X(.*)/)[1]).empty?
 method
 else
 words.shift
 end
 end
 end
 class Authorization < Base
 def self.regex
 /\A(-u|--user)\Z/
 end
 def parse(*)
 request.headers['Authorization'] ||=
 "Basic #{Base64.strict_encode64(words.shift)}"
 end
 end
 class Cookie < Base
 def self.regex
 /\A(-b|--cookie)\Z/
 end
 def parse(*)
 request.headers['Set-Cookie'] ||= words.shift
 end
 end
 class UserAgent < Base
 def self.regex
 /\A(-A|--user-agent)\Z/
 end
 def parse(*)
 request.headers['User-Agent'] ||= words.shift
 end
 end
 class Head < Base
 def self.regex
 /\A(-I|--head)\Z/
 end
 def parse(*)
 request.method = 'HEAD'
 end
 end
 class Data < Base
 def self.regex
 /\A(-d|--data|--data-ascii)\Z/
 end
 def options
 {}
 end
 def parse(*)
 request.method = 'POST' if request.method == 'GET' ||
 request.method == 'HEAD'
 request.headers['Content-Type'] ||=
 'application/x-www-form-urlencoded'
 data = extract_data(options)
 request.body = if request.body.empty?
 data
 else
 "#{request.body}&#{data}"
 end
 end
 def extract_data(args)
 data = words.shift
 data = read_file(data, args) if data =~ /\A@/ ||
 args[:raw]
 data
 end
 def read_file(word,
 strip_newlines: true,
 raw: false,
 url_encode: false)
 path = raw ? word : word[1..-1]
 content = File.read(path)
 content.gsub!(/\n|\r/, '') if strip_newlines
 content = CGI.escape(content) if url_encode
 content
 rescue Errno::ENOENT
 file_path = File.expand_path(data[1..-1])
 raise FileNotFoundError,
 "Cannot read file: #{file_path}. Does it exist?"
 end
 end
 class StripNewlinesData < Data
 def self.regex
 /\A--data-binary\Z/
 end
 def options
 { strip_newlines: false }
 end
 end
 class RawData < Data
 def self.regex
 /\A--data-raw\Z/
 end
 def options
 { raw: true }
 end
 end
 class URLEncodeData < Data
 def self.regex
 /\A--data-urlencode\Z/
 end
 def options
 { url_encode: true }
 end
 end
 class URL < Base
 def self.===(word)
 word =~ URI::DEFAULT_PARSER.make_regexp
 end
 def parse(word)
 request.url = word
 end
 end
 class NonStrictURL < Base
 def self.regex
 /\./
 end
 def parse(word)
 request.url = "http://#{word}" if request.url == ''
 end
 end
end

Here’s the final code we end up with:

module Word
 def self.parse(request, words, word)
 subclass = Word::Base.subclasses
 .detect { |klass| klass === word }
 subclass&.new(words, request)&.parse(word)
 end
 class Base
 def initialize(words, request)
 @words = words
 @request = request
 end
 # List of subclasses, first-inherited class appears first.
 def self.subclasses
 ObjectSpace.each_object(Base.singleton_class)
 .reject { |klass| klass == Base }.reverse
 end
 # Used to check if the word given matches the subclass' regex.
 def self.===(word)
 word =~ regex
 end
 def self.regex
 raise NotImplementedError,
 "#{name} doesn't have .regex defined"
 end
 def parse(*)
 raise NotImplementedError,
 "#{self.class.name} doesn't have #parse defined"
 end
 private
 attr_reader :words
 attr_reader :request
 end
 class Header < Base
 def self.regex
 /\A(-H|--header)\Z/
 end
 def parse(*)
 header = extract_header(words.shift)
 return unless header
 request.headers[header.keys.first] = header.values.first
 end
 private
 def extract_header(header)
 [header.split(/: (.+)/)].to_h
 rescue StandardError
 nil
 end
 end
 class Compressed < Base
 def self.regex
 /\A--compressed\Z/
 end
 def parse(*)
 request.headers['Accept-Encoding'] ||= 'deflate, gzip'
 end
 end
 class Request < Base
 def self.regex
 /\A-X|\A--request\Z/
 end
 def parse(word)
 request.method = extract_method(word)
 end
 private
 # Needed for extracting -XPUT and similar formats.
 def extract_method(word)
 return words.shift unless word =~ /^-X/
 if !(method = word.match(/^-X(.*)/)[1]).empty?
 method
 else
 words.shift
 end
 end
 end
 class Authorization < Base
 def self.regex
 /\A(-u|--user)\Z/
 end
 def parse(*)
 request.headers['Authorization'] ||=
 "Basic #{Base64.strict_encode64(words.shift)}"
 end
 end
 class Cookie < Base
 def self.regex
 /\A(-b|--cookie)\Z/
 end
 def parse(*)
 request.headers['Set-Cookie'] ||= words.shift
 end
 end
 class UserAgent < Base
 def self.regex
 /\A(-A|--user-agent)\Z/
 end
 def parse(*)
 request.headers['User-Agent'] ||= words.shift
 end
 end
 class Head < Base
 def self.regex
 /\A(-I|--head)\Z/
 end
 def parse(*)
 request.method = 'HEAD'
 end
 end
 class Data < Base
 def self.regex
 /\A(-d|--data|--data-ascii)\Z/
 end
 def options
 {}
 end
 def parse(*)
 request.method = 'POST' if request.method == 'GET' ||
 request.method == 'HEAD'
 request.headers['Content-Type'] ||=
 'application/x-www-form-urlencoded'
 data = extract_data(options)
 request.body = if request.body.empty?
 data
 else
 "#{request.body}&#{data}"
 end
 end
 def extract_data(args)
 data = words.shift
 data = read_file(data, args) if data =~ /\A@/ ||
 args[:raw]
 data
 end
 def read_file(word,
 strip_newlines: true,
 raw: false,
 url_encode: false)
 path = raw ? word : word[1..-1]
 content = File.read(path)
 content.gsub!(/\n|\r/, '') if strip_newlines
 content = CGI.escape(content) if url_encode
 content
 rescue Errno::ENOENT
 file_path = File.expand_path(data[1..-1])
 raise FileNotFoundError,
 "Cannot read file: #{file_path}. Does it exist?"
 end
 end
 class StripNewlinesData < Data
 def self.regex
 /\A--data-binary\Z/
 end
 def options
 { strip_newlines: false }
 end
 end
 class RawData < Data
 def self.regex
 /\A--data-raw\Z/
 end
 def options
 { raw: true }
 end
 end
 class URLEncodeData < Data
 def self.regex
 /\A--data-urlencode\Z/
 end
 def options
 { url_encode: true }
 end
 end
 class URL < Base
 def self.===(word)
 word =~ URI::DEFAULT_PARSER.make_regexp
 end
 def parse(word)
 request.url = word
 end
 end
 class NonStrictURL < Base
 def self.regex
 /\./
 end
 def parse(word)
 request.url = "http://#{word}" if request.url == ''
 end
 end
end
class Parser
 def initialize(request)
 @request = request
 end
 def parse(words)
 self.words = words
 while (word = words.shift)
 parse_word(word)
 end
 end
 private
 attr_accessor :words
 attr_reader :request
 def parse_word(word)
 Word.parse(request, words, word)
 end
end
module ParseCurl
 class << self
 def parse(command)
 self.request = Request.new
 self.parser = Parser.new(request)
 parser.parse(Shellwords.split(command.strip))
 request.to_h
 end
 private
 attr_accessor :request
 attr_accessor :parser
 end
end
ParseCurl.parse(ARGV.first)

Note how each class contains knowledge specific to the argument we’re trying to cover with it and nothing else. I think that’s an improvement since it’s still easy to support new arguments, and we don’t pollute a single class with many methods.