5
\$\begingroup\$

I put together this Ruby script (1.9.3) to load data into MySQL from many .xlsx files that contain data entered by people. These files typically have many illegal characters and it has been difficult to handle the different errors. I have tried multiple rescue and character escaping approaches but now this method is not good and I have several more in the same class that are similar but load from different formats.

I would love some code review to help make this more functional. This is a standalone Ruby script without Rails or activerecord. I am using the logger, rubyXL, and MySQL2 gems.

 def paid_adv_load_ppc
 begin
 Dir["#{@paid_adv_work_path}*.xlsx"].each do |file|
 if file =~ /(PPC)/ && file !~ /(~)/
 begin 
 @file = file
 begin
 workbook = RubyXL::Parser.parse(file)
 data = workbook.worksheets[0]
 sheet1 = data.extract_data
 begin
 sheet1.each_with_index do |row,index|
 if index > 0 && row[1] != nil
 service = row[3].to_s.gsub(/['"]/, '') 
 impressions = row[9].to_i 
 clicks = row[10].to_i
 cost = row[11].to_f
 total = row[12].to_f
 puts "index: #{index} row: #{row}"
 @db.query("INSERT INTO ppc
 (service,impressions,clicks,cost,total)
 VALUES ('#{service}',#{impressions},#{clicks},#{cost},#{total}) ")
 end
 end
 rescue Exception => e
 @log.info("There is a problem loading row for index #{index} :#{row} because of #{e.message}")
 next
 end
 rescue Exception => e
 @log.info("There is a problem loading #{@file}: #{e.message}")
 next
 end 
 end
 @log.info("Loaded file #{@file} into database")
 FileUtils.rm(@file)
 end
 end
 rescue Exception => e
 @log.warn("Unable to process the data feed: #{@file} because #{e.message}")
 end
 end
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 30, 2014 at 20:24
\$\endgroup\$

1 Answer 1

7
\$\begingroup\$

For starters, I'd break the script up into many more methods. If nothing else it'll avoid the "pyramid code" (all that nested indentation) you've got going on. But it should also make the code easier to understand, maintain and extend.

I'd also, as an overall approach, favor writing a file of INSERT statements, rather than inserting each row, one at a time. At least for testing, I'd highly recommend this. You can always read this file directly into mysql later. The point is that if, say, the first 2 rows get inserted fine, but the 3rd fails, you'll already have those first 2 row in the database, when you run the script again. And since the .xlsx files only get removed if everything works out, the same file may be imported several times - perhaps failing halfway several times - resulting in a bunch of doubled rows.

Speaking of files, I'd avoid completely removing the xlsx file once it's done. Probably better to move it or rename it, so you still have it, but it won't get read again. In general, just try to avoid destructive operations unless necessary.

Anyway, the code below is an attempt to break up your code. I haven't tried it, so don't just blindly run it or anything.

Moreover, I'm using RubyXL's get_table method here - I don't know if that's actually useful in your case. However, I imagine that the xlsx-files have a header row, since you always skip the first row. If those headers are sensical, you can use them to your advantage, and avoid the hard-coded (and opaque) column indices.

Some of the methods here have optional arguments that default to instance variables. I won't recommend that for production use; it's there for testing purposes. You can either make the arguments required, or you can remove the arguments, and rely on instance variables.

# Hash for translating xlsx columns to mysql columns
# The procs may not be necessary; get_table may handle conversion
COLUMN_TRANSLATIONS = {
 "Service" => { column: "service", proc: :to_s },
 "Impressions" => { column: "impressions", proc: :to_i },
 "Clicks" => { column: "clicks", proc: :to_i },
 "Cost" => { column: "cost", proc: :to_f },
 "Total" => { column: "total", proc: :to_f }
}.freeze
# Gets the relevant .xlsx-files
def xlsx_files(directory = @paid_adv_work_path)
 path = File.join(directory, "*(PPC)*.xlsx")
 files = Dir.glob(path)
 files.reject { |file| File.basename(file) =~ /^~/ }
end
# Use get_table to get structured data
def read_xlsx(file)
 workbook = RubyXL::Parser.parse(file)
 worksheet = workbook.worksheets[0]
 worksheet.get_table(COLUMN_TRANSLATIONS.keys)[:table]
end
# Translate into a mysql-friendly format
def translate_table(table)
 table.map do |row|
 row.map do |column, value|
 lookup = COLUMN_TRANSLATIONS[name.to_s] # get_table uses symbol keys; we want strings
 value = value.send(lookup[:proc])
 [lookup[:column], value]
 end
 end
end
# Insert a row
def insert_rows(rows, connection = @db)
 columns = COLUMN_TRANSLATIONS.map { |name, info| info[:column] }
 rows.each do |row|
 columns = row.keys.join(", ")
 values = row.values.map { |value| "'#{connection.escape(value)}'" }.join(", ")
 query = "INSERT INTO ppc (#{columns}) VALUES (#{values});"
 connection.query(query)
 end
end
# Put it together per-file
def import_xlsx_file(file, connection = @db)
 table = read_xlsx(file)
 rows = translate_table(table)
 insert_rows(rows, connection)
end
# And all together now...
def import
 xlsx_files.each do |file|
 begin
 import_xlsx_file(file)
 # Handle file removal/renaming here
 rescue => e
 @log.info "Failed to import #{file}: #{e.message}"
 end
 end
end

An advantage of having the optional arguments is that you can swap out the database connection with a stand-in object, like:

class QueryLogger
 def initialize(logger)
 @logger = logger
 end
 def query(string)
 @logger.info(string)
 end
end

And try something like import_xlsx_file(some_file, QueryLogger.new(@log)) to log the queries instead of actually running them on the server.

answered Mar 30, 2014 at 22:13
\$\endgroup\$
2
  • 1
    \$\begingroup\$ one detail: rescue Exception => exc is no good, should capture only StandardError, which is completely equivalent to rescue => exc. \$\endgroup\$ Commented Mar 31, 2014 at 18:38
  • \$\begingroup\$ @tokland Oops, you're right - amending the code. Thanks \$\endgroup\$ Commented Mar 31, 2014 at 19:04

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.