3
\$\begingroup\$

I need to convert some CSV data into a format recognizable by some old script my work uses. Changing the format is pretty much out of the question. So I've written this:

$rate_plan_terms = [
 "pp",
 "01",
 "12",
 "24",
 "36"
]
def generate(fname)
 puts "generating doc for #{fname}"
 file = File.open(fname)
 standard_offer_hash = {}
 features_hash = {}
 file.readlines[1..-1].each do |row|
 # standard_offers
 _region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.split(",")
#used? no yes no yes kinda yes yes yes
 standard_offer_hash[product_cid] ||= {} 
 standard_offer_hash[product_cid][:offerType] = product_name # just keep it as product_cid?
 standard_offer_hash[product_cid][:ratePlan] ||= []
 $rate_plan_terms.each do |term|
 rate_plan = {
 :ratePlanName => offer_name,
 :ratePlanCode => offer_cid,
 :ratePlanTerm => term,
 :offerEffectiveDate => "",
 :ratePlanMRC => ""
 }
 standard_offer_hash[product_cid][:ratePlan] << rate_plan unless standard_offer_hash[product_cid][:ratePlan].include? rate_plan # they will be repeated because of features
 end
 # features
 features_hash[component_cid] ||= {
 :featureName => component_name,
 :featureCode => component_cid,
 :featureType => "FEATURE",
 :featureTerm => "",
 :featureEffectiveDate => "",
 :featureMRC => ""
 }
 end
 feature_arr = features_hash.map {|k,f| f}
 standard_offer_arr = standard_offer_hash.map {|k,o| o}
 product_arr = []
 pricelist = {
 :features => { :feature => feature_arr }
 :ChannelOrgType => "OCM",
 :Region => "ALL",
 :product => product_arr,
 :businessRelationship => "",
 :standardOffer => standard_offer_arr,
 }
 {:PriceList => pricelist}
end

Yeah, complicated. This is running on up to a million lines per file, usually two files, every day. So it needs to be faster than it currently is.

FYI, the CSV is loosely separated into many products belong to many offers, and products have many features, but two different products can have the same features. So there is a lot of repeating.

There seem to be two time sinks here. One is the Array#include? I'm running over and over again. I wanted to transform that into a build hash -> convert to array later approach I took with features_arr and standard_offers_arr to avoid doing searches. But it depends on the product_cid, which (could) change per row.

The second is those two hashes to arrays at the end. The standard_offer_hash is going to be huge. That's can't be very fast.

To be completely honest I just don't see a way to optimize this further. Any suggestions?

I can show some output for what the generated structure needs to look like, if it would help.

asked Oct 27, 2014 at 16:27
\$\endgroup\$
2
  • \$\begingroup\$ Is the idea to write the output to a file? Or just to have it in memory? \$\endgroup\$ Commented Oct 27, 2014 at 17:23
  • \$\begingroup\$ It needs to remain in memory to get passed into another function, which does some magical back-end stuff and then prints it to stdout \$\endgroup\$ Commented Oct 27, 2014 at 17:24

1 Answer 1

3
\$\begingroup\$
  • You open the file, but you don't explicitly close it. Bad karma.

  • You read all lines into memory, it seems, rather than go through them one by one. Memory-wise that's pretty inefficient.

  • Any reason for $rate_plan_terms to be global? That too seems like bad karma. A constant would make more sense, I think.

  • Have you considered using Ruby's bundled CSV parser? It'll be more robust.

  • It'd be easier to just use Hash#values rather than map { |k, v| v }

As for the use of include?, you skip that by making :rate_plan a hash, and use a key like "#{offer_name}#{offer_cid}". Simpler to explain in code, so see below.

require "csv"
def generate(fname)
 puts "generating doc for #{fname}"
 terms = %w{pp 01 12 24 36}.freeze # just made this local here, a const would still be better
 offers = {}
 features = {}
 CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
 # If the CSV file has a proper header line, you can also access cells by their name
 _region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields
 offers[product_cid] ||= {
 offerType: product_name
 ratePlan: {}
 }
 key = "#{offer_name}#{offer_cid}"
 offers[product_cid][:ratePlan][key] ||= terms.map do |term|
 {
 ratePlanName: offer_name,
 ratePlanCode: offer_cid,
 ratePlanTerm: term,
 offerEffectiveDate: "",
 ratePlanMRC: ""
 }
 end
 features[component_cid] ||= {
 featureName: component_name,
 featureCode: component_cid,
 featureType: "FEATURE",
 featureTerm: "",
 featureEffectiveDate: "",
 featureMRC: ""
 }
 end
 offsers.each do |key, offer|
 offer[:ratePlan] = offer[:ratePlan].values
 end
 {
 PriceList: {
 features: { feature: features.values }
 ChannelOrgType: "OCM",
 Region: "ALL",
 product: [],
 businessRelationship: "",
 standardOffer: offers.values
 }
 }
end

Still not pretty (it's a tad over the 5-10 line limit you might strive for for methods in Ruby).

It might be nice to break things into methods like feature_hash(name, cid), which would generate the separate hashes.

require "csv"
TERMS = %w{pp 01 12 24 36}.freeze
def generate(fname)
 puts "generating doc for #{fname}"
 offers = {}
 features = {}
 CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
 # If the CSV file has a proper header line, you can also access cells by their name
 _region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields
 offers[product_cid] ||= {
 offerType: product_name
 ratePlan: {}
 }
 key = "#{offer_name}#{offer_cid}"
 offers[product_cid][:ratePlan][key] ||= rate_plan_array(offer_name, offer_cid)
 features[component_cid] ||= feature_hash(component_name, component_cid)
 end
 offsers.each do |key, offer|
 offer[:ratePlan] = offer[:ratePlan].values
 end
 {
 PriceList: {
 features: { feature: features.values }
 ChannelOrgType: "OCM",
 Region: "ALL",
 product: [],
 businessRelationship: "",
 standardOffer: offers.values
 }
 }
end
def rate_plan_array(name, cid)
 TERMS.map do |term|
 {
 ratePlanName: offer_name,
 ratePlanCode: offer_cid,
 ratePlanTerm: term,
 offerEffectiveDate: "",
 ratePlanMRC: ""
 }
 end
end
def feature_hash(name, cid)
 {
 featureName: name,
 featureCode: cid,
 featureType: "FEATURE",
 featureTerm: "",
 featureEffectiveDate: "",
 featureMRC: ""
 }
end

Either way you cut it, though, it probably won't win any beauty contests.

answered Oct 27, 2014 at 20:41
\$\endgroup\$
2
  • \$\begingroup\$ This has a lot of great suggestions. Your solution to cycle through offers to fix the :rateplan class is great for what I was looking for. \$\endgroup\$ Commented Oct 28, 2014 at 14:38
  • \$\begingroup\$ terms being global was an artifact of old code, but even then it was inappropriate. \$\endgroup\$ Commented Oct 28, 2014 at 14:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.