The code is working fine however, I would like to fix code after unless and can I store thread ID in the database too? Is there a way to improve performance?
require 'csv'
class CsvDb
class << self
def convert_save(model_name, csv_data, field_name=nil)
target_model = model_name.classify.constantize
csv_file = csv_data.read
row_headers={}
counter=0;
#Thread.new do
CSV.parse(csv_file) do |row|
if counter==0
temp=row
row_headers = Hash[temp.map.with_index.to_a]
counter +=1
next
end
unless row[row_headers["name"]].nil?
temp={}
business_type = row[row_headers["business_type_id"]]
business_type_id = business_type=='Online' ? 1: business_type=='Local' ? 2: 3
temp[:business_type_id] = business_type_id
temp[:user_id] = row[row_headers["user_id"]]
temp[:name] = row[row_headers["name"]]
temp[:country_id] = row[row_headers["country_id"]]
temp[:homepage] = row[row_headers["homepage"]] ||=""
temp[:telephone] = row[row_headers["telephone"]] ||=""
temp[:email] = row[row_headers["email"]] ||=""
temp[:address] = row[row_headers["address"]] ||=""
temp[:latitude] = row[row_headers["latitude"]]
temp[:longitude] = row[row_headers["longitude"]]
temp[:facebook] = row[row_headers["facebook"]] ||=""
temp[:twitter] = row[row_headers["twitter"]] ||=""
temp[:google] = row[row_headers["google"]] ||=""
temp[:instagram] = row[row_headers["instagram"]] ||=""
temp[:pinterest] = row[row_headers["pinterest"]] ||=""
temp[:free_shipping] = row[row_headers["free_shipping"]]
temp[:ship_details] = row[row_headers["ship_details"]] ||=""
temp[:category_ids] = [row[row_headers["category_ids"]]]
temp[:style_ids] = [row[row_headers["style_ids"]]]
temp[:shipping_country_ids] = [row[row_headers["shipping_country_ids"]]]
business = target_model.new(temp)
business.save
end
end
ActiveRecord::Base.connection.close
end
#end
end
end
2 Answers 2
The following code is not tested, maybe some minor changes will be necessary: I'm not 100% sure it will work as is, don't hesitate to tell me if there is anything wrong with the code
Refactoring the existing code
The first thing I did was to set the headers
option to true
it will enable the parsing of the first line of the CSV and instead of giving you an array for each row it will return an hashmap with keys extracted from the header. It replace the code you wrote to parse the header of the file.
Then I replace the imbricated ternary operations to use a case when
expression for the business_type_id
which is far more readable.
Next I made two small function to extract all the needed values, I use reduce
to automatically fill a hashmap from an array of symboles.
At least I merge all the hashmap in the constructor of the target_model
before saving.
CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
business_type_id = case row[:business_type_id]
when 'Online' then 1
when 'Local' then 2
else 3
end
target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
.save()
end
def extract_required_fields(row)
[:user_id, :name, :country_id, :free_shipping, :category_ids, :style_ids, :shipping_country_ids]
.reduce({}) do |carry, item|
carry[item] = row[item]
end
end
def extract_optionals_fiels(row)
[:homepage, :telephone, :email, :address, :facebook, :twitter, :google, :instagram, :pinterest, :ship_details]
.reduce({}) do |carry, item|
carry[item] = row[item] ||= ''
end
end
Bulk update
But this code, even if it has a better separation of concern and less duplication will not really speed your import process because your performance bottleneck is in the insert operation performed for each item by the save
method.
So the solution is to make a bulk update instead of saving each row separately: you'll group the insert !
To do so, you can use the https://github.com/zdennis/activerecord-import gem which will allow you to do something like this :
entities = []
CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
business_type_id = case row[:business_type_id]
when 'Online' then 1
when 'Local' then 2
else 3
end
entities << target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
target_model.import entities if entities.size == 100
end
#don't forget to save the remaining entities:
target_model.import entities unless entities.size == 0
In this case we proceed the save operation each 100 row, which will make the import process way faster !
Getting the Thread's ID
To collect the thread's Id, I suggest to use the inspect
method of the current_thread
, you'll get more information than only the thread's id but you'll be able to have all the needed informations about which thread processed the insert operation.
thread_identifier = Thread.current.inspect
-
\$\begingroup\$ Should I have to write code in model now or I can use same code as it is placed now. \$\endgroup\$Dinesh Saini– Dinesh Saini2016年04月18日 05:18:34 +00:00Commented Apr 18, 2016 at 5:18
-
\$\begingroup\$ You should be able to use the code as it is placed now \$\endgroup\$lilobase– lilobase2016年04月18日 07:59:08 +00:00Commented Apr 18, 2016 at 7:59
-
\$\begingroup\$ But there is issue with bulk update, I have to skip few columns which is not possible by gem. \$\endgroup\$Dinesh Saini– Dinesh Saini2016年04月18日 08:59:02 +00:00Commented Apr 18, 2016 at 8:59
-
\$\begingroup\$ You have to skip columns or row ? For the columns, just remove them from the constructor's hash, for the rows you can make the append (
<<
) conditional. \$\endgroup\$lilobase– lilobase2016年04月18日 09:28:16 +00:00Commented Apr 18, 2016 at 9:28 -
\$\begingroup\$ Need to remove columns because the column name is same as of nested model. That model is just for main user help who create csv: CSV has fields [:user_id, :shipping_countries, shipping_country_ids] here shipping_countries is useless. The first case cover this case but gem one not. \$\endgroup\$Dinesh Saini– Dinesh Saini2016年04月18日 09:40:05 +00:00Commented Apr 18, 2016 at 9:40
No idea what "thread ID" means.
I'll just focus on the the translation of the CSV rows into an object.
There's just to much repetition in the unless
body. All those rows
should be mentioned once, then handled in a loop instead of hard coding
them like this. E.g.
unless row[row_headers["name"]].nil?
temp={}
business_type = row[row_headers["business_type_id"]]
business_type_id = business_type == "Online" ? 1: business_type == "Local" ? 2 : 3
temp[:business_type_id] = business_type_id
for name in [:user_id, :name, :country_id, :latitude, :longitude, :free_shipping, :category_ids, :style_ids, :shipping_country_ids]
temp[name] = row[row_headers[name.to_s]]
end
for name in [:homepage, :telephone, :email, :address, :facebook, :twitter, :google, :instagram, :pinterest, :ship_details]
temp[name] = row[row_headers[name.to_s]] ||= ""
end
business = target_model.new(temp)
business.save
end
That said, I'm sure there are better ways to do this in Ruby ...
-
\$\begingroup\$ thanks, since records are in millions and process take much time so I would like to do it using thread to improve performance. Here concern is that how can I store thread data and its status in the database because thread id can't be remain same. \$\endgroup\$Dinesh Saini– Dinesh Saini2016年04月17日 18:01:54 +00:00Commented Apr 17, 2016 at 18:01
Explore related questions
See similar questions with these tags.
I would like to fix code after unless and can I store thread ID in the database too?
Could you re-phrase that? I have no idea what you're trying to convey. \$\endgroup\$fix
afterunless
, are you talking about formatting ? You know fixing code is off-topic. \$\endgroup\$