What are best practices of creating database importer in Rails?

Question 1

I have sample database in GES files and I wrote importer in Ruby.

Sample file: WAR_APO.GES: https://gist.github.com/regedarek/cd0ce73c5d1e14ff9818

I will have several of GES files which I want to move to my data folder in app and I want to process them manually.

For most of them logic is the same but the difference are filename, model and attributes hash.
I want to also and new cases like: Update when type is U or Delete if is D

How should I refactor that:

class PrototypePharmaceuticImporter
 ID = '00'
 K = "#{ID}K"
 F = "#{ID}F"
 FB = "#{ID}FB"
 D = "#{ID}D"
 I = "#{ID}I"
 U = "#{ID}U"
 E = "#{ID}E"
 TYPES = [K, F, FB, D, I, U, E]
 ARTICLE_TYPE = 57
 BARCODE = 15
 GROUP_CODE = 1
 GROUP_KEY = 40
 GROUP_NAME = 2
 FORM_CODE = 1
 FORM_KEY = 38
 FORM_NAME = 2
 FORM_SHORT_NAME = 3
 LONGNAME = 28
 PACKAGE_TYPE = 46
 PHARMACY_ONLY = 3
 PRESCRIPTION_ONLY = 47
 PRICE = 2
 WEIGHT = 14
 attr_accessor :file, :k, :f
 def initialize
 @pharmaceutics_file = File.open 'data/PAC_APO.GES', 'r'
 @forms_file = File.open 'data/DAR_APO.GES', 'r'
 @groups_file = File.open 'data/WAR_APO.GES', 'r'
 @k = []
 @f = []
 extract_groups_to_database
 end
 def extract_pharmaceutics_to_database
 add = true
 tmp = []
 type = K
 @pharmaceutics_file.each_with_index do |line, i|
 break if (i > 96 * 1000 + 760 + 9)
 _type = line.strip
 _changed = TYPES.include? _type
 if _changed && i > 0
 case type
 when K then @k << tmp
 when F then @f << tmp
 when FB then @f << tmp
 when I, U, D
 hash = {
 article_type: tmp[ARTICLE_TYPE],
 price: tmp[PRICE],
 weight: tmp[WEIGHT],
 package_type: tmp[PACKAGE_TYPE],
 group_code: tmp[GROUP_KEY],
 form_code: tmp[FORM_KEY],
 name: tmp[LONGNAME],
 barcode: tmp[BARCODE],
 pharmacy_only: tmp[PHARMACY_ONLY],
 prescription_only: tmp[PRESCRIPTION_ONLY]
 }
 Pharmaceutic.create(hash)
 end
 tmp = []
 type = _type
 end
 tmp << clean(line)
 end
 end
 def extract_forms_to_database
 add = true
 tmp = []
 type = K
 @forms_file.each_with_index do |line, i|
 break if (i > 96 * 1000 + 760 + 9)
 _type = line.strip
 _changed = TYPES.include? _type
 if _changed && i > 0
 case type
 when K then @k << tmp
 when F then @f << tmp
 when FB then @f << tmp
 when I, U, D
 hash = {
 code: tmp[FORM_CODE],
 name: tmp[FORM_NAME],
 short_name: tmp[FORM_SHORT_NAME]
 }
 PharmaceuticForm.create(hash)
 end
 tmp = []
 type = _type
 end
 tmp << clean(line)
 end
 end
 def extract_groups_to_database
 add = true
 tmp = []
 type = K
 @groups_file.each_with_index do |line, i|
 break if (i > 96 * 1000 + 760 + 9)
 _type = line.strip
 _changed = TYPES.include? _type
 if _changed && i > 0
 case type
 when K then @k << tmp
 when F then @f << tmp
 when FB then @f << tmp
 when I, U, D
 hash = {
 code: tmp[GROUP_CODE],
 name: tmp[GROUP_NAME]
 }
 PharmaceuticGroup.create(hash)
 end
 tmp = []
 type = _type
 end
 tmp << clean(line)
 end
 end
 private
 def clean line
 line.strip
 .gsub(/^[\d]{2}/, '')
 .gsub(/\\[s|S]39/, 'ß')
 .gsub(/\\a25/, 'ä')
 .gsub(/\\A25/, 'Ä')
 .gsub(/\\o25/, 'ö')
 .gsub(/\\O25/, 'Ö')
 .gsub(/\\u25/, 'ü')
 .gsub(/\\U25/, 'Ü')
 end
end

Basically, I want to get rid of duplications in these methods.

Question 2

Some notes:

As usual, I'd recommend a more functional approach. It's not a theoretical issue, functional code is more concise, more clear. When I see @k = [] I get the shakes thinking that this variable can (and will) be modified just everywhere in the class (an instance variable is just a nasty global variable -for the scope a class, granted- with a pretty name).
Also related to FP, your code is hard to abstract because it's written with procedures instead of functions: they don't take values (as arguments) and return values, everything works by side-effects through instance variables. No referential transparency, no idempotence...
Don't create so many individual constants, group them in hashes.
File.open 'data/PAC_APO.GES', 'r'. Hardcoded files in classes? mmmm.
96 * 1000 + 760 + 9. Magic number, define it as a constant so it has a name.

Answering your particular question about how to avoid repeating code for those 3 methods: whatever is different put it as arguments. If some value depends on something the caller cannot know yet about (the code under when I, U, D), then use an argument block as callback. So it might be written:

def extract_to_database(collection)
 add = true
 tmp = []
 type = K
 collection.each_with_index do |line, i|
 break if (i > 96 * 1000 + 760 + 9)
 _type = line.strip
 _changed = TYPES.include? _type
 if _changed && i > 0
 case type
 when K then @k << tmp
 when F then @f << tmp
 when FB then @f << tmp
 when I, U, D
 yield(tmp)
 end
 tmp = []
 type = _type
 end
 tmp << clean(line)
 end
end

Note that I kept your old imperative code (a functional approach would require rewriting the full class), but at least is now DRYed. I'd strongly encourage to try the functional paradigm, it's hard at first but it pays off on the long run. My article on this matter.

Question 3

Could you give me simple advice how to get rid of using @k = [] from this method? I read your article and it is great but I have problem with implementing some of your hints. So I changed it to use sql for inserting and it looks like that: gist.github.com/regedarek/6172861. Thanks for this article it is great.

Question 4

I'll try to write a refactor of the gist later. For now, note that @f is defined nowhere. Try to write it without instance variables.

tokland tokland 11.2k1 gold badge21 silver badges26 bronze badges · Accepted Answer · 2013-07-26 12:31:22Z

Some notes:

As usual, I'd recommend a more functional approach. It's not a theoretical issue, functional code is more concise, more clear. When I see @k = [] I get the shakes thinking that this variable can (and will) be modified just everywhere in the class (an instance variable is just a nasty global variable -for the scope a class, granted- with a pretty name).
Also related to FP, your code is hard to abstract because it's written with procedures instead of functions: they don't take values (as arguments) and return values, everything works by side-effects through instance variables. No referential transparency, no idempotence...
Don't create so many individual constants, group them in hashes.
File.open 'data/PAC_APO.GES', 'r'. Hardcoded files in classes? mmmm.
96 * 1000 + 760 + 9. Magic number, define it as a constant so it has a name.

Answering your particular question about how to avoid repeating code for those 3 methods: whatever is different put it as arguments. If some value depends on something the caller cannot know yet about (the code under when I, U, D), then use an argument block as callback. So it might be written:

def extract_to_database(collection)
 add = true
 tmp = []
 type = K
 collection.each_with_index do |line, i|
 break if (i > 96 * 1000 + 760 + 9)
 _type = line.strip
 _changed = TYPES.include? _type
 if _changed && i > 0
 case type
 when K then @k << tmp
 when F then @f << tmp
 when FB then @f << tmp
 when I, U, D
 yield(tmp)
 end
 tmp = []
 type = _type
 end
 tmp << clean(line)
 end
end

Note that I kept your old imperative code (a functional approach would require rewriting the full class), but at least is now DRYed. I'd strongly encourage to try the functional paradigm, it's hard at first but it pays off on the long run. My article on this matter.

Could you give me simple advice how to get rid of using @k = [] from this method? I read your article and it is great but I have problem with implementing some of your hints. So I changed it to use sql for inserting and it looks like that: gist.github.com/regedarek/6172861. Thanks for this article it is great.
I'll try to write a refactor of the gist later. For now, note that @f is defined nowhere. Try to write it without instance variables.

Stack Exchange Network

What are best practices of creating database importer in Rails?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

What are best practices of creating database importer in Rails?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions