I have sample database in GES
files and I wrote importer in Ruby.
Sample file: WAR_APO.GES
: https://gist.github.com/regedarek/cd0ce73c5d1e14ff9818
I will have several of GES
files which I want to move to my data
folder in app and I want to process them manually.
- For most of them logic is the same but the difference are filename, model and attributes hash.
- I want to also and new cases like:
Update
when type isU
orDelete
if isD
How should I refactor that:
class PrototypePharmaceuticImporter
ID = '00'
K = "#{ID}K"
F = "#{ID}F"
FB = "#{ID}FB"
D = "#{ID}D"
I = "#{ID}I"
U = "#{ID}U"
E = "#{ID}E"
TYPES = [K, F, FB, D, I, U, E]
ARTICLE_TYPE = 57
BARCODE = 15
GROUP_CODE = 1
GROUP_KEY = 40
GROUP_NAME = 2
FORM_CODE = 1
FORM_KEY = 38
FORM_NAME = 2
FORM_SHORT_NAME = 3
LONGNAME = 28
PACKAGE_TYPE = 46
PHARMACY_ONLY = 3
PRESCRIPTION_ONLY = 47
PRICE = 2
WEIGHT = 14
attr_accessor :file, :k, :f
def initialize
@pharmaceutics_file = File.open 'data/PAC_APO.GES', 'r'
@forms_file = File.open 'data/DAR_APO.GES', 'r'
@groups_file = File.open 'data/WAR_APO.GES', 'r'
@k = []
@f = []
extract_groups_to_database
end
def extract_pharmaceutics_to_database
add = true
tmp = []
type = K
@pharmaceutics_file.each_with_index do |line, i|
break if (i > 96 * 1000 + 760 + 9)
_type = line.strip
_changed = TYPES.include? _type
if _changed && i > 0
case type
when K then @k << tmp
when F then @f << tmp
when FB then @f << tmp
when I, U, D
hash = {
article_type: tmp[ARTICLE_TYPE],
price: tmp[PRICE],
weight: tmp[WEIGHT],
package_type: tmp[PACKAGE_TYPE],
group_code: tmp[GROUP_KEY],
form_code: tmp[FORM_KEY],
name: tmp[LONGNAME],
barcode: tmp[BARCODE],
pharmacy_only: tmp[PHARMACY_ONLY],
prescription_only: tmp[PRESCRIPTION_ONLY]
}
Pharmaceutic.create(hash)
end
tmp = []
type = _type
end
tmp << clean(line)
end
end
def extract_forms_to_database
add = true
tmp = []
type = K
@forms_file.each_with_index do |line, i|
break if (i > 96 * 1000 + 760 + 9)
_type = line.strip
_changed = TYPES.include? _type
if _changed && i > 0
case type
when K then @k << tmp
when F then @f << tmp
when FB then @f << tmp
when I, U, D
hash = {
code: tmp[FORM_CODE],
name: tmp[FORM_NAME],
short_name: tmp[FORM_SHORT_NAME]
}
PharmaceuticForm.create(hash)
end
tmp = []
type = _type
end
tmp << clean(line)
end
end
def extract_groups_to_database
add = true
tmp = []
type = K
@groups_file.each_with_index do |line, i|
break if (i > 96 * 1000 + 760 + 9)
_type = line.strip
_changed = TYPES.include? _type
if _changed && i > 0
case type
when K then @k << tmp
when F then @f << tmp
when FB then @f << tmp
when I, U, D
hash = {
code: tmp[GROUP_CODE],
name: tmp[GROUP_NAME]
}
PharmaceuticGroup.create(hash)
end
tmp = []
type = _type
end
tmp << clean(line)
end
end
private
def clean line
line.strip
.gsub(/^[\d]{2}/, '')
.gsub(/\\[s|S]39/, 'ß')
.gsub(/\\a25/, 'ä')
.gsub(/\\A25/, 'Ä')
.gsub(/\\o25/, 'ö')
.gsub(/\\O25/, 'Ö')
.gsub(/\\u25/, 'ü')
.gsub(/\\U25/, 'Ü')
end
end
Basically, I want to get rid of duplications in these methods.
1 Answer 1
Some notes:
As usual, I'd recommend a more functional approach. It's not a theoretical issue, functional code is more concise, more clear. When I see
@k = []
I get the shakes thinking that this variable can (and will) be modified just everywhere in the class (an instance variable is just a nasty global variable -for the scope a class, granted- with a pretty name).Also related to FP, your code is hard to abstract because it's written with procedures instead of functions: they don't take values (as arguments) and return values, everything works by side-effects through instance variables. No referential transparency, no idempotence...
Don't create so many individual constants, group them in hashes.
File.open 'data/PAC_APO.GES', 'r'
. Hardcoded files in classes? mmmm.96 * 1000 + 760 + 9
. Magic number, define it as a constant so it has a name.
Answering your particular question about how to avoid repeating code for those 3 methods: whatever is different put it as arguments. If some value depends on something the caller cannot know yet about (the code under when I, U, D
), then use an argument block as callback. So it might be written:
def extract_to_database(collection)
add = true
tmp = []
type = K
collection.each_with_index do |line, i|
break if (i > 96 * 1000 + 760 + 9)
_type = line.strip
_changed = TYPES.include? _type
if _changed && i > 0
case type
when K then @k << tmp
when F then @f << tmp
when FB then @f << tmp
when I, U, D
yield(tmp)
end
tmp = []
type = _type
end
tmp << clean(line)
end
end
Note that I kept your old imperative code (a functional approach would require rewriting the full class), but at least is now DRYed. I'd strongly encourage to try the functional paradigm, it's hard at first but it pays off on the long run. My article on this matter.
-
\$\begingroup\$ Could you give me simple advice how to get rid of using @k = [] from this method? I read your article and it is great but I have problem with implementing some of your hints. So I changed it to use sql for inserting and it looks like that: gist.github.com/regedarek/6172861. Thanks for this article it is great. \$\endgroup\$tomekfranek– tomekfranek2013年08月07日 10:21:33 +00:00Commented Aug 7, 2013 at 10:21
-
1\$\begingroup\$ I'll try to write a refactor of the gist later. For now, note that
@f
is defined nowhere. Try to write it without instance variables. \$\endgroup\$tokland– tokland2013年08月07日 10:30:32 +00:00Commented Aug 7, 2013 at 10:30