I have two csv files, pricat.csv
which contains objects I need to populate my DB with, and mapping.csv
which specifies how the value in pricat.csv
must be displayed in my DB, for ex: 'NW 17-18'
in pricat.csv
has to be 'Winter Collection 2017/2018'
in my DB. Here the csvs, first row in both are the headers:
ean;supplier;brand;catalog_code;collection;season;article_structure_code;article_number;article_number_2;article_number_3;color_code;size_group_code;size_code;size_name;currency;price_buy_gross;price_buy_net;discount_rate;price_sell;material;target_area
8719245200978;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;38;38;EUR;;58.5;;139.95;Aviation;Woman Shoes
8719245200985;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;39;39;EUR;;58.5;;139.95;Aviation;Woman Shoes
source;destination;source_type;destination_type
winter;Winter;season;season
summer;Summer;season;season
NW 17-18;Winter Collection 2017/2018;collection;collection
EU;European sizes;size_group_code;size_group
EU|36;European size 36;size_group_code|size_code;size
EU|37;European size 37;size_group_code|size_code;size
EU|38;European size 38;size_group_code|size_code;size
EU|39;European size 39;size_group_code|size_code;size
EU|40;European size 40;size_group_code|size_code;size
EU|41;European size 41;size_group_code|size_code;size
EU|42;European size 42;size_group_code|size_code;size
4;Boot;article_structure_code;article_structure
5;Sneaker;article_structure_code;article_structure
6;Slipper;article_structure_code;article_structure
7;Loafer;article_structure_code;article_structure
8;Mocassin;article_structure_code;article_structure
9;Sandal;article_structure_code;article_structure
10;Pump;article_structure_code;article_structure
1;Nero;color_code;color
2;Marrone;color_code;color
3;Brandy Nero;color_code;color
4;Indaco Nero;color_code;color
5;Fucile;color_code;color
6;Bosco Nero;color_code;color
In my models.py
in Django I have three models: Catalog
--> Article
--> Variation
the attributes of my models are manually named as mapping.csv
specifies, for ex: Variation
will not have a color_code
attribute but color
.
To populate the DB I've created a custom Django command which reads the rows in pricat.csv
and create istances like this:
x = Catalog.objects.get_or_create(brand=info[2], supplier=info[1], catalog_code=info[3],
collection=map_dict[info[4]],
season=map_dict[info[5]], size_group=map_dict[info[11]],
currency=info[14], target_area=info[20])
y = Article.objects.get_or_create(article_structure=map_dict[info[6]],
article_number=info[7], catalog=x[0])
z = Variation.objects.get_or_create(ean=info[0], article=y[0], size_code=info[12], color=map_col[info[10]],
material=info[19], price_buy_gross=info[15], price_buy_net=info[16],
discount_rate=info[17], price_sell=info[18], size=f'{map_dict[info[11]]} {info[12]}')
info
is a list of all the value in a pricat.csv
row and map_dict
and map_col
are two dictionaries I create with two func() from the mapping.csv
:
def mapping(map_file):
with open(map_file, 'r') as f:
f = [l.strip('\n') for l in f]
map_dict = {}
for l in f[1:19]:
info = l.strip().split(';')
source = info[0]
destination = info[1]
source_type = info[2]
destination_type = info[3]
map_dict[source] = destination
map_dict[source_type] = destination_type
return map_dict
def mapping_color(map_file):
with open(map_file, 'r') as f:
f = [l.strip('\n') for l in f]
map_dict = {}
for l in f[19:]:
info = l.strip().split(';')
source = info[0]
destination = info[1]
source_type = info[2]
destination_type = info[3]
map_dict[source] = destination
map_dict[source_type] = destination_type
return map_dict
map_dict = mapping('mapping.csv')
map_col = mapping_color('mapping.csv')
I had to create two dict because a single one would have duplicate keys.
The code works fine and the DB is populated as intended, but I feel the way I did the mapping is bad practice, also both my command and funcs relies on indeces so the values in my csvs have to be in that specific order to work. I would greatly appreciate any suggestion on how to improve my code or accomplish this task, I hope my explanation is clear.
EDIT:
class Catalog(models.Model):
brand = models.CharField(max_length=255)
supplier = models.CharField(max_length=255)
catalog_code = models.CharField(max_length=255, default=1, blank=True)
collection = models.CharField(max_length=255)
season = models.CharField(max_length=255)
size_group = models.CharField(max_length=2)
currency = models.CharField(max_length=3)
target_area = models.CharField(max_length=255)
def __str__(self):
return self.brand
def get_articles(self):
return Article.objects.filter(catalog=self.pk)
class Article(models.Model):
article_structure = models.CharField(max_length=255)
article_number = models.CharField(max_length=255)
catalog = models.ForeignKey(Catalog, on_delete=models.CASCADE)
def __str__(self):
return f'{self.article_number} | {self.article_structure}'
class Variation(models.Model):
ean = models.CharField(max_length=255)
article = models.ForeignKey(Article, on_delete=models.CASCADE)
size_code = models.IntegerField()
size = models.CharField(max_length=255, default=0)
color = models.CharField(max_length=255)
material = models.CharField(max_length=255)
price_buy_gross = models.CharField(max_length=255)
price_buy_net = models.FloatField()
discount_rate = models.CharField(max_length=255, default=0)
price_sell = models.FloatField()
def __str__(self):
return f'Ean: {self.ean}, article: {self.article}'
I've created a new mapping()
def mapping(map_file):
with open(map_file, 'r') as f:
f = [l.strip('\n') for l in f]
map_dict = {}
for l in f[1:]:
info = l.strip().split(';')
source = info[0]
destination = info[1]
source_type = info[2]
child_dict = {source: destination}
map_dict[source_type] = map_dict.get(source_type, {source: destination})
map_dict[source_type].update(child_dict)
return map_dict
It returns a nested dict, I'm trying to finda solution using this single nested dict instead of 2 dicts like before.
-
1\$\begingroup\$ Please show more of your code, particularly the model classes for Catalog --> Article --> Variation. \$\endgroup\$Reinderien– Reinderien2020年11月21日 17:13:29 +00:00Commented Nov 21, 2020 at 17:13
-
\$\begingroup\$ classes added @Reinderien \$\endgroup\$Adamantoisetortoise– Adamantoisetortoise2020年11月21日 17:24:41 +00:00Commented Nov 21, 2020 at 17:24
2 Answers 2
You can use the built-in csv.DictReader to easily create dictionaries from CSV files. How about this?
import csv
def create_mapping(map_file):
with open(map_file) as csvfile:
reader = csv.DictReader(csvfile, delimiter=';')
mapping = {row['source']: row['destination']
for row in reader
if row['source_type'] != 'color_code'}
return mapping
map_dict = create_mapping('mapping.csv')
We are using dictionary comprehension to create the dictionary.
You can do something similar for colors, then you want to have all the rows where source_type
equals color_code
(so ==
instead of !=
). But perhaps it is a better idea put the color mappings into a different file. Furthermore, if you process the pricat.csv
in a similar fashion:
with open('pricat.csv') as csvfile:
reader = csv.DictReader(csvfile, delimiter=';')
for row in reader:
# process row
You'll be able to use the rows as dictionaries:
{'ean': '8719245200985',
'supplier': 'Rupesco BV',
'brand': 'Via Vai',
'catalog_code': '',
'collection': 'NW 17-18',
'season': 'winter',
'article_structure_code': '10',
'article_number': '15189-02',
'article_number_2': '15189-02 Aviation Nero',
'article_number_3': 'Aviation',
'color_code': '1',
'size_group_code': 'EU',
'size_code': '39',
'size_name': '39',
'currency': 'EUR',
'price_buy_gross': '',
'price_buy_net': '58.5',
'discount_rate': '',
'price_sell': '139.95',
'material': 'Aviation',
'target_area': 'Woman Shoes'}
So you can do something like:
y = Article.objects.get_or_create(article_structure=map_dict[row['article_structure_code']],
article_number=row['article_number'], catalog=x[0])
This can still be refactored a bit, but now you are no longer dependent on the column numbers.
-
\$\begingroup\$ Hi! I changed my mapping() (let me know what you think about it) but your use of csv.DictReader gave me a great insight and I think I'm going to come up with a very nice solution soon, if it works I'll accept your answer. \$\endgroup\$Adamantoisetortoise– Adamantoisetortoise2020年11月22日 16:30:43 +00:00Commented Nov 22, 2020 at 16:30
class Command(BaseCommand):
help = 'Create a catalog, accept csv as argument'
def add_arguments(self, parser):
parser.add_argument('file', nargs='+', type=str)
parser.add_argument('map', nargs='+', type=str)
def handle(self, *args, **options):
map_dict = mapping(options['map'][0])
with open(options['file'][0], 'r') as f:
reader = csv.DictReader(f, delimiter=';')
for row in reader:
x = Catalog.objects.get_or_create(brand=row['brand'], supplier=row['supplier'],
catalog_code=row['catalog_code'],
collection=map_dict['collection'][row['collection']],
season=map_dict['season'][row['season']],
size_group=map_dict['size_group_code'][row['size_group_code']],
currency=row['currency'], target_area=row['target_area'])
if x[1]:
logger_catalog.info(f'Created Catalog instance {x[0]}')
y = Article.objects.get_or_create(article_structure=map_dict['article_structure_code'][row['article_structure_code']],
article_number=row['article_number'], catalog=x[0])
if y[1]:
logger_catalog.info(f'Created Article instance {y[0]}')
z = Variation.objects.get_or_create(ean=row['ean'], article=y[0], size_code=row['size_code'],
color=map_dict['color_code'][row['color_code']],
material=row['material'], price_buy_gross=row['price_buy_gross'],
price_buy_net=row['price_buy_net'],
discount_rate=row['discount_rate'], price_sell=row['price_sell'],
size=map_dict['size_group_code|size_code'][f"{row['size_group_code']}|{row['size_code']}"])
if z[1]:
logger_catalog.info(f'Created Variation instance {z[0]}')
Finally I remake my Command, now it's indipendent from indeces so it will correctly populate the database even if the colums in the csv are in a different order.
My mapping() func (see question) returns a nested dict, the keys of the parent dict are the columns names that need to be mapped and the values are dicts with this structure:
{value_presented_in_csv: how_value_should_be_presented_in_DB}
.
In my Command
I iterate through each row of pricat.csv
turning rows in dicts {colum_name: value_presented_in_csv}
, if the data don't need to be mapped I get the value from my row
dict like brand=row['brand']
, if the data need to be mapped I get the value from my nested dict map_dict
like this map_dict[column_name][value_presented_in_csv]
(this gives me the value of the child dict that is how_value_should_be_presented_in_DB
).
It is better because doesn't relies on indeces no more, my first implementation works correctly only if the columns in pricat.csv
are in that precise order;
with this new implementation the columns can be in any order and the DB would still be populate correctly.