Mapping column names and values of a csv using another csv

Question 1

I have two csv files, pricat.csv which contains objects I need to populate my DB with, and mapping.csv which specifies how the value in pricat.csv must be displayed in my DB, for ex: 'NW 17-18' in pricat.csv has to be 'Winter Collection 2017/2018' in my DB. Here the csvs, first row in both are the headers:

ean;supplier;brand;catalog_code;collection;season;article_structure_code;article_number;article_number_2;article_number_3;color_code;size_group_code;size_code;size_name;currency;price_buy_gross;price_buy_net;discount_rate;price_sell;material;target_area
8719245200978;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;38;38;EUR;;58.5;;139.95;Aviation;Woman Shoes
8719245200985;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;39;39;EUR;;58.5;;139.95;Aviation;Woman Shoes

source;destination;source_type;destination_type
winter;Winter;season;season
summer;Summer;season;season
NW 17-18;Winter Collection 2017/2018;collection;collection
EU;European sizes;size_group_code;size_group
EU|36;European size 36;size_group_code|size_code;size
EU|37;European size 37;size_group_code|size_code;size
EU|38;European size 38;size_group_code|size_code;size
EU|39;European size 39;size_group_code|size_code;size
EU|40;European size 40;size_group_code|size_code;size
EU|41;European size 41;size_group_code|size_code;size
EU|42;European size 42;size_group_code|size_code;size
4;Boot;article_structure_code;article_structure
5;Sneaker;article_structure_code;article_structure
6;Slipper;article_structure_code;article_structure
7;Loafer;article_structure_code;article_structure
8;Mocassin;article_structure_code;article_structure
9;Sandal;article_structure_code;article_structure
10;Pump;article_structure_code;article_structure
1;Nero;color_code;color
2;Marrone;color_code;color
3;Brandy Nero;color_code;color
4;Indaco Nero;color_code;color
5;Fucile;color_code;color
6;Bosco Nero;color_code;color

In my models.py in Django I have three models: Catalog --> Article --> Variation the attributes of my models are manually named as mapping.csv specifies, for ex: Variation will not have a color_code attribute but color. To populate the DB I've created a custom Django command which reads the rows in pricat.csv and create istances like this:

x = Catalog.objects.get_or_create(brand=info[2], supplier=info[1], catalog_code=info[3],
 collection=map_dict[info[4]],
 season=map_dict[info[5]], size_group=map_dict[info[11]],
 currency=info[14], target_area=info[20])
y = Article.objects.get_or_create(article_structure=map_dict[info[6]],
 article_number=info[7], catalog=x[0])
z = Variation.objects.get_or_create(ean=info[0], article=y[0], size_code=info[12], color=map_col[info[10]],
 material=info[19], price_buy_gross=info[15], price_buy_net=info[16],
 discount_rate=info[17], price_sell=info[18], size=f'{map_dict[info[11]]} {info[12]}')

info is a list of all the value in a pricat.csv row and map_dict and map_col are two dictionaries I create with two func() from the mapping.csv:

def mapping(map_file):
 with open(map_file, 'r') as f:
 f = [l.strip('\n') for l in f]
 map_dict = {}
 for l in f[1:19]:
 info = l.strip().split(';')
 source = info[0]
 destination = info[1]
 source_type = info[2]
 destination_type = info[3]
 map_dict[source] = destination
 map_dict[source_type] = destination_type
 return map_dict
def mapping_color(map_file):
 with open(map_file, 'r') as f:
 f = [l.strip('\n') for l in f]
 map_dict = {}
 for l in f[19:]:
 info = l.strip().split(';')
 source = info[0]
 destination = info[1]
 source_type = info[2]
 destination_type = info[3]
 map_dict[source] = destination
 map_dict[source_type] = destination_type
 return map_dict
map_dict = mapping('mapping.csv')
map_col = mapping_color('mapping.csv')

I had to create two dict because a single one would have duplicate keys.

The code works fine and the DB is populated as intended, but I feel the way I did the mapping is bad practice, also both my command and funcs relies on indeces so the values in my csvs have to be in that specific order to work. I would greatly appreciate any suggestion on how to improve my code or accomplish this task, I hope my explanation is clear.

EDIT:

class Catalog(models.Model):
 brand = models.CharField(max_length=255)
 supplier = models.CharField(max_length=255)
 catalog_code = models.CharField(max_length=255, default=1, blank=True)
 collection = models.CharField(max_length=255)
 season = models.CharField(max_length=255)
 size_group = models.CharField(max_length=2)
 currency = models.CharField(max_length=3)
 target_area = models.CharField(max_length=255)
 def __str__(self):
 return self.brand
 def get_articles(self):
 return Article.objects.filter(catalog=self.pk)
class Article(models.Model):
 article_structure = models.CharField(max_length=255)
 article_number = models.CharField(max_length=255)
 catalog = models.ForeignKey(Catalog, on_delete=models.CASCADE)
 def __str__(self):
 return f'{self.article_number} | {self.article_structure}'
class Variation(models.Model):
 ean = models.CharField(max_length=255)
 article = models.ForeignKey(Article, on_delete=models.CASCADE)
 size_code = models.IntegerField()
 size = models.CharField(max_length=255, default=0)
 color = models.CharField(max_length=255)
 material = models.CharField(max_length=255)
 price_buy_gross = models.CharField(max_length=255)
 price_buy_net = models.FloatField()
 discount_rate = models.CharField(max_length=255, default=0)
 price_sell = models.FloatField()
 def __str__(self):
 return f'Ean: {self.ean}, article: {self.article}'

I've created a new mapping()

def mapping(map_file):
 with open(map_file, 'r') as f:
 f = [l.strip('\n') for l in f]
 map_dict = {}
 for l in f[1:]:
 info = l.strip().split(';')
 source = info[0]
 destination = info[1]
 source_type = info[2]
 child_dict = {source: destination}
 map_dict[source_type] = map_dict.get(source_type, {source: destination})
 map_dict[source_type].update(child_dict)
 return map_dict

It returns a nested dict, I'm trying to finda solution using this single nested dict instead of 2 dicts like before.

Question 2

Please show more of your code, particularly the model classes for Catalog --> Article --> Variation.

Question 3

classes added @Reinderien

Question 4

You can use the built-in csv.DictReader to easily create dictionaries from CSV files. How about this?

import csv
def create_mapping(map_file):
 with open(map_file) as csvfile:
 reader = csv.DictReader(csvfile, delimiter=';')
 mapping = {row['source']: row['destination'] 
 for row in reader 
 if row['source_type'] != 'color_code'}
 return mapping
map_dict = create_mapping('mapping.csv')

We are using dictionary comprehension to create the dictionary. You can do something similar for colors, then you want to have all the rows where source_type equals color_code (so == instead of !=). But perhaps it is a better idea put the color mappings into a different file. Furthermore, if you process the pricat.csv in a similar fashion:

with open('pricat.csv') as csvfile:
 reader = csv.DictReader(csvfile, delimiter=';')
 for row in reader:
 # process row

You'll be able to use the rows as dictionaries:

{'ean': '8719245200985',
 'supplier': 'Rupesco BV',
 'brand': 'Via Vai',
 'catalog_code': '',
 'collection': 'NW 17-18',
 'season': 'winter',
 'article_structure_code': '10',
 'article_number': '15189-02',
 'article_number_2': '15189-02 Aviation Nero',
 'article_number_3': 'Aviation',
 'color_code': '1',
 'size_group_code': 'EU',
 'size_code': '39',
 'size_name': '39',
 'currency': 'EUR',
 'price_buy_gross': '',
 'price_buy_net': '58.5',
 'discount_rate': '',
 'price_sell': '139.95',
 'material': 'Aviation',
 'target_area': 'Woman Shoes'}

So you can do something like:

y = Article.objects.get_or_create(article_structure=map_dict[row['article_structure_code']],
 article_number=row['article_number'], catalog=x[0])

This can still be refactored a bit, but now you are no longer dependent on the column numbers.

Question 5

Hi! I changed my mapping() (let me know what you think about it) but your use of csv.DictReader gave me a great insight and I think I'm going to come up with a very nice solution soon, if it works I'll accept your answer.

Question 6

class Command(BaseCommand):
 help = 'Create a catalog, accept csv as argument'
 def add_arguments(self, parser):
 parser.add_argument('file', nargs='+', type=str)
 parser.add_argument('map', nargs='+', type=str)
 def handle(self, *args, **options):
 map_dict = mapping(options['map'][0])
 with open(options['file'][0], 'r') as f:
 reader = csv.DictReader(f, delimiter=';')
 for row in reader:
 x = Catalog.objects.get_or_create(brand=row['brand'], supplier=row['supplier'],
 catalog_code=row['catalog_code'],
 collection=map_dict['collection'][row['collection']],
 season=map_dict['season'][row['season']],
 size_group=map_dict['size_group_code'][row['size_group_code']],
 currency=row['currency'], target_area=row['target_area'])
 if x[1]:
 logger_catalog.info(f'Created Catalog instance {x[0]}')
 y = Article.objects.get_or_create(article_structure=map_dict['article_structure_code'][row['article_structure_code']],
 article_number=row['article_number'], catalog=x[0])
 if y[1]:
 logger_catalog.info(f'Created Article instance {y[0]}')
 z = Variation.objects.get_or_create(ean=row['ean'], article=y[0], size_code=row['size_code'],
 color=map_dict['color_code'][row['color_code']],
 material=row['material'], price_buy_gross=row['price_buy_gross'],
 price_buy_net=row['price_buy_net'],
 discount_rate=row['discount_rate'], price_sell=row['price_sell'],
 size=map_dict['size_group_code|size_code'][f"{row['size_group_code']}|{row['size_code']}"])
 if z[1]:
 logger_catalog.info(f'Created Variation instance {z[0]}')

Finally I remake my Command, now it's indipendent from indeces so it will correctly populate the database even if the colums in the csv are in a different order.

My mapping() func (see question) returns a nested dict, the keys of the parent dict are the columns names that need to be mapped and the values are dicts with this structure:
{value_presented_in_csv: how_value_should_be_presented_in_DB}.

In my Command I iterate through each row of pricat.csv turning rows in dicts {colum_name: value_presented_in_csv}, if the data don't need to be mapped I get the value from my row dict like brand=row['brand'], if the data need to be mapped I get the value from my nested dict map_dict like this map_dict[column_name][value_presented_in_csv] (this gives me the value of the child dict that is how_value_should_be_presented_in_DB).

It is better because doesn't relies on indeces no more, my first implementation works correctly only if the columns in pricat.csv are in that precise order; with this new implementation the columns can be in any order and the DB would still be populate correctly.

PythonSherpa PythonSherpa 1887 bronze badges · Accepted Answer · 2020-11-22 15:39:11Z

You can use the built-in csv.DictReader to easily create dictionaries from CSV files. How about this?

import csv
def create_mapping(map_file):
 with open(map_file) as csvfile:
 reader = csv.DictReader(csvfile, delimiter=';')
 mapping = {row['source']: row['destination'] 
 for row in reader 
 if row['source_type'] != 'color_code'}
 return mapping
map_dict = create_mapping('mapping.csv')

We are using dictionary comprehension to create the dictionary. You can do something similar for colors, then you want to have all the rows where source_type equals color_code (so == instead of !=). But perhaps it is a better idea put the color mappings into a different file. Furthermore, if you process the pricat.csv in a similar fashion:

with open('pricat.csv') as csvfile:
 reader = csv.DictReader(csvfile, delimiter=';')
 for row in reader:
 # process row

You'll be able to use the rows as dictionaries:

{'ean': '8719245200985',
 'supplier': 'Rupesco BV',
 'brand': 'Via Vai',
 'catalog_code': '',
 'collection': 'NW 17-18',
 'season': 'winter',
 'article_structure_code': '10',
 'article_number': '15189-02',
 'article_number_2': '15189-02 Aviation Nero',
 'article_number_3': 'Aviation',
 'color_code': '1',
 'size_group_code': 'EU',
 'size_code': '39',
 'size_name': '39',
 'currency': 'EUR',
 'price_buy_gross': '',
 'price_buy_net': '58.5',
 'discount_rate': '',
 'price_sell': '139.95',
 'material': 'Aviation',
 'target_area': 'Woman Shoes'}

So you can do something like:

y = Article.objects.get_or_create(article_structure=map_dict[row['article_structure_code']],
 article_number=row['article_number'], catalog=x[0])

This can still be refactored a bit, but now you are no longer dependent on the column numbers.

Hi! I changed my mapping() (let me know what you think about it) but your use of csv.DictReader gave me a great insight and I think I'm going to come up with a very nice solution soon, if it works I'll accept your answer.

Stack Exchange Network

Mapping column names and values of a csv using another csv

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Mapping column names and values of a csv using another csv

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions