- This is a good solution? What could be the drawbacks and problems down the line?
- Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a
Person
class. It would detect if there is an appropriate mapping and apply it; alternatively it would raiseNotImplementedError
if it is an unknown mapping orKeyError
when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?
- This is a good solution? What could be the drawbacks and problems down the line?
- Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a
Person
class. It would detect if there is an appropriate mapping and apply it; alternatively it would raiseNotImplementedError
if it is an unknown mapping orKeyError
when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?
- This is a good solution? What could be the drawbacks and problems down the line?
- Currently I've implemented different subclasses for the mapping. Is there a convention or industry standard for these types of situations?
"""
Data mappers for Person
"""
# Abstract class for mapper
class Mapper(object):
def __init__(self, data):
self.data = data
# Data mapper for format A, maps the fields from dict_a to Person
class MapperA(Mapper):
def __init__(self, data):
self.name = ' '.join([datadata.get('name', {}).get(key) for key in ['first_name'('first_name', 'last_name']]'last_name'))
self.email = data.get('workEmail')
self.age = data.get('age')
self.connected = data.get('connected')
@classmethod
def is_mapper_for(cls, data):
returnneeded all(key= in{'name', data'workEmail'}
for key in ('name', 'workEmail' return needed.issubset(set(data))
# Data mapper for format B, maps the fields from dict_b to Person
class MapperB(Mapper):
def __init__(self, data):
self.name = data.get('fullName')
self.email = data.get('workEmail')
self.age = data.get('age')
self.connected = data.get('connected')
@classmethod
def is_mapper_for(cls, data):
returnneeded all(key= in{'fullName', data'workEmail'}
for key in ('fullName', 'workEmail' return needed.issubset(set(data))
# Creates a Person instance base on the input data mapping
def Person(data):
for cls in Mapper.__subclasses__():
if cls.is_mapper_for(data):
return cls(data)
raise NotImplementedError
if __name__ == '__main__':
from database.connection import make_session
from database.models import PersonModel
# Sample data for example
dict_a = {
'name': {
'first_name': 'John',
'last_name': 'Doe'
},
'workEmail': '[email protected]',
'age': 50,
'connected': False
}
dict_b = {
'fullName': 'John Doe',
'workEmail': '[email protected]',
'age': 50,
'connected': False
}
# Instantiate Person from data
persons = [Person[PersonModel(**Person(data).__dict__ for data in [dict_a(dict_a, dict_b]]
# Store persons that fit the database model
persons = [PersonModel(**person.__dict__dict_b) for person in persons]]
with make_session() as session:
session.add_all(persons)
session.commit()
- For question 2, I found this question to be useful, but would still want to know if this approach in general is good.
- Added style improvement suggestions from @Reinderien
"""
Data mappers for Person
"""
# Abstract class for mapper
class Mapper(object):
def __init__(self, data):
self.data = data
# Data mapper for format A, maps the fields from dict_a to Person
class MapperA(Mapper):
def __init__(self, data):
self.name = ' '.join([data.get('name').get(key) for key in ['first_name', 'last_name']])
self.email = data.get('workEmail')
self.age = data.get('age')
self.connected = data.get('connected')
@classmethod
def is_mapper_for(cls, data):
return all(key in data for key in ('name', 'workEmail'))
# Data mapper for format B, maps the fields from dict_b to Person
class MapperB(Mapper):
def __init__(self, data):
self.name = data.get('fullName')
self.email = data.get('workEmail')
self.age = data.get('age')
self.connected = data.get('connected')
@classmethod
def is_mapper_for(cls, data):
return all(key in data for key in ('fullName', 'workEmail'))
# Creates a Person instance base on the input data mapping
def Person(data):
for cls in Mapper.__subclasses__():
if cls.is_mapper_for(data):
return cls(data)
raise NotImplementedError
if __name__ == '__main__':
from database.connection import make_session
from database.models import PersonModel
# Sample data for example
dict_a = {
'name': {
'first_name': 'John',
'last_name': 'Doe'
},
'workEmail': '[email protected]',
'age': 50,
'connected': False
}
dict_b = {
'fullName': 'John Doe',
'workEmail': '[email protected]',
'age': 50,
'connected': False
}
# Instantiate Person from data
persons = [Person(data) for data in [dict_a, dict_b]]
# Store persons that fit the database model
persons = [PersonModel(**person.__dict__) for person in persons]
with make_session() as session:
session.add_all(persons)
session.commit()
- For question 2, I found this question to be useful, but would still want to know if this approach in general is good.
"""
Data mappers for Person
"""
# Abstract class for mapper
class Mapper(object):
def __init__(self, data):
self.data = data
# Data mapper for format A, maps the fields from dict_a to Person
class MapperA(Mapper):
def __init__(self, data):
self.name = ' '.join(data.get('name', {}).get(key) for key in ('first_name', 'last_name'))
self.email = data.get('workEmail')
self.age = data.get('age')
self.connected = data.get('connected')
@classmethod
def is_mapper_for(cls, data):
needed = {'name', 'workEmail'}
return needed.issubset(set(data))
# Data mapper for format B, maps the fields from dict_b to Person
class MapperB(Mapper):
def __init__(self, data):
self.name = data.get('fullName')
self.email = data.get('workEmail')
self.age = data.get('age')
self.connected = data.get('connected')
@classmethod
def is_mapper_for(cls, data):
needed = {'fullName', 'workEmail'}
return needed.issubset(set(data))
# Creates a Person instance base on the input data mapping
def Person(data):
for cls in Mapper.__subclasses__():
if cls.is_mapper_for(data):
return cls(data)
raise NotImplementedError
if __name__ == '__main__':
from database.connection import make_session
from database.models import PersonModel
# Sample data for example
dict_a = {
'name': {
'first_name': 'John',
'last_name': 'Doe'
},
'workEmail': '[email protected]',
'age': 50,
'connected': False
}
dict_b = {
'fullName': 'John Doe',
'workEmail': '[email protected]',
'age': 50,
'connected': False
}
# Instantiate Person from data
persons = [PersonModel(**Person(data).__dict__ for data in (dict_a, dict_b)]
with make_session() as session:
session.add_all(persons)
session.commit()
- For question 2, I found this question to be useful, but would still want to know if this approach in general is good.
- Added style improvement suggestions from @Reinderien
Sourcing data fromatformat from multiple different structures
I have limited experience in pythonPython programming and I'm building my first scraper application for a data engineering project that needs to scale to storing hundreds of thousands of Persons from tens of different structures. I was wondering if:
- This is a good solution? What could be the drawbacks and problems down the line?
- Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a
Person
class. It would detect if there is an appropriate mapping and apply it,it; alternatively it would raiseNotImplementedError
if it is an unknown mapping orKeyError
when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?
Sourcing data fromat from multiple different structures
I have limited experience in python programming and I'm building my first scraper application for a data engineering project that needs to scale to storing hundreds of thousands of Persons from tens of different structures I was wondering if:
- This is a good solution? What could be the drawbacks and problems down the line?
- Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a
Person
class. It would detect if there is an appropriate mapping and apply it, alternatively it would raiseNotImplementedError
if it is an unknown mapping orKeyError
when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?
Sourcing data format from multiple different structures
I have limited experience in Python programming and I'm building my first scraper application for a data engineering project that needs to scale to storing hundreds of thousands of Persons from tens of different structures. I was wondering if:
- This is a good solution? What could be the drawbacks and problems down the line?
- Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a
Person
class. It would detect if there is an appropriate mapping and apply it; alternatively it would raiseNotImplementedError
if it is an unknown mapping orKeyError
when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?