Return to Question

Narrowed down the question

Source Link

edited Jan 8, 2019 at 2:13

maivel

edited Jan 8, 2019 at 2:13

maivel

This is a good solution? What could be the drawbacks and problems down the line?
Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it; alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?

This is a good solution? What could be the drawbacks and problems down the line?
Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it; alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?

This is a good solution? What could be the drawbacks and problems down the line?
Currently I've implemented different subclasses for the mapping. Is there a convention or industry standard for these types of situations?

Minor edit

Source Link

edited Jan 8, 2019 at 2:05

maivel

edited Jan 8, 2019 at 2:05

maivel

"""
Data mappers for Person
"""
# Abstract class for mapper
class Mapper(object):
 def __init__(self, data):
 self.data = data
# Data mapper for format A, maps the fields from dict_a to Person
class MapperA(Mapper):
 def __init__(self, data):
 self.name = ' '.join([datadata.get('name', {}).get(key) for key in ['first_name'('first_name', 'last_name']]'last_name'))
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 returnneeded all(key= in{'name', data'workEmail'}
 for key in ('name', 'workEmail' return needed.issubset(set(data))
# Data mapper for format B, maps the fields from dict_b to Person
class MapperB(Mapper):
 def __init__(self, data):
 self.name = data.get('fullName')
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 returnneeded all(key= in{'fullName', data'workEmail'}
 for key in ('fullName', 'workEmail' return needed.issubset(set(data))
# Creates a Person instance base on the input data mapping
def Person(data):
 for cls in Mapper.__subclasses__():
 if cls.is_mapper_for(data):
 return cls(data)
 raise NotImplementedError
if __name__ == '__main__':
 from database.connection import make_session
 from database.models import PersonModel
 # Sample data for example
 dict_a = {
 'name': {
 'first_name': 'John',
 'last_name': 'Doe'
 },
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 dict_b = {
 'fullName': 'John Doe',
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 # Instantiate Person from data
 persons = [Person[PersonModel(**Person(data).__dict__ for data in [dict_a(dict_a, dict_b]]
 # Store persons that fit the database model
 persons = [PersonModel(**person.__dict__dict_b) for person in persons]]
 with make_session() as session:
 session.add_all(persons)
 session.commit()

For question 2, I found this question to be useful, but would still want to know if this approach in general is good.
Added style improvement suggestions from @Reinderien

"""
Data mappers for Person
"""
# Abstract class for mapper
class Mapper(object):
 def __init__(self, data):
 self.data = data
# Data mapper for format A, maps the fields from dict_a to Person
class MapperA(Mapper):
 def __init__(self, data):
 self.name = ' '.join([data.get('name').get(key) for key in ['first_name', 'last_name']])
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 return all(key in data for key in ('name', 'workEmail'))
# Data mapper for format B, maps the fields from dict_b to Person
class MapperB(Mapper):
 def __init__(self, data):
 self.name = data.get('fullName')
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 return all(key in data for key in ('fullName', 'workEmail'))
# Creates a Person instance base on the input data mapping
def Person(data):
 for cls in Mapper.__subclasses__():
 if cls.is_mapper_for(data):
 return cls(data)
 raise NotImplementedError
if __name__ == '__main__':
 from database.connection import make_session
 from database.models import PersonModel
 # Sample data for example
 dict_a = {
 'name': {
 'first_name': 'John',
 'last_name': 'Doe'
 },
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 dict_b = {
 'fullName': 'John Doe',
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 # Instantiate Person from data
 persons = [Person(data) for data in [dict_a, dict_b]]
 # Store persons that fit the database model
 persons = [PersonModel(**person.__dict__) for person in persons]
 with make_session() as session:
 session.add_all(persons)
 session.commit()

For question 2, I found this question to be useful, but would still want to know if this approach in general is good.

"""
Data mappers for Person
"""
# Abstract class for mapper
class Mapper(object):
 def __init__(self, data):
 self.data = data
# Data mapper for format A, maps the fields from dict_a to Person
class MapperA(Mapper):
 def __init__(self, data):
 self.name = ' '.join(data.get('name', {}).get(key) for key in ('first_name', 'last_name'))
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 needed = {'name', 'workEmail'}
  return needed.issubset(set(data))
# Data mapper for format B, maps the fields from dict_b to Person
class MapperB(Mapper):
 def __init__(self, data):
 self.name = data.get('fullName')
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 needed = {'fullName', 'workEmail'}
  return needed.issubset(set(data))
# Creates a Person instance base on the input data mapping
def Person(data):
 for cls in Mapper.__subclasses__():
 if cls.is_mapper_for(data):
 return cls(data)
 raise NotImplementedError
if __name__ == '__main__':
 from database.connection import make_session
 from database.models import PersonModel
 # Sample data for example
 dict_a = {
 'name': {
 'first_name': 'John',
 'last_name': 'Doe'
 },
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 dict_b = {
 'fullName': 'John Doe',
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 # Instantiate Person from data
 persons = [PersonModel(**Person(data).__dict__ for data in (dict_a, dict_b)]
 with make_session() as session:
 session.add_all(persons)
 session.commit()

For question 2, I found this question to be useful, but would still want to know if this approach in general is good.
Added style improvement suggestions from @Reinderien

Tweeted twitter.com/StackCodeReview/status/1082336323699163137

occurred Jan 7, 2019 at 18:01

spelling and grammar

Source Link

edited Jan 7, 2019 at 15:57

Reinderien

edited Jan 7, 2019 at 15:57

Reinderien

70.9k
5
76
256

Sourcing data fromatformat from multiple different structures

I have limited experience in pythonPython programming and I'm building my first scraper application for a data engineering project that needs to scale to storing hundreds of thousands of Persons from tens of different structures. I was wondering if:

This is a good solution? What could be the drawbacks and problems down the line?
Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it,it; alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?

Sourcing data fromat from multiple different structures

I have limited experience in python programming and I'm building my first scraper application for a data engineering project that needs to scale to storing hundreds of thousands of Persons from tens of different structures I was wondering if:

This is a good solution? What could be the drawbacks and problems down the line?
Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it, alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?

Sourcing data format from multiple different structures

I have limited experience in Python programming and I'm building my first scraper application for a data engineering project that needs to scale to storing hundreds of thousands of Persons from tens of different structures. I was wondering if:

This is a good solution? What could be the drawbacks and problems down the line?
Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it; alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?