Skip to main content
Code Review

Return to Question

Narrowed down the question
Source Link
  1. This is a good solution? What could be the drawbacks and problems down the line?
  2. Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it; alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?
  1. This is a good solution? What could be the drawbacks and problems down the line?
  2. Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it; alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?
  1. This is a good solution? What could be the drawbacks and problems down the line?
  2. Currently I've implemented different subclasses for the mapping. Is there a convention or industry standard for these types of situations?
Minor edit
Source Link
"""
Data mappers for Person
"""
# Abstract class for mapper
class Mapper(object):
 def __init__(self, data):
 self.data = data
# Data mapper for format A, maps the fields from dict_a to Person
class MapperA(Mapper):
 def __init__(self, data):
 self.name = ' '.join([datadata.get('name', {}).get(key) for key in ['first_name'('first_name', 'last_name']]'last_name'))
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 returnneeded all(key= in{'name', data'workEmail'}
 for key in ('name', 'workEmail' return needed.issubset(set(data))
# Data mapper for format B, maps the fields from dict_b to Person
class MapperB(Mapper):
 def __init__(self, data):
 self.name = data.get('fullName')
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 returnneeded all(key= in{'fullName', data'workEmail'}
 for key in ('fullName', 'workEmail' return needed.issubset(set(data))
# Creates a Person instance base on the input data mapping
def Person(data):
 for cls in Mapper.__subclasses__():
 if cls.is_mapper_for(data):
 return cls(data)
 raise NotImplementedError
if __name__ == '__main__':
 from database.connection import make_session
 from database.models import PersonModel
 # Sample data for example
 dict_a = {
 'name': {
 'first_name': 'John',
 'last_name': 'Doe'
 },
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 dict_b = {
 'fullName': 'John Doe',
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 # Instantiate Person from data
 persons = [Person[PersonModel(**Person(data).__dict__ for data in [dict_a(dict_a, dict_b]]
 # Store persons that fit the database model
 persons = [PersonModel(**person.__dict__dict_b) for person in persons]]
 with make_session() as session:
 session.add_all(persons)
 session.commit()
  • For question 2, I found this question to be useful, but would still want to know if this approach in general is good.
  • Added style improvement suggestions from @Reinderien
"""
Data mappers for Person
"""
# Abstract class for mapper
class Mapper(object):
 def __init__(self, data):
 self.data = data
# Data mapper for format A, maps the fields from dict_a to Person
class MapperA(Mapper):
 def __init__(self, data):
 self.name = ' '.join([data.get('name').get(key) for key in ['first_name', 'last_name']])
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 return all(key in data for key in ('name', 'workEmail'))
# Data mapper for format B, maps the fields from dict_b to Person
class MapperB(Mapper):
 def __init__(self, data):
 self.name = data.get('fullName')
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 return all(key in data for key in ('fullName', 'workEmail'))
# Creates a Person instance base on the input data mapping
def Person(data):
 for cls in Mapper.__subclasses__():
 if cls.is_mapper_for(data):
 return cls(data)
 raise NotImplementedError
if __name__ == '__main__':
 from database.connection import make_session
 from database.models import PersonModel
 # Sample data for example
 dict_a = {
 'name': {
 'first_name': 'John',
 'last_name': 'Doe'
 },
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 dict_b = {
 'fullName': 'John Doe',
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 # Instantiate Person from data
 persons = [Person(data) for data in [dict_a, dict_b]]
 # Store persons that fit the database model
 persons = [PersonModel(**person.__dict__) for person in persons]
 with make_session() as session:
 session.add_all(persons)
 session.commit()
  • For question 2, I found this question to be useful, but would still want to know if this approach in general is good.
"""
Data mappers for Person
"""
# Abstract class for mapper
class Mapper(object):
 def __init__(self, data):
 self.data = data
# Data mapper for format A, maps the fields from dict_a to Person
class MapperA(Mapper):
 def __init__(self, data):
 self.name = ' '.join(data.get('name', {}).get(key) for key in ('first_name', 'last_name'))
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 needed = {'name', 'workEmail'}
  return needed.issubset(set(data))
# Data mapper for format B, maps the fields from dict_b to Person
class MapperB(Mapper):
 def __init__(self, data):
 self.name = data.get('fullName')
 self.email = data.get('workEmail')
 self.age = data.get('age')
 self.connected = data.get('connected')
 @classmethod
 def is_mapper_for(cls, data):
 needed = {'fullName', 'workEmail'}
  return needed.issubset(set(data))
# Creates a Person instance base on the input data mapping
def Person(data):
 for cls in Mapper.__subclasses__():
 if cls.is_mapper_for(data):
 return cls(data)
 raise NotImplementedError
if __name__ == '__main__':
 from database.connection import make_session
 from database.models import PersonModel
 # Sample data for example
 dict_a = {
 'name': {
 'first_name': 'John',
 'last_name': 'Doe'
 },
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 dict_b = {
 'fullName': 'John Doe',
 'workEmail': '[email protected]',
 'age': 50,
 'connected': False
 }
 # Instantiate Person from data
 persons = [PersonModel(**Person(data).__dict__ for data in (dict_a, dict_b)]
 with make_session() as session:
 session.add_all(persons)
 session.commit()
  • For question 2, I found this question to be useful, but would still want to know if this approach in general is good.
  • Added style improvement suggestions from @Reinderien
Tweeted twitter.com/StackCodeReview/status/1082336323699163137
spelling and grammar
Source Link
Reinderien
  • 70.9k
  • 5
  • 76
  • 256

Sourcing data fromatformat from multiple different structures

I have limited experience in pythonPython programming and I'm building my first scraper application for a data engineering project that needs to scale to storing hundreds of thousands of Persons from tens of different structures. I was wondering if:

  1. This is a good solution? What could be the drawbacks and problems down the line?
  2. Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it,it; alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?

Sourcing data fromat from multiple different structures

I have limited experience in python programming and I'm building my first scraper application for a data engineering project that needs to scale to storing hundreds of thousands of Persons from tens of different structures I was wondering if:

  1. This is a good solution? What could be the drawbacks and problems down the line?
  2. Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it, alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?

Sourcing data format from multiple different structures

I have limited experience in Python programming and I'm building my first scraper application for a data engineering project that needs to scale to storing hundreds of thousands of Persons from tens of different structures. I was wondering if:

  1. This is a good solution? What could be the drawbacks and problems down the line?
  2. Currently I've implemented different subclasses for the mapping, but I suppose it would be better if I could just pass any dictionary when instantiating a Person class. It would detect if there is an appropriate mapping and apply it; alternatively it would raise NotImplementedError if it is an unknown mapping or KeyError when an existing mapping has been changed. Is there a convention or industry standard for these types of situations?
Post Reopened by mdfst13, Sᴀᴍ Onᴇᴌᴀ , Heslacher, Martin R, t3chb0t
Implemented solution from Update
Source Link
Loading
Found partial answer for question 2.
Source Link
Loading
added 5 characters in body
Source Link
mdfst13
  • 22.4k
  • 6
  • 34
  • 70
Loading
Typos
Source Link
Loading
Updated the full solution under Implementation heading
Source Link
Loading
added 342 characters in body
Source Link
Loading
Post Closed as "Not suitable for this site" by Jamal
Source Link
Loading
lang-py

AltStyle によって変換されたページ (->オリジナル) /