I've got something really simple this time where I'm mapping pandas' Series
to dataclass
es with a oneliner helper function (as I have several models):
import pandas as pd
from typing import Any
from dataclasses import dataclass, fields
def create_dataclass(data: pd.Series, factory: Any) -> Any:
return factory(**{f.name: data[f.name] for f in fields(factory)})
I call it like this:
@dataclass
class Person:
first_name: str
last_name: str
@dataclass
class Employee(Person):
company: str
def create_employees(data: pd.DataFrame) -> List[Employee]:
return [create_dataclass(r, Employee) for i, r in data.iterrows()]
Do you think it still could be more pythonic?
2 Answers 2
Looks very pythonic to me.
Thumbs up, LGTM, ship it!
Ok, fine, I have a few minor remarks.
Maybe the Any
annotations could be
finessed a bit to be more informative?
Or maybe just drop the -> Any:
.
return factory(**{f.name: data[f.name] for f in fields(factory)})
The **
double star is as pythonic as it gets.
But notice that what we really care about is name
.
So perhaps
from operator import attrgetter
...
return factory(**{name: data[name] for name in map(attrgetter('name'), fields(factory))})
Hmmm, not sure that longer works out to a win. Prolly better to keep the code as-is.
... for _, r in data.iterrows()]
nit: Prefer row
over r
. Whatever.
Like I said, ship it.
This looks great, exactly what I need, I'm stealing it. :)
I'm using this in a base class PandasClass
, with the functions that create dataclass
instances as classmethod
s. In this way, any dataclass
that inherits from PandasClass
gets the create_employees
(and similar) "for free".
from typing import List, Self
@dataclass
class PandasClass
@classmethod
def create_dataclass(cls, row: pd.Series) -> Self:
return cls(**{f.name: row[f.name] for f in fields(cls)})
@classmethod
def create_dataclass_list(cls, dataframe: pd.DataFrame) -> List[Self]:
return [cls.create_dataclass(row) for _, row in dataframe.iterrows()]
So in the OP's example, if Person inherits from PandasClass
class Person(PandasClass):
...
then we can call Employee.create_dataclass_list(df)
as the original create_employees(df)
.
Explore related questions
See similar questions with these tags.