My goal is to create an object that behaves the same as a Pandas DataFrame, but with a few extra methods of my own on top of it. As far as I understand, one approach would be to extend the class, which I first tried to do as follows:
class CustomDF(pd.DataFrame):
def __init__(self, filename):
self = pd.read_csv(filename)
But I get errors when trying to view this object, saying: 'CustomDF' object has no attribute '_data'
.
My second iteration was to instead not inherit the object, but rather import it as a DataFrame into one of the object attributes, and have the methods work around it, like this:
class CustomDF():
def __init__(self, filename):
self.df = pd.read_csv(filename)
def custom_method_1(self,a,b,...):
...
def custom_method_2(self,a,b,...):
...
This is fine, except that for all custom methods, I need to access the self.df
attribute first to do anything on it, but I would prefer that my custom dataframe were just self
.
Is there a way that this can be done? Or is this approach not ideal anyway?
4 Answers 4
The __init__
method is overwritten in your first example.
Use super
and then add your custom code
class CustomDF(pd.DataFrame):
def __init__(self, *args, **kw):
super(CustomDF, self).__init__(*args, **kw)
# Your code here
def custom_method_1(self,a,b,...):
...
-
Thanks, I tried this though and didn't quite get what I expected. How would I put in my line where I want to assign
pd.read_csv(filename)
toself
?teepee– teepee2018年11月29日 17:59:41 +00:00Commented Nov 29, 2018 at 17:59
Is this what you were looking for?
class CustomDF:
def __init__(self):
self.df = pd.read_csv(filename)
def custom_method_1(self, *args, **kwargs):
result_1 = do_custom_operations_on(self.df, *args, **kwargs)
return result_1
def custom_method_2(self, *args, **kwargs):
result_2 = do_custom_operations_on(self.df, *args, **kwargs)
return result_2
...
I would probably go with the decorator pattern here. The accepted answer for this post will put you on the right track.
I see that your first iteration would be really cool, but it seem to me you need to know quite a lot of stuff about Pandas' internals, e.g., that this _data
attribute need to be set in a certain way.
Cheers.
In my project I did something similar and use decorators, like manu suggested. The decorator @property
might work for you, it basically turns the method .df()
into a property .df
. Therefore it will only be read in when it's called specifically. But this only works on instances of the class.
class CustomDF:
@property
def df(self):
return pd.read_csv(filename)
def custom_method_1(self, *args, **kwargs):
result_1 = do_custom_operations_on(self.df, *args, **kwargs)
return result_1
def custom_method_2(self, *args, **kwargs):
result_2 = do_custom_operations_on(self.df, *args, **kwargs)
return result_2
super().__init__(...)
in your overridden init so that the rest of the setup that occurs inpd.DataFrame.__init__()
also happens in your custom class.pd.read_csv(filename)
portion of this code?super(CustomDF, self).__init__(pd.read_csv(filename))
. Would this be an appropriate solution, or does this introduce any problems?