I want to create a special type of object, based on the pandas.DataFrame
object, which will always be created based on a particular input file type.
I have been able to design a class that can be created the same way as a normal DataFrame, i.e.:
class CustomDF(pd.DataFrame):
...
Obj = CustomDF({'a':[1,2],'b':[3,4]})
But I want to change the initialization behaviour to accept a csv filename and import it. I know Pandas allows this using:
df = pd.read_csv(filename)
But I can't get it to work within my new class when I do:
class CustomDF(pd.DataFrame):
def __init__(self, filename):
self = pd.read_csv(filename)
And although there is no error when I create an object with this class, I do get the error 'CustomDF' object has no attribute '_data'
when trying to access it.
I have tried changing self = pd.read_csv(filename)
to self._data = pd.read_csv(filename)
or self.data = pd.read_csv(filename)
but this doesn't have any effect.
What is the proper way to accomplish this? Is there a better approach to doing this same thing?
1 Answer 1
I tried your code and the error I got was
main:4: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
This basically means that pandas has an issue when you do the following step
self.data = pd.read_csv(filename)
The reason is that your CustomDF
is a dataframe. Think of a dataframe df
. When you df.someColumnName
, you get the values of that column. when you try to do something like df.someColumnName = something
, you are trying to create a new column. You cannot create a column like that.
I removed the inheritance of pd.DataFrame
in CustomDF
and it works fine.
import pandas as pd
class CustomDF():
def __init__(self, filename):
self.data = pd.read_csv(filename)
csdf = CustomDF("breast_cancer_wisconsin.csv")
3 Comments
df_custom = CustomDF('./myfile.csv')
, then df_custom
is not my object like I would hope, but I would have to access it using df_custom.data
instead, right?pd.DataFrame
properties will be available in df_custom.data
instead.