0

Hello Stackoverflow folks,... I hope this questions is not already answered. After half a day of googeling I did resign myself to asking a question here. My problem is the following:

I want to create a class which takes some information and processes this information:

 #Klassendefinition für eine Instanz von Rohdaten
class raw_data(): 
 def __init__(self, filename_rawdata, filename_metadata,
 file_format, path, category, df_raw, df_meta):
 self.filename_rawdata = filename_rawdata
 self.filename_metadata = filename_metadata
 self.file_format = file_format
 self.path = path
 self.category = category
 self.df_raw = getDF(self.filename_rawdata)
 self.df_meta = getDF(self.filename_metadata)
 # generator
 def parse(self, path):
 g = gzip.open(path, 'rb')
 for l in g:
 yield eval(l)
 # function that returns a pandas dataframe with the data
 def getDF(self, filename):
 i = 0
 df = {}
 for d in self.parse(filename):
 df[i] = d
 i += 1
 return pd.DataFrame.from_dict(df, orient='index')

Now I have a problem with the init method, I would like to run the class method below on default when the class in instantiated, but I somehow cannot manage to get this working. I have seen several other posts here like [Calling a class function inside of __init__ [1]: Python 3: Calling a class function inside of __init__ but I am still not able to do it. The first question did work for me, but I would like to call the instance variable after the constructor ran.

I tried this:

class raw_data(): 
 def __init__(self, filename_rawdata, filename_metadata,
 file_format, path, category):
 self.filename_rawdata = filename_rawdata
 self.filename_metadata = filename_metadata
 self.file_format = file_format
 self.path = path
 self.category = category
 getDF(self.filename_rawdata)
 getDF(self.filename_metadata)
 # generator
 def parse(self, path):
 g = gzip.open(path, 'rb')
 for l in g:
 yield eval(l)
 # function that returns a pandas dataframe with the data
 def getDF(self, filename):
 i = 0
 df = {}
 for d in self.parse(filename):
 df[i] = d
 i += 1
 return pd.DataFrame.from_dict(df, orient='index')

But I get an error because getDF is not defined (obviously).. I hope this questions is not silly by any means. I need to do it that way, because afterwards I want to run like 50-60 instance calls and I do not want to repeat like Instance.getDF() ... for every instance, but rather would like to have it called directly.

asked Feb 26, 2019 at 15:19
3
  • 3
    self.getDF(...) Commented Feb 26, 2019 at 15:24
  • As an aside, raw_data.getDF can be reduced to return pd.DataFrame.from_dict(dict(enumerate(self.parse(filename)))) or (I think) return pd.DataFrame.from_records(enumerate(self.parse(filename))). Commented Feb 26, 2019 at 15:29
  • Also, neither parse nor getDF need to be methods of your class; they could be defined as regular functions outside the class, or, if you really want to keep them in your class's namespace, be made static methods. Commented Feb 26, 2019 at 15:32

1 Answer 1

0

All you need to so is call getDF like any other method, using self as the object on which it should be invoked.

self.df_raw = self.getDF(self.filename_rawdata)

That said, this class could be greatly simplified by making it a dataclass.

from dataclasses import dataclass
@dataclass
class RawData:
 filename_rawdata: str
 filename_metadata: str
 path: str
 category: str
 def __post_init__(self):
 self.df_raw = self.getDF(self.filename_rawdata)
 self.df_meta = self.getDF(self.filename_metadata)
 @staticmethod
 def parse(path):
 with gzip.open(path, 'rb') as g:
 yield from map(eval, g)
 @staticmethod
 def getDF(filename):
 return pd.DataFrame.from_records(enumerate(RawData.parse(filename)))

The auto-generated __init__ method will set the four defined attributes for you. __post_init__ will be called after __init__, giving you the opportunity to call getDF on the two given file names.

answered Feb 26, 2019 at 15:42
Sign up to request clarification or add additional context in comments.

1 Comment

Ah shit, I did look at the code x-times but somehow did not see that I am missing the ".self" before the function :-( sorry this was stupid from me. Never heard of this "dataclasses" before, this was an eyeopener for me, thank you so much!!

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.