2
\$\begingroup\$

I have written the following function to read multiple CSV files into pandas dataframes. Depending on the use case, the user can pass optional resampling (frequency and method) and/or a date range (start/end). For both options I'd like to check if both keywords were given and raise errors if not.

My issue is that reading the CSV files can potentially take quite a bit of time and it's quite frustrating to get the value error 5 minutes after you've called the function. I could duplicate the if statements at the top of the function. However I'd like to know if there is a more readable way or a best practice that avoids having the same if/else statements multiple times.

def file_loader(filep,freq:str = None, method: str =None ,d_debut: str =None,d_fin: str =None):
 df= pd.read_csv(filep,\
 index_col=0,infer_datetime_format=True,parse_dates=[0],\
 header=0,names=['date',filep.split('.')[0]])\
 .sort_index()
 if d_debut is not None:
 if d_fin is None:
 raise ValueError("Please provide an end timestamp!")
 else:
 df=df.loc[ (df.index >= d_debut) & (df.index <= d_fin)]
 
 if freq is not None:
 if method is None:
 raise ValueError("Please provide a resampling method for the given frequency eg. 'last' ,'mean'")
 else:
 ## getattr sert à appeler ...resample(freq).last() etc avec les kwargs en string ex: freq='1D' et method ='mean'
 df= getattr(df.resample(freq), method)()
 return df
Toby Speight
87.2k14 gold badges104 silver badges322 bronze badges
asked Oct 6, 2021 at 13:19
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

It's normal to do the parameter checking first:

def file_loader(filep, freq: str = None, method: str = None,
 d_debut: str = None, d_fin: str = None):
 if d_debut is not None and d_fin is None:
 raise ValueError("Please provide an end timestamp!")
 if freq is not None and method is None:
 raise ValueError("Please provide a resampling method for the given frequency e.g. 'last' ,'mean'")

You might prefer to check that d_fin and d_debut are either both provided or both defaulted, rather than allowing d_fin without d_debut as at present:

 if (d_debut is None) != (d_fin is None):
 raise ValueError("Please provide both start and end timestamps!")

Then after loading, the conditionals are simple:

 if d_debut is not None:
 assert(d_fin is not None) # would have thrown earlier
 df = df.loc[df.index >= d_debut and df.index <= d_fin]
 
 if freq is not None:
 assert(method is not None) # would have thrown earlier
 df = getattr(df.resample(freq), method)()

The assertions are there to document something we know to be true. You could omit them safely, but they do aid understanding.


It may make sense to provide a useful default method rather than None. Similarly, could the df.index >= d_debut and df.index <= d_fin test be adapted to be happy with one or the other cutoff missing? For example:

 if d_debut is not None or d_fin is not None:
 df = df.loc[(d_debut is None or df.index >= d_debut) and 
 (d_fin is None or df.index <= d_fin)]

Then we wouldn't need the parameter checks at all.


Code style - please take more care with whitespace (consult PEP-8) for maximum readability. Lots of this code seems unnecessarily bunched-up.

answered Oct 6, 2021 at 14:20
\$\endgroup\$
2
  • \$\begingroup\$ Thanks for your answer. I'd rather the function force the user to understand exactly what data he is getting, hence the strict checks. I don't understand why you have assert statements. As you've commented, errors would have been raised at the top. And my bad for the styling. \$\endgroup\$ Commented Oct 6, 2021 at 14:29
  • \$\begingroup\$ The asserts are there to document something we know to be true. You could omit them safely, but they do aid understanding. \$\endgroup\$ Commented Oct 6, 2021 at 14:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.