Skip to main content
Code Review

Return to Question

More specific title; minor spelling fixes
Source Link
Toby Speight
  • 87.9k
  • 14
  • 104
  • 325

Input checking Read CSV, with date filtering and readability in a dataframe loaderresampling

I have written the following function to read multiple .csvCSV files into pandas dataframes. Depending on the use case, the user can pass optional resampling (frequency and method) and/or a date range (start/end). For both options I'd like to check if both keywords were given and raise erroserrors if not.

My issue is that reading the csvCSV files can potentially take quite a bit of time and it's quite frustrating to get the value error 5 minutes after you've called the function. I could duplicate the ifif statements at the top of the function. However I'd like to know if there is a more readable way or a best practice that avoids having the same if elseif/else statements multiple times.

def file_loader(filep,freq:str = None, method: str =None ,d_debut: str =None,d_fin: str =None):
 df= pd.read_csv(filep,\
 index_col=0,infer_datetime_format=True,parse_dates=[0],\
 header=0,names=['date',filep.split('.')[0]])\
 .sort_index()
 if d_debut is not None:
 if d_fin is None:
 raise ValueError("Please provide an end timestamp!")
 else:
 df=df.loc[ (df.index >= d_debut) & (df.index <= d_fin)]
 
 if freq is not None:
 if method is None:
 raise ValueError("Please provide a resampling method for the given frequency eg. 'last' ,'mean'")
 else:
 ## getattr sert à appeler ...resample(freq).last() etc avec les kwargs en string ex: freq='1D' et method ='mean'
 df= getattr(df.resample(freq), method)()
 return df

Input checking and readability in a dataframe loader

I have written the following function to read multiple .csv files into pandas dataframes. Depending on the use case, the user can pass optional resampling (frequency and method) and/or a date range (start/end). For both options I'd like to check if both keywords were given and raise erros if not.

My issue is that reading the csv files can potentially take quite a bit of time and it's quite frustrating to get the value error 5 minutes after you've called the function. I could duplicate the if statements at the top of the function. However I'd like to know if there is a more readable way or a best practice that avoids having the same if else statements multiple times.

def file_loader(filep,freq:str = None, method: str =None ,d_debut: str =None,d_fin: str =None):
 df= pd.read_csv(filep,\
 index_col=0,infer_datetime_format=True,parse_dates=[0],\
 header=0,names=['date',filep.split('.')[0]])\
 .sort_index()
 if d_debut is not None:
 if d_fin is None:
 raise ValueError("Please provide an end timestamp!")
 else:
 df=df.loc[ (df.index >= d_debut) & (df.index <= d_fin)]
 
 if freq is not None:
 if method is None:
 raise ValueError("Please provide a resampling method for the given frequency eg. 'last' ,'mean'")
 else:
 ## getattr sert à appeler ...resample(freq).last() etc avec les kwargs en string ex: freq='1D' et method ='mean'
 df= getattr(df.resample(freq), method)()
 return df

Read CSV, with date filtering and resampling

I have written the following function to read multiple CSV files into pandas dataframes. Depending on the use case, the user can pass optional resampling (frequency and method) and/or a date range (start/end). For both options I'd like to check if both keywords were given and raise errors if not.

My issue is that reading the CSV files can potentially take quite a bit of time and it's quite frustrating to get the value error 5 minutes after you've called the function. I could duplicate the if statements at the top of the function. However I'd like to know if there is a more readable way or a best practice that avoids having the same if/else statements multiple times.

def file_loader(filep,freq:str = None, method: str =None ,d_debut: str =None,d_fin: str =None):
 df= pd.read_csv(filep,\
 index_col=0,infer_datetime_format=True,parse_dates=[0],\
 header=0,names=['date',filep.split('.')[0]])\
 .sort_index()
 if d_debut is not None:
 if d_fin is None:
 raise ValueError("Please provide an end timestamp!")
 else:
 df=df.loc[ (df.index >= d_debut) & (df.index <= d_fin)]
 
 if freq is not None:
 if method is None:
 raise ValueError("Please provide a resampling method for the given frequency eg. 'last' ,'mean'")
 else:
 ## getattr sert à appeler ...resample(freq).last() etc avec les kwargs en string ex: freq='1D' et method ='mean'
 df= getattr(df.resample(freq), method)()
 return df
Source Link
kubatucka
  • 365
  • 2
  • 9

Input checking and readability in a dataframe loader

I have written the following function to read multiple .csv files into pandas dataframes. Depending on the use case, the user can pass optional resampling (frequency and method) and/or a date range (start/end). For both options I'd like to check if both keywords were given and raise erros if not.

My issue is that reading the csv files can potentially take quite a bit of time and it's quite frustrating to get the value error 5 minutes after you've called the function. I could duplicate the if statements at the top of the function. However I'd like to know if there is a more readable way or a best practice that avoids having the same if else statements multiple times.

def file_loader(filep,freq:str = None, method: str =None ,d_debut: str =None,d_fin: str =None):
 df= pd.read_csv(filep,\
 index_col=0,infer_datetime_format=True,parse_dates=[0],\
 header=0,names=['date',filep.split('.')[0]])\
 .sort_index()
 if d_debut is not None:
 if d_fin is None:
 raise ValueError("Please provide an end timestamp!")
 else:
 df=df.loc[ (df.index >= d_debut) & (df.index <= d_fin)]
 
 if freq is not None:
 if method is None:
 raise ValueError("Please provide a resampling method for the given frequency eg. 'last' ,'mean'")
 else:
 ## getattr sert à appeler ...resample(freq).last() etc avec les kwargs en string ex: freq='1D' et method ='mean'
 df= getattr(df.resample(freq), method)()
 return df
lang-py

AltStyle によって変換されたページ (->オリジナル) /