2
\$\begingroup\$

For an assignment, I am identifying the first quarter of the 2008 recession in the United States. The Excel data I'm using can be downloaded here: gdplev.xls. How can I improve this pandas code to make it more idiomatic or optimized?

def get_recession_start():
 '''Returns the year and quarter of the recession start time as a 
 string value in a format such as 2005q3'''
 GDP_df = pd.read_excel("gdplev.xls", 
 names=["Quarter", "GDP in 2009 dollars"], 
 parse_cols = "E,G", 
 skiprows = 7)
 GDP_df = GDP_df.query("Quarter >= '2000q1'")
 GDP_df["Growth"] = GDP_df["GDP in 2009 dollars"].pct_change()
 GDP_df = GDP_df.reset_index(drop=True)
 # recession defined as two consecutive quarters of negative growth
 GDP_df["Recession"] = (GDP_df.Growth < 0) & (GDP_df.Growth.shift(-1) < 0) 
 return GDP_df.iloc[GDP_df["Recession"].idxmax()]["Quarter"]
get_recession_start()
Georgy
1,9972 gold badges15 silver badges27 bronze badges
asked Feb 15, 2019 at 7:05
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Your function does too many things: reading Excel file, filtering necessary rows, and calculating the "recession_start". My advice is to take the first two out.

Also, supply quarters and GDP as separate objects (pd.Series) to the function instead of the DataFrame. Like this you will remove hardcoded strings from your function, and, what's more important, you will get rid of SettingWithCopyWarning that you should get right now:

df = pd.read_excel('gdplev.xls',
 names=['Quarter', 'GDP in 2009 dollars'],
 usecols='E,G',
 skiprows=7)
mask = df['Quarter'] >= '2000q1'
print(get_recession_start(quarters=df.loc[mask, 'Quarter'],
 gdps=df.loc[mask, 'GDP in 2009 dollars']))

Note that I use usecols instead of parse_cols as it is deprecated. Also, I removed df.query in favor of boolean masking and .loc.

Then, the function would look like this:

def get_recession_start(quarters: pd.Series,
 gdps: pd.Series) -> str:
 """
 Returns the year and quarter of the recession start time
 as a string value in a format such as 2005q3
 """
 growth = gdps.pct_change()
 recession = (growth < 0) & (growth.shift(-1) < 0)
 recession = recession.reset_index(drop=True)
 return quarters.iloc[recession.idxmax()]

Here I also used triple double quotes for the docstring and type hints. IMHO, this looks much cleaner.

Probably, it would also make sense to return only the recession.idxmax() index and get corresponding quarters value outside of the function.

answered Feb 15, 2019 at 13:01
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.