I have the following df with data about the American quarterly GDP in billions of chained 2009 dollars, from 1947q1 to 2016q2:
df = pd.DataFrame(data = [1934
This is probably overkill for this particular example... In a nutshell this is a bit more complicated than @zaq's answer but also much faster (about 9x here, and the difference would be much bigger on larger datasets) because it's vectorized instead of looped. But for this very small dataset here, clearly you would go with the simpler answer since even the slower way is fast enough. Finally, it stores the data in the dataframe itself rather than as a tuple (which could be an advantage or disadvantage, depending on the situation).
Thanks to @zaq for pointing out that I misread the question initially. I believe this now gives the same answer as zaq's except we have different implicit assumptions about the initial state of the world (beginning in recession or not) which is indeterminate in the data provided.
df['change'] = df.diff() # change in GDP from prior quarter
start = (df.change<0) & (df.change.shift(-1)<0) # potential start
end = (df.change>0) & (df.change.shift(-1)>0) # potential end
df['recess' ] = np.nan
df.loc[ start, 'recess' ] = -1
df.loc[ end, 'recess' ] = 1
df['recess'] = df.recess.ffill() # if the current row doesn't fit the
# definition of a start or end, then
# fill it with the prior row value
df['startend'] = np.nan
df.loc[ (df.recess==-1) & (df.recess.shift()== 1), 'startend'] = -1 # start
df.loc[ (df.recess== 1) & (df.recess.shift()==-1), 'startend'] = 1 # end
df[df.startend.notnull()]
GDP change recess startend
quarter
1947q4 1960.7 30.4 1.0 1.0
1949q1 2007.5 -27.8 -1.0 -1.0
1950q1 2084.6 79.9 1.0 1.0
1953q3 2578.9 -14.6 -1.0 -1.0
1954q2 2530.7 2.7 1.0 1.0
1957q4 2846.4 -29.5 -1.0 -1.0
1958q2 2790.9 18.2 1.0 1.0
1969q4 4715.5 -20.6 -1.0 -1.0
1970q2 4715.4 8.3 1.0 1.0
1974q3 5378.7 -52.6 -1.0 -1.0
1975q2 5333.2 40.8 1.0 1.0
1980q2 6392.6 -132.3 -1.0 -1.0
1980q4 6501.2 118.3 1.0 1.0
1981q4 6585.1 -77.8 -1.0 -1.0
1982q4 6493.1 6.3 1.0 1.0
1990q4 8907.4 -76.5 -1.0 -1.0
1991q2 8934.4 68.8 1.0 1.0
2008q3 14891.6 -71.8 -1.0 -1.0
2009q3 14402.5 46.9 1.0 1.0
One issue with your code is that you are not tracking the current status of the economy. If there are 20 consecutive quarters of GDP decline, your code will report 18 recessions beginning. And if there are 20 quarters of growth, it will report 18 recessions ending even if there wasn't one to begin with.
So, I introduce a Boolean recession
to indicate whether we are in recession currently. Other changes: chained inequalities like a < b < c
work in Python as expected, and improve readability; also, your column name is so verbose that I used positional indexing iloc
instead, to have readable conditions in if-statements.
lst_start = []
lst_end = []
recession = False
for i in range(1, len(df)-1):
if not recession and (df.iloc[i-1, 0] > df.iloc[i, 0] > df.iloc[i+1, 0]):
recession = True
lst_start.append(df.index[i])
elif recession and (df.iloc[i-1, 0] < df.iloc[i, 0] < df.iloc[i+1, 0]):
recession = False
lst_end.append(df.index[i])
print(list(zip(lst_start, lst_end)))