This question is referenced from the this SO Question.
I want to perform some data analysis on pandas Dataframe. I have one dataframe like below:
What you can do is merge the dataframe into itself, after computing the month number (from the date), and the previous one as well.
Let's start with computing those 2 values. For convenience purposes, I firstly converted the raw month
string value to datetime, which allowed me to use relativedelta to compute the previous month. This ensures behaviour is correct, even after a change of year.
In [7]: df['month'] = pd.to_datetime(df['month'])
In [8]: df['month_num'] = df['month'].apply(lambda x: x.strftime('%Y-%m'))
In [9]: from dateutil.relativedelta import relativedelta
In [10]: df['previous_month_num'] = df['month'].apply(lambda x: (x + relativedelta(months=-1)).strftime('%Y-%m'))
In [11]: df
Out[11]:
city month person_count person_name person_symbol sir sport_name \
0 mumbai 2017-01-23 10 ramesh ram a football
1 mumbai 2017-01-23 14 ramesh mum a football
2 delhi 2017-01-23 25 ramesh mum a football
3 delhi 2017-01-23 20 ramesh ram a football
4 mumbai 2017-02-22 34 ramesh ram b football
5 mumbai 2017-02-22 23 ramesh mum b football
6 delhi 2017-02-22 43 ramesh mum b football
7 delhi 2017-02-22 34 ramesh ram b football
8 pune 2017-03-03 10 mahesh mah c basketball
9 nagpur 2017-03-03 20 mahesh mah c basketball
month_num previous_month_num
0 2017-01 2016-12
1 2017-01 2016-12
2 2017-01 2016-12
3 2017-01 2016-12
4 2017-02 2017-01
5 2017-02 2017-01
6 2017-02 2017-01
7 2017-02 2017-01
8 2017-03 2017-02
9 2017-03 2017-02
We can then merge the dataframe into itself, using the computed month values as merging keys:
In [12]: relevant_columns = ['city', 'person_symbol', 'sport_name']
In [13]: pd.merge(df, df, left_on=relevant_columns + ['previous_month_num'], right_on=rele
...: vant_columns + ['month_num'], how='left', suffixes=('', '_previous'))[list(df.col
...: umns) + ['person_count_previous']].fillna(0).drop(['month_num', 'previous_month_n
...: um'], axis=1)
Out[13]:
city month person_count person_name person_symbol sir sport_name \
0 mumbai 2017-01-23 10 ramesh ram a football
1 mumbai 2017-01-23 14 ramesh mum a football
2 delhi 2017-01-23 25 ramesh mum a football
3 delhi 2017-01-23 20 ramesh ram a football
4 mumbai 2017-02-22 34 ramesh ram b football
5 mumbai 2017-02-22 23 ramesh mum b football
6 delhi 2017-02-22 43 ramesh mum b football
7 delhi 2017-02-22 34 ramesh ram b football
8 pune 2017-03-03 10 mahesh mah c basketball
9 nagpur 2017-03-03 20 mahesh mah c basketball
person_count_previous
0 0
1 0
2 0
3 0
4 10
5 14
6 25
7 20
8 0
9 0
Some comments:
['city', 'person_symbol', 'sport_name']
as the reference columns, but feel free to add some more, depending on what exactly you want to achieve.person_count_previous
, but you can rename it, should it be best for you.