How to keep track of previous date record column in pandas dataframe?

后端未结

关注

 1  751

This question is referenced from the this SO Question.

I want to perform some data analysis on pandas Dataframe. I have one dataframe like below:


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2021-01-25 11:56
              
            
            
                                                                       
What you can do is merge the dataframe into itself, after computing the month number (from the date), and the previous one as well.

Let's start with computing those 2 values. For convenience purposes, I firstly converted the raw month string value to datetime, which allowed me to use relativedelta to compute the previous month. This ensures behaviour is correct, even after a change of year.

In [7]: df['month'] = pd.to_datetime(df['month'])

In [8]: df['month_num'] = df['month'].apply(lambda x: x.strftime('%Y-%m'))

In [9]: from dateutil.relativedelta import relativedelta

In [10]: df['previous_month_num'] = df['month'].apply(lambda x: (x + relativedelta(months=-1)).strftime('%Y-%m'))

In [11]: df
Out[11]:
     city      month person_count person_name person_symbol sir  sport_name  \
0  mumbai 2017-01-23           10      ramesh           ram   a    football
1  mumbai 2017-01-23           14      ramesh           mum   a    football
2   delhi 2017-01-23           25      ramesh           mum   a    football
3   delhi 2017-01-23           20      ramesh           ram   a    football
4  mumbai 2017-02-22           34      ramesh           ram   b    football
5  mumbai 2017-02-22           23      ramesh           mum   b    football
6   delhi 2017-02-22           43      ramesh           mum   b    football
7   delhi 2017-02-22           34      ramesh           ram   b    football
8    pune 2017-03-03           10      mahesh           mah   c  basketball
9  nagpur 2017-03-03           20      mahesh           mah   c  basketball

  month_num previous_month_num
0   2017-01            2016-12
1   2017-01            2016-12
2   2017-01            2016-12
3   2017-01            2016-12
4   2017-02            2017-01
5   2017-02            2017-01
6   2017-02            2017-01
7   2017-02            2017-01
8   2017-03            2017-02
9   2017-03            2017-02


We can then merge the dataframe into itself, using the computed month values as merging keys:

In [12]: relevant_columns = ['city', 'person_symbol', 'sport_name']

In [13]: pd.merge(df, df, left_on=relevant_columns + ['previous_month_num'], right_on=rele
    ...: vant_columns + ['month_num'], how='left', suffixes=('', '_previous'))[list(df.col
    ...: umns) + ['person_count_previous']].fillna(0).drop(['month_num', 'previous_month_n
    ...: um'], axis=1)
Out[13]:
     city      month person_count person_name person_symbol sir  sport_name  \
0  mumbai 2017-01-23           10      ramesh           ram   a    football
1  mumbai 2017-01-23           14      ramesh           mum   a    football
2   delhi 2017-01-23           25      ramesh           mum   a    football
3   delhi 2017-01-23           20      ramesh           ram   a    football
4  mumbai 2017-02-22           34      ramesh           ram   b    football
5  mumbai 2017-02-22           23      ramesh           mum   b    football
6   delhi 2017-02-22           43      ramesh           mum   b    football
7   delhi 2017-02-22           34      ramesh           ram   b    football
8    pune 2017-03-03           10      mahesh           mah   c  basketball
9  nagpur 2017-03-03           20      mahesh           mah   c  basketball

  person_count_previous
0                     0
1                     0
2                     0
3                     0
4                    10
5                    14
6                    25
7                    20
8                     0
9                     0


Some comments:


I used ['city', 'person_symbol', 'sport_name'] as the reference columns, but feel free to add some more, depending on what exactly you want to achieve.
The new column is named person_count_previous, but you can rename it, should it be best for you.
By default, when there is no match for the previous count, the column will be NaN. I replaced the values with 0, thanks to fillna.
I removed the "temporary" columns using drop, but feel free to keep them.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复