How to resample a dataframe with different functions applied to each column?

后端未结

关注

 4  1345

I have a times series with temperature and radiation in a pandas dataframe. The time resolution is 1 minute in regular steps.

import datetime
im


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2020-12-07 16:48
              
            
            
                                                                       
With pandas 0.18 the resample API changed (see the docs). 
So for pandas >= 0.18 the answer is:

In [31]: frame.resample('1H').agg({'radiation': np.sum, 'tamb': np.mean})
Out[31]: 
                         tamb   radiation
2012-04-05 08:00:00  5.161235  279.507182
2012-04-05 09:00:00  4.968145  290.941073
2012-04-05 10:00:00  4.478531  317.678285
2012-04-05 11:00:00  4.706206  335.258633
2012-04-05 12:00:00  2.457873    8.655838




Old Answer:

I am answering my question to reflect the time series related changes in pandas >= 0.8 (all other answers are outdated).

Using pandas >= 0.8 the answer is:

In [30]: frame.resample('1H', how={'radiation': np.sum, 'tamb': np.mean})
Out[30]: 
                         tamb   radiation
2012-04-05 08:00:00  5.161235  279.507182
2012-04-05 09:00:00  4.968145  290.941073
2012-04-05 10:00:00  4.478531  317.678285
2012-04-05 11:00:00  4.706206  335.258633
2012-04-05 12:00:00  2.457873    8.655838

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2020-12-07 16:51
              
            
            
                                                                       
You need to use groupby as such:

grouped = frame.groupby(lambda x: x.hour)
grouped.agg({'radiation': np.sum, 'tamb': np.mean})
# Same as: grouped.agg({'radiation': 'sum', 'tamb': 'mean'})


with the output being:

        radiation      tamb
key_0                      
8      298.581107  4.883806
9      311.176148  4.983705
10     315.531527  5.343057
11     288.013876  6.022002
12       5.527616  8.507670


So in essence I am splitting on the hour value and then calculating the mean of tamb and the sum of radiation and returning back the DataFrame (similar approach to R's ddply). For more info I would check the documentation page for groupby as well as this blog post.

Edit: To make this scale a bit better you could group on both the day and time as such:

grouped = frame.groupby(lambda x: (x.day, x.hour))
grouped.agg({'radiation': 'sum', 'tamb': 'mean'})
          radiation      tamb
key_0                        
(5, 8)   298.581107  4.883806
(5, 9)   311.176148  4.983705
(5, 10)  315.531527  5.343057
(5, 11)  288.013876  6.022002
(5, 12)    5.527616  8.507670

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-07 16:57
              
            
            
                                                                       
To tantalize you, in pandas 0.8.0 (under heavy development in the timeseries branch on GitHub), you'll be able to do:

In [5]: frame.convert('1h', how='mean')
Out[5]: 
                     radiation      tamb
2012-04-05 08:00:00   7.840989  8.446109
2012-04-05 09:00:00   4.898935  5.459221
2012-04-05 10:00:00   5.227741  4.660849
2012-04-05 11:00:00   4.689270  5.321398
2012-04-05 12:00:00   4.956994  5.093980


The above mentioned methods are the right strategy with the current production version of pandas.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2020-12-07 17:03
              
            
            
                                                                       
You can also downsample using the asof method of pandas.DateRange objects.

In [21]: hourly = pd.DateRange(datetime.datetime(2012, 4, 5, 8, 0),
...                          datetime.datetime(2012, 4, 5, 12, 0),
...                          offset=pd.datetools.Hour())

In [22]: frame.groupby(hourly.asof).size()
Out[22]: 
key_0
2012-04-05 08:00:00    60
2012-04-05 09:00:00    60
2012-04-05 10:00:00    60
2012-04-05 11:00:00    60
2012-04-05 12:00:00    1
In [23]: frame.groupby(hourly.asof).agg({'radiation': np.sum, 'tamb': np.mean})
Out[23]: 
                     radiation  tamb 
key_0                                
2012-04-05 08:00:00  271.54     4.491
2012-04-05 09:00:00  266.18     5.253
2012-04-05 10:00:00  292.35     4.959
2012-04-05 11:00:00  283.00     5.489
2012-04-05 12:00:00  0.5414     9.532

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复