How to resample a dataframe with different functions applied to each column?

后端 未结 4 1345
旧巷少年郎
旧巷少年郎 2020-12-07 16:04

I have a times series with temperature and radiation in a pandas dataframe. The time resolution is 1 minute in regular steps.

import datetime
im         


        
相关标签:
4条回答
  • 2020-12-07 16:48

    With pandas 0.18 the resample API changed (see the docs). So for pandas >= 0.18 the answer is:

    In [31]: frame.resample('1H').agg({'radiation': np.sum, 'tamb': np.mean})
    Out[31]: 
                             tamb   radiation
    2012-04-05 08:00:00  5.161235  279.507182
    2012-04-05 09:00:00  4.968145  290.941073
    2012-04-05 10:00:00  4.478531  317.678285
    2012-04-05 11:00:00  4.706206  335.258633
    2012-04-05 12:00:00  2.457873    8.655838
    

    Old Answer:

    I am answering my question to reflect the time series related changes in pandas >= 0.8 (all other answers are outdated).

    Using pandas >= 0.8 the answer is:

    In [30]: frame.resample('1H', how={'radiation': np.sum, 'tamb': np.mean})
    Out[30]: 
                             tamb   radiation
    2012-04-05 08:00:00  5.161235  279.507182
    2012-04-05 09:00:00  4.968145  290.941073
    2012-04-05 10:00:00  4.478531  317.678285
    2012-04-05 11:00:00  4.706206  335.258633
    2012-04-05 12:00:00  2.457873    8.655838
    
    0 讨论(0)
  • 2020-12-07 16:51

    You need to use groupby as such:

    grouped = frame.groupby(lambda x: x.hour)
    grouped.agg({'radiation': np.sum, 'tamb': np.mean})
    # Same as: grouped.agg({'radiation': 'sum', 'tamb': 'mean'})
    

    with the output being:

            radiation      tamb
    key_0                      
    8      298.581107  4.883806
    9      311.176148  4.983705
    10     315.531527  5.343057
    11     288.013876  6.022002
    12       5.527616  8.507670
    

    So in essence I am splitting on the hour value and then calculating the mean of tamb and the sum of radiation and returning back the DataFrame (similar approach to R's ddply). For more info I would check the documentation page for groupby as well as this blog post.

    Edit: To make this scale a bit better you could group on both the day and time as such:

    grouped = frame.groupby(lambda x: (x.day, x.hour))
    grouped.agg({'radiation': 'sum', 'tamb': 'mean'})
              radiation      tamb
    key_0                        
    (5, 8)   298.581107  4.883806
    (5, 9)   311.176148  4.983705
    (5, 10)  315.531527  5.343057
    (5, 11)  288.013876  6.022002
    (5, 12)    5.527616  8.507670
    
    0 讨论(0)
  • 2020-12-07 16:57

    To tantalize you, in pandas 0.8.0 (under heavy development in the timeseries branch on GitHub), you'll be able to do:

    In [5]: frame.convert('1h', how='mean')
    Out[5]: 
                         radiation      tamb
    2012-04-05 08:00:00   7.840989  8.446109
    2012-04-05 09:00:00   4.898935  5.459221
    2012-04-05 10:00:00   5.227741  4.660849
    2012-04-05 11:00:00   4.689270  5.321398
    2012-04-05 12:00:00   4.956994  5.093980
    

    The above mentioned methods are the right strategy with the current production version of pandas.

    0 讨论(0)
  • 2020-12-07 17:03

    You can also downsample using the asof method of pandas.DateRange objects.

    In [21]: hourly = pd.DateRange(datetime.datetime(2012, 4, 5, 8, 0),
    ...                          datetime.datetime(2012, 4, 5, 12, 0),
    ...                          offset=pd.datetools.Hour())
    
    In [22]: frame.groupby(hourly.asof).size()
    Out[22]: 
    key_0
    2012-04-05 08:00:00    60
    2012-04-05 09:00:00    60
    2012-04-05 10:00:00    60
    2012-04-05 11:00:00    60
    2012-04-05 12:00:00    1
    In [23]: frame.groupby(hourly.asof).agg({'radiation': np.sum, 'tamb': np.mean})
    Out[23]: 
                         radiation  tamb 
    key_0                                
    2012-04-05 08:00:00  271.54     4.491
    2012-04-05 09:00:00  266.18     5.253
    2012-04-05 10:00:00  292.35     4.959
    2012-04-05 11:00:00  283.00     5.489
    2012-04-05 12:00:00  0.5414     9.532
    
    0 讨论(0)
提交回复
热议问题