Average over a specific time period

前端 未结 2 2011
清酒与你
清酒与你 2021-01-21 18:54

I have a quite huge table in python from a .h5 file The start of the table looks somewhat like this:

table =
                [WIND REL DIRECTION  [deg]]  [WIND S         


        
相关标签:
2条回答
  • 2021-01-21 19:25

    resample is your friend.

    idx = pltd.num2date(table.index)
    df = pd.DataFrame({'direction': np.random.randn(10), 
                       'speed': np.random.randn(10)}, 
                      index=idx)
    
    >>> df
                                      direction     speed
    2014-05-28 08:53:59.971204+00:00   0.205429  0.699439
    2014-05-28 08:54:01.008002+00:00   0.383199 -0.392261
    2014-05-28 08:54:04.031995+00:00  -2.146569 -0.325526
    2014-05-28 08:54:04.982402+00:00   1.572352  1.289276
    2014-05-28 08:54:06.019200+00:00   0.880394 -0.440667
    2014-05-28 08:54:11.980795+00:00  -1.343758  0.615725
    2014-05-28 08:54:13.017603+00:00  -1.713043  0.552017
    2014-05-28 08:54:13.968000+00:00  -0.350017  0.728910
    2014-05-28 08:54:15.004798+00:00  -0.619273  0.286762
    2014-05-28 08:54:16.041596+00:00   0.459747  0.524788
    
    >>> df.resample('15S', how='mean') # how='mean' is the default here
                               direction     speed
    2014-05-28 08:53:45+00:00   0.205429  0.699439
    2014-05-28 08:54:00+00:00  -0.388206  0.289639
    2014-05-28 08:54:15+00:00  -0.079763  0.405775
    

    Performance is similar to the method provided by @LondonRob. I used a DataFrame with 1 million rows to test.

    df = pd.DataFrame({'direction': np.random.randn(1e6), 'speed': np.random.randn(1e6)}, index=pd.date_range(start='2015-1-1', periods=1e6, freq='1S'))
    
    >>> %timeit df.resample('15S')
    100 loops, best of 3: 15.6 ms per loop
    
    >>> %timeit df.groupby(pd.TimeGrouper(freq='15S')).mean()
    100 loops, best of 3: 15.7 ms per loop
    
    0 讨论(0)
  • 2021-01-21 19:44

    I think this is the "right" way to do this. (Although it seems a little bit underdocumented to me. Anyway it works!)

    You need to do a groupby on your DataFrame and use something called a TimeGrouper.

    It works like this:

    import pandas as pd
    import numpy as np
    
    # Create a dataframe. You can ignore all this bit!
    periods = 60 * 60
    random_dates = pd.date_range('2015-12-25', periods=periods, freq='s')
    random_speeds = np.random.randint(100, size=periods)
    random_directions = np.random.random(periods)
    df = pd.DataFrame({'date': random_dates, 'wind_speed': random_speeds, 'wind_direction': random_directions})
    df = df.set_index('date')
    
    # Here's where the magic happens:
    grouped15s = df.groupby(pd.TimeGrouper(freq='15S'))
    averages_ws_15s = grouped15s.wind_speed.mean()
    

    Or, if you insist on having spaces in your column names, that last line will become:

    averages_ws_15s = grouped15s['Wind Speed'].mean()
    

    This results in the following:

    date
    2015-12-25 00:00:00    45.800000
    2015-12-25 00:00:15    48.466667
    2015-12-25 00:00:30    38.066667
    2015-12-25 00:00:45    54.866667
    2015-12-25 00:01:00    34.866667
    2015-12-25 00:01:15    37.000000
    2015-12-25 00:01:30    47.133333
    etc....                etc....
    
    0 讨论(0)
提交回复
热议问题