Convert daily pandas stock data to monthly data using first trade day of the month

前端 未结 3 738
一生所求
一生所求 2020-12-30 16:41

I have a set of calculated OHLCVA daily securities data in a pandas dataframe like this:

>>> type(data_dy)


        
相关标签:
3条回答
  • 2020-12-30 16:51

    I've seen in the last version of pandas you can use time offset alias 'BMS', which stands for "business month start frequency" or 'BM', which stands for "business month end frequency".

    The code in the first case would look like

    data_dy.resample('BMS', closed='right', label='right').apply(ohlc_dict)
    

    or, in the second case,

    data_dy.resample('BM', closed='right', label='right').apply(ohlc_dict)
    
    0 讨论(0)
  • 2020-12-30 16:58

    Thank you J Bradley, your solution worked perfectly. I did have to upgrade my version of pandas from their official website though as the version installed via pip did not have CustomBusinessMonthBegin in pandas.tseries.offsets. My final code was:

    #----- imports -----
    import pandas as pd
    from pandas.tseries.offsets import CustomBusinessMonthBegin
    import pandas.io.data as web
    #----- get sample data -----
    df = web.get_data_yahoo('SPY', '2012-12-01', '2013-12-31')
    #----- build custom calendar -----
    month_index =df.index.to_period('M')
    min_day_in_month_index = pd.to_datetime(df.set_index(month_index, append=True).reset_index(level=0).groupby(level=0)['Open'].min())
    custom_month_starts = CustomBusinessMonthBegin(calendar = min_day_in_month_index)
    #----- convert daily data to monthly data -----
    ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
    mthly_ohlcva = df.resample(custom_month_starts, how=ohlc_dict)
    

    This yielded the following:

    >>> mthly_ohlcva
                    Volume  Adj Close    High     Low   Close    Open
    Date                                                             
    2012-12-03  2889875900     136.92  145.58  139.54  142.41  142.80
    2013-01-01  2587140200     143.92  150.94  144.73  149.70  145.11
    2013-02-01  2581459300     145.76  153.28  148.73  151.61  150.65
    2013-03-01  2330972300     151.30  156.85  150.41  156.67  151.09
    2013-04-01  2907035000     154.20  159.72  153.55  159.68  156.59
    2013-05-01  2781596000     157.84  169.07  158.10  163.45  159.33
    2013-06-03  3533321800     155.74  165.99  155.73  160.42  163.83
    2013-07-01  2330904500     163.78  169.86  160.22  168.71  161.26
    2013-08-01  2283131700     158.87  170.97  163.05  163.65  169.99
    2013-09-02  2226749600     163.90  173.60  163.70  168.01  165.23
    2013-10-01  2901739000     171.49  177.51  164.53  175.79  168.14
    2013-11-01  1930952900     176.57  181.75  174.76  181.00  176.02
    2013-12-02  2232775900     181.15  184.69  177.32  184.69  181.09
    
    0 讨论(0)
  • 2020-12-30 17:13

    Instead of M you can pass MS as the resample rule:

    df =pd.DataFrame( range(72), index = pd.date_range('1/1/2011', periods=72, freq='D'))
    
    #df.resample('MS', how = 'mean')    # pandas <0.18
    df.resample('MS').mean()  # pandas >= 0.18
    

    Updated to use the first business day of the month respecting US Federal Holidays:

    df =pd.DataFrame( range(200), index = pd.date_range('12/1/2012', periods=200, freq='D'))
    
    from pandas.tseries.offsets import CustomBusinessMonthBegin
    from pandas.tseries.holiday import USFederalHolidayCalendar
    bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())
    
    df.resample(bmth_us).mean()
    

    if you want custom starts of the month using the min month found in the data try this. (It isn't pretty, but it should work).

    month_index =df.index.to_period('M')
    
    min_day_in_month_index = pd.to_datetime(df.set_index(new_index, append=True).reset_index(level=0).groupby(level=0)['level_0'].min())
    
    custom_month_starts =CustomBusinessMonthBegin(calendar = min_day_in_month_index)
    

    Pass custom_start_months to the fist parameter of resample

    0 讨论(0)
提交回复
热议问题