Convert daily pandas stock data to monthly data using first trade day of the month

前端未结

关注

 3  738

I have a set of calculated OHLCVA daily securities data in a pandas dataframe like this:

>>> type(data_dy)


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北荒        
                
              
                            
                2020-12-30 16:51
              
            
            
                                                                       
I've seen in the last version of pandas you can use time offset alias 'BMS', which stands for "business month start frequency" or 'BM', which stands for "business month end frequency".

The code in the first case would look like

data_dy.resample('BMS', closed='right', label='right').apply(ohlc_dict)


or, in the second case,

data_dy.resample('BM', closed='right', label='right').apply(ohlc_dict)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2020-12-30 16:58
              
            
            
                                                                       
Thank you J Bradley, your solution worked perfectly.  I did have to upgrade my version of pandas from their official website though as the version installed via pip did not have CustomBusinessMonthBegin in pandas.tseries.offsets.  My final code was:

#----- imports -----
import pandas as pd
from pandas.tseries.offsets import CustomBusinessMonthBegin
import pandas.io.data as web
#----- get sample data -----
df = web.get_data_yahoo('SPY', '2012-12-01', '2013-12-31')
#----- build custom calendar -----
month_index =df.index.to_period('M')
min_day_in_month_index = pd.to_datetime(df.set_index(month_index, append=True).reset_index(level=0).groupby(level=0)['Open'].min())
custom_month_starts = CustomBusinessMonthBegin(calendar = min_day_in_month_index)
#----- convert daily data to monthly data -----
ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
mthly_ohlcva = df.resample(custom_month_starts, how=ohlc_dict)


This yielded the following:

>>> mthly_ohlcva
                Volume  Adj Close    High     Low   Close    Open
Date                                                             
2012-12-03  2889875900     136.92  145.58  139.54  142.41  142.80
2013-01-01  2587140200     143.92  150.94  144.73  149.70  145.11
2013-02-01  2581459300     145.76  153.28  148.73  151.61  150.65
2013-03-01  2330972300     151.30  156.85  150.41  156.67  151.09
2013-04-01  2907035000     154.20  159.72  153.55  159.68  156.59
2013-05-01  2781596000     157.84  169.07  158.10  163.45  159.33
2013-06-03  3533321800     155.74  165.99  155.73  160.42  163.83
2013-07-01  2330904500     163.78  169.86  160.22  168.71  161.26
2013-08-01  2283131700     158.87  170.97  163.05  163.65  169.99
2013-09-02  2226749600     163.90  173.60  163.70  168.01  165.23
2013-10-01  2901739000     171.49  177.51  164.53  175.79  168.14
2013-11-01  1930952900     176.57  181.75  174.76  181.00  176.02
2013-12-02  2232775900     181.15  184.69  177.32  184.69  181.09

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2020-12-30 17:13
              
            
            
                                                                       
Instead of M you can pass MS as the resample rule:

df =pd.DataFrame( range(72), index = pd.date_range('1/1/2011', periods=72, freq='D'))

#df.resample('MS', how = 'mean')    # pandas <0.18
df.resample('MS').mean()  # pandas >= 0.18


Updated to use the first business day of the month respecting US Federal Holidays:

df =pd.DataFrame( range(200), index = pd.date_range('12/1/2012', periods=200, freq='D'))

from pandas.tseries.offsets import CustomBusinessMonthBegin
from pandas.tseries.holiday import USFederalHolidayCalendar
bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())

df.resample(bmth_us).mean()


if you want custom starts of the month using the min month found in the data try this. (It isn't pretty, but it should work).

month_index =df.index.to_period('M')

min_day_in_month_index = pd.to_datetime(df.set_index(new_index, append=True).reset_index(level=0).groupby(level=0)['level_0'].min())

custom_month_starts =CustomBusinessMonthBegin(calendar = min_day_in_month_index)


Pass custom_start_months to the fist parameter of resample
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复