Compute a compounded return series in Python

后端 未结 3 1771
礼貌的吻别
礼貌的吻别 2021-02-02 02:21

Greetings all, I have two series of data: daily raw stock price returns (positive or negative floats) and trade signals (buy=1, sell=-1, no trade=0).

The raw price retur

相关标签:
3条回答
  • 2021-02-02 03:03

    The cumulative return part of this question is dealt with in Wes McKinney's excellent 'Python for Data Analysis' book on page 339, and uses cumprod() from Pandas to create a rebased/indexed cumulative return from calculated price changes.

    Example from book:

    import pandas.io.data as web
    
    price = web.get_data_yahoo('AAPL', '2011-01-01')['Adj Close']
    
    returns = price.pct_change()
    
    ret_index = (1 + returns).cumprod()
    
    ret_index[0] = 1 # Set first value to 1
    
    0 讨论(0)
  • 2021-02-02 03:13

    There is a fantastic module called pandas that was written by a guy at AQR (a hedge fund) that excels at calculations like this... what you need is a way to handle "missing data"... as someone mentioned above, the basics are using the nan (not a number) capabilities of scipy or numpy; however, even those libraries don't make financial calculations that much easier... if you use pandas, you can mark the data you don't want to consider as nan, and then any future calculations will reject it, while performing normal operations on other data.

    I have been using pandas on my trading platform for about 8 months... I wish I had started using it sooner.

    Wes (the author) gave a talk at pyCon 2010 about the capabilities of the module... see the slides and video on the pyCon 2010 webpage. In that video, he demonstrates how to get daily returns, run 1000s of linear regressions on a matrix of returns (in a fraction of a second), timestamp / graph data... all done with this module. Combined with psyco, this is a beast of a financial analysis tool.

    The other great thing it handles is cross-sectional data... so you could grab daily close prices, their rolling means, etc... then timestamp every calculation, and get all this stored in something similar to a python dictionary (see the pandas.DataFrame class)... then you access slices of the data as simply as:

    close_prices['stdev_5d']
    

    See the pandas rolling moments doc for more information on to calculate the rolling stdev (it's a one-liner).

    Wes has gone out of his way to speed the module up with cython, although I'll concede that I'm considering upgrading my server (an older Xeon), due to my analysis requirements.

    EDIT FOR STRIMP's QUESTION: After you converted your code to use pandas data structures, it's still unclear to me how you're indexing your data in a pandas dataframe and the compounding function's requirements for handling missing data (or for that matter the conditions for a 0.0 return... or if you are using NaN in pandas..). I will demonstrate using my data indexing... a day was picked at random... df is a dataframe with ES Futures quotes in it... indexed per second... missing quotes are filled in with numpy.nan. DataFrame indexes are datetime objects, offset by the pytz module's timezone objects.

    >>> df.info
    <bound method DataFrame.info of <class 'pandas.core.frame.DataFrame'>
    Index: 86400 entries , 2011-03-21 00:00:00-04:00 to 2011-03-21 23:59:59-04:00
    etf                                         18390  non-null values
    etfvol                                      18390  non-null values
    fut                                         29446  non-null values
    futvol                                      23446  non-null values
    ...
    >>> # ET is a pytz object...
    >>> et
    <DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
    >>> # To get the futures quote at 9:45, eastern time...
    >>> df.xs(et.localize(dt.datetime(2011,3,21,9,45,0)))['fut']
    1291.75
    >>>
    

    To give a simple example of how to calculate a column of continuous returns (in a pandas.TimeSeries), which reference the quote 10 minutes ago (and filling in for missing ticks), I would do this:

    >>> df['fut'].fill(method='pad')/df['fut'].fill(method='pad').shift(600)
    

    No lambda is required in that case, just dividing the column of values by itself 600 seconds ago. That .shift(600) part is because my data is indexed per-second.

    HTH, \mike

    0 讨论(0)
  • 2021-02-02 03:24

    imagine I have a DataMatrix with closing prices, some indicator value, and a trade signal like this:

     >>> data_matrix
                            close          dvi            signal
     2008-01-02 00:00:00    144.9          0.6504         -1             
     2008-01-03 00:00:00    144.9          0.6603         -1             
     2008-01-04 00:00:00    141.3          0.7528         -1             
     2008-01-07 00:00:00    141.2          0.8226         -1             
     2008-01-08 00:00:00    138.9          0.8548         -1             
     2008-01-09 00:00:00    140.4          0.8552         -1             
     2008-01-10 00:00:00    141.3          0.846          -1             
     2008-01-11 00:00:00    140.2          0.7988         -1             
     2008-01-14 00:00:00    141.3          0.6151         -1             
     2008-01-15 00:00:00    138.2          0.3714         1   
    

    I use the signal to create a DataMatrix of returns based on the trade signal:

    >>> get_indicator_returns()
    
                       indicator_returns    
    2008-01-02 00:00:00    NaN            
    2008-01-03 00:00:00    0.000483       
    2008-01-04 00:00:00    0.02451        
    2008-01-07 00:00:00    0.0008492      
    2008-01-08 00:00:00    0.01615        
    2008-01-09 00:00:00    -0.01051       
    2008-01-10 00:00:00    -0.006554      
    2008-01-11 00:00:00    0.008069       
    2008-01-14 00:00:00    -0.008063      
    2008-01-15 00:00:00    0.02201 
    

    What I ended up doing is this:

    def get_compounded_indicator_cumulative(self):
    
        indicator_dm = self.get_indicator_returns()
        dates = indicator_dm.index
    
        indicator_returns = indicator_dm['indicator_returns']
        compounded = array(zeros(size(indicator_returns)))
    
        compounded[1] = indicator_returns[1]
    
        for i in range(2, len(indicator_returns)):
    
            compounded[i] = (1 + compounded[i-1]) * (1 + indicator_returns[i]) - 1
    
        data = {
            'compounded_returns': compounded
        }
    
        return DataMatrix(data, index=dates)
    

    For some reason I really struggled with this one...

    I'm in the process of converting all my price series to PyTables. Looks promising so far.

    0 讨论(0)
提交回复
热议问题