Relative Strength Index in python pandas

前端 未结 12 1419
生来不讨喜
生来不讨喜 2020-12-07 17:29

I am new to pandas. What is the best way to calculate the relative strength part in the RSI indicator in pandas? So far I got the following:

from pylab impor         


        
相关标签:
12条回答
  • 2020-12-07 17:45
    dUp= delta[delta > 0]
    dDown= delta[delta < 0]
    

    also you need something like:

    RolUp = RolUp.reindex_like(delta, method='ffill')
    RolDown = RolDown.reindex_like(delta, method='ffill')
    

    otherwise RS = RolUp / RolDown will not do what you desire

    Edit: seems this is a more accurate way of RS calculation:

    # dUp= delta[delta > 0]
    # dDown= delta[delta < 0]
    
    # dUp = dUp.reindex_like(delta, fill_value=0)
    # dDown = dDown.reindex_like(delta, fill_value=0)
    
    dUp, dDown = delta.copy(), delta.copy()
    dUp[dUp < 0] = 0
    dDown[dDown > 0] = 0
    
    RolUp = pd.rolling_mean(dUp, n)
    RolDown = pd.rolling_mean(dDown, n).abs()
    
    RS = RolUp / RolDown
    
    0 讨论(0)
  • 2020-12-07 17:45

    It is important to note that there are various ways of defining the RSI. It is commonly defined in at least two ways: using a simple moving average (SMA) as above, or using an exponential moving average (EMA). Here's a code snippet that calculates both definitions of RSI and plots them for comparison. I'm discarding the first row after taking the difference, since it is always NaN by definition.

    Note that when using EMA one has to be careful: since it includes a memory going back to the beginning of the data, the result depends on where you start! For this reason, typically people will add some data at the beginning, say 100 time steps, and then cut off the first 100 RSI values.

    In the plot below, one can see the difference between the RSI calculated using SMA and EMA: the SMA one tends to be more sensitive. Note that the RSI based on EMA has its first finite value at the first time step (which is the second time step of the original period, due to discarding the first row), whereas the RSI based on SMA has its first finite value at the 14th time step. This is because by default rolling_mean() only returns a finite value once there are enough values to fill the window.

    import pandas
    import pandas_datareader.data as web
    import datetime
    import matplotlib.pyplot as plt
    
    # Window length for moving average
    window_length = 14
    
    # Dates
    start = '2010-01-01'
    end = '2013-01-27'
    
    # Get data
    data = web.DataReader('AAPL', 'yahoo', start, end)
    # Get just the adjusted close
    close = data['Adj Close']
    # Get the difference in price from previous step
    delta = close.diff()
    # Get rid of the first row, which is NaN since it did not have a previous 
    # row to calculate the differences
    delta = delta[1:] 
    
    # Make the positive gains (up) and negative gains (down) Series
    up, down = delta.copy(), delta.copy()
    up[up < 0] = 0
    down[down > 0] = 0
    
    # Calculate the EWMA
    roll_up1 = up.ewm(span=window_length).mean()
    roll_down1 = down.abs().ewm(span=window_length).mean()
    
    # Calculate the RSI based on EWMA
    RS1 = roll_up1 / roll_down1
    RSI1 = 100.0 - (100.0 / (1.0 + RS1))
    
    # Calculate the SMA
    roll_up2 = up.rolling(window_length).mean()
    roll_down2 = down.abs().rolling(window_length).mean()
    
    # Calculate the RSI based on SMA
    RS2 = roll_up2 / roll_down2
    RSI2 = 100.0 - (100.0 / (1.0 + RS2))
    
    # Compare graphically
    plt.figure(figsize=(8, 6))
    RSI1.plot()
    RSI2.plot()
    plt.legend(['RSI via EWMA', 'RSI via SMA'])
    plt.show()
    
    0 讨论(0)
  • 2020-12-07 17:47

    You can also use the following. If statements will ensure the first RSI value is calculated differently (and properly) from the rest of the values. In the end, all NaN values will be replaced with blanks.

    This assumes you have already imported pandas and your dataframe is df. The only additional data required is a column of Close prices which is labeled as Close. You can reference this column as df.Close, however, sometimes you may have multiple word with space separators as a column header (which requires df['word1 word2'] format). As a consistent practice I always use the df['Close'] format.

    import numpy as np
    
    # Calculate change in closing prices day over day
    df['Delta'] = df['Close'].diff(periods=1, axis=0)
    
    # Calculate if difference in close is Gain
    conditions = [df['Delta'] <= 0, df['Delta'] > 0]
    choices = [0, df['Delta']]
    df['ClGain'] = np.select(conditions, choices)
    
    # Calculate if difference in close is Loss
    conditions = [df['Delta'] >= 0, df['Delta'] < 0]
    choices = [0, -df['Delta']]
    df['ClLoss'] = np.select(conditions, choices)
    
    # Determine periods to calculate RSI over
    rsi_n = 9
    
    # Calculate Avg Gain over n periods
    conditions = [df.index < rsi_n, df.index == rsi_n, df.index > rsi_n]
    choices = ["", df['ClGain'].rolling(rsi_n).mean(), ((df['AvgGain'].shift(1) * (rsi_n - 1)) + df['ClGain']) / rsi_n]
    df['AvgGain'] = np.select(conditions, choices)
    
    # Calculate Avg Loss over n periods
    conditions = [df.index < rsi_n, df.index == rsi_n, df.index > rsi_n]
    choices = ["", df['ClLoss'].rolling(rsi_n).mean(), ((df['AvgLoss'].shift(1) * (rsi_n - 1)) + df['ClLoss']) / rsi_n]
    df['AvgLoss'] = np.select(conditions, choices)
    
    # Calculate RSI
    df['RSI'] = 100-(100 / (1 + (df['AvgGain'] / df['AvgLoss'])))
    
    # Replace NaN cells with blanks
    df = df.replace(np.nan, "", regex=True)
    
    # (OPTIONAL) Remove columns used to create RSI
    del df['Delta']
    del df['ClGain']
    del df['ClLoss']
    del df['AvgGain']
    del df['AvgLoss']
    
    0 讨论(0)
  • 2020-12-07 17:55

    You can get a massive speed up of Bill's answer by using numba. 100 loops of 20k row series( regular = 113 seconds, numba = 0.28 seconds ). Numba excels with loops and arithmetic.

    import numpy as np
    import numba as nb
    
    @nb.jit(fastmath=True, nopython=True)   
    def calc_rsi( array, deltas, avg_gain, avg_loss, n ):
    
        # Use Wilder smoothing method
        up   = lambda x:  x if x > 0 else 0
        down = lambda x: -x if x < 0 else 0
        i = n+1
        for d in deltas[n+1:]:
            avg_gain = ((avg_gain * (n-1)) + up(d)) / n
            avg_loss = ((avg_loss * (n-1)) + down(d)) / n
            if avg_loss != 0:
                rs = avg_gain / avg_loss
                array[i] = 100 - (100 / (1 + rs))
            else:
                array[i] = 100
            i += 1
    
        return array
    
    def get_rsi( array, n = 14 ):   
    
        deltas = np.append([0],np.diff(array))
    
        avg_gain =  np.sum(deltas[1:n+1].clip(min=0)) / n
        avg_loss = -np.sum(deltas[1:n+1].clip(max=0)) / n
    
        array = np.empty(deltas.shape[0])
        array.fill(np.nan)
    
        array = calc_rsi( array, deltas, avg_gain, avg_loss, n )
        return array
    
    rsi = get_rsi( array or series, 14 )
    
    0 讨论(0)
  • 2020-12-07 17:57

    You do this using finta package as well just to add above

    ref: https://github.com/peerchemist/finta/tree/master/examples

    import pandas as pd
    from finta import TA
    import matplotlib.pyplot as plt
    
    ohlc = pd.read_csv("C:\\WorkSpace\\Python\\ta-lib\\intraday_5min_IBM.csv", index_col="timestamp", parse_dates=True)
    ohlc['RSI']= TA.RSI(ohlc)
    
    0 讨论(0)
  • 2020-12-07 18:01
    def RSI(series):
        delta = series.diff()
        u = delta * 0 
        d = u.copy()
        i_pos = delta > 0
        i_neg = delta < 0
        u[i_pos] = delta[i_pos]
        d[i_neg] = delta[i_neg]
        rs = moments.ewma(u, span=27) / moments.ewma(d, span=27)
        return 100 - 100 / (1 + rs)
    
    0 讨论(0)
提交回复
热议问题