I am new to pandas. What is the best way to calculate the relative strength part in the RSI indicator in pandas? So far I got the following:
from pylab impor
dUp= delta[delta > 0]
dDown= delta[delta < 0]
also you need something like:
RolUp = RolUp.reindex_like(delta, method='ffill')
RolDown = RolDown.reindex_like(delta, method='ffill')
otherwise RS = RolUp / RolDown
will not do what you desire
Edit: seems this is a more accurate way of RS calculation:
# dUp= delta[delta > 0]
# dDown= delta[delta < 0]
# dUp = dUp.reindex_like(delta, fill_value=0)
# dDown = dDown.reindex_like(delta, fill_value=0)
dUp, dDown = delta.copy(), delta.copy()
dUp[dUp < 0] = 0
dDown[dDown > 0] = 0
RolUp = pd.rolling_mean(dUp, n)
RolDown = pd.rolling_mean(dDown, n).abs()
RS = RolUp / RolDown
It is important to note that there are various ways of defining the RSI. It is commonly defined in at least two ways: using a simple moving average (SMA) as above, or using an exponential moving average (EMA). Here's a code snippet that calculates both definitions of RSI and plots them for comparison. I'm discarding the first row after taking the difference, since it is always NaN by definition.
Note that when using EMA one has to be careful: since it includes a memory going back to the beginning of the data, the result depends on where you start! For this reason, typically people will add some data at the beginning, say 100 time steps, and then cut off the first 100 RSI values.
In the plot below, one can see the difference between the RSI calculated using SMA and EMA: the SMA one tends to be more sensitive. Note that the RSI based on EMA has its first finite value at the first time step (which is the second time step of the original period, due to discarding the first row), whereas the RSI based on SMA has its first finite value at the 14th time step. This is because by default rolling_mean() only returns a finite value once there are enough values to fill the window.
import pandas
import pandas_datareader.data as web
import datetime
import matplotlib.pyplot as plt
# Window length for moving average
window_length = 14
# Dates
start = '2010-01-01'
end = '2013-01-27'
# Get data
data = web.DataReader('AAPL', 'yahoo', start, end)
# Get just the adjusted close
close = data['Adj Close']
# Get the difference in price from previous step
delta = close.diff()
# Get rid of the first row, which is NaN since it did not have a previous
# row to calculate the differences
delta = delta[1:]
# Make the positive gains (up) and negative gains (down) Series
up, down = delta.copy(), delta.copy()
up[up < 0] = 0
down[down > 0] = 0
# Calculate the EWMA
roll_up1 = up.ewm(span=window_length).mean()
roll_down1 = down.abs().ewm(span=window_length).mean()
# Calculate the RSI based on EWMA
RS1 = roll_up1 / roll_down1
RSI1 = 100.0 - (100.0 / (1.0 + RS1))
# Calculate the SMA
roll_up2 = up.rolling(window_length).mean()
roll_down2 = down.abs().rolling(window_length).mean()
# Calculate the RSI based on SMA
RS2 = roll_up2 / roll_down2
RSI2 = 100.0 - (100.0 / (1.0 + RS2))
# Compare graphically
plt.figure(figsize=(8, 6))
RSI1.plot()
RSI2.plot()
plt.legend(['RSI via EWMA', 'RSI via SMA'])
plt.show()
You can also use the following. If statements will ensure the first RSI value is calculated differently (and properly) from the rest of the values. In the end, all NaN values will be replaced with blanks.
This assumes you have already imported pandas and your dataframe is df. The only additional data required is a column of Close prices which is labeled as Close. You can reference this column as df.Close, however, sometimes you may have multiple word with space separators as a column header (which requires df['word1 word2'] format). As a consistent practice I always use the df['Close'] format.
import numpy as np
# Calculate change in closing prices day over day
df['Delta'] = df['Close'].diff(periods=1, axis=0)
# Calculate if difference in close is Gain
conditions = [df['Delta'] <= 0, df['Delta'] > 0]
choices = [0, df['Delta']]
df['ClGain'] = np.select(conditions, choices)
# Calculate if difference in close is Loss
conditions = [df['Delta'] >= 0, df['Delta'] < 0]
choices = [0, -df['Delta']]
df['ClLoss'] = np.select(conditions, choices)
# Determine periods to calculate RSI over
rsi_n = 9
# Calculate Avg Gain over n periods
conditions = [df.index < rsi_n, df.index == rsi_n, df.index > rsi_n]
choices = ["", df['ClGain'].rolling(rsi_n).mean(), ((df['AvgGain'].shift(1) * (rsi_n - 1)) + df['ClGain']) / rsi_n]
df['AvgGain'] = np.select(conditions, choices)
# Calculate Avg Loss over n periods
conditions = [df.index < rsi_n, df.index == rsi_n, df.index > rsi_n]
choices = ["", df['ClLoss'].rolling(rsi_n).mean(), ((df['AvgLoss'].shift(1) * (rsi_n - 1)) + df['ClLoss']) / rsi_n]
df['AvgLoss'] = np.select(conditions, choices)
# Calculate RSI
df['RSI'] = 100-(100 / (1 + (df['AvgGain'] / df['AvgLoss'])))
# Replace NaN cells with blanks
df = df.replace(np.nan, "", regex=True)
# (OPTIONAL) Remove columns used to create RSI
del df['Delta']
del df['ClGain']
del df['ClLoss']
del df['AvgGain']
del df['AvgLoss']
You can get a massive speed up of Bill's answer by using numba. 100 loops of 20k row series( regular = 113 seconds, numba = 0.28 seconds ). Numba excels with loops and arithmetic.
import numpy as np
import numba as nb
@nb.jit(fastmath=True, nopython=True)
def calc_rsi( array, deltas, avg_gain, avg_loss, n ):
# Use Wilder smoothing method
up = lambda x: x if x > 0 else 0
down = lambda x: -x if x < 0 else 0
i = n+1
for d in deltas[n+1:]:
avg_gain = ((avg_gain * (n-1)) + up(d)) / n
avg_loss = ((avg_loss * (n-1)) + down(d)) / n
if avg_loss != 0:
rs = avg_gain / avg_loss
array[i] = 100 - (100 / (1 + rs))
else:
array[i] = 100
i += 1
return array
def get_rsi( array, n = 14 ):
deltas = np.append([0],np.diff(array))
avg_gain = np.sum(deltas[1:n+1].clip(min=0)) / n
avg_loss = -np.sum(deltas[1:n+1].clip(max=0)) / n
array = np.empty(deltas.shape[0])
array.fill(np.nan)
array = calc_rsi( array, deltas, avg_gain, avg_loss, n )
return array
rsi = get_rsi( array or series, 14 )
You do this using finta package as well just to add above
ref: https://github.com/peerchemist/finta/tree/master/examples
import pandas as pd
from finta import TA
import matplotlib.pyplot as plt
ohlc = pd.read_csv("C:\\WorkSpace\\Python\\ta-lib\\intraday_5min_IBM.csv", index_col="timestamp", parse_dates=True)
ohlc['RSI']= TA.RSI(ohlc)
def RSI(series):
delta = series.diff()
u = delta * 0
d = u.copy()
i_pos = delta > 0
i_neg = delta < 0
u[i_pos] = delta[i_pos]
d[i_neg] = delta[i_neg]
rs = moments.ewma(u, span=27) / moments.ewma(d, span=27)
return 100 - 100 / (1 + rs)