Does Pandas calculate ewm wrong?

霸气de小男生 提交于 2019-12-28 02:04:21

问题


When trying to calculate the exponential moving average (EMA) from financial data in a dataframe it seems that Pandas' ewm approach is incorrect.

The basics are well explained in the following link: http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages

When going to Pandas explanation, the approach taken is as follows (using the "adjust" parameter as False):

   weighted_average[0] = arg[0];
   weighted_average[i] = (1-alpha) * weighted_average[i-1] + alpha * arg[i]

This in my view is incorrect. The "arg" should be (for example) the closing values, however, arg[0] is the first average (i.e. the simple average of the first series of data of the length of the period selected), but NOT the first closing value. arg[0] and arg[i] can therefore never be from the same data. Using the "min_periods" parameter does not seem to resolve this.

Can anyone explain me how (or if) Pandas can be used to properly calculate the EMA of data?


回答1:


There are several ways to initialize an exponential moving average, so I wouldn't say pandas is doing it wrong, just different.

Here would be a way to calculate it like you want:

In [20]: s.head()
Out[20]: 
0    22.27
1    22.19
2    22.08
3    22.17
4    22.18
Name: Price, dtype: float64

In [21]: span = 10

In [22]: sma = s.rolling(window=span, min_periods=span).mean()[:span]

In [24]: rest = s[span:]

In [25]: pd.concat([sma, rest]).ewm(span=span, adjust=False).mean()
Out[25]: 
0           NaN
1           NaN
2           NaN
3           NaN
4           NaN
5           NaN
6           NaN
7           NaN
8           NaN
9     22.221000
10    22.208091
11    22.241165
12    22.266408
13    22.328879
14    22.516356
15    22.795200
16    22.968800
17    23.125382
18    23.275312
19    23.339801
20    23.427110
21    23.507635
22    23.533520
23    23.471062
24    23.403596
25    23.390215
26    23.261085
27    23.231797
28    23.080561
29    22.915004
Name: Price, dtype: float64



回答2:


You can compute EWMA using alpha or coefficient (span) in Pandas ewm function.

Formula for using alpha: (1 - alpha) * previous_val + alpha * current_val where alpha = 1 / period

Formula for using coeff: ((current_val - previous_val) * coeff) + previous_val where coeff = 2 / (period + 1)

Here is how you can use Pandas for computing above formulas:

con = pd.concat([df[:period][base].rolling(window=period).mean(), df[period:][base]])

if (alpha == True):
    df[target] = con.ewm(alpha=1 / period, adjust=False).mean()
else:
    df[target] = con.ewm(span=period, adjust=False).mean()



回答3:


Here's an example of how Pandas calculates both adjusted and non-adjusted ewm:

name = 'closing'
series = pd.Series([1, 2, 3, 5, 8, 13, 21, 34], name=name).to_frame()
period = 4
alpha = 2/(1+period)

series[name+'_ewma'] = np.nan
series.loc[0, name+'_ewma'] = series[name].iloc[0]

series[name+'_ewma_adjust'] = np.nan
series.loc[0, name+'_ewma_adjust'] = series[name].iloc[0]

for i in range(1, len(series)):
    series.loc[i, name+'_ewma'] = (1-alpha) * series.loc[i-1, name+'_ewma'] + alpha * series.loc[i, name]

    ajusted_weights = np.array([(1-alpha)**(i-t) for t in range(i+1)])
    series.loc[i, name+'_ewma_adjust'] = np.sum(series.iloc[0:i+1][name].values * ajusted_weights) / ajusted_weights.sum()

print(series)
print("diff adjusted=False -> ", np.sum(series[name+'_ewma'] - series[name].ewm(span=period, adjust=False).mean()))
print("diff adjusted=True -> ", np.sum(series[name+'_ewma_adjust'] - series[name].ewm(span=period, adjust=True).mean()))

Mathematical formula can be found at https://github.com/pandas-dev/pandas/issues/8861




回答4:


If you are calculating ewm of ewm (Like MACD formula), you will have bad results because the second and following ewm will use index starting by 0 and ending with period. I use the following solution.

sma = df['Close'].rolling(period, min_periods=period).mean()
#this variable is used to shift index by non null start minus period
idx_start = sma.isna().sum() + 1 - period
idx_end = idx_start + period
sma = sma[idx_start: idx_end]
rest = df[item][idx_end:]
ema = pd.concat([sma, rest]).ewm(span=period, adjust=False).mean()


来源:https://stackoverflow.com/questions/37924377/does-pandas-calculate-ewm-wrong

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!