问题
I am trying to apply the Hurst Exponent on SPY closing prices on a rolling window. The below code (which I got from here: https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing) works well if I apply it on the closing prices column. However this gives me a static value. I would like to apply the Hurst Exponent on a rolling window considering the last 200 closing prices. My objective is to get a column in which the Hurst Exponent is updated in each row considering the last 200 closing prices.
from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn
import pandas_datareader as dr
from datetime import date
df = dr.data.get_data_yahoo('SPY',start='23-01-1991',end=date.today())
def hurst(ts):
"""Returns the Hurst Exponent of the time series vector ts"""
# Create the range of lag values
lags = range(2, 100)
# Calculate the array of the variances of the lagged differences
tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]
# Use a linear fit to estimate the Hurst Exponent
poly = polyfit(log(lags), log(tau), 1)
# Return the Hurst exponent from the polyfit output
return poly[0]*2.0
print ("Hurst(SPY): %s" % hurst(df['Close']))
## I've tried the next lines of code but unfortunately they are not working:
df['Hurst_Column']= [0]
for aRowINDEX in range( 1, 200 ):
df['Hurst_Column'][-aRowINDEX] = hurst (df[u'Close'][:-aRowINDEX])
I am very new at Python, and I have tried different things with no luck. Anyone could help me with it, please? Any help would be more than welcome. Thank you!
回答1:
Let me offer you a two step way forwards:
Step 1: a bit more robust Hurst Exponent implementation with test data
Step 2: a simple way to produce a "sliding-window"-alike calculation
Step 3: a bit more complicated way - if a ROLLING WINDOW is a MUST ...
Bonus: What should I write under the code of my question to have it done?
Step 1: a bit more robust Hurst Exponent implementation with test data :
Here, I will post a function implementation, taken from QuantFX
module, as-is ( Py2.7 will not make troubles in most places, yet any xrange()
ought be replaced with range()
in Py3.x ).
This code contains a few improvements and some sort of self-healing, if tests show, that there are problems with data-segment ( QuantFX
uses a convention of a natural flow of the time, where data[0]
is the "oldest" time-series cell and data[-1]
being the "most recent" one ).
Calling the HurstEXP()
without any parameter will yield a demo-run, showing some tests and explanations of the subject matter.
Also the print( HurstEXP.__doc__ )
is self-explanatory:
def HurstEXP( ts = [ None, ] ): # TESTED: HurstEXP() Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
""" __doc__
USAGE:
HurstEXP( ts = [ None, ] )
Returns the Hurst Exponent of the time series vector ts[]
PARAMETERS:
ts[,] a time-series, with 100+ elements
( or [ None, ] that produces a demo run )
RETURNS:
float - a Hurst Exponent approximation,
as a real value
or
an explanatory string on an empty call
THROWS:
n/a
EXAMPLE:
>>> HurstEXP() # actual numbers will vary, as per np.random.randn() generator used
HurstEXP( Geometric Browian Motion ): 0.49447454
HurstEXP( Mean-Reverting Series ): -0.00016013
HurstEXP( Trending Series ): 0.95748937
'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'
>>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
REF.s:
>>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
"""
#---------------------------------------------------------------------------------------------------------------------------<self-reflective>
if ( ts[0] == None ): # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:
gbm = np.log( 1000 + np.cumsum( np.random.randn( 100000 ) ) ) # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log( 1000 + rand + ... )]
mr = np.log( 1000 + np.random.randn( 100000 ) ) # a Mean-Reverting Series [log(1000 + rand), log(1000 + rand ), log(1000 + rand ),... log( 1000 + rand )]
tr = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) ) # a Trending Series [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]
# Output the Hurst Exponent for each of the above SYNTH series
print ( "HurstEXP( Geometric Browian Motion ): {0: > 12.8f}".format( HurstEXP( gbm ) ) )
print ( "HurstEXP( Mean-Reverting Series ): {0: > 12.8f}".format( HurstEXP( mr ) ) )
print ( "HurstEXP( Trending Series ): {0: > 12.8f}".format( HurstEXP( tr ) ) )
return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
""" # FIX:
===================================================================================================================
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
0.47537688039105963
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
-0.31081076640420308
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
nan
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.026867491053098096
"""
pass; too_short_list = 101 - len( ts ) # MUST HAVE 101+ ELEMENTS
if ( 0 < too_short_list ): # IF NOT:
ts = too_short_list * ts[:1] + ts # PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
#---------------------------------------------------------------------------------------------------------------------------
lags = range( 2, 100 ) # Create the range of lag values
tau = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ] # Calculate the array of the variances of the lagged differences
#oly = np.polyfit( np.log( lags ), np.log( tau ), 1 ) # Use a linear fit to estimate the Hurst Exponent
#eturn ( 2.0 * poly[0] ) # Return the Hurst exponent from the polyfit output
""" ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
|>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
ret, rcount, out=ret, casting='unsafe', subok=False)
QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] ) # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.028471879418359915
|
|
|# DATA:
|
|>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
memmap([ 1763.31005859, 1765.01000977, 1765.44995117, 1764.80004883,
1765.83996582, 1768.91003418, 1771.04003906, 1769.43994141,
1771.4699707 , 1771.61999512, 1774.76000977, 1769.55004883,
1773.4699707 , 1773.32995605, 1770.08996582, 1770.20996094,
1768.34997559, 1768.02001953, 1767.59997559, 1767.23999023,
1768.41003418, 1769.06994629, 1769.56994629, 1770.7800293 ,
1770.56994629, 1769.7800293 , 1769.90002441, 1770.44995117,
1770.9699707 , 1771.04003906, 1771.16003418, 1769.81005859,
1768.76000977, 1769.39001465, 1773.23999023, 1771.91003418,
1766.92004395, 1765.56994629, 1762.65002441, 1760.18005371,
1755. , 1756.67004395, 1753.48999023, 1753.7199707 ,
1751.92004395, 1745.44995117, 1745.44995117, 1744.54003906,
1744.54003906, 1744.84997559, 1744.84997559, 1744.34997559,
1744.34997559, 1743.75 , 1743.75 , 1745.23999023,
1745.23999023, 1745.15002441, 1745.31005859, 1745.47998047,
1745.47998047, 1749.06994629, 1749.06994629, 1748.29003906,
1748.29003906, 1747.42004395, 1747.42004395, 1746.98999023,
1747.61999512, 1748.79003906, 1748.79003906, 1748.38000488,
1748.38000488, 1744.81005859, 1744.81005859, 1736.80004883,
1736.80004883, 1735.43005371, 1735.43005371, 1737.9699707
], dtype=float32
)
|
|
| # CONVERTED .tolist() to avoid .memmap-type artifacts:
|
|>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
[1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.028471876494884543
===================================================================================================================
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
0.47537688039105963
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
-0.31081076640420308
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
nan
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.026867491053098096
"""
return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] ) # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )
Step 2: a simple way to produce a "sliding-window" calculation :
[ ( -i, HurstEXP( ts = df['Close'][:-i] ) ) for i in range( 1, 200 ) ] # should call the HurstEXP for the last 200 days
TEST-ME:
>>> df[u'Close']
Date
1993-01-29 43.937500
1993-02-01 44.250000
...
2019-07-17 297.739990
2019-07-18 297.429993
Name: Close, Length: 6665, dtype: float64
>>>
>>> [ ( -i,
HurstEXP( df[u'Close'][:-i] )
) for i in range( 1, 10 )
]
[ ( -1, 0.4489364467179827 ),
( -2, 0.4489306967683502 ),
( -3, 0.44892205577752986 ),
( -4, 0.448931424819551 ),
( -5, 0.44895272101162326 ),
( -6, 0.44896713741862954 ),
( -7, 0.44898211557287204 ),
( -8, 0.4489941656580211 ),
( -9, 0.4490116318052649 )
]
Step 3: a bit more complicated way - if a ROLLING WINDOW is a MUST ... :
While not much memory / processing efficient, the "rolling window" trick may get injected into the game, whereas there is no memory, the less a processing efficiency benefit from doing so ( you spend a lot on syntactically plausible code, yet the processing efficiency does not get here any plus from doing it right this way, as convolved nature of HurstEXP()
cannot help, without an attempt to re-vectorise also its internal code (why and what for ever?) any better from this ... just if professor or boss still wants you to do so ... ):
def rolling_window( aMatrix, aRollingWindowLENGTH ): #
""" __doc__
USAGE: rolling_window( aMatrix, aRollingWindowLENGTH )
PARAMS: aMatrix a numpy array
aRollingWindowLENGTH a LENGTH of a rolling window
RETURNS: a stride_trick'ed numpy array with rolling windows
THROWS: n/a
EXAMPLE: >>> x = np.arange( 10 ).reshape( ( 2, 5 ) )
>>> rolling_window( x, 3 )
array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
[[5, 6, 7], [6, 7, 8], [7, 8, 9]]])
>>> np.mean( rolling_window( x, 3 ), -1 )
array([[ 1., 2., 3.],
[ 6., 7., 8.]])
"""
new_shape = aMatrix.shape[:-1] + ( aMatrix.shape[-1] - aRollingWindowLENGTH + 1, aRollingWindowLENGTH )
new_strides = aMatrix.strides + ( aMatrix.strides[-1], )
return np.lib.stride_tricks.as_strided( aMatrix,
shape = new_shape,
strides = new_strides
)
>>> rolling_window( df[u'Close'], 100 ).shape
(6566, 100)
>>> rolling_window( df[u'Close'], 100 ).flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False <---------------- a VIEW, not a replica
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
You get an array of 6566 vectors with "rolling_window"-ed 100-day blocks of of SPY[Close]-s
>>> rolling_window( df[u'Close'], 100 )
array([[ 43.9375 , 44.25 , 44.34375 , ..., 44.5 , 44.59375 , 44.625 ],
[ 44.25 , 44.34375 , 44.8125 , ..., 44.59375 , 44.625 , 44.21875 ],
[ 44.34375 , 44.8125 , 45. , ..., 44.625 , 44.21875 , 44.8125 ],
...,
[279.14001465, 279.51998901, 279.32000732, ..., 300.6499939 , 300.75 , 299.77999878],
[279.51998901, 279.32000732, 279.20001221, ..., 300.75 , 299.77999878, 297.73999023],
[279.32000732, 279.20001221, 278.67999268, ..., 299.77999878, 297.73999023, 297.42999268]])
Q: What should I write under the code of my question to have it done?
for aRowINDEX in range( 1, 200 ):
df[u'HurstEXP_COLUMN'][-aRowINDEX] = HurstEXP( df[u'Close'][:-aRowINDEX] )
print( "[{0:>4d}]: DIFF( hurst() - HurstEXP() ) == {1:}".format( aRowINDEX,
( hurst( df[u'Close'][:-aRowINDEX] )
- HurstEXP( df[u'Close'][:-aRowINDEX] )
)
)
来源:https://stackoverflow.com/questions/57096574/how-to-apply-the-hurst-exponent-in-python-in-a-rolling-window