问题
I'm trying to find the fastest solution to iterate over each row of Open High Low Close data and count out how many rows exist between the current row's high and the next high that’s greater than or equal to the original. Here is a shortened code example which I think explains it well (I'd be trying to find the final value of _next
):
for i in df.index:
while df.loc[i, 'high'] > df.loc[_next, 'high']:
_next += 1
I think I need to use a lambda
function inside apply
but I can't seem to get it right. Been trying to figure this out for days and would love any advice!
Using this data, the function should return [2, 0, 0, 3, 1, 0, 0]
d = {'high': [1.2, 1.1, 1.1, 1.4, 1.3, 1.2, 1.3]}
df = pd.DataFrame(data=d)
high rows_to_higher_high
0 1.2 2
1 1.1 0
2 1.1 0
3 1.4 3
4 1.3 1
5 1.2 0
6 1.3 0
Edit: itertuples(name=None, index=True) is the fastest way to do it so far.
回答1:
using list comprehension
and get index to list
In [166]: d = {'high': [1.2, 1.1, 1.1, 1.4, 1.3, 1.2, 1.3]}
In [167]: df = pd.DataFrame(data=d)
In [168]: df['rows_to_higher_high']=[(df['high'].values[i+1:]>=val).tolist().index(True) if True in (df['high'].values[i+1:]>=val).tolist() else le
...: n((df['high'].values[i+1:]>=val).tolist()) for i, val in enumerate(df['high'].values)]
In [169]: df
Out[169]:
high rows_to_higher_high
0 1.2 2
1 1.1 0
2 1.1 0
3 1.4 3
4 1.3 1
5 1.2 0
6 1.3 0
回答2:
Base on my understanding, we can get the subtraction from current row by all value, then get the upper tri , and get the index max with argmax
when the value is more than 0
Sample :
s
0 1
1 2
2 1
3 3
4 1
5 1
6 5
dtype: int64
Solution :
np.clip(np.triu((s.values-s.values[:,None])>0,0).argmax(1)-np.arange(len(s)),a_min=0, a_max=None)
array([1, 2, 1, 3, 2, 1, 0])
来源:https://stackoverflow.com/questions/62963977/pandas-find-nearest-higher-high-ohlc