Getting the last non-nan index of a sorted numpy matrix or pandas dataframe

匿名 (未验证) 提交于 2019-12-03 01:03:01

问题:

Given a numpy array (or pandas dataframe) like this:

import numpy as np  a = np.array([ [1,      1,      1,    0.5, np.nan, np.nan, np.nan], [1,      1,      1, np.nan, np.nan, np.nan, np.nan], [1,      1,      1,    0.5,   0.25,  0.125,  0.075], [1,      1,      1,   0.25, np.nan, np.nan, np.nan], [1, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [1,      1,    0.5,    0.5, np.nan, np.nan, np.nan] ]) 

I'm looking to most efficiently retrieve the last non-nan value in each row, so in this situation I'd be looking for a function that returns something like this:

np.array([3,           2,           6,           3,           0,           3]) 

I can try np.argmin(a, axis=1) - 1, but this has at least two undesirable properties - it fails for rows not ending with nan (dealbreaker) and it doesn't "lazy-evaluate" and stop once it has reached the last non-nan value in a given row (this doesn't matter as much as the "it has to be right" condition).

I imagine there's a way to do it with np.where, but in addition to evaluating all the elements of each row, I can't see an obvious elegant way to rearrange the output to get the last index in each row:

>>> np.where(np.isnan(a)) (array([0, 0, 0, 1, 1, 1, 1, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5]),  array([4, 5, 6, 3, 4, 5, 6, 4, 5, 6, 1, 2, 3, 4, 5, 6, 4, 5, 6])) 

回答1:

pandas.Series has a last_valid_index method:

pd.DataFrame(a.T).apply(pd.Series.last_valid_index) Out:  0    3 1    2 2    6 3    3 4    0 5    3 dtype: int64 


回答2:

If all nan values have been sorted to the end of each row, you can do something like this:

(~np.isnan(a)).sum(axis = 1) - 1 # array([3, 2, 6, 3, 0, 3]) 


回答3:

check if not nan then reverse order of columns and take argmax then subtract from number of columns

a.shape[1] - (~np.isnan(a))[:, ::-1].argmax(1) - 1  array([3, 2, 6, 3, 0, 3]) 


回答4:

Well here is a way to do it. Probably not the most efficient though:

list(map(lambda x: [i for i, x_ in enumerate(x) if not np.isnan(x_)][-1], a)) 

Also it will fail if any row is fully 'nan' because python will try to do getitem on an empty list.



回答5:

This solution doesn't require the array to be sorted. It just returns the last non nan item along axis 1.

(~np.isnan(a)).cumsum(1).argmax(1) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!