可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Given a numpy array (or pandas dataframe) like this:

import numpy as np  a = np.array([ [1,      1,      1,    0.5, np.nan, np.nan, np.nan], [1,      1,      1, np.nan, np.nan, np.nan, np.nan], [1,      1,      1,    0.5,   0.25,  0.125,  0.075], [1,      1,      1,   0.25, np.nan, np.nan, np.nan], [1, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [1,      1,    0.5,    0.5, np.nan, np.nan, np.nan] ])

I'm looking to most efficiently retrieve the last non-nan value in each row, so in this situation I'd be looking for a function that returns something like this:

np.array([3,           2,           6,           3,           0,           3])

I can try np.argmin(a, axis=1) - 1, but this has at least two undesirable properties - it fails for rows not ending with nan (dealbreaker) and it doesn't "lazy-evaluate" and stop once it has reached the last non-nan value in a given row (this doesn't matter as much as the "it has to be right" condition).

I imagine there's a way to do it with np.where, but in addition to evaluating all the elements of each row, I can't see an obvious elegant way to rearrange the output to get the last index in each row:

>>> np.where(np.isnan(a)) (array([0, 0, 0, 1, 1, 1, 1, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5]),  array([4, 5, 6, 3, 4, 5, 6, 4, 5, 6, 1, 2, 3, 4, 5, 6, 4, 5, 6]))

回答1:

pandas.Series has a last_valid_index method:

pd.DataFrame(a.T).apply(pd.Series.last_valid_index) Out:  0    3 1    2 2    6 3    3 4    0 5    3 dtype: int64

回答2:

If all nan values have been sorted to the end of each row, you can do something like this:

(~np.isnan(a)).sum(axis = 1) - 1 # array([3, 2, 6, 3, 0, 3])

回答3:

check if not nan then reverse order of columns and take argmax then subtract from number of columns

a.shape[1] - (~np.isnan(a))[:, ::-1].argmax(1) - 1  array([3, 2, 6, 3, 0, 3])

回答4:

Well here is a way to do it. Probably not the most efficient though:

list(map(lambda x: [i for i, x_ in enumerate(x) if not np.isnan(x_)][-1], a))

Also it will fail if any row is fully 'nan' because python will try to do getitem on an empty list.

回答5:

This solution doesn't require the array to be sorted. It just returns the last non nan item along axis 1.

(~np.isnan(a)).cumsum(1).argmax(1)

文章来源: Getting the last non-nan index of a sorted numpy matrix or pandas dataframe

标签

last

sor