Find Indexes of Non-NaN Values in Pandas DataFrame

后端未结

关注

 2  1759

我寻月下人不归 2020-12-21 04:14

I have a very large dataset (roughly 200000x400), however I have it filtered and only a few hundred values remain, the rest are NaN. I would like to create a list of indexes

2条回答

醉梦人生 (楼主)

2020-12-21 04:30

assuming that your column names are of int dtype:

In [73]: df
Out[73]:
     0    1     2
0  NaN  NaN  1.20
1  NaN  NaN   NaN
2  NaN  1.1   NaN
3  NaN  NaN   NaN
4  1.4  NaN  1.01

In [74]: df.columns.dtype
Out[74]: dtype('int64')

In [75]: df.stack().reset_index().drop(0, 1).apply(tuple, axis=1).tolist()
Out[75]: [(0, 2), (2, 1), (4, 0), (4, 2)]

if your column names are of object dtype:

In [81]: df.columns.dtype
Out[81]: dtype('O')

In [83]: df.stack().reset_index().astype(int).drop(0,1).apply(tuple, axis=1).tolist()
Out[83]: [(0, 2), (2, 1), (4, 0), (4, 2)]

Timing for 50K rows DF:

In [89]: df = pd.concat([df] * 10**4, ignore_index=True)

In [90]: df.shape
Out[90]: (50000, 3)

In [91]: %timeit list(map(tuple, np.argwhere(~np.isnan(df.values))))
10 loops, best of 3: 144 ms per loop

In [92]: %timeit df.stack().reset_index().drop(0, 1).apply(tuple, axis=1).tolist()
1 loop, best of 3: 1.67 s per loop

Conclusion: the Nickil Maveli's solution is 12 times faster for this test DF

0 讨论(0)

查看其它2个回答