Find Indexes of Non-NaN Values in Pandas DataFrame

后端 未结 2 1759
我寻月下人不归
我寻月下人不归 2020-12-21 04:14

I have a very large dataset (roughly 200000x400), however I have it filtered and only a few hundred values remain, the rest are NaN. I would like to create a list of indexes

2条回答
  •  醉梦人生
    2020-12-21 04:30

    assuming that your column names are of int dtype:

    In [73]: df
    Out[73]:
         0    1     2
    0  NaN  NaN  1.20
    1  NaN  NaN   NaN
    2  NaN  1.1   NaN
    3  NaN  NaN   NaN
    4  1.4  NaN  1.01
    
    In [74]: df.columns.dtype
    Out[74]: dtype('int64')
    
    In [75]: df.stack().reset_index().drop(0, 1).apply(tuple, axis=1).tolist()
    Out[75]: [(0, 2), (2, 1), (4, 0), (4, 2)]
    

    if your column names are of object dtype:

    In [81]: df.columns.dtype
    Out[81]: dtype('O')
    
    In [83]: df.stack().reset_index().astype(int).drop(0,1).apply(tuple, axis=1).tolist()
    Out[83]: [(0, 2), (2, 1), (4, 0), (4, 2)]
    

    Timing for 50K rows DF:

    In [89]: df = pd.concat([df] * 10**4, ignore_index=True)
    
    In [90]: df.shape
    Out[90]: (50000, 3)
    
    In [91]: %timeit list(map(tuple, np.argwhere(~np.isnan(df.values))))
    10 loops, best of 3: 144 ms per loop
    
    In [92]: %timeit df.stack().reset_index().drop(0, 1).apply(tuple, axis=1).tolist()
    1 loop, best of 3: 1.67 s per loop
    

    Conclusion: the Nickil Maveli's solution is 12 times faster for this test DF

提交回复
热议问题