Find Indexes of Non-NaN Values in Pandas DataFrame

后端 未结 2 1760
我寻月下人不归
我寻月下人不归 2020-12-21 04:14

I have a very large dataset (roughly 200000x400), however I have it filtered and only a few hundred values remain, the rest are NaN. I would like to create a list of indexes

相关标签:
2条回答
  • 2020-12-21 04:27

    Convert the dataframe to it's equivalent NumPy array representation and check for NaNs present. Later, take the negation of it's corresponding indices (indicating non nulls) using numpy.argwhere. Since the output required must be a list of tuples, you could then make use of generator map function applying tuple as function to every iterable of the resulting array.

    >>> list(map(tuple, np.argwhere(~np.isnan(df.values))))
    [(0, 2), (2, 1), (4, 0), (4, 2)]
    
    0 讨论(0)
  • 2020-12-21 04:30

    assuming that your column names are of int dtype:

    In [73]: df
    Out[73]:
         0    1     2
    0  NaN  NaN  1.20
    1  NaN  NaN   NaN
    2  NaN  1.1   NaN
    3  NaN  NaN   NaN
    4  1.4  NaN  1.01
    
    In [74]: df.columns.dtype
    Out[74]: dtype('int64')
    
    In [75]: df.stack().reset_index().drop(0, 1).apply(tuple, axis=1).tolist()
    Out[75]: [(0, 2), (2, 1), (4, 0), (4, 2)]
    

    if your column names are of object dtype:

    In [81]: df.columns.dtype
    Out[81]: dtype('O')
    
    In [83]: df.stack().reset_index().astype(int).drop(0,1).apply(tuple, axis=1).tolist()
    Out[83]: [(0, 2), (2, 1), (4, 0), (4, 2)]
    

    Timing for 50K rows DF:

    In [89]: df = pd.concat([df] * 10**4, ignore_index=True)
    
    In [90]: df.shape
    Out[90]: (50000, 3)
    
    In [91]: %timeit list(map(tuple, np.argwhere(~np.isnan(df.values))))
    10 loops, best of 3: 144 ms per loop
    
    In [92]: %timeit df.stack().reset_index().drop(0, 1).apply(tuple, axis=1).tolist()
    1 loop, best of 3: 1.67 s per loop
    

    Conclusion: the Nickil Maveli's solution is 12 times faster for this test DF

    0 讨论(0)
提交回复
热议问题