问题
How do I iterate over a dataframe like the following and return the non-NaN value locations as a tuple. i.e.
df:
0 1 2
0 NaN NaN 1
1 1 NaN NaN
2 NaN 2 NaN
I would get an output of [(0, 1), (2, 0), (1, 2)]. Would the best way be to do a nested-for loop? Or is there an easier way I'm unaware of through Pandas.
回答1:
Assuming you don't need in order, you could stack the nonnull values and work on index values.
In [26]: list(df[df.notnull()].stack().index)
Out[26]: [(0L, '2'), (1L, '0'), (2L, '1')]
In [27]: df[df.notnull()].stack().index
Out[27]:
MultiIndex(levels=[[0, 1, 2], [u'0', u'1', u'2']],
labels=[[0, 1, 2], [2, 0, 1]])
Furthermore, using stack method, NaN
are ignored anyway.
In [28]: list(df.stack().index)
Out[28]: [(0L, '2'), (1L, '0'), (2L, '1')]
回答2:
To get the non-null locations:
import numpy as np
>>> np.argwhere(df.notnull().values).tolist()
[[0, 2], [1, 0], [2, 1]]
If you really want them as tuple pairs, just use a list comprehension:
>>> [tuple(pair) for pair in np.argwhere(df.notnull().values).tolist()]
[(0, 2), (1, 0), (2, 1)]
To get the null locations:
>>> np.argwhere(df.isnull().values).tolist()
[[0, 0], [0, 1], [1, 1], [1, 2], [2, 0], [2, 2]]
回答3:
A direct way :
list(zip(*np.where(df.notnull())))
for
[(0, 2), (1, 0), (2, 1)]
来源:https://stackoverflow.com/questions/36375939/how-to-get-row-column-indices-of-all-non-nan-items-in-pandas-dataframe