While working in Pandas in Python...
I'm working with a dataset that contains some missing values, and I'd like to return a dataframe which contains only those rows which have missing data. Is there a nice way to do this?
(My current method to do this is an inefficient "look to see what index isn't in the dataframe without the missing values, then make a df out of those indices.")
metersk
You can use any
axis=1
to check for least one True
per row, then filter with boolean indexing:
null_data = df[df.isnull().any(axis=1)]
Similar to metersk's answer,
null_data = df[np.logical_or.reduce(df.isnull().values, axis=1)]
Test
n = 2
df = pd.DataFrame({'a':np.tile([0,1,2,3,4,np.nan],n),
'b':np.tile([0,1,2,3,np.nan,5],n)})
x = df[np.logical_or.reduce(df.isnull().values,axis=1)]
y = df[df.isnull().any(axis=1)]
x.equals(y)
来源:https://stackoverflow.com/questions/30447083/python-pandas-return-only-those-rows-which-have-missing-values