I am using the pandas.DataFrame.dropna method to drop rows that contain NaN. This function returns a dataframe that excludes the dropped rows, as shown in the documentation.
I was going to leave a comment, but figured I'd write an answer as it started getting fairly complicated. Start with the following data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame([['a', 'b', np.nan], [np.nan, 'c', 'c'], ['c', 'd', 'a']],
columns=['col1', 'col2', 'col3'])
df
col1 col2 col3
0 a b NaN
1 NaN c c
2 c d a
And say we want to keep rows with Nans in the columns col2
and col3
One way to do this is the following: which is based on the answers from this post
df.loc[pd.isnull(df[['col2', 'col3']]).any(axis=1)]
col1 col2 col3
0 a b NaN
So this gives us the rows that would be dropped if we dropped rows with Nans in the columns of interest. To keep the columns we can run the same code, but use a ~
to invert the selection
df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)]
col1 col2 col3
1 NaN c c
2 c d a
this is equivalent to:
df.dropna(subset=['col2', 'col3'])
Which we can test:
df.dropna(subset=['col2', 'col3']).equals(df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)])
True
You can of course test this on your own larger dataframes but should get the same answer.