Pandas dropna - store dropped rows

后端未结

关注

 2  2079

I am using the pandas.DataFrame.dropna method to drop rows that contain NaN. This function returns a dataframe that excludes the dropped rows, as shown in the documentation.

相关标签:

2条回答

一向

2021-02-18 21:43
I was going to leave a comment, but figured I'd write an answer as it started getting fairly complicated. Start with the following data frame:
```
import pandas as pd
import numpy as np
df = pd.DataFrame([['a', 'b', np.nan], [np.nan, 'c', 'c'], ['c', 'd', 'a']],
              columns=['col1', 'col2', 'col3'])
df
  col1 col2 col3
0    a    b  NaN
1  NaN    c    c
2    c    d    a
```
And say we want to keep rows with Nans in the columns col2 and col3 One way to do this is the following: which is based on the answers from this post
```
df.loc[pd.isnull(df[['col2', 'col3']]).any(axis=1)]

  col1 col2 col3
0    a    b  NaN
```
So this gives us the rows that would be dropped if we dropped rows with Nans in the columns of interest. To keep the columns we can run the same code, but use a ~ to invert the selection
```
df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)]

  col1 col2 col3
1  NaN    c    c
2    c    d    a
```
this is equivalent to:
```
df.dropna(subset=['col2', 'col3'])
```
Which we can test:
```
df.dropna(subset=['col2', 'col3']).equals(df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)])

True
```
You can of course test this on your own larger dataframes but should get the same answer.
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2021-02-18 22:06
You can do this by indexing the original DataFrame by using the unary ~ (invert) operator to give the inverse of the NA free DataFrame.
```
na_free = df.dropna()
only_na = df[~df.index.isin(na_free.index)]
```
Another option would be to use the ufunc implementation of ~.
```
only_na = df[np.invert(df.index.isin(na_free.index))]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...