Pandas dropna - store dropped rows

后端 未结 2 2078
旧时难觅i
旧时难觅i 2021-02-18 21:14

I am using the pandas.DataFrame.dropna method to drop rows that contain NaN. This function returns a dataframe that excludes the dropped rows, as shown in the documentation.

2条回答
  •  一向
    一向 (楼主)
    2021-02-18 21:43

    I was going to leave a comment, but figured I'd write an answer as it started getting fairly complicated. Start with the following data frame:

    import pandas as pd
    import numpy as np
    df = pd.DataFrame([['a', 'b', np.nan], [np.nan, 'c', 'c'], ['c', 'd', 'a']],
                  columns=['col1', 'col2', 'col3'])
    df
      col1 col2 col3
    0    a    b  NaN
    1  NaN    c    c
    2    c    d    a
    

    And say we want to keep rows with Nans in the columns col2 and col3 One way to do this is the following: which is based on the answers from this post

    df.loc[pd.isnull(df[['col2', 'col3']]).any(axis=1)]
    
      col1 col2 col3
    0    a    b  NaN
    

    So this gives us the rows that would be dropped if we dropped rows with Nans in the columns of interest. To keep the columns we can run the same code, but use a ~ to invert the selection

    df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)]
    
      col1 col2 col3
    1  NaN    c    c
    2    c    d    a
    

    this is equivalent to:

    df.dropna(subset=['col2', 'col3'])
    

    Which we can test:

    df.dropna(subset=['col2', 'col3']).equals(df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)])
    
    True
    

    You can of course test this on your own larger dataframes but should get the same answer.

提交回复
热议问题