Removing rows with NaN in MultiIndex with duplicates

问题

Updated with a DataFrame that repros my exact issue

I have an issue where NaN appearing in my indexes is leading to non-unique rows (since NaN !== NaN). I need to drop all rows where NaN occurs in the index. My previous question had an example DataFrame with a single NaN row, however the original solution did not resolve my issue as it did not meet this poorly advertised requirement:

(Note that in the actual data I have thousands of such rows, including duplicate rows since NaN !== NaN so this is permissible on an index)

(from my original post)

The Issue

>>>import pandas as pd
>>>import numpy as np
>>> df = pd.DataFrame([[1,1,"a"],[1,2,"b"],[1,3,"c"],[1,np.nan,"x"],[1,np.nan,"x"],[1,np.nan,"x"],[2,1,"d"],[2,2,"e"],[np.nan,1,"x"],[np.nan,2,"x"],[np.nan,1,"x"]], columns=["a","b","c"])
>>>df
         c
a   b
1.0 1.0  a
    2.0  b
    3.0  c
    NaN  x
    NaN  x
    NaN  x
2.0 1.0  d
    2.0  e
NaN 1.0  x
    2.0  x
    1.0  x

Note the duplicate rows: (1.0, NaN) and (NaN, 1.0)

Failed Solutions:

I've tried something simple like:

>>>df = df[pandas.notnull(df.index)]

But this fails because notnull is not implemented for MultiIndex.

Also one of the early answers suggested:

>>>df = df.reindex(df.index.dropna())

However this failed with the error:

Exception: cannot handle a non-unique multi-index!

Desired Output:

>>>df
         c
a   b
1.0 1.0  a
    2.0  b
    3.0  c
2.0 1.0  d
    2.0  e

(all NaN index rows are dropped, eliminating any non-unique rows)

回答1:

Option 1
reset_index, dropna, and set_index once more.

c = df.index.names
df = df.reset_index().dropna().set_index(c)
df

         c
a   b     
1.0 1.0  a
    2.0  b
    3.0  c
2.0 1.0  d
    2.0  e
    2.0  x
    1.0  x

If your MultiIndex is unique, you can use...
Option 2
df.index.dropna and df.reindex

df = df.reindex(df.index.dropna())

来源：https://stackoverflow.com/questions/46163674/removing-rows-with-nan-in-multiindex-with-duplicates

标签

python

pandas

dataframe

nan

multi-index