I\'m trying to run what I think is simple code to eliminate any columns with all NaNs, but can\'t get this to work (axis = 1
works just fine when eliminating ro
You can use dropna with axis=1
and thresh=1
:
In[19]:
df.dropna(axis=1, thresh=1)
Out[19]:
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
This will drop any column which doesn't have at least 1 non-NaN value which will mean any column with all NaN
will get dropped
The reason what you tried failed is because the boolean mask:
In[20]:
df.notnull().any(axis = 0)
Out[20]:
a True
b True
c True
d False
dtype: bool
cannot be aligned on the index which is what is used by default, as this produces a boolean mask on the columns
You need loc, because filter by columns:
print (df.notnull().any(axis = 0))
a True
b True
c True
d False
dtype: bool
df = df.loc[:, df.notnull().any(axis = 0)]
print (df)
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
Or filter columns and then select by []
:
print (df.columns[df.notnull().any(axis = 0)])
Index(['a', 'b', 'c'], dtype='object')
df = df[df.columns[df.notnull().any(axis = 0)]]
print (df)
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
Or dropna with parameter how='all'
for remove all columns filled by NaN
s only:
print (df.dropna(axis=1, how='all'))
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN