pandas dataframe drop columns by number of nan

后端未结

关注

 5  795

I have a dataframe with some columns containing nan. I\'d like to drop those columns with certain number of nan. For example, in the following code, I\'d like to drop any co

相关标签:

5条回答

暖寄归人

2021-01-04 02:09
There is a thresh param for dropna, you just need to pass the length of your df - the number of NaN values you want as your threshold:
```
In [13]:

dff.dropna(thresh=len(dff) - 2, axis=1)
Out[13]:
          A         B
0  0.517199 -0.806304
1 -0.643074  0.229602
2  0.656728  0.535155
3       NaN -0.162345
4 -0.309663 -0.783539
5  1.244725 -0.274514
6 -0.254232       NaN
7 -1.242430  0.228660
8 -0.311874 -0.448886
9 -0.984453 -0.755416
```
So the above will drop any column that does not meet the criteria of the length of the df (number of rows) - 2 as the number of non-Na values.
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2021-01-04 02:16
Say you have to drop columns having more than 70% null values.
```
data.drop(data.loc[:,list((100*(data.isnull().sum()/len(data.index))>70))].columns, 1)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

南笙

2021-01-04 02:20

You can use a conditional list comprehension:

>>> dff[[c for c in dff if dff[c].isnull().sum() < 2]]
          A         B
0 -0.819004  0.919190
1  0.922164  0.088111
2  0.188150  0.847099
3       NaN -0.053563
4  1.327250 -0.376076
5  3.724980  0.292757
6 -0.319342       NaN
7 -1.051529  0.389843
8 -0.805542 -0.018347
9 -0.816261 -1.627026

0 讨论(0)

野趣味

2021-01-04 02:27

Here is a possible solution:

s = dff.isnull().apply(sum, axis=0) # count the number of nan in each column
print s
   A    1 
   B    1
   C    3
   dtype: int64

for col in dff: 
   if s[col] >= 2:  
       del dff[col]

for c in dff:
    if sum(dff[c].isnull()) >= 2:
        dff.drop(c, axis=1, inplace=True)

0 讨论(0)

迷失自我

2021-01-04 02:31
I recommend the drop-method. This is an alternative solution:
```
dff.drop(dff.loc[:,len(dff) - dff.isnull().sum() <2], axis=1)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...