pandas dataframe drop columns by number of nan

后端 未结 5 795
忘了有多久
忘了有多久 2021-01-04 02:08

I have a dataframe with some columns containing nan. I\'d like to drop those columns with certain number of nan. For example, in the following code, I\'d like to drop any co

相关标签:
5条回答
  • 2021-01-04 02:09

    There is a thresh param for dropna, you just need to pass the length of your df - the number of NaN values you want as your threshold:

    In [13]:
    
    dff.dropna(thresh=len(dff) - 2, axis=1)
    Out[13]:
              A         B
    0  0.517199 -0.806304
    1 -0.643074  0.229602
    2  0.656728  0.535155
    3       NaN -0.162345
    4 -0.309663 -0.783539
    5  1.244725 -0.274514
    6 -0.254232       NaN
    7 -1.242430  0.228660
    8 -0.311874 -0.448886
    9 -0.984453 -0.755416
    

    So the above will drop any column that does not meet the criteria of the length of the df (number of rows) - 2 as the number of non-Na values.

    0 讨论(0)
  • 2021-01-04 02:16

    Say you have to drop columns having more than 70% null values.

    data.drop(data.loc[:,list((100*(data.isnull().sum()/len(data.index))>70))].columns, 1)
    
    0 讨论(0)
  • 2021-01-04 02:20

    You can use a conditional list comprehension:

    >>> dff[[c for c in dff if dff[c].isnull().sum() < 2]]
              A         B
    0 -0.819004  0.919190
    1  0.922164  0.088111
    2  0.188150  0.847099
    3       NaN -0.053563
    4  1.327250 -0.376076
    5  3.724980  0.292757
    6 -0.319342       NaN
    7 -1.051529  0.389843
    8 -0.805542 -0.018347
    9 -0.816261 -1.627026
    
    0 讨论(0)
  • 2021-01-04 02:27

    Here is a possible solution:

    s = dff.isnull().apply(sum, axis=0) # count the number of nan in each column
    print s
       A    1 
       B    1
       C    3
       dtype: int64
    
    for col in dff: 
       if s[col] >= 2:  
           del dff[col]
    

    Or

    for c in dff:
        if sum(dff[c].isnull()) >= 2:
            dff.drop(c, axis=1, inplace=True)
    
    0 讨论(0)
  • 2021-01-04 02:31

    I recommend the drop-method. This is an alternative solution:

    dff.drop(dff.loc[:,len(dff) - dff.isnull().sum() <2], axis=1)
    
    0 讨论(0)
提交回复
热议问题