Winsorizing data by column in pandas with NaN

前端 未结 1 820
耶瑟儿~
耶瑟儿~ 2020-12-21 08:23

I\'d like to winsorize several columns of data in a pandas Data Frame. Each column has some NaN, which affects the winsorization, so they need to be removed. The only way I

相关标签:
1条回答
  • 2020-12-21 09:10

    As often happens, simply creating the MWE helped clarify. I need to use clip() in combination with quantile() as below:

    df2 = df.clip(lower=df.quantile(0.01), upper=df.quantile(0.99), axis=1)
    df2.quantile([0, 0.01, 0.25, 0.5, 0.75, 0.99, 1])
    

    Output:

                   one       two      three          four
    0.00  9.862626e-07  0.000974   0.975807   1003.814520
    0.01  9.862666e-07  0.000974   0.975816   1003.820092
    0.25  2.485043e-05  0.024975  25.200378  25099.994780
    0.50  4.975859e-05  0.049810  50.290946  50374.548980
    0.75  7.486737e-05  0.074842  74.794537  75217.343920
    0.99  9.897462e-05  0.098986  98.978245  98991.436977
    1.00  9.897463e-05  0.098986  98.978263  98991.438985
    
    In [384]: df2.count()
    Out[384]:
    one       90700
    two       91600
    three     63500
    four     100000
    dtype: int64
    

    The numbers are different from above because I have maintained all of the data in each column that is not missing (NaN).

    0 讨论(0)
提交回复
热议问题