I\'d like to winsorize several columns of data in a pandas Data Frame. Each column has some NaN, which affects the winsorization, so they need to be removed. The only way I
As often happens, simply creating the MWE helped clarify. I need to use clip() in combination with quantile() as below:
df2 = df.clip(lower=df.quantile(0.01), upper=df.quantile(0.99), axis=1)
df2.quantile([0, 0.01, 0.25, 0.5, 0.75, 0.99, 1])
Output:
one two three four
0.00 9.862626e-07 0.000974 0.975807 1003.814520
0.01 9.862666e-07 0.000974 0.975816 1003.820092
0.25 2.485043e-05 0.024975 25.200378 25099.994780
0.50 4.975859e-05 0.049810 50.290946 50374.548980
0.75 7.486737e-05 0.074842 74.794537 75217.343920
0.99 9.897462e-05 0.098986 98.978245 98991.436977
1.00 9.897463e-05 0.098986 98.978263 98991.438985
In [384]: df2.count()
Out[384]:
one 90700
two 91600
three 63500
four 100000
dtype: int64
The numbers are different from above because I have maintained all of the data in each column that is not missing (NaN).