Remove Outliers in Pandas DataFrame using Percentiles

后端 未结 4 1875
自闭症患者
自闭症患者 2021-01-30 09:28

I have a DataFrame df with 40 columns and many records.

df:

User_id | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 |...| Col39

For e

4条回答
  •  闹比i
    闹比i (楼主)
    2021-01-30 10:12

    What you are describing is similar to the process of winsorizing, which clips values (for example, at the 5th and 95th percentiles) instead of eliminating them completely.

    Here's an example:

    import pandas as pd
    from scipy.stats import mstats
    %matplotlib inline
    
    test_data = pd.Series(range(30))
    test_data.plot()
    

    # Truncate values to the 5th and 95th percentiles
    transformed_test_data = pd.Series(mstats.winsorize(test_data, limits=[0.05, 0.05])) 
    transformed_test_data.plot()
    

提交回复
热议问题