Remove Outliers in Pandas DataFrame using Percentiles

后端 未结 4 1872
自闭症患者
自闭症患者 2021-01-30 09:28

I have a DataFrame df with 40 columns and many records.

df:

User_id | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 |...| Col39

For e

4条回答
  •  后悔当初
    2021-01-30 10:09

    Use this code and don't waste your time:

    Q1 = df.quantile(0.25)
    Q3 = df.quantile(0.75)
    IQR = Q3 - Q1
    
    df = df[~((df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]
    

    in case you want specific columns:

    cols = ['col_1', 'col_2'] # one or more
    
    Q1 = df[cols].quantile(0.25)
    Q3 = df[cols].quantile(0.75)
    IQR = Q3 - Q1
    
    df = df[~((df[cols] < (Q1 - 1.5 * IQR)) |(df[cols] > (Q3 + 1.5 * IQR))).any(axis=1)]
    

提交回复
热议问题