Remove Outliers in Pandas DataFrame using Percentiles

后端 未结 4 1874
自闭症患者
自闭症患者 2021-01-30 09:28

I have a DataFrame df with 40 columns and many records.

df:

User_id | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 |...| Col39

For e

4条回答
  •  再見小時候
    2021-01-30 10:31

    Use an inner join. Something like this should work

    cols = df.columns.tolist()
    cols.remove('user_id') #remove user_id from list of columns
    
    P = np.percentile(df[cols[0]], [5, 95])
    new_df = df[(df[cols[0] > P[0]) & (df[cols[0]] < P[1])]
    for col in cols[1:]:
        P = np.percentile(df[col], [5, 95])
        new_df = new_df.join(df[(df[col] > P[0]]) & (df[col] < P[1])], how='inner')
    

提交回复
热议问题