Remove Outliers in Pandas DataFrame using Percentiles

后端未结

关注

 4  1874

自闭症患者 2021-01-30 09:28

I have a DataFrame df with 40 columns and many records.

df:

User_id | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 |...| Col39

For e

4条回答

再見小時候 (楼主)

2021-01-30 10:31

Use an inner join. Something like this should work

cols = df.columns.tolist()
cols.remove('user_id') #remove user_id from list of columns

P = np.percentile(df[cols[0]], [5, 95])
new_df = df[(df[cols[0] > P[0]) & (df[cols[0]] < P[1])]
for col in cols[1:]:
    P = np.percentile(df[col], [5, 95])
    new_df = new_df.join(df[(df[col] > P[0]]) & (df[col] < P[1])], how='inner')

0 讨论(0)

查看其它4个回答