Row wise outlier detection in python

…衆ロ難τιáo~ 提交于 2021-02-07 10:19:55

问题


I have the CSV data as follows:

A_ID    P_ID    1429982904  1430370002  1430974801  1431579602  1432184403  1432789202  1435208402  1435308653
11Jgipc qjMakF  364             365             363             363             364             364             364             367
11Jgipc qxL8FJ  18              18              18              18              18              18              18              18
11Jgipc r0Bpnt  40              40              41              41              41              42              42              42
11Jgipc roLk4N  140             140             143             143             146             147             147             149
11Jgipc tOudhM  12              13              13              13              13              13              14              14
11Jgipc u-x6o8  678             678             688             688             689             690             692             695
11Jgipc u5HHmV  1778            1785            1811           1811             1819            1826            1834            1836
11Jgipc ufrVoP  67              67              67              67              67              67              67              67
11Jgipc vRqMK4  36              36              34              34              34              34              34              34
11Jgipc wbdj-C  31              33              35              35              36              36              36              37
11Jgipc xtRiw3  6               6               6               6               6               6               6               6

What I want to do is, find outliers in each row.

About the data:

The column headers apart from A_ID and P_IDare timestamps. So for each pair of A_ID and P_ID (i.e. in a row), set of values are present. So each row can be considered as a time-series.

Expected Output:

For each row, probably the tuple(s) in the form [(A_ID,PID):(Value, ColumnHeader),.....]

What I have tried:

I have tried as per the suggestions given in this solution.

  • The simplest solution of finding mean and standard deviation first, followed by finding outliers which are K-times standard deviation and above mean did not work as for each row the value of K differs.
  • Even the moving average method seems to be not appropriate for this case, because for every row the constraint would differ.
  • Manually setting such constraint is not an option as the number of rows are large and so do the number of such files I want to find outliers for.

What could be better options as per my understanding:

  • Using Scikit Learn - "Outlier detection with several methods". If yes, how can I do it?

  • Any other specific package? May be in Pandas? if so, how can I do it?

Any example, help or suggestion would be much appreciated.

来源:https://stackoverflow.com/questions/33938398/row-wise-outlier-detection-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!