问题
I have the CSV data as follows:
A_ID P_ID 1429982904 1430370002 1430974801 1431579602 1432184403 1432789202 1435208402 1435308653
11Jgipc qjMakF 364 365 363 363 364 364 364 367
11Jgipc qxL8FJ 18 18 18 18 18 18 18 18
11Jgipc r0Bpnt 40 40 41 41 41 42 42 42
11Jgipc roLk4N 140 140 143 143 146 147 147 149
11Jgipc tOudhM 12 13 13 13 13 13 14 14
11Jgipc u-x6o8 678 678 688 688 689 690 692 695
11Jgipc u5HHmV 1778 1785 1811 1811 1819 1826 1834 1836
11Jgipc ufrVoP 67 67 67 67 67 67 67 67
11Jgipc vRqMK4 36 36 34 34 34 34 34 34
11Jgipc wbdj-C 31 33 35 35 36 36 36 37
11Jgipc xtRiw3 6 6 6 6 6 6 6 6
What I want to do is, find outliers in each row.
About the data:
The column headers apart from A_ID
and P_ID
are timestamps. So for each pair of A_ID
and P_ID
(i.e. in a row), set of values are present. So each row can be considered as a time-series.
Expected Output:
For each row, probably the tuple(s) in the form [(A_ID,PID):(Value, ColumnHeader),.....]
What I have tried:
I have tried as per the suggestions given in this solution.
- The simplest solution of finding mean and standard deviation first, followed by finding outliers which are K-times standard deviation and above mean did not work as for each row the value of K differs.
- Even the moving average method seems to be not appropriate for this case, because for every row the constraint would differ.
- Manually setting such constraint is not an option as the number of rows are large and so do the number of such files I want to find outliers for.
What could be better options as per my understanding:
Using Scikit Learn - "Outlier detection with several methods". If yes, how can I do it?
Any other specific package? May be in Pandas? if so, how can I do it?
Any example, help or suggestion would be much appreciated.
来源:https://stackoverflow.com/questions/33938398/row-wise-outlier-detection-in-python