Simple algorithm for online outlier detection of a generic time series

前端未结

关注

 2  781

I am working with a large amount of time series. These time series are basically network measurements coming every 10 minutes, and some of them are periodic (i.e. the bandwidth)

相关标签:

2条回答

忘掉有多难

2021-01-31 06:49
I suggest the scheme below, which should be implementable in a day or so:

Training
- Collect as many samples as you can hold in memory
- Remove obvious outliers using the standard deviation for each attribute
- Calculate and store the correlation matrix and also the mean of each attribute
- Calculate and store the Mahalanobis distances of all your samples
Calculating "outlierness":

For the single sample of which you want to know its "outlierness":
- Retrieve the means, covariance matrix and Mahalanobis distances from training
- Calculate the Mahalanobis distance "d" for your sample
- Return the percentile in which "d" falls (using the Mahalanobis distances from training)
That will be your outlier score: 100% is an extreme outlier.

PS. In calculating the Mahalanobis distance, use the correlation matrix, not the covariance matrix. This is more robust if the sample measurements vary in unit and number.
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2021-01-31 07:11

This is a big and complex subject, and the answer will depend on (a) how much effort you want to invest in this and (b) how effective you want your outlier detection to be. One possible approach is adaptive filtering, which is typically used for applications like noise cancelling headphones, etc. You have a filter which constantly adapts to the input signal, effectively matching its filter coefficients to a hypothetical short term model of the signal source, thereby reducing mean square error output. This then gives you a low level output signal (the residual error) except for when you get an outlier, which will result in a spike, which will be easy to detect (threshold). Read up on adaptive filtering, LMS filters, etc, if you're serious about this kind of technique.

0 讨论(0)
发布评论:

提交评论
- 加载中...

Simple algorithm for online outlier detection of a generic time series

Training

Calculating "outlierness":