问题
I tried to use AnomalyDetectionTs()
by library(AnomalyDetection)
from https://github.com/twitter/AnomalyDetection
and https://www.r-bloggers.com/anomaly-detection-in-r/
on my data. In my example data, there are very swing values without dropping curve (or dropping slowly like pattern) on plot more than it should be from its pattern. This function doesn't work for me. All those anomaly detected points by the function are right and normal values.
This is the result from the function :
My example data : https://raw.githubusercontent.com/ieatbaozi/R-Practicing/master/example.csv
df <- read.csv(url("https://raw.githubusercontent.com/ieatbaozi/R-Practicing/master/example.csv"),header = TRUE,stringsAsFactors = FALSE)
df$DateTime <- as.POSIXct(df$DateTime)
library(AnomalyDetection)
ADtest <- AnomalyDetectionTs(df, max_anoms=0.1, direction='both', plot=TRUE)
ADtest$plot
Here is my expected result : How to detect those abnormal data?
How to fix those values by filling most proper values? Smooth them to plot close to pattern around them and total value of all data still be the same after fixing those values.
My extra question is : Do you have any idea to find its pattern? I can you give you more information. Thank you so much for you helps.
回答1:
Here is a possible solution.
- Compute the mean values for small windows around each point (rolling mean)
- Compute the difference between the actual value and the local mean.
- Compute the standard deviation for all of the differences from step 2.
- Flag as outliers those points that are more than X standard deviations from the local mean.
Using this method, I got the points that you are looking for, together with a few others - points that are in the transition from the very low values to the very high values. You may be able to filter those out.
Code
library(zoo) ## For rolling mean function
WindowSize = 5
HalfWidth = (WindowSize-1)/2
SD = sqrt(mean((rollmean(df$Val, WindowSize ) -
df$Val[-c(1:HalfWidth, (nrow(df)+1-(1:HalfWidth)))])^2))
Out = which(abs(rollmean(df$Val, WindowSize ) -
df$Val[-c(1:HalfWidth, (nrow(df)+1-(1:HalfWidth)))]) > 2.95*SD) + 2
plot(df, type="l")
points(df[Out,], pch=16, col="red")
来源:https://stackoverflow.com/questions/44713124/r-how-to-detect-and-fix-abnormal-values-on-plot