问题
Anomaly detection methods published and now abandoned by twitter have been separately forked and maintained in the anomalize package and the hrbrmstr/AnomalyDetection fork. Both have implemented features that are 'tidy'.
Working static versions
tidyverse_cran_downloads %>%
filter(package == "tidyr") %>%
ungroup() %>%
select(-package) -> one_package_only
one_package_only %>%
anomalize::time_decompose(count,
merge = TRUE,
method = "twitter",
frequency = "7 days") -> one_package_only_decomp
one_package_only_decomp %>%
anomalize::anomalize(remainder, method = "iqr") %>%
anomalize::time_recompose()
one_package_only_decomp %>%
select(date, remainder) %>%
AnomalyDetection::ad_ts(max_anoms = 0.02,
direction = 'both')
These work as expected.
I would like to apply the twitter anomaly detection process on a tiled window to my dataset, which is similar in structure to the anomalize::tidyverse_cran_downloads
dataset. A regular set of over 100 observations of a value, grouped by a categorical definition.
The tsibble
package (which replaces the old tibbletime
) has a method to apply a function in a purrr
-like syntax via slide,tile and stretch. This can include returning a full data-frame like object, inside another data-frame like object as per purrr
. (What a sentence!)
I've gone through the window function vignette but haven't had much luck.
Attempt 1 slide2
:
The anomalize::decompose_twitter
function takes two arguments, data
and target
tidyverse_cran_downloads %>%
mutate(
Monthly_MA = slide2_dfr(
.x = .,
.y = count,
~ anomalize::decompose_twitter,
.size = 5
)
)
Error: Element 1 has length 3, not 1 or 425. Call
rlang::last_error()to see a backtrace
Maybe I've misunderstood how the .x .y
syntax works?
Attempt 2:pmap
my_diag <- function(...) {
data <- tibble(...)
fit <- anomalize::decompose_twitter(data = data, target = count)
}
tidyverse_cran_downloads %>%
nest(-package) %>%
filter(package %in% c("tidyr", "lubridate")) %>% # just to make it quick
mutate(diag = purrr::map(data, ~ pslide_dfr(., my_diag, .size = 7)))
Error in stats::stl(., s.window = "periodic", robust = TRUE) : series is not periodic or has less than two periods
Appears something is running, but the period between observations is off somehow or not getting parsed?
Attempt 3: ad_ts
ad_ts
only takes one argument, so ignoring the fact that we have yet to find a way to calculate the remainder after decomposition, I should be able to use it via slide
. It also expects it's x
to be:
Time series as a two column data frame where the first column consists of the timestamps and the second column consists of the observations.
So we shouldn't have to do much to the data after it's nested.
tidyverse_cran_downloads %>%
nest(-package, .key = "my_data") %>%
mutate(
Daily_MA = slide_dfr(
.f = AnomalyDetection::ad_ts,
.x = my_data
)
)
Error in .f(.x[[i]], ...) : data must be a single data frame.
So the function is at least being called, but it's being called by more than a single data frame?
I want to:
- Apply a process of decomposition through the twitter algorithm, followed by anomaly detection on the remainder
- Use one of the two anomaly detection packages to do it, or a blend of the two
- Apply it to a window of time
- Over grouped categorical data
The only way my data set differs is that I have half hourly observations of values over a period of multiple months, and I actually only need the anomalies recalculated each day (i.e. once every 48 observations), where the window looks back over the prior 30 days to decompose and detect them.
(N.B. I would have tagged tsibble
and anomalize
, but I don't have the rep to make those tags)
回答1:
Approach 2 should work as expected? The error message is related to the stl()
that requires at least two seasonal periods to estimate. For example, daily data needs at least 14 observations for stl()
to run. Increasing the window size .size = 7 * 3
works fine.
my_decomp <- function(...) {
data <- tibble(...)
anomalize::decompose_twitter(data, count)
}
library(dplyr)
library(anomalize)
tidyverse_cran_downloads %>%
group_by(package) %>%
tidyr::nest() %>%
mutate(diag = purrr::map(data, ~ tsibble::pslide_dfr(., my_decomp, .size = 7 * 3)))
#> # A tibble: 15 x 3
#> package data diag
#> <chr> <list> <list>
#> 1 tidyr <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 2 lubridate <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 3 dplyr <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 4 broom <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 5 tidyquant <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 6 tidytext <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 7 ggplot2 <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 8 purrr <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 9 glue <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 10 stringr <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 11 forcats <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 12 knitr <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 13 readr <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 14 tibble <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 15 tidyverse <tibble [425 × 2]> <tibble [8,506 × 5]>
来源:https://stackoverflow.com/questions/56238837/apply-timeseries-decomposition-and-anomaly-detection-over-a-sliding-tiled-wind