问题
I have a dataset that may contain MULTIPLE observations per date. So there could be 5 observations on date1, 2 observations on date2, and 1 observation on group3.
I want to calculate the moving average - by date - and importantly while not summarising/reducing' the number of rows. That is In this example above, I would still have 8 rows of data, and in a column next to it I would have that date's rolling average price I find this challenging because when I use a typical rolling function from ZOO package it goes line by-line and I dont know how to get it to skip by DATE
for example first step normally would be to:
df %>%
groupy_by(DATE) %>%
summarise(mean_daily_price = mean(price)) %>%
ungroup() %>%
arrange(Date) %>%
mutate( ra = rollapply(price, 2, mean, partial=T)
--- but the summarise makes me lose rows.
library(dplyr)
library(zoo)
DF = structure(list(Date = c("Jan-13", "Jan-13", "Jan-13", "Jan-13", "Jan-13", "Jul-14", "Jul-14", "Oct-16"), Price = c(100L, 200L, 300L, 1000L, 400L, 150L, 50L, 600L), Average.by.Date = c(400L, 400L, 400L, 400L, 400L, 100L, 100L, 600L), Moving_Average_Size_2 = c(NA, NA, NA, NA, NA, 250L, 250L, 350L)), .Names = c("Date", "Price", "Average.by.Date", "Moving_Average_Size_2"), class = "data.frame", row.names = c(NA,
-8L))
回答1:
I think that your safest approach will need to be a two step process -- calculate the rolling averages by Date
then merge them back in (still using dplyr
here)
rolledAvg <-
DF %>%
group_by(Date) %>%
summarise(mean_daily_price = mean(Price)) %>%
ungroup() %>%
arrange(Date) %>%
mutate( ra = rollapply(mean_daily_price
, 2
, mean
, partial=T
, fill = NA))
left_join(DF, rolledAvg)
gives:
Date Price Average.by.Date Moving_Average_Size_2 mean_daily_price ra
1 Jan-13 100 400 NA 400 250
2 Jan-13 200 400 NA 400 250
3 Jan-13 300 400 NA 400 250
4 Jan-13 1000 400 NA 400 250
5 Jan-13 400 400 NA 400 250
6 Jul-14 150 100 250 100 350
7 Jul-14 50 100 250 100 350
8 Oct-16 600 600 350 600 600
I see in a comment to another answer that you don't think the first ra
should be 250 -- if that is the case, change the calculation in your call to rollapply
. Right now, it appears to be behaving as expected/documented. So, if you want something different, you will need to explain the change you want (probably a separate question).
Another caveat, particularly important for any other readers that stumble on this: this current approach treats sequential date entries as equidistant, no matter how far apart they actually are. If this works for your need, great. But, in many use cases, you may need to take care of the actual amount of time between measurements.
Similarly, the current approach loses all information about the number of measurements taken, it may be worthwhile to consider an approach that weights by number of observations (unless you are happy trusting each day's calculated average).
回答2:
In the code below, we use mutate
instead of summarise
to add mean_daily_price
, so that we keep all rows of the data frame. Then, in the final mutate
we run rollapply
only on the unique values of mean_daily_price
, but then use table
and rep
to repeat the output of rollapply
by the number of rows for each Date
.
DF %>%
arrange(Date) %>%
group_by(Date) %>%
mutate(mean_daily_price = mean(Price)) %>%
ungroup() %>%
mutate(ra = rep(rollapply(unique(mean_daily_price), 2, mean, fill=NA, align="right"),
table(Date)[order(unique(Date))]))
Date Price Average.by.Date Moving_Average_Size_2 mean_daily_price ra 1 Jan-13 100 400 NA 400 NA 2 Jan-13 200 400 NA 400 NA 3 Jan-13 300 400 NA 400 NA 4 Jan-13 1000 400 NA 400 NA 5 Jan-13 400 400 NA 400 NA 6 Jul-14 150 100 250 100 250 7 Jul-14 50 100 250 100 250 8 Oct-16 600 600 350 600 350
来源:https://stackoverflow.com/questions/39921555/moving-average-by-unique-date-with-multiple-observations-per-date