Why is cummean(x)
not equal to cumsum(x)/seq_along(x)
?
set.seed(456)
x <- as.integer(runif(30)*300)
x
cummean(x)
cumsum(x)/seq_
This is actually an issue with the dplyr::cummean
function as of dplyr
1.1.0 see here. Romain Francois pushed a fix four days ago so if you pull the dplyr
version from github it should give the correct results, will try and update in a sec.
Example that was used in the issue mentioned above:
library(tidyverse)
x <- 1:5
# long(er) way
cumsum(x) / seq_along(x)
#> [1] 1.0 1.5 2.0 2.5 3.0
# dplyr 0.8.5 cummean()
cummean(x)
#> [1] 1.0 1.5 2.0 2.5 3.0
# dplyr 1.0.0 cummean()
cummean(x)
#> [1] 1.000000 1.000000 1.333333 1.750000 2.200000
What caused the bug (also from github issue linked above):
It looks like the indexing is off by one for dplyr_cummean in /src/funs.cpp, causing the first index to be repeated twice (and the last index to be dropped). I'll submit a pull request with a slight change which I think makes it work as intended.
Update: Current version on github (1.0.0.9000) gives correct result:
library(dplyr)
packageVersion("dplyr")
#[1] ‘1.0.0.9000’
set.seed(456)
x <- as.integer(runif(30)*300)
all(dplyr::cummean(x) == cumsum(x)/seq_along(x))
#[1] TRUE