Why is dplyr::cummean(x) not equal to cumsum(x)/seq_along(x)?

后端 未结 1 2008
误落风尘
误落风尘 2021-01-19 06:08

Why is cummean(x) not equal to cumsum(x)/seq_along(x)?

set.seed(456)
x <- as.integer(runif(30)*300)
x

cummean(x)
cumsum(x)/seq_         


        
相关标签:
1条回答
  • 2021-01-19 06:27

    This is actually an issue with the dplyr::cummean function as of dplyr 1.1.0 see here. Romain Francois pushed a fix four days ago so if you pull the dplyr version from github it should give the correct results, will try and update in a sec.

    Example that was used in the issue mentioned above:

    library(tidyverse)
    x <- 1:5
    
    # long(er) way
    cumsum(x) / seq_along(x)
    #> [1] 1.0 1.5 2.0 2.5 3.0
    
    # dplyr 0.8.5 cummean()
    cummean(x)
    #> [1] 1.0 1.5 2.0 2.5 3.0
    
    # dplyr 1.0.0 cummean()
    cummean(x)
    #> [1] 1.000000 1.000000 1.333333 1.750000 2.200000
    

    What caused the bug (also from github issue linked above):

    It looks like the indexing is off by one for dplyr_cummean in /src/funs.cpp, causing the first index to be repeated twice (and the last index to be dropped). I'll submit a pull request with a slight change which I think makes it work as intended.


    Update: Current version on github (1.0.0.9000) gives correct result:

    library(dplyr)
    packageVersion("dplyr")
    #[1] ‘1.0.0.9000’
    
    set.seed(456)
    x <- as.integer(runif(30)*300)
    
    all(dplyr::cummean(x) == cumsum(x)/seq_along(x))
    #[1] TRUE
    
    0 讨论(0)
提交回复
热议问题