R - dplyr bootstrap issue

谁说胖子不能爱 提交于 2019-12-23 17:42:45

问题


I have an issue understanding how to use the dplyr bootstrap function properly.

What I want is to generate a bootstrap distribution from two randomly assigned groups and compute the difference in means, like this for example :

library(dplyr) 
library(broom) 
data(mtcars) 

mtcars %>% 
  mutate(treat = sample(c(0, 1), 32, replace = T)) %>% 
  group_by(treat) %>%
  summarise(m = mean(disp)) %>% 
  summarise(m = m[treat == 1] - m[treat == 0])

The issue is that I need to repeat this operation 100, 1000, or more times.

Using replicate, I can do

frep = function(mtcars) mtcars %>% 
  mutate(treat = sample(c(0, 1), 32, replace = T)) %>% 
  group_by(treat) %>%
  summarise(m = mean(disp)) %>% 
  summarise(m = m[treat == 1] - m[treat == 0])

replicate(1000, frep(mtcars = mtcars), simplify = T) %>% unlist()

and get the distribution

I don't really get how to use bootstraphere. How should I start ?

mtcars %>% 
  bootstrap(10) %>% 
  mutate(treat = sample(c(0, 1), 32, replace = T)) 

mtcars %>% 
  bootstrap(10) %>% 
  do(tidy(treat = sample(c(0, 1), 32, replace = T))) 

It's not really working. Where should I put the bootstrap pip ?

Thanks.


回答1:


In the do step, we wrap with data.frame and create the 'treat' column, then we can group by 'replicate' and 'treat' to get the summarised output column

mtcars %>% 
    bootstrap(10) %>% 
    do(data.frame(., treat = sample(c(0,1), 32, replace=TRUE))) %>% 
    group_by(replicate, treat) %>% 
    summarise(m = mean(disp)) %>%
    summarise(m = m[treat == 1] - m[treat == 0])
    #or as 1 occurs second and 0 second, we can also use
    #summarise(m = last(m) - first(m))


来源:https://stackoverflow.com/questions/39548923/r-dplyr-bootstrap-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!