问题
I have the following data-set (dat) with 8 unique treatment groups. I want to sample 3 points from each unique group and store their mean and variance. I want to do this 1000 times over (sample with replacement) using a loop to store all the values in output. I tried to do this loop and I keep running into unexpected '=' in:"output[i] <- summarise(group_by(new_df[i], fertilizer,crop, level),mean[i]="
Any suggestions on how to fix it, or make it more
fertilizer <- c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P")
crop <- c("alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group","alone","group")
level <- c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low")
growth <- c(0,0,1,2,90,5,2,5,8,55,1,90,2,4,66,80,1,90,2,33,56,70,99,100,66,80,1,90,2,33,0,0,1,2,90,5,2,2,5,8,55,1,90,2,4,66,0,0)
dat <- data.frame(fertilizer, crop, level, growth)
library(dplyr)
for(i in 1:1000){
new_df[i] <- dat %>%
group_by(fertilizer, crop, level) %>%
sample_n(3)
output[i] <- summarise(
group_by(new_df[i], fertilizer, crop, level),
mean[i] = mean(growth),
var[i] = sd(growth) * sd(growth))
}
回答1:
I don't think you need a loop. You can do this faster by sampling 3*1000
values per group at once, assign sample_id
and add it to grouping variables, and finaly summarize
to get desired values. This way you are calling all functions only once. -
dat %>%
group_by(fertilizer, crop, level) %>%
sample_n(3*1000, replace = T) %>%
mutate(sample_id = rep(1:1000, each = 3)) %>%
group_by(sample_id, add = TRUE) %>%
summarise(
mean = mean(growth, na.rm = T),
var = sd(growth)^2
) %>%
ungroup()
# A tibble: 8,000 x 6
fertilizer crop level sample_id mean var
<chr> <chr> <chr> <int> <dbl> <dbl>
1 N alone high 1 30.7 2640.
2 N alone high 2 1 0
3 N alone high 3 60.3 2640.
4 N alone high 4 1.33 0.333
5 N alone high 5 1.33 0.333
6 N alone high 6 60.3 2640.
7 N alone high 7 1.33 0.333
8 N alone high 8 30.3 2670.
9 N alone high 9 1.33 0.333
10 N alone high 10 60.7 2581.
# ... with 7,990 more rows
回答2:
Try this:
replicate(2, {
dat %>%
group_by(fertlizer, crop, level) %>%
sample_n(3) %>%
summarize(mu = mean(growth), sigma2 = sd(growth)^2) %>%
ungroup()
}, simplify = FALSE)
# [[1]]
# # A tibble: 8 x 5
# fertlizer crop level mu sigma2
# <fct> <fct> <fct> <dbl> <dbl>
# 1 N alone high 1 1
# 2 N alone low 30.7 2641.
# 3 N group high 33.3 2408.
# 4 N group low 56 553
# 5 P alone high 22.7 1409.
# 6 P alone low 2.33 2.33
# 7 P group high 40.3 1336.
# 8 P group low 23 1387
# [[2]]
# # A tibble: 8 x 5
# fertlizer crop level mu sigma2
# <fct> <fct> <fct> <dbl> <dbl>
# 1 N alone high 30.3 2670.
# 2 N alone low 52.7 2069.
# 3 N group high 61.7 2408.
# 4 N group low 20 925
# 5 P alone high 35.3 3042.
# 6 P alone low 19.7 990.
# 7 P group high 14.3 270.
# 8 P group low 32 2524.
(Replace 2
with your 1000
.)
来源:https://stackoverflow.com/questions/57582024/resample-and-looping-over-dplyr-functions-in-r