问题
I have a CVS file which has data for different countries at different weeks of this year. I want to create a summary dataframe (within r) grouping together data for weeks 21-24 and weeks 37-41. The data is set as attached example:
I am a beginner and not sure where to start. Thanks
回答1:
We can use case_when
to construct a grouping column based on the substring in the 'year_week' as well as do the grouping on 'country' and summarise
the sum
of 'new_cases`
library(dplyr)
library(stringr)
df1 %>%
group_by(country, grp = case_when(as.numeric(str_remove(year_week,
".*-W")) %in% 21:24 ~ 'W21_W24', TRUE ~ 'W37_W41')) %>%
summarise(new_cases = sum(new_cases, na.rm = TRUE), .groups = 'drop')
-output
# A tibble: 6 x 3
# country grp new_cases
# <chr> <chr> <dbl>
#1 Austria W21_W24 874
#2 Austria W37_W41 19045
#3 Belgium W21_W24 4231
#4 Belgium W37_W41 80918
#5 Bulgaria W21_W24 555
#6 Bulgaria W37_W41 6917
data
df1 <- structure(list(country = c("Austria", "Austria", "Austria", "Austria",
"Austria", "Austria", "Austria", "Austria", "Belgium", "Belgium",
"Belgium", "Belgium", "Belgium", "Belgium", "Belgium", "Belgium",
"Belgium", "Bulgaria", "Bulgaria", "Bulgaria", "Bulgaria", "Bulgaria",
"Bulgaria"), country_code = c("AT", "AT", "AT", "AT", "AT", "AT",
"AT", "AT", "BE", "BE", "BE", "BE", "BE", "BE", "BE", "BE", "BE",
"BG", "BG", "BG", "BG", "BG", "BG"), year_week = c("2020-W21",
"2020-W22", "2020-W23", "2020-W24", "2020-W37", "2020-W38", "2020-W39",
"2020-W40", "2020-W21", "2020-W22", "2020-W23", "2020-W24", "2020-W37",
"2020-W38", "2020-W39", "2020-W40", "2020-W41", "2020-W24", "2020-W37",
"2020-W38", "2020-W39", "2020-W40", "2020-W41"), new_cases = c(267,
231, 184, 192, 3977, 4997, 4992, 5079, 1516, 1170, 843, 702,
6012, 9947, 11192, 18368, 35399, 555, 937, 928, 1178, 1521, 2353
)), class = "data.frame", row.names = c(NA, -23L))
来源:https://stackoverflow.com/questions/65159760/grouping-data-in-weeks-using-r