问题
I have a dataframe which looks like this:
df <- data.frame (
time = rep(c("2010", "2011", "2012", "2013", "2014"),4),
age = rep(c("40-44", "45-49", "50-54", "55-59", "60-64"),4),
weight = rep(c(0.38, 0.23, 0.19, 0.12, 0.08),4),
ethnic = rep(c(rep("M",5),rep("NM",5)),2),
gender = c(rep("M",10), rep("F",10)),
pop = round((runif(10, min = 10000, max = 99999)), digits = 0),
count = round((runif(10, min = 100, max = 999)), digits = 0)
)
df$rate = df$count / df$pop
I want to calculate the direct age standardised incidence rates, where incidence rate = count/pop), and confidence intervals for these; for each subgrouping. So I would have a standardised rate for each combination of time, gender, ethnicity, age. Is there a way to do this in R?
I have tried using the function ageadjust.direct
from the R package {epitools}, as so:
age_adjust_test <- ageadjust.direct(count = df$count, pop = df$pop,
rate = df$rate, stdpop = df$weight)
The output from this is an overall adjusted rate, confidence intervals, and crude rate. Is there a way to get this output by each sub-group?
回答1:
We can do a group by summarise
into a list
and then unnest
the list
components into separate columns
library(tidyverse)
df %>%
group_by(time,age, ethnic, gender) %>%
summarise(age_adjust = list(ageadjust.direct(count = count,
pop = pop, rate = rate, stdpop = weight))) %>%
mutate(age_adjust = map(age_adjust, as.data.frame.list)) %>%
unnest
# A tibble: 20 x 8
# Groups: time, age, ethnic [10]
# time age ethnic gender crude.rate adj.rate lci uci
# <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
# 1 2010 40-44 M F 0.00763 0.00763 0.00709 0.00820
# 2 2010 40-44 M M 0.00763 0.00763 0.00709 0.00820
# 3 2010 40-44 NM F 0.0281 0.0281 0.0257 0.0306
# 4 2010 40-44 NM M 0.0281 0.0281 0.0257 0.0306
# 5 2011 45-49 M F 0.0145 0.0145 0.0136 0.0155
# 6 2011 45-49 M M 0.0145 0.0145 0.0136 0.0155
# 7 2011 45-49 NM F 0.0425 0.0425 0.0399 0.0453
# 8 2011 45-49 NM M 0.0425 0.0425 0.0399 0.0453
# 9 2012 50-54 M F 0.0116 0.0116 0.0109 0.0124
#10 2012 50-54 M M 0.0116 0.0116 0.0109 0.0124
#11 2012 50-54 NM F 0.00708 0.00708 0.00607 0.00821
#12 2012 50-54 NM M 0.00708 0.00708 0.00607 0.00821
#13 2013 55-59 M F 0.0251 0.0251 0.0232 0.0271
#14 2013 55-59 M M 0.0251 0.0251 0.0232 0.0271
#15 2013 55-59 NM F 0.00733 0.00733 0.00678 0.00792
#16 2013 55-59 NM M 0.00733 0.00733 0.00678 0.00792
#17 2014 60-64 M F 0.0101 0.0101 0.00944 0.0109
#18 2014 60-64 M M 0.0101 0.0101 0.00944 0.0109
#19 2014 60-64 NM F 0.00916 0.00916 0.00852 0.00984
#20 2014 60-64 NM M 0.00916 0.00916 0.00852 0.00984
回答2:
Simply use by
to subset dataframe by one or more factors, then pass the subset into your function. Here, by
will return a list of dataframes using the function values as shown on docs page. Outside by
, you can then bind all dfs into one final dataframe with do.call(rbind,...)
.
age_adjust_test_list <- by(df, df[,c("time", "gender", "ethnicity", "age")], function(sub) {
tmp <- ageadjust.direct(count = sub$count, pop = sub$pop,
rate = sub$rate, stdpop = sub$weight)
data.frame(time = max(sub$time),
gender = max(sub$gender),
ethnicity = max(sub$ethnicity),
age = max(sub$age),
crude_rate = tmp[[1]],
adj_rate = tmp[[2]],
lower_CI = tmp[[3]],
upper_CI = tmp[[4]])
})
final_df <- do.call(rbind, age_adjust_test_list)
NULLs will show for combinations that are not represented in dataframe. So consider filtering out as needed:
age_adjust_test_list <- Filter(function(x) !is.null(x), age_adjust_test_list)
回答3:
Here's a handy data.table way, one line is enough.
library(data.table)
library(epitools)
# convert df to data.table
setDT(df)
# define subgroups
group_by<-c('time','age', 'ethnic', 'gender')
# ageadjust.direct by subgroups. The trick is to include as.list()
df[, as.list(ageadjust.direct(count = count, pop = pop, rate = rate, stdpop = weight)), by=group_by]
来源:https://stackoverflow.com/questions/50322718/calculate-age-standardised-rates-by-sub-group-with-confidence-intervals-in-r