问题
I have a data set that has 3 columns , (ID , D , AE).
sample=data.frame(
ID=c(1,1,1,2,2,2),
D=c('a','b','c','a','c','c'),
AE=c('m','x','w','y','m','f')
)
I want to count the number of IDs where all possible combinations that consist of any combination between any two drugs within a certain ID and the AEs corresponding to that ID. please see the image to understand exactly what i mean enter image description here.
Someone could help me with a code that worked perfectly on the small dataset(sample). However the code would take several hours to work on the actual dataset that has (46000 unique IDs , 1600 unique D values and 3200 unique AE values). Actually i had to interrupt the session after 3 hours (while code execution) without getting any output.
here is the code that i want to improve its performance to suit huge dataset:
enter code here
library(tidyverse)
combinations <- sample %>%
mutate(D2 = D) %>%
group_by(ID) %>%
expand(crossing(D, D2, AE)) %>% # Get all D1, D2, AE combinations within-ID
filter(D2 > D) %>% # Deduplicate to unique combinations
rename(D1 = D) %>%
ungroup() %>%
distinct(D1, D2, AE) # Deduplicate across IDs
# For a given combination of D1, D2, AE; check how many IDs in sample have that combination
count_ids <- function(D1_val, D2_val, AE_val, data) {
data %>%
group_by(ID) %>%
mutate(
has_D1 = if_else(D1_val %in% D, "D1", "no D1"),
has_D2 = if_else(D2_val %in% D, "D2", "no D2"),
has_AE = if_else(AE_val %in% AE, "AE", "no AE")
) %>%
group_by(has_D1, has_D2, has_AE) %>%
summarise(n_IDs = n_distinct(ID), .groups = "drop") %>%
list(.)
}
combinations %>%
mutate(data = list(sample)) %>%
rowwise() %>%
mutate(data = count_ids(D1, D2, AE, data)) %>% # Get the ID counts for each combination
unnest(data) %>%
mutate(colname = str_c(has_D1, has_D2, has_AE, sep = ",")) %>% # Create a column name for each possibility of the combinations
select(-starts_with("has_")) %>%
pivot_wider(names_from = colname, values_from = n_IDs, values_fill = 0L) # Spread out to wide format
I would really appreciate your help. Thanks in advance
来源:https://stackoverflow.com/questions/64140017/accelerating-code-execution-in-r-to-provide-counting-of-all-possible-combination