This is a follow up to a previous question about learning multiple models.
The use case is that I have multiple observations for each subject, and I want to train a mode
Just adding an answer for completeness here, you will need to install multidplyr from Hadley's repo to run this, more info in the vignette:
library(dplyr)
library(multidplyr)
library(purrr)
cluster <- create_cluster(4)
set_default_cluster(cluster)
cluster_library(cluster, "fitdistrplus")
# dt is a dataframe, subject_id identifies observations from each subject
by_subject <- partition(dt, subject_id)
fits <- by_subject %>%
do(fit = fitdist(.$observation, "norm")))
collected_fits <- collect(fits)$fit
collected_summaries <- collected_fits %>% map(summary)
There is the furrr package now, for example something like:
library(dplyr)
library(furrr)
plan(multiprocess)
dt %>%
split(dt$subject_id) %>%
future_map(~fitdist(.$observation, "norm"))