R - Parallelizing multiple model learning (with dplyr and purrr)

前端 未结 2 921
眼角桃花
眼角桃花 2021-02-05 12:19

This is a follow up to a previous question about learning multiple models.

The use case is that I have multiple observations for each subject, and I want to train a mode

相关标签:
2条回答
  • 2021-02-05 12:30

    Just adding an answer for completeness here, you will need to install multidplyr from Hadley's repo to run this, more info in the vignette:

    library(dplyr)
    library(multidplyr)
    library(purrr)
    
    cluster <- create_cluster(4)
    set_default_cluster(cluster)
    cluster_library(cluster, "fitdistrplus")
    
    # dt is a dataframe, subject_id identifies observations from each subject
    by_subject <- partition(dt, subject_id)
    
    fits <- by_subject %>% 
        do(fit = fitdist(.$observation, "norm")))
    
    collected_fits <- collect(fits)$fit
    collected_summaries <- collected_fits %>% map(summary)
    
    0 讨论(0)
  • 2021-02-05 12:33

    There is the furrr package now, for example something like:

    library(dplyr)
    library(furrr)
    plan(multiprocess)
    
    dt %>% 
        split(dt$subject_id) %>%
        future_map(~fitdist(.$observation, "norm"))
    
    0 讨论(0)
提交回复
热议问题