Rbind list of vectors with differing lengths

后端 未结 2 543
失恋的感觉
失恋的感觉 2021-01-25 05:31

I am new to R and I am trying to build a frequency/severity simulation. Everything is working fine except that it takes about 10min to do 10000 simulations for each of 700 locat

相关标签:
2条回答
  • 2021-01-25 06:35

    We can append NAs at the end to make the length same for each of the list elements and then do the rbind

    out <- do.call(rbind, lapply(obs, `length<-`, max(lengths(obs))))
    as.data.frame(out) # if we need a data.frame as output
    

    or using tidyverse

    library(tidyverse)
    obs %>%
       set_names(seq_along(.)) %>% 
       stack %>% 
       group_by(ind) %>% 
       mutate(Col = paste0("Col", row_number())) %>% 
       spread(Col, values)
    
    0 讨论(0)
  • 2021-01-25 06:35

    Everything is working fine except that it takes [too long] to do [numsim] simulations

    If your real application uses rnorm or similar, you can make a single call to it:

    set.seed(1223)
    numsim = 3e5
    freqs = rN.D(numsim)
    maxlen = max(freqs)
    m = matrix(, maxlen, numsim)
    m[row(m) <= freqs[col(m)]] <- rX.D(sum(freqs))
    
    res = as.data.table(t(m))
    

    I am filling the data the "wrong way" (with each simulation on a column instead of a row) and then transposing since R fills matrix values using "column-major" order.


    If you need to use lapply, here's a benchmark for the final step:

    set.seed(1223)
    
    library(dplyr); library(tidyr); library(purrr)
    library(data.table)
    
    numsim = 3e5
    
    rN.D <- function(numsim) rpois(numsim, 4) 
    rX.D <- function(numsim) rnorm(numsim, mean = 5, sd = 4)
    
    freqs <- rN.D(numsim)
    obs <- lapply(freqs, function(x) rX.D(x))
    
    system.time({
    tidyres = obs %>%
       set_names(seq_along(.)) %>% 
       stack %>% 
       group_by(ind) %>% 
       mutate(Col = paste0("Col", row_number())) %>% 
       spread(Col, values)
    })
    #    user  system elapsed 
    #   16.56    0.31   16.88     
    
    system.time({
        out <- do.call(rbind, lapply(obs, `length<-`, max(lengths(obs))))
        bres = as.data.frame(out)
    })
    #    user  system elapsed 
    #    0.50    0.05    0.55 
    
    system.time(
        dtres <- setDT(transpose(obs))
    )
    #    user  system elapsed 
    #    0.03    0.01    0.05 
    

    The last approach is fastest compared to the other two (both from @akrun's answer).

    Comment. I would recommend using only data.table or tidyverse. Mixing and matching will get messy very quickly. When I was setting this example up, I saw that purrr has it's own transpose function, so if you loaded packages in a different order, code like this can give different results without warning.

    0 讨论(0)
提交回复
热议问题