How to create missing value for repeated measurement data?

前端 未结 2 1433
鱼传尺愫
鱼传尺愫 2021-01-06 03:40

I have a data set that not every subject’s observations were observed at the exact same time points, but I want to turn it in to a data set that every one’s observations wer

相关标签:
2条回答
  • 2021-01-06 03:47

    We could do this using data.table. We convert the data.frame to data.table (setDT(m)), set the key columns (setkey) and join with the cross-join of unique elements of 'id' and 'age'

    library(data.table)
    setkey(setDT(m), id, age)[CJ(unique(id), unique(age))]
    #    id age IQ
    # 1:  1   2  3
    # 2:  1   3  4
    # 3:  1   4  5
    # 4:  1   5  4
    # 5:  1   6 NA
    # 6:  1   8 NA
    # 7:  2   2 NA
    # 8:  2   3  6
    # 9:  2   4 NA
    #10:  2   5 NA
    #11:  2   6  5
    #12:  2   8 NA
    #13:  3   2  3
    #14:  3   3 NA
    #15:  3   4 NA
    #16:  3   5  8
    #17:  3   6 NA
    #18:  3   8 10
    

    In the devel version i.e. v1.9.5, we can use unique=TRUE within CJ (from @Frank's comment)

    setDT(m, key=c('id', 'age'))[CJ(id, age, unique=TRUE)]
    

    Benchmarks

    set.seed(24)
    m1 <- data.frame(id=rep(1:10000, each=10), age=sample(2:400, 10000*10, 
             replace=TRUE), IQ=rnorm(10000*10))
    system.time(res1 <- complete(m1, id, age))
    # user  system elapsed 
    #18.888   0.000  16.258 
    
    
    system.time({ DT <- as.data.table(m1)
             res2 <- setkey(DT, id, age)[CJ(unique(id), unique(age))]})
    #  user  system elapsed 
    #  0.000   0.000   0.279 
    
    
    
    library(microbenchmark)
    jeremy <- function() complete(m1, id, age)
    akrun <- function() {DT <- as.data.table(m1)
       setkey(DT, id, age)[CJ(unique(id), unique(age))]}
    
    microbenchmark(jeremy(), akrun(), times=20L, unit='relative')
    #Unit: relative
    #   expr      min       lq   mean   median       uq      max neval cld
    #jeremy() 24.95042 30.84234 17.138 23.09175 12.16891 8.305394    20   b
    # akrun()  1.00000  1.00000  1.000  1.00000  1.00000 1.000000    20  a 
    
    0 讨论(0)
  • 2021-01-06 04:04

    Using tidyr, this is a one liner. You use the complete function, which creates rows with each combination of the columns passed to it, filling the rest of the rows with NA:

    library(tidyr)
    complete(m, id, age)
    
    Source: local data frame [18 x 3]
    
          id   age    IQ
       (dbl) (dbl) (dbl)
    1      1     2     3
    2      1     3     4
    3      1     4     5
    4      1     5     4
    5      1     6    NA
    6      1     8    NA
    7      2     2    NA
    8      2     3     6
    9      2     4    NA
    10     2     5    NA
    11     2     6     5
    12     2     8    NA
    13     3     2     3
    14     3     3    NA
    15     3     4    NA
    16     3     5     8
    17     3     6    NA
    18     3     8    10
    
    0 讨论(0)
提交回复
热议问题