How to create missing value for repeated measurement data?

扶醉桌前 提交于 2019-12-03 21:12:04

问题


I have a data set that not every subject’s observations were observed at the exact same time points, but I want to turn it in to a data set that every one’s observations were observed at the exact same time points (so that I can use it in SAS proc traj).

For example, suppose I have dataset "m":

id   <- c(1,1,1,1,2,2,3,3,3)
age  <- c(2,3,4,5,3,6,2,5,8)
IQ   <- c(3,4,5,4,6,5,3,8,10)
m    <- data.frame(id,age,IQ)
> m
  id age IQ
1  1   2  3
2  1   3  4
3  1   4  5
4  1   5  4
5  2   3  6
6  2   6  5
7  3   2  3
8  3   5  8
9  3   8 10
> unique(age)
[1] 2 3 4 5 6 8

I want to turn m to m2. But I can only do that manually.

id2   <- c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)
age2  <- c(2,3,4,5,6,8,2,3,4,5,6,8,2,3,4,5,6,8) 
IQ2   <- c(3,4,5,4,NA,NA,6,5,NA,NA,NA,NA,3,8,10,NA,NA,NA) 
m2    <- data.frame(id2,age2,IQ2)    
m2
> m2
   id2 age2 IQ2
1    1    2   3
2    1    3   4
3    1    4   5
4    1    5   4
5    1    6  NA
6    1    8  NA
7    2    2   6
8    2    3   5
9    2    4  NA
10   2    5  NA
11   2    6  NA
12   2    8  NA
13   3    2   3
14   3    3   8
15   3    4  10
16   3    5  NA
17   3    6  NA
18   3    8  NA

Does anyone know a smarter way to do this?


回答1:


Using tidyr, this is a one liner. You use the complete function, which creates rows with each combination of the columns passed to it, filling the rest of the rows with NA:

library(tidyr)
complete(m, id, age)

Source: local data frame [18 x 3]

      id   age    IQ
   (dbl) (dbl) (dbl)
1      1     2     3
2      1     3     4
3      1     4     5
4      1     5     4
5      1     6    NA
6      1     8    NA
7      2     2    NA
8      2     3     6
9      2     4    NA
10     2     5    NA
11     2     6     5
12     2     8    NA
13     3     2     3
14     3     3    NA
15     3     4    NA
16     3     5     8
17     3     6    NA
18     3     8    10



回答2:


We could do this using data.table. We convert the data.frame to data.table (setDT(m)), set the key columns (setkey) and join with the cross-join of unique elements of 'id' and 'age'

library(data.table)
setkey(setDT(m), id, age)[CJ(unique(id), unique(age))]
#    id age IQ
# 1:  1   2  3
# 2:  1   3  4
# 3:  1   4  5
# 4:  1   5  4
# 5:  1   6 NA
# 6:  1   8 NA
# 7:  2   2 NA
# 8:  2   3  6
# 9:  2   4 NA
#10:  2   5 NA
#11:  2   6  5
#12:  2   8 NA
#13:  3   2  3
#14:  3   3 NA
#15:  3   4 NA
#16:  3   5  8
#17:  3   6 NA
#18:  3   8 10

In the devel version i.e. v1.9.5, we can use unique=TRUE within CJ (from @Frank's comment)

setDT(m, key=c('id', 'age'))[CJ(id, age, unique=TRUE)]

Benchmarks

set.seed(24)
m1 <- data.frame(id=rep(1:10000, each=10), age=sample(2:400, 10000*10, 
         replace=TRUE), IQ=rnorm(10000*10))
system.time(res1 <- complete(m1, id, age))
# user  system elapsed 
#18.888   0.000  16.258 


system.time({ DT <- as.data.table(m1)
         res2 <- setkey(DT, id, age)[CJ(unique(id), unique(age))]})
#  user  system elapsed 
#  0.000   0.000   0.279 



library(microbenchmark)
jeremy <- function() complete(m1, id, age)
akrun <- function() {DT <- as.data.table(m1)
   setkey(DT, id, age)[CJ(unique(id), unique(age))]}

microbenchmark(jeremy(), akrun(), times=20L, unit='relative')
#Unit: relative
#   expr      min       lq   mean   median       uq      max neval cld
#jeremy() 24.95042 30.84234 17.138 23.09175 12.16891 8.305394    20   b
# akrun()  1.00000  1.00000  1.000  1.00000  1.00000 1.000000    20  a 


来源:https://stackoverflow.com/questions/32654706/how-to-create-missing-value-for-repeated-measurement-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!