问题
I have a dataset similar to this (real one is way bigger). It is in long format and I need to change it to wide format with one row per id. My problem is that there are a lot of different combinations of time, drug, unit and admin. Only a combination of time, drug, unit and admin will be unique and should only occur once pr id. I could not find a solution to this. I would like R to create unique combinations of columns so the data can be transformed to wide format. I have tried
melt.data.table(df, id.vars=c(id,time,drug,unit,admin), measure.vars = c(dose), na.rm=F)
and also a combination with
%>% expand(nesting(time, drug, unit, admin, dose), id)
but it doesn't work. Here is mock data:
id<-c(1492,1492,1492,1492,1493,1493)
time<-c("Pre-bypass","Post-bypass","Total","Post-bypass","Pre-OP","Pre-OP")
drug<-c("ACE","LEVO","LEVO","MIL","BB","BC")
unit<-c(NA,"ml/hr","ml","mg",NA,NA)
admin<-c(NA, "IV","IV","Inhale",NA,NA)
dose<-c(NA,50,40,5,NA,NA)
df<-rbind(id,time,drug,unit,admin,dose)
df<-t(df)
df<-as.data.table(df)
I would like my output to be something like this (the reason for the TRUE in Pre.bypass.Ace.unitNA.adminNA and Pre.OP columns is that dose and unit is missing here but because it is listed it is given in standard dose and unit:
id.new<-c(1492,1493)
Pre.OP.BB.unitNA.adminNA<-c(NA,TRUE)
Pre.OP.BC.unitNA.adminNA<-c(NA,TRUE)
Total.LEVO.ml.h.IV<-c(40,NA)
Pre.bypass.Ace.unitNA.adminNA<-c(TRUE,NA)
Post.bypass.LEVO.ml.h.IV<-c(50,NA)
Post.bypass.MIL.ml.h.IV<-c(5,NA)
df.new<-rbind(id.new,Post.bypass.MIL.ml.h.IV,Pre.OP.BB.unitNA.adminNA,Pre.OP.BC.unitNA.adminNA,Total.LEVO.ml.h.IV,Pre.bypass.Ace.unitNA.adminNA,Post.bypass.LEVO.ml.h.IV)
df.new<-t(df.new)
回答1:
I agree with the comments that long format is usually the better way to go. If you have to use wide format the using the tidyr
package you can do the following:
library(tidyr)
df %>%
unite(combination, time, drug, unit, admin) %>%
spread(key = combination, value = dose)
回答2:
library(data.table)
id <- c(1492, 1492, 1492, 1492, 1493, 1493)
time <- c("Pre-bypass", "Post-bypass", "Total", "Post-bypass", "Pre-OP", "Pre-OP")
drug <- c("ACE", "LEVO", "LEVO", "MIL", "BB", "BC")
unit <- c(NA, "ml/hr", "ml", "mg", NA, NA)
admin <- c(NA, "IV", "IV", "Inhale", NA, NA)
dose <- c(NA, 50, 40, 5, NA, NA)
df <- rbind(id, time, drug, unit, admin, dose)
df <- t(df)
df <- as.data.table(df)
df
#> id time drug unit admin dose
#> 1: 1492 Pre-bypass ACE NA NA NA
#> 2: 1492 Post-bypass LEVO ml/hr IV 50
#> 3: 1492 Total LEVO ml IV 40
#> 4: 1492 Post-bypass MIL mg Inhale 5
#> 5: 1493 Pre-OP BB NA NA NA
#> 6: 1493 Pre-OP BC NA NA NA
Convert to wide with data.table package function dcast
data.table::dcast(df, id ~ ..., value.var = "dose")
#> id Post-bypass_LEVO_ml/hr_IV Post-bypass_MIL_mg_Inhale
#> 1: 1492 50 5
#> 2: 1493 NA NA
#> Pre-OP_BB_NA_NA Pre-OP_BC_NA_NA Pre-bypass_ACE_NA_NA Total_LEVO_ml_IV
#> 1: NA NA NA 40
#> 2: NA NA NA NA
来源:https://stackoverflow.com/questions/49909940/long-to-wide-format-with-several-duplicates-circumvent-with-unique-combo-of-col