Fastest way to reshape variable values as columns

前端 未结 2 1486
春和景丽
春和景丽 2021-02-03 11:17

I have a dataset with about 3 million rows and the following structure:

PatientID| Year | PrimaryConditionGroup
---------------------------------------
1                 


        
2条回答
  •  春和景丽
    2021-02-03 11:26

    There are probably more succinct ways of doing this, but for sheer speed, it's hard to beat a data.table-based solution:

    df <- read.table(text="PatientID Year  PrimaryConditionGroup
    1         Y1    TRAUMA
    1         Y1    PREGNANCY
    2         Y2    SEIZURE
    3         Y1    TRAUMA", header=T)
    
    library(data.table)
    dt <- data.table(df, key=c("PatientID", "Year"))
    
    dt[ , list(TRAUMA =    sum(PrimaryConditionGroup=="TRAUMA"),
               PREGNANCY = sum(PrimaryConditionGroup=="PREGNANCY"),
               SEIZURE =   sum(PrimaryConditionGroup=="SEIZURE")),
       by = list(PatientID, Year)]
    
    #      PatientID Year TRAUMA PREGNANCY SEIZURE
    # [1,]         1   Y1      1         1       0
    # [2,]         2   Y2      0         0       1
    # [3,]         3   Y1      1         0       0
    

    EDIT: aggregate() provides a 'base R' solution that might or might not be more idiomatic. (The sole complication is that aggregate returns a matrix, rather than a data.frame; the second line below fixes that up.)

    out <- aggregate(PrimaryConditionGroup ~ PatientID + Year, data=df, FUN=table)
    out <- cbind(out[1:2], data.frame(out[3][[1]]))
    

    2nd EDIT Finally, a succinct solution using the reshape package gets you to the same place.

    library(reshape)
    mdf <- melt(df, id=c("PatientID", "Year"))
    cast(PatientID + Year ~ value, data=j, fun.aggregate=length)
    

提交回复
热议问题