Complicated reshaping

后端 未结 8 1419
南方客
南方客 2020-12-25 14:01

I want to reshape my dataframe from long to wide format and I loose some data that I\'d like to keep. For the following example:

df <- data.frame(Par1 =          


        
8条回答
  •  孤城傲影
    2020-12-25 14:46

    Late to the party, but here's another alternative using data.table:

    require(data.table)
    dt <- data.table(df, key=c("Par1", "Par2"))
    dt[, list(pre=mean(Val[Type == "pre"]), 
              post=mean(Val[Type == "post"]), 
              pre.num=length(Val[Type == "pre"]), 
              post.num=length(Val[Type == "post"]), 
              ParD = paste(ParD, collapse="_")), 
    by=list(Par1, Par2)]
    
    #    Par1 Par2 pre post pre.num post.num        ParD
    # 1:    A    D  10   20       1        1     foo_bar
    # 2:    B    E  30   40       1        1     baz_qux
    # 3:    C    F  50   65       1        2 bla_xyz_meh
    

    [from Matthew] +1 Some minor improvements to save repeating the same ==, and to demonstrate local variables inside j.

    dt[, list(pre=mean(Val[.pre <- Type=="pre"]),     # save .pre
              post=mean(Val[.post <- Type=="post"]),  # save .post
              pre.num=sum(.pre),                      # reuse .pre
              post.num=sum(.post),                    # reuse .post
              ParD = paste(ParD, collapse="_")), 
    by=list(Par1, Par2)]
    
    #    Par1 Par2 pre post pre.num post.num        ParD
    # 1:    A    D  10   20       1        1     foo_bar
    # 2:    B    E  30   40       1        1     baz_qux
    # 3:    C    F  50   65       1        2 bla_xyz_meh
    
    dt[, { .pre <- Type=="pre"                  # or save .pre and .post up front 
           .post <- Type=="post"
           list(pre=mean(Val[.pre]), 
                post=mean(Val[.post]),
                pre.num=sum(.pre),
                post.num=sum(.post), 
                ParD = paste(ParD, collapse="_")) }
    , by=list(Par1, Par2)]
    
    #    Par1 Par2 pre post pre.num post.num        ParD
    # 1:    A    D  10   20       1        1     foo_bar
    # 2:    B    E  30   40       1        1     baz_qux
    # 3:    C    F  50   65       1        2 bla_xyz_meh
    

    And if a list column is ok rather than a paste, then this should be faster :

    dt[, { .pre <- Type=="pre"
           .post <- Type=="post"
           list(pre=mean(Val[.pre]), 
                post=mean(Val[.post]),
                pre.num=sum(.pre),
                post.num=sum(.post), 
                ParD = list(ParD)) }     # list() faster than paste()
    , by=list(Par1, Par2)]
    
    #    Par1 Par2 pre post pre.num post.num        ParD
    # 1:    A    D  10   20       1        1     foo,bar
    # 2:    B    E  30   40       1        1     baz,qux
    # 3:    C    F  50   65       1        2 bla,xyz,meh
    

提交回复
热议问题