Reshaping a data frame with more than one measure variable

前端 未结 3 1857
独厮守ぢ
独厮守ぢ 2020-12-14 17:33

I\'m using a data frame similar to this one:

df<-data.frame(student=c(rep(1,5),rep(2,5)), month=c(1:5,1:5),  
      quiz1p1=seq(20,20.9,0.1),quiz1p2=seq(3         


        
相关标签:
3条回答
  • 2020-12-14 18:08

    There was a very similar question asked about half a year ago, in which I wrote the following function:

    melt.wide = function(data, id.vars, new.names) {
      require(reshape2)
      require(stringr)
      data.melt = melt(data, id.vars=id.vars)
      new.vars = data.frame(do.call(
        rbind, str_extract_all(data.melt$variable, "[0-9]+")))
      names(new.vars) = new.names
      cbind(data.melt, new.vars)
    }
    

    You can use the function to "melt" your data as follows:

    dfL <-melt.wide(df, id.vars=1:2, new.names=c("Quiz", "Part"))
    head(dfL)
    #   student month variable value Quiz Part
    # 1       1     1  quiz1p1  20.0    1    1
    # 2       1     2  quiz1p1  20.1    1    1
    # 3       1     3  quiz1p1  20.2    1    1
    # 4       1     4  quiz1p1  20.3    1    1
    # 5       1     5  quiz1p1  20.4    1    1
    # 6       2     1  quiz1p1  20.5    1    1
    tail(dfL)
    #    student month variable value Quiz Part
    # 35       1     5  quiz2p2  90.4    2    2
    # 36       2     1  quiz2p2  90.5    2    2
    # 37       2     2  quiz2p2  90.6    2    2
    # 38       2     3  quiz2p2  90.7    2    2
    # 39       2     4  quiz2p2  90.8    2    2
    # 40       2     5  quiz2p2  90.9    2    2
    

    Once the data are in this form, you can much more easily use dcast() to get whatever form you desire. For example

    head(dcast(dfL, student + month + Quiz ~ Part))
    #   student month Quiz    1    2
    # 1       1     1    1 20.0 30.0
    # 2       1     1    2 80.0 90.0
    # 3       1     2    1 20.1 30.1
    # 4       1     2    2 80.1 90.1
    # 5       1     3    1 20.2 30.2
    # 6       1     3    2 80.2 90.2
    
    0 讨论(0)
  • 2020-12-14 18:16

    I think this does what you want:

    #Break variable into two columns, one for the quiz and one for the part of the quiz
    dfL <- transform(dfL, quiz = substr(variable, 1,5), 
                     part = substr(variable, 6,7))
    
    #Adjust your dcast call:
    dcast(dfL, student + month + quiz ~ part)
    #-----
       student month  quiz   p1   p2
    1        1     1 quiz1 20.0 30.0
    2        1     1 quiz2 80.0 90.0
    3        1     2 quiz1 20.1 30.1
    ...
    18       2     4 quiz2 80.8 90.8
    19       2     5 quiz1 20.9 30.9
    20       2     5 quiz2 80.9 90.9
    
    0 讨论(0)
  • 2020-12-14 18:20

    Here's how you could do this with reshape(), from base R:

    df2 <- reshape(df, direction="long",
                   idvar = 1:2, varying = list(c(3,5), c(4,6)),
                   v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))
    
    ## Checking the output    
    rbind(head(df2, 3), tail(df2, 3))
    #           student month  time   p1   p2
    # 1.1.quiz1       1     1 quiz1 20.0 30.0
    # 1.2.quiz1       1     2 quiz1 20.1 30.1
    # 1.3.quiz1       1     3 quiz1 20.2 30.2
    # 2.3.quiz2       2     3 quiz2 80.7 90.7
    # 2.4.quiz2       2     4 quiz2 80.8 90.8
    # 2.5.quiz2       2     5 quiz2 80.9 90.9
    

    You can also use column names (instead of column numbers) for idvar and varying. It's more verbose, but seems like better practice to me:

    ## The same operation as above, using just column *names*
    df2 <- reshape(df, direction="long", idvar=c("student", "month"),
                   varying = list(c("quiz1p1", "quiz2p1"), 
                                  c("quiz1p2", "quiz2p2")), 
                   v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))
    
    0 讨论(0)
提交回复
热议问题