Pivoting rows into columns

后端 未结 4 1576
我寻月下人不归
我寻月下人不归 2021-02-05 15:57

Suppose (to simplify) I have a table containing some control vs. treatment data:

Which, Color, Response, Count
Control, Red, 2, 10
Control, Blue, 3, 20
Treatment         


        
相关标签:
4条回答
  • 2021-02-05 16:12

    The cast function from the reshape package (not to be confused with the reshape function in base R) can do this and many other things. See here: http://had.co.nz/reshape/

    0 讨论(0)
  • 2021-02-05 16:15

    To add to the options (many years later)....

    The typical approach in base R would involve the reshape function (which is generally unpopular because of the multitude of arguments that take time to master). It's a pretty efficient function for smaller datasets, but doesn't always scale well.

    reshape(mydf, direction = "wide", idvar = "Color", timevar = "Which")
    #   Color Response.Control Count.Control Response.Treatment Count.Treatment
    # 1   Red                2            10                  1              14
    # 2  Blue                3            20                  4              21
    

    Already covered are cast/dcast from the "reshape" and "reshape2" (and now, dcast.data.table from "data.table", especially useful when you have large datasets). But also from the Hadleyverse, there's "tidyr", which works nicely with the "dplyr" package:

    library(tidyr)
    library(dplyr)
    mydf %>%
      gather(var, val, Response:Count) %>%  ## make a long dataframe
      unite(RN, var, Which) %>%             ## combine the var and Which columns
      spread(RN, val)                       ## make the results wide
    #   Color Count_Control Count_Treatment Response_Control Response_Treatment
    # 1  Blue            20              21                3                  4
    # 2   Red            10              14                2                  1
    

    Also to note would be that in a forthcoming version of "data.table", the dcast.data.table function should be able to handle this without having to first melt your data.

    The data.table implementation of dcast allows you to convert multiple columns to a wide format without melting it first, as follows:

    library(data.table)
    dcast(as.data.table(mydf), Color ~ Which, value.var = c("Response", "Count"))
    #    Color Response_Control Response_Treatment Count_Control Count_Treatment
    # 1:  Blue                3                  4            20              21
    # 2:   Red                2                  1            10              14
    
    0 讨论(0)
  • 2021-02-05 16:18

    Reshape does indeed work for pivoting a skinny data frame (e.g., from a simple SQL query) to a wide matrix, and is very flexible, but it's slow. For large amounts of data, very very slow. Fortunately, if you only want to pivot to a fixed shape, it's fairly easy to write a little C function to do the pivot fast.

    In my case, pivoting a skinny data frame with 3 columns and 672,338 rows took 34 seconds with reshape, 25 seconds with my R code, and 2.3 seconds with C. Ironically, the C implementation was probably easier to write than my (tuned for speed) R implementation.

    Here's the core C code for pivoting floating point numbers. Note that it assumes that you have already allocated a correctly sized result matrix in R before calling the C code, which causes the R-devel folks to shudder in horror:

    #include <R.h> 
    #include <Rinternals.h> 
    /* 
     * This mutates the result matrix in place.
     */
    SEXP
    dtk_pivot_skinny_to_wide(SEXP n_row  ,SEXP vi_1  ,SEXP vi_2  ,SEXP v_3  ,SEXP result)
    {
       int ii, max_i;
       unsigned int pos;
       int nr = *INTEGER(n_row);
       int * aa = INTEGER(vi_1);
       int * bb = INTEGER(vi_2);
       double * cc = REAL(v_3);
       double * rr = REAL(result);
       max_i = length(vi_2);
       /*
        * R stores matrices by column.  Do ugly pointer-like arithmetic to
        * map the matrix to a flat vector.  We are translating this R code:
        *    for (ii in 1:length(vi.2))
        *       result[((n.row * (vi.2[ii] -1)) + vi.1[ii])] <- v.3[ii]
        */
       for (ii = 0; ii < max_i; ++ii) {
          pos = ((nr * (bb[ii] -1)) + aa[ii] -1);
          rr[pos] = cc[ii];
          /* printf("ii: %d \t value: %g \t result index:  %d \t new value: %g\n", ii, cc[ii], pos, rr[pos]); */
       }
       return(result);
    }
    
    0 讨论(0)
  • 2021-02-05 16:28

    Using the reshape package.

    First, melt your data.frame:

    x <- melt(df) 
    

    Then cast:

    dcast(x, Color ~ Which + variable)
    

    Depending on which version of the reshape package you're working with it could be cast() (reshape) or dcast() (reshape2)

    Voila.

    0 讨论(0)
提交回复
热议问题