Group values with identical ID into columns without summerizing them in R

后端 未结 2 680
栀梦
栀梦 2021-01-21 20:25

I have a dataframe that looks like this, but with a lot more Proteins

Protein      z
  Irak4  -2.46
  Irak4  -0.13
    Itk  -0.49
    Itk   4.22
    Itk  -0.51
          


        
相关标签:
2条回答
  • 2021-01-21 20:43
    library(data.table)
    
    dcast(setDT(df),rowid(Protein)~Protein,value.var='z')
    
       Protein Irak4   Itk  Ras
    1:       1 -2.46 -0.49 1.53
    2:       2 -0.13  4.22   NA
    3:       3    NA -0.51   NA
    

    in base R you can do:

    data.frame(sapply(a<-unstack(df,z~Protein),`length<-`,max(lengths(a))))
      Irak4   Itk  Ras
    1 -2.46 -0.49 1.53
    2 -0.13  4.22   NA
    3    NA -0.51   NA
    

    Or using reshape:

    reshape(transform(df,gr=ave(z,Protein,FUN=seq_along)),v.names = 'z',timevar = 'Protein',idvar = 'gr',dir='wide') 
      gr z.Irak4 z.Itk z.Ras
    1  1   -2.46 -0.49  1.53
    2  2   -0.13  4.22    NA
    5  3      NA -0.51    NA
    
    0 讨论(0)
  • 2021-01-21 21:06

    Here is an option with tidyverse

    library(tidyverse)
    DF %>% 
      group_by(Protein) %>% 
      mutate(idx = row_number()) %>% 
      spread(Protein, z) %>% 
      select(-idx)
    # A tibble: 3 x 3
    #   Irak4   Itk   Ras
    #   <dbl> <dbl> <dbl>
    #1  -2.46 -0.49  1.53
    #2  -0.13  4.22 NA   
    #3  NA    -0.51 NA 
    

    Before we spread the data, we need to create unique identifiers.


    In base R you could use unstack first which will give you a named list of vectors that contain the values in the z column.

    Use lapply to iterate over that list and append the vectors with NAs using the `length<-` function in order to have a list of vectors with equal lengths. Then we can call data.frame.

    lst <- unstack(DF, z ~ Protein)
    data.frame(lapply(lst, `length<-`, max(lengths(lst))))
    #  Irak4   Itk  Ras
    #1 -2.46 -0.49 1.53
    #2 -0.13  4.22   NA
    #3    NA -0.51   NA
    

    data

    DF <- structure(list(Protein = c("Irak4", "Irak4", "Itk", "Itk", "Itk", 
    "Ras"), z = c(-2.46, -0.13, -0.49, 4.22, -0.51, 1.53)), .Names = c("Protein", 
    "z"), class = "data.frame", row.names = c(NA, -6L))
    
    0 讨论(0)
提交回复
热议问题