Group values with identical ID into columns without summerizing them in R

问题

I have a dataframe that looks like this, but with a lot more Proteins

Protein      z
  Irak4  -2.46
  Irak4  -0.13
    Itk  -0.49
    Itk   4.22
    Itk  -0.51
    Ras   1.53

For further operations I need the data to be grouped by Proteinname into columns like this.

Irak4    Itk    Ras
-2.46  -0.49   1.53
-0.13   4.22     NA
   NA  -0.51     NA

I tried different packages like dplyr or reshape, but did not manage to transform the data into the desired format.

Is there any way to achieve this? I think the missing datapoints for some Proteins are the main problem here.

I am quite new to R, so my apologies if I am missing an obvious solution.

回答1:

Here is an option with tidyverse

library(tidyverse)
DF %>% 
  group_by(Protein) %>% 
  mutate(idx = row_number()) %>% 
  spread(Protein, z) %>% 
  select(-idx)
# A tibble: 3 x 3
#   Irak4   Itk   Ras
#   <dbl> <dbl> <dbl>
#1  -2.46 -0.49  1.53
#2  -0.13  4.22 NA   
#3  NA    -0.51 NA

Before we spread the data, we need to create unique identifiers.

In base R you could use unstack first which will give you a named list of vectors that contain the values in the z column.

Use lapply to iterate over that list and append the vectors with NAs using the `length<-` function in order to have a list of vectors with equal lengths. Then we can call data.frame.

lst <- unstack(DF, z ~ Protein)
data.frame(lapply(lst, `length<-`, max(lengths(lst))))
#  Irak4   Itk  Ras
#1 -2.46 -0.49 1.53
#2 -0.13  4.22   NA
#3    NA -0.51   NA

data

DF <- structure(list(Protein = c("Irak4", "Irak4", "Itk", "Itk", "Itk", 
"Ras"), z = c(-2.46, -0.13, -0.49, 4.22, -0.51, 1.53)), .Names = c("Protein", 
"z"), class = "data.frame", row.names = c(NA, -6L))

回答2:

library(data.table)

dcast(setDT(df),rowid(Protein)~Protein,value.var='z')

   Protein Irak4   Itk  Ras
1:       1 -2.46 -0.49 1.53
2:       2 -0.13  4.22   NA
3:       3    NA -0.51   NA

in base R you can do:

data.frame(sapply(a<-unstack(df,z~Protein),`length<-`,max(lengths(a))))
  Irak4   Itk  Ras
1 -2.46 -0.49 1.53
2 -0.13  4.22   NA
3    NA -0.51   NA

Or using reshape:

reshape(transform(df,gr=ave(z,Protein,FUN=seq_along)),v.names = 'z',timevar = 'Protein',idvar = 'gr',dir='wide') 
  gr z.Irak4 z.Itk z.Ras
1  1   -2.46 -0.49  1.53
2  2   -0.13  4.22    NA
5  3      NA -0.51    NA

来源：https://stackoverflow.com/questions/52394652/group-values-with-identical-id-into-columns-without-summerizing-them-in-r

标签

dataframe

reshape

missing-data