I have a dataframe that looks like this, but with a lot more Proteins
Protein z
Irak4 -2.46
Irak4 -0.13
Itk -0.49
Itk 4.22
Itk -0.51
library(data.table)
dcast(setDT(df),rowid(Protein)~Protein,value.var='z')
Protein Irak4 Itk Ras
1: 1 -2.46 -0.49 1.53
2: 2 -0.13 4.22 NA
3: 3 NA -0.51 NA
in base R you can do:
data.frame(sapply(a<-unstack(df,z~Protein),`length<-`,max(lengths(a))))
Irak4 Itk Ras
1 -2.46 -0.49 1.53
2 -0.13 4.22 NA
3 NA -0.51 NA
Or using reshape:
reshape(transform(df,gr=ave(z,Protein,FUN=seq_along)),v.names = 'z',timevar = 'Protein',idvar = 'gr',dir='wide')
gr z.Irak4 z.Itk z.Ras
1 1 -2.46 -0.49 1.53
2 2 -0.13 4.22 NA
5 3 NA -0.51 NA
Here is an option with tidyverse
library(tidyverse)
DF %>%
group_by(Protein) %>%
mutate(idx = row_number()) %>%
spread(Protein, z) %>%
select(-idx)
# A tibble: 3 x 3
# Irak4 Itk Ras
# <dbl> <dbl> <dbl>
#1 -2.46 -0.49 1.53
#2 -0.13 4.22 NA
#3 NA -0.51 NA
Before we spread
the data, we need to create unique identifiers.
In base R
you could use unstack
first which will give you a named list of vectors that contain the values in the z
column.
Use lapply
to iterate over that list and append the vectors with NA
s using the `length<-`
function in order to have a list of vectors with equal lengths. Then we can call data.frame
.
lst <- unstack(DF, z ~ Protein)
data.frame(lapply(lst, `length<-`, max(lengths(lst))))
# Irak4 Itk Ras
#1 -2.46 -0.49 1.53
#2 -0.13 4.22 NA
#3 NA -0.51 NA
data
DF <- structure(list(Protein = c("Irak4", "Irak4", "Itk", "Itk", "Itk",
"Ras"), z = c(-2.46, -0.13, -0.49, 4.22, -0.51, 1.53)), .Names = c("Protein",
"z"), class = "data.frame", row.names = c(NA, -6L))