问题
I'm new to R, and I'm pretty sure this is something simple to accomplish, but I cannot figure out how to perform this action. I've tried the split function, utilizing a for loop, but cannot quite figure out how to get it right. As an example, this is what my original data frame looks like:
dat <- data.frame(col1 = c(rep("red", 4), rep("blue", 3)), col2 = c(1, 3, 2, 4, 7, 8, 9))
col1 col2
red 1
red 3
red 2
red 4
blue 7
blue 8
blue 9
I want to create new columns for each unique value in col1 and assign it's corressponding value in col2 to the new data frame. And this is how I want my new data frame:
red blue
1 7
3 8
2 9
4 NA
I've gotten close with a list structure close to what I wanted, but I need a data frame to boxplot and dotplot the results. Any help would be appriciated. Thanks!
回答1:
I'm sure there's a more efficient solution, but here's one option
dat <- data.frame(col1 = c(rep("red", 4), rep("blue", 3)), col2 = c(1, 3, 2, 4, 7, 8, 9))
dat
col1 col2
1 red 1
2 red 3
3 red 2
4 red 4
5 blue 7
6 blue 8
7 blue 9
ust <- unstack(dat, form = col2 ~ col1)
res <- data.frame(sapply(ust, '[', 1:max(unlist(lapply(ust, length)))))
res
blue red
1 7 1
2 8 3
3 9 2
4 NA 4
Edit: If you want the column order red then blue
res[, c("red", "blue")]
red blue
1 1 7
2 3 8
3 2 9
4 4 NA
回答2:
Here's an Hadleyverse possible solution
library(tidyr)
library(dplyr)
dat %>%
group_by(col1) %>%
mutate(n = row_number()) %>%
spread(col1, col2)
# Source: local data frame [4 x 3]
#
# n blue red
# 1 1 7 1
# 2 2 8 3
# 3 3 9 2
# 4 4 NA 4
Or using data.table
library(data.table)
dcast(setDT(dat)[, indx := 1:.N, by = col1], indx ~ col1, value.var = "col2")
# indx blue red
# 1: 1 7 1
# 2: 2 8 3
# 3: 3 9 2
# 4: 4 NA 4
回答3:
Just to show another option using base R *apply
and cbind
# split the data into list using col1 column
tmp.list = lapply(split(dat, dat$col1), function(x) x$col2)
# identify the length of the biggest list
max.length = max(sapply(tmp.list, length))
# combine the list elements, while filling NA for the missing values
data.frame(do.call(cbind,
lapply(tmp.list, function(x) c(x, rep(NA, max.length - length(x))))
))
# blue red
#1 7 1
#2 8 3
#3 9 2
#4 NA 4
来源:https://stackoverflow.com/questions/32236327/create-new-data-frame-with-column-names-from-unique-values-in-another-data-frame