问题
I have a data frame which I need to transform. I need to change the rows into unique columns based on the value of a column.
ex:
The Input DataFrame
| column_1 | column_2 |
-----------------------
| A | B |
| A | C |
| B | E |
| B | C |
| C | F |
| C | G |
The Output DataFrame
| column_1 | column_2 | column_3 |
----------------------------------
| A | B | C |
| B | E | C |
| C | F | G |
The final DataFrame should have all the unique values in column_1 and the values from column_2 from input DataFrame will be added as new columns in new DataFrame i.e. Column_2 and Column_3.
I have tried to use reshape and melt packages in R but I am getting erroneous data frame.
回答1:
We can use the dplyr
and cSplit
function from the splitstackshape
. It will also work for cases when there are more than two values per group.
library(dplyr)
library(splitstackshape)
dt2 <- dt %>%
group_by(column_1) %>%
summarise(column_2 = toString(column_2)) %>%
cSplit("column_2") %>%
setNames(paste0("column_", 1:ncol(.)))
dt2
column_1 column_2 column_3
1: A B C
2: B E C
3: C F G
Data
dt <- data.frame(column_1 = c("A", "A", "B", "B", "C", "C"),
column_2 = c("B", "C", "E", "C", "F", "G"),
stringsAsFactors = FALSE)
回答2:
Assuming there are always 2 lines for each value in column_1.
Extract the first line for each column_1 element in a first data.table, then the last line in a second data.table, finally merge them into a new data.table
library(data.table)
df <- data.frame(column_1=c("A","A","B","B","C","C"),column_2=c("B","C","E","C","F","G"))
df <- as.data.table(df)
setkey(df,column_1)
first_part <- df[J(unique(column_1)), mult = "first"]
second_part <- df[J(unique(column_1)), mult = "last"]
setnames(second_part,"column_2","column_3")
new_df <- merge(first_part,second_part, by="column_1")
column_1 column_2 column_3
1: A B C
2: B E C
3: C F G
回答3:
Here's a short solution with dplyr
and tidyr
:
library(dplyr)
library(tidyr)
df %>% mutate(col = c("column_2","column_3")[duplicated(column_1)+1]) %>%
spread(col,column_2)
# column_1 column_2 column_3
# 1 A B C
# 2 B E C
# 3 C F G
And a general solution :
df <- data.frame(column_1 = c("A", "A", "B", "B", "C", "C","A","B","C"),
column_2 = c("B", "C", "E", "C", "F", "G","X","Y","Z"),
stringsAsFactors = FALSE)
df %>% group_by(column_1) %>%
mutate(col=paste0("column_",row_number()+1)) %>%
spread(col,column_2) %>% ungroup
# # A tibble: 3 x 4
# column_1 column_2 column_3 column_4
# * <chr> <chr> <chr> <chr>
# 1 A B C X
# 2 B E C Y
# 3 C F G Z
来源:https://stackoverflow.com/questions/46191387/how-to-transform-a-dataframes-row-into-columns-in-r