问题
Continuing from my previous post, I am now having 1 more column of ID values that I need to use to pivot rows into columns.
NUM <- c(1,2,3,1,2,3,1,2,3,1)
ID <- c("DJ45","DJ45","DJ45","DJ46","DJ46","DJ46","DJ47","DJ47","DJ47","DJ48")
Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D")
Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4)
df1 <- data.frame(ID,NUM,Type,Points)
df1:
+------+-----+------+--------+
| ID | Num | Type | Points |
+------+-----+------+--------+
| DJ45 | 1 | A | 9.2 |
| DJ45 | 2 | F | 60.8 |
| DJ45 | 3 | C | 22.9 |
| DJ46 | 1 | B | 1012.7 |
| DJ46 | 2 | D | 18.7 |
| DJ46 | 3 | A | 11.1 |
| DJ47 | 1 | E | 67.2 |
| DJ47 | 2 | C | 63.1 |
| DJ47 | 3 | F | 16.7 |
| DJ48 | 1 | D | 58.4 |
+------+-----+------+--------+
My desired output is
+------+-----+------+--------+------+------+------+------+
| ID | Num | A | B | C | D | E | F |
+------+-----+------+--------+------+------+------+------+
| DJ45 | 1 | 9.2 | N/A | N/A | N/A | N/A | N/A |
| DJ45 | 2 | N/A | N/A | N/A | N/A | N/A | 60.8 |
| DJ45 | 3 | N/A | N/A | 22.9 | N/A | N/A | N/A |
| DJ46 | 1 | N/A | 1012.7 | N/A | N/A | N/A | N/A |
| DJ46 | 2 | N/A | N/A | N/A | 18.7 | N/A | N/A |
| DJ46 | 3 | 11.1 | N/A | N/A | N/A | N/A | N/A |
| DJ47 | 1 | N/A | N/A | N/A | N/A | 67.2 | N/A |
| DJ47 | 2 | N/A | N/A | 63.1 | N/A | N/A | N/A |
| DJ47 | 3 | N/A | N/A | N/A | N/A | N/A | 16.7 |
| DJ48 | 1 | N/A | N/A | N/A | 58.4 | N/A | N/A |
+------+-----+------+--------+------+------+------+------+
I am using spread
function in R but getting errors saying duplicate identifiers. This is because I have 2 columns now (ID & NUM) instead of one (NUM) that I had previously. Please let me know how I could do this.
回答1:
Not knowing what you've tried, I would suggest:
spread(df1, Type, Points)
# ID NUM A B C D E F
# 1 DJ45 1 9.2 NA NA NA NA NA
# 2 DJ45 2 NA NA NA NA NA 60.8
# 3 DJ45 3 NA NA 22.9 NA NA NA
# 4 DJ46 1 NA 1012.7 NA NA NA NA
# 5 DJ46 2 NA NA NA 18.7 NA NA
# 6 DJ46 3 11.1 NA NA NA NA NA
# 7 DJ47 1 NA NA NA NA 67.2 NA
# 8 DJ47 2 NA NA 63.1 NA NA NA
# 9 DJ47 3 NA NA NA NA NA 16.7
# 10 DJ48 1 NA NA NA 58.4 NA NA
If you are getting an error about duplicate identifiers, it is because the combination of "ID" and "Num" in your actual data have one or more duplicate entries (in your sample data, they don't).
If that is the case, you need to add another column to make them unique.
Adding dplyr
into the chain, it might be something like:
df1 %>%
group_by(ID, NUM) %>%
mutate(id2 = sequence(n())) %>%
spread(Type, Points)
Demo of assumed error:
df2 <- rbind(df1, df1[1:3, ]) ## Duplicate the first three rows
spread(df2, Type, Points)
# Error: Duplicate identifiers for rows (1, 11), (3, 13), (2, 12)
library(dplyr)
df2 %>%
group_by(ID, NUM) %>%
mutate(id2 = sequence(n())) %>%
spread(Type, Points)
# Source: local data frame [13 x 9]
#
# ID NUM id2 A B C D E F
# 1 DJ45 1 1 9.2 NA NA NA NA NA
# 2 DJ45 1 2 9.2 NA NA NA NA NA
# 3 DJ45 2 1 NA NA NA NA NA 60.8
# 4 DJ45 2 2 NA NA NA NA NA 60.8
# 5 DJ45 3 1 NA NA 22.9 NA NA NA
# 6 DJ45 3 2 NA NA 22.9 NA NA NA
# 7 DJ46 1 1 NA 1012.7 NA NA NA NA
# 8 DJ46 2 1 NA NA NA 18.7 NA NA
# 9 DJ46 3 1 11.1 NA NA NA NA NA
# 10 DJ47 1 1 NA NA NA NA 67.2 NA
# 11 DJ47 2 1 NA NA 63.1 NA NA NA
# 12 DJ47 3 1 NA NA NA NA NA 16.7
# 13 DJ48 1 1 NA NA NA 58.4 NA NA
来源:https://stackoverflow.com/questions/29952517/r-pivoting-using-spread-function