问题
There has been questions on this topic before here, but I am still struggling with spreading this. I would like so each state
to have its own column of temperatures values.
Here is a dput()
of my data. I'll call it df
structure(list(date = c("2018-01-21", "2018-01-21", "2018-01-20",
"2018-01-20", "2018-01-19", "2018-01-19", "2018-01-18", "2018-01-18",
"2018-01-17", "2018-01-17", "2018-01-16", "2018-01-16", "2018-01-15",
"2018-01-15", "2018-01-14", "2018-01-14", "2018-01-12", "2018-01-12",
"2018-01-11", "2018-01-11", "2018-01-10", "2018-01-10", "2018-01-09",
"2018-01-09", "2018-01-08", "2018-01-08", "2018-01-07", "2018-01-07",
"2018-01-06", "2018-01-06", "2018-01-05", "2018-01-05", "2018-01-04",
"2018-01-04", "2018-01-03", "2018-01-03", "2018-01-03", "2018-01-03",
"2018-01-02", "2018-01-02"), tmin = c(24, 31, 31, 29, 44, 17,
32, 7, 31, 7, 31, 6, 30, 13, 30, 1, 43, 20, 33, 52, 42, 29, 30,
29, 26, 32, 33, -2, 29, 0, 23, 3, 19, 11, NA, -3, 22, -3, 24,
-4), state = c("UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH",
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT",
"OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH",
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH")), class = "data.frame", row.names = c(NA,
-40L), .Names = c("date", "tmin", "state"))
The code I run is
df %>% spread(state,tmin)
which I expected to give me the following format
date UT OH
... ... ...
but I get the error message
Error: Duplicate identifiers for rows (36, 38), (35, 37)
I have tried a few different things. One thing I tried was grouping by date. I was thinking that rows of the same date were causing a problem for spread
. I also tried making a new row with add_rownames()
then using spread(state,tmin)
, but that also failed to solve the issue.
回答1:
In order for spread
to work as intended, the resulting data frame must have uniquely identified rows and columns. In the case of your data, the "date" column is the only unique identifier after spreading. However, rows 36 and 38 are identical:
date tmin state
36 2018-01-03 -3 OH
38 2018-01-03 -3 OH
This puts tidyr in the impossible position of trying to resolve two data points to the same row and column. In addition, rows 35 and 37 both have the same date and state, once again creating the impossible situation of placing two different values into the same position in the new data frame:
date tmin state
35 2018-01-03 NA UT
37 2018-01-03 22 UT
The following data cleanup will make spreading possible:
df %>%
filter(!is.na(tmin)) %>% # remove NA values
unique %>% # remove duplicated rows
spread(state, tmin)
date OH UT
1 2018-01-02 -4 24
2 2018-01-03 -3 22
3 2018-01-04 11 19
4 2018-01-05 3 23
5 2018-01-06 0 29
...
来源:https://stackoverflow.com/questions/48383605/spread-with-duplicate-identifiers-for-rows