Spread with duplicate identifiers for rows [duplicate]

独自空忆成欢 提交于 2019-11-29 10:25:38

In order for spread to work as intended, the resulting data frame must have uniquely identified rows and columns. In the case of your data, the "date" column is the only unique identifier after spreading. However, rows 36 and 38 are identical:

         date tmin state
36 2018-01-03   -3    OH
38 2018-01-03   -3    OH

This puts tidyr in the impossible position of trying to resolve two data points to the same row and column. In addition, rows 35 and 37 both have the same date and state, once again creating the impossible situation of placing two different values into the same position in the new data frame:

         date tmin state
35 2018-01-03   NA    UT
37 2018-01-03   22    UT

The following data cleanup will make spreading possible:

df %>% 
  filter(!is.na(tmin)) %>% # remove NA values
  unique %>% # remove duplicated rows
  spread(state, tmin)

         date OH UT
1  2018-01-02 -4 24
2  2018-01-03 -3 22
3  2018-01-04 11 19
4  2018-01-05  3 23
5  2018-01-06  0 29
...
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!