问题
How do I change the following table from:
Type Name Answer n
TypeA Apple Yes 5
TypeA Apple No 10
TypeA Apple DK 8
TypeA Apple NA 20
TypeA Orange Yes 6
TypeA Orange No 11
TypeA Orange DK 8
TypeA Orange NA 23
Change to:
Type Name Yes No DK NA
TypeA Apple 5 10 8 20
TypeA Orange 6 11 8 23
I used the following codes to get the first table.
df_1 <-
df %>%
group_by(Type, Name, Answer) %>%
tally()
Then I tried to use the spread command to get to the 2nd table, but I got the following error message:
"Error: All columns must be named"
df_2 <- spread(df_1, Answer)
回答1:
Following on the comment from ayk, I'm providing an example. It looks to me like when you have a data_frame with a column of either a factor or character class that has values of NA, this cannot be spread without either removing them or re-classifying the data. This is specific to a data_frame (note the dplyr class with the underscore in the name), as this works in my example when you have values of NA in a data.frame. For example, a slightly modified version of the example above:
Here is the dataframe
library(dplyr)
library(tidyr)
df_1 <- data_frame(Type = c("TypeA", "TypeA", "TypeB", "TypeB"),
Answer = c("Yes", "No", NA, "No"),
n = 1:4)
df_1
Which gives a data_frame that looks like this
Source: local data frame [4 x 3]
Type Answer n
(chr) (chr) (int)
1 TypeA Yes 1
2 TypeA No 2
3 TypeB NA 3
4 TypeB No 4
Then, when we try to tidy it, we get an error message:
df_1 %>% spread(key=Answer, value=n)
Error: All columns must be named
But if we remove the NA's then it 'works':
df_1 %>%
filter(!is.na(Answer)) %>%
spread(key=Answer, value=n)
Source: local data frame [2 x 3]
Type No Yes
(chr) (int) (int)
1 TypeA 2 1
2 TypeB 4 NA
However, removing the NAs may not give you the desired result: i.e. you might want those to be included in your tidied table. You could modify the data directly to change the NAs to a more descriptive value. Alternatively, you could change your data to a data.frame and then it spreads just fine:
as.data.frame(df_1) %>% spread(key=Answer, value=n)
Type No Yes NA
1 TypeA 2 1 NA
2 TypeB 4 NA 3
回答2:
I think only tidyr is needed to get from df_1
to df_2
.
library(magrittr)
df_1 <- read.csv(text="Type,Name,Answer,n\nTypeA,Apple,Yes,5\nTypeA,Apple,No,10\nTypeA,Apple,DK,8\nTypeA,Apple,NA,20\nTypeA,Orange,Yes,6\nTypeA,Orange,No,11\nTypeA,Orange,DK,8\nTypeA,Orange,NA,23", stringsAsFactors=F)
df_2 <- df_1 %>%
tidyr::spread(key=Answer, value=n)
Output:
Type Name DK No Yes NA
1 TypeA Apple 8 10 5 20
2 TypeA Orange 8 11 6 23
来源:https://stackoverflow.com/questions/34684925/how-to-use-the-spread-function-properly-in-tidyr