问题
In a data frame, I have a column with Y and N values. This data frame also has an id column. I would like to create two columns, one with the total Y count and another with the total N count for each id. I tried doing this procedure with the dplyr summarise function
group_by(id) %>%
summarise(total_not = count(column_y_e_n == "N"),
total_yes = count(column_y_e_n == "Y")
but objected to the error message
Error in summarise_impl(.data, dots)
Any sugestion?
回答1:
Slight variation on original answer from Harro:
library(tidyr)
dfr <- data.frame(
id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
bool = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)
dfrSummary <- dfr %>%
group_by(
id, bool
) %>%
summarize(
count = n()
) %>%
spread(
key = bool,
value = count,
fill = 0
)
回答2:
I replaced the count function with the sum function and got success.
group_by(id) %>%
summarise(total_not = sum(column_y_e_n == "N"),
total_yes = sum(column_y_e_n == "Y")
回答3:
I would approach the problem using group_by and tally(). Or you can skip the middle step and use count directly.
library(tidyverse)
##Fake data
df <- tibble(
id = rep(1:20,each = 10),
column_y_e_n = sapply(1:200, function(i)sample(c("Y", "N"),1))
)
##group_by() + tally()
df_2 <- df %>%
group_by(id, column_y_e_n) %>%
tally() %>%
spread(column_y_e_n, n) %>%
magrittr::set_colnames(c("id", "total_not", "total_yes"))
df_2
#direct method
df_3 <- df %>%
count(id, column_y_e_n) %>%
spread(column_y_e_n, n) %>%
magrittr::set_colnames(c("id", "total_not", "total_yes"))
df_3
The last pipes spread the resulting column and format column names.
回答4:
I usually want to do everything in tidyverse. But in this case the base R solution seems appropriate:
dfr <- data.frame(
id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
column_y_e_n = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)
table(dfr)
gives you:
column_y_e_n
id N Y
1 1 4
2 3 2
3 3 0
来源:https://stackoverflow.com/questions/54719103/logical-value-count-with-summarise-r