问题
I have the following data frame which I have obtained from a count. I have used dput
to make the data frame available and then edited the data frame so there is a duplicate of A
.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
print(df)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
Now I would like to take distinct on Procedure and only keep the first A
.
df %>%
distinct(Procedure, .keep_all=TRUE)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
It does not work. Strange...
回答1:
If we print the Procedure
column, we can see that there are duplicated levels for a
, which is problematic for the distinct
function.
df$Procedure
[1] D A A C
Levels: A A C D -1
Warning message:
In print.factor(x) : duplicated level [2] in factor
One way to fix is to drop the factor levels. We can use factor
function to achieve this. Another way is to convert the Procedure
column to character.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
library(tidyverse)
df %>%
mutate(Procedure = factor(Procedure)) %>%
distinct(Procedure, .keep_all=TRUE)
# # A tibble: 3 x 2
# Procedure n
# <fct> <int>
# 1 D 10717
# 2 A 4412
# 3 C 1480
回答2:
You have duplicated value in a label parameter .Label = c("A", "A", "C", "D", "-1"). That is an issue. Btw your way of initializing of a tibble seems to be very strange (i do not know exactly your goal but still)
Why not use
df <- tibble(
Procedure = c("D", "A", "A", "C"),
n = c(10717L, 4412L, 2058L, 1480L)
)
来源:https://stackoverflow.com/questions/54705659/distinct-in-dplyr-does-not-work-sometimes