Is it possible to select all unique values from a column of a data.frame
using select
function in dplyr
library?
Something like \
The dplyr
select
function selects specific columns from a data frame. To return unique values in a particular column of data, you can use the group_by
function. For example:
library(dplyr)
# Fake data
set.seed(5)
dat = data.frame(x=sample(1:10,100, replace=TRUE))
# Return the distinct values of x
dat %>%
group_by(x) %>%
summarise()
x
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
If you want to change the column name you can add the following:
dat %>%
group_by(x) %>%
summarise() %>%
select(unique.x=x)
This both selects column x
from among all the columns in the data frame that dplyr
returns (and of course there's only one column in this case) and changes its name to unique.x
.
You can also get the unique values directly in base R
with unique(dat$x)
.
If you have multiple variables and want all unique combinations that appear in the data, you can generalize the above code as follows:
set.seed(5)
dat = data.frame(x=sample(1:10,100, replace=TRUE),
y=sample(letters[1:5], 100, replace=TRUE))
dat %>%
group_by(x,y) %>%
summarise() %>%
select(unique.x=x, unique.y=y)