Select unique values with 'select' function in 'dplyr' library

后端 未结 3 856
野的像风
野的像风 2021-01-30 10:08

Is it possible to select all unique values from a column of a data.frame using select function in dplyr library? Something like \

3条回答
  •  无人共我
    2021-01-30 10:54

    The dplyr select function selects specific columns from a data frame. To return unique values in a particular column of data, you can use the group_by function. For example:

    library(dplyr)
    
    # Fake data
    set.seed(5)
    dat = data.frame(x=sample(1:10,100, replace=TRUE))
    
    # Return the distinct values of x
    dat %>%
      group_by(x) %>%
      summarise() 
    
        x
    1   1
    2   2
    3   3
    4   4
    5   5
    6   6
    7   7
    8   8
    9   9
    10 10
    

    If you want to change the column name you can add the following:

    dat %>%
      group_by(x) %>%
      summarise() %>%
      select(unique.x=x)
    

    This both selects column x from among all the columns in the data frame that dplyr returns (and of course there's only one column in this case) and changes its name to unique.x.

    You can also get the unique values directly in base R with unique(dat$x).

    If you have multiple variables and want all unique combinations that appear in the data, you can generalize the above code as follows:

    set.seed(5)
    dat = data.frame(x=sample(1:10,100, replace=TRUE), 
                     y=sample(letters[1:5], 100, replace=TRUE))
    
    dat %>% 
      group_by(x,y) %>%
      summarise() %>%
      select(unique.x=x, unique.y=y)
    

提交回复
热议问题