Count unique values across columns in R

前端 未结 3 803
轻奢々
轻奢々 2021-01-27 00:42

I am trying to create a new variable with unique counts of string values from two different columns. So I have something like this, for example:

# A tibble: 4 x         


        
相关标签:
3条回答
  • 2021-01-27 01:24

    Here is a method using tidyverse without looping

    library(tidyverse)
    df1 %>% 
       mutate(partners = str_c(names, partners, sep=", ")) %>%
       separate_rows(partners) %>%
       distinct %>% 
       count(names) %>% 
       right_join(df1)
    # A tibble: 4 x 3
    #  names       n partners                 
    #  <fct>   <int> <fct>                    
    #1 John        4 Mary, Ashley, John, Kate 
    #2 Mary        3 Charlie, John, Mary, John
    #3 Charlie     3 Kate, Marcy              
    #4 David       3 Mary, Claire         
    
    0 讨论(0)
  • 2021-01-27 01:35

    There's another way with toString.

    dat$uniquecounts <- sapply(strsplit(apply(dat, 1, toString), ", "), 
                               function(x) length(unique(x)))
    
    dat
    #     names                  partners uniquecounts
    # 1    John  Mary, Ashley, John, Kate            4
    # 2    Mary Charlie, John, Mary, John            3
    # 3 Charlie               Kate, Marcy            3
    # 4   David              Mary, Claire            3
    
    0 讨论(0)
  • 2021-01-27 01:37

    Withtidyverse, first convert factor columns to character, use map2 and split partners to individual vector of strings and then count unique values combining with names using n_distinct.

    library(tidyverse)
    
    df %>%
      mutate_all(as.character) %>%
      mutate(uniquecounts = map2_dbl(names, partners, 
                           ~ n_distinct(c(.x, str_split(.y, ", ")[[1]]))))
    
    
    #    names                    partners uniquecounts
    #1    John  Mary, Ashley, John, Kate            4
    #2    Mary Charlie, John, Mary, John            3
    #3 Charlie               Kate, Marcy            3
    #4   David              Mary, Claire            3
    

    With same logic in base R

    df[] <- lapply(df, as.character)
    as.numeric(mapply(function(x, y) length(unique(c(x, y))), 
              df$names, strsplit(df$partners, ", ")))
    #[1] 4 3 3 3
    
    0 讨论(0)
提交回复
热议问题