I am trying to create a new variable with unique counts of string values from two different columns. So I have something like this, for example:
# A tibble: 4 x
Here is a method using tidyverse
without looping
library(tidyverse)
df1 %>%
mutate(partners = str_c(names, partners, sep=", ")) %>%
separate_rows(partners) %>%
distinct %>%
count(names) %>%
right_join(df1)
# A tibble: 4 x 3
# names n partners
# <fct> <int> <fct>
#1 John 4 Mary, Ashley, John, Kate
#2 Mary 3 Charlie, John, Mary, John
#3 Charlie 3 Kate, Marcy
#4 David 3 Mary, Claire
There's another way with toString
.
dat$uniquecounts <- sapply(strsplit(apply(dat, 1, toString), ", "),
function(x) length(unique(x)))
dat
# names partners uniquecounts
# 1 John Mary, Ashley, John, Kate 4
# 2 Mary Charlie, John, Mary, John 3
# 3 Charlie Kate, Marcy 3
# 4 David Mary, Claire 3
Withtidyverse
, first convert factor columns to character, use map2
and split partners
to individual vector of strings and then count unique values combining with names
using n_distinct
.
library(tidyverse)
df %>%
mutate_all(as.character) %>%
mutate(uniquecounts = map2_dbl(names, partners,
~ n_distinct(c(.x, str_split(.y, ", ")[[1]]))))
# names partners uniquecounts
#1 John Mary, Ashley, John, Kate 4
#2 Mary Charlie, John, Mary, John 3
#3 Charlie Kate, Marcy 3
#4 David Mary, Claire 3
With same logic in base R
df[] <- lapply(df, as.character)
as.numeric(mapply(function(x, y) length(unique(c(x, y))),
df$names, strsplit(df$partners, ", ")))
#[1] 4 3 3 3