R: Generate a dummy variable based on the existence of one column' value in another column

问题

I have a data frame like this:

A                    B          
2012,2013,2014     2011
2012,2013,2014     2012
2012,2013,2014     2013
2012,2013,2014     2014
2012,2013,2014     2015

I wanted to create a dummy variable, which indicates whether the value in column B exists in column A. 1 indicates the existence, and 0 indicates non-existant. Such that,

A                    B       dummy        
2012,2013,2014     2011        0
2012,2013,2014     2012        1
2012,2013,2014     2013        1
2012,2013,2014     2014        1
2012,2013,2014     2015        0

I have tried to use %in% to achieve this:

df$dummy <- ifelse(df$B %in% df$A, 1, 0)

but it turned out that everything in the column of dummy is 1.

Same situation happened when I tried to use another method any():

df$dummy <- any(df$A==df$B)

everything in the column of dummy is TRUE.

Is there an efficient way to generate this dummy variable?

Many thanks!

回答1:

It looks like column A is a string of numbers separated by commas, so %in% would not be appropriate (it would be helpful if, for example, you checked for B inside a vector of multiple strings, or numbers if A and B were numeric). If your data frame structure is different, please let me know (and feel free to edit your question).

You probably could accomplish this multiple ways. Perhaps an easy way is to use grepl one row at a time to identify if column B is present in A.

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(dummy = +grepl(B, A))

Output

# A tibble: 5 x 3
  A              B     dummy
  <fct>          <fct> <int>
1 2012,2013,2014 2011      0
2 2012,2013,2014 2012      1
3 2012,2013,2014 2013      1
4 2012,2013,2014 2014      1
5 2012,2013,2014 2015      0

Data

df <- data.frame(
  A = c(rep("2012,2013,2014", 5)),
  B = c("2011", "2012", "2013", "2014", "2015")
)

回答2:

If you want to use base R:

df <- data.frame(A = rep("2012,2013,2014", 5), B = c("2011", "2012","2013","2014","2015"))

for(i in 1:nrow(df)){
     df$dummy[i] <- grepl(df$B[i],df$A[i])
}

回答3:

Making a tab-separated file:

A   B          
2012,2013,2014  2011
2012,2013,2014  2012
2012,2013,2014  2013
2012,2013,2014  2014
2012,2013,2014  2015

Here's a way using str_detect from stringr:

read.table('test.txt', header = TRUE) %>% 
  mutate(
    B = as.character(B),
    dummy = case_when(
      str_detect(pattern = B, fixed(A)) ~ '1',
      TRUE ~ '0'
    )
  )

回答4:

Here is another solution using tidyverse. The main problem is that A is being read as a string. My solution first separates each number into different columns, and afterwards compares B to these numbers.

library(tidyverse)

df %>%
  #Separate A into separate numbers
  separate(col = A,
           sep = ",",
           into = c("S1","S2","S3")) %>%
  #Compare B to the new columns and fill dummy
  mutate(dummy = ifelse(B %in% c(S1,S2,S3), 1, 0))

来源：https://stackoverflow.com/questions/60133014/r-generate-a-dummy-variable-based-on-the-existence-of-one-column-value-in-anot

标签

if-statement

dummy-variable

any