R: Generate a dummy variable based on the existence of one column' value in another column

◇◆丶佛笑我妖孽 提交于 2021-02-08 03:37:16

问题


I have a data frame like this:

A                    B          
2012,2013,2014     2011
2012,2013,2014     2012
2012,2013,2014     2013
2012,2013,2014     2014
2012,2013,2014     2015

I wanted to create a dummy variable, which indicates whether the value in column B exists in column A. 1 indicates the existence, and 0 indicates non-existant. Such that,

A                    B       dummy        
2012,2013,2014     2011        0
2012,2013,2014     2012        1
2012,2013,2014     2013        1
2012,2013,2014     2014        1
2012,2013,2014     2015        0

I have tried to use %in% to achieve this:

df$dummy <- ifelse(df$B %in% df$A, 1, 0)

but it turned out that everything in the column of dummy is 1.

Same situation happened when I tried to use another method any():

df$dummy <- any(df$A==df$B)

everything in the column of dummy is TRUE.

Is there an efficient way to generate this dummy variable?

Many thanks!


回答1:


It looks like column A is a string of numbers separated by commas, so %in% would not be appropriate (it would be helpful if, for example, you checked for B inside a vector of multiple strings, or numbers if A and B were numeric). If your data frame structure is different, please let me know (and feel free to edit your question).

You probably could accomplish this multiple ways. Perhaps an easy way is to use grepl one row at a time to identify if column B is present in A.

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(dummy = +grepl(B, A))

Output

# A tibble: 5 x 3
  A              B     dummy
  <fct>          <fct> <int>
1 2012,2013,2014 2011      0
2 2012,2013,2014 2012      1
3 2012,2013,2014 2013      1
4 2012,2013,2014 2014      1
5 2012,2013,2014 2015      0

Data

df <- data.frame(
  A = c(rep("2012,2013,2014", 5)),
  B = c("2011", "2012", "2013", "2014", "2015")
)



回答2:


If you want to use base R:

df <- data.frame(A = rep("2012,2013,2014", 5), B = c("2011", "2012","2013","2014","2015"))

for(i in 1:nrow(df)){
     df$dummy[i] <- grepl(df$B[i],df$A[i])
}



回答3:


Making a tab-separated file:

A   B          
2012,2013,2014  2011
2012,2013,2014  2012
2012,2013,2014  2013
2012,2013,2014  2014
2012,2013,2014  2015

Here's a way using str_detect from stringr:

read.table('test.txt', header = TRUE) %>% 
  mutate(
    B = as.character(B),
    dummy = case_when(
      str_detect(pattern = B, fixed(A)) ~ '1',
      TRUE ~ '0'
    )
  )



回答4:


Here is another solution using tidyverse. The main problem is that A is being read as a string. My solution first separates each number into different columns, and afterwards compares B to these numbers.

library(tidyverse)

df %>%
  #Separate A into separate numbers
  separate(col = A,
           sep = ",",
           into = c("S1","S2","S3")) %>%
  #Compare B to the new columns and fill dummy
  mutate(dummy = ifelse(B %in% c(S1,S2,S3), 1, 0))


来源:https://stackoverflow.com/questions/60133014/r-generate-a-dummy-variable-based-on-the-existence-of-one-column-value-in-anot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!