问题
I have a data frame like this:
A B
2012,2013,2014 2011
2012,2013,2014 2012
2012,2013,2014 2013
2012,2013,2014 2014
2012,2013,2014 2015
I wanted to create a dummy variable, which indicates whether the value in column B exists in column A. 1 indicates the existence, and 0 indicates non-existant. Such that,
A B dummy
2012,2013,2014 2011 0
2012,2013,2014 2012 1
2012,2013,2014 2013 1
2012,2013,2014 2014 1
2012,2013,2014 2015 0
I have tried to use %in%
to achieve this:
df$dummy <- ifelse(df$B %in% df$A, 1, 0)
but it turned out that everything in the column of dummy
is 1
.
Same situation happened when I tried to use another method any()
:
df$dummy <- any(df$A==df$B)
everything in the column of dummy
is TRUE
.
Is there an efficient way to generate this dummy variable?
Many thanks!
回答1:
It looks like column A
is a string of numbers separated by commas, so %in%
would not be appropriate (it would be helpful if, for example, you checked for B
inside a vector of multiple strings, or numbers if A
and B
were numeric). If your data frame structure is different, please let me know (and feel free to edit your question).
You probably could accomplish this multiple ways. Perhaps an easy way is to use grepl
one row at a time to identify if column B
is present in A
.
library(tidyverse)
df %>%
rowwise() %>%
mutate(dummy = +grepl(B, A))
Output
# A tibble: 5 x 3
A B dummy
<fct> <fct> <int>
1 2012,2013,2014 2011 0
2 2012,2013,2014 2012 1
3 2012,2013,2014 2013 1
4 2012,2013,2014 2014 1
5 2012,2013,2014 2015 0
Data
df <- data.frame(
A = c(rep("2012,2013,2014", 5)),
B = c("2011", "2012", "2013", "2014", "2015")
)
回答2:
If you want to use base R:
df <- data.frame(A = rep("2012,2013,2014", 5), B = c("2011", "2012","2013","2014","2015"))
for(i in 1:nrow(df)){
df$dummy[i] <- grepl(df$B[i],df$A[i])
}
回答3:
Making a tab-separated file:
A B
2012,2013,2014 2011
2012,2013,2014 2012
2012,2013,2014 2013
2012,2013,2014 2014
2012,2013,2014 2015
Here's a way using str_detect
from stringr:
read.table('test.txt', header = TRUE) %>%
mutate(
B = as.character(B),
dummy = case_when(
str_detect(pattern = B, fixed(A)) ~ '1',
TRUE ~ '0'
)
)
回答4:
Here is another solution using tidyverse
. The main problem is that A
is being read as a string. My solution first separates each number into different columns, and afterwards compares B
to these numbers.
library(tidyverse)
df %>%
#Separate A into separate numbers
separate(col = A,
sep = ",",
into = c("S1","S2","S3")) %>%
#Compare B to the new columns and fill dummy
mutate(dummy = ifelse(B %in% c(S1,S2,S3), 1, 0))
来源:https://stackoverflow.com/questions/60133014/r-generate-a-dummy-variable-based-on-the-existence-of-one-column-value-in-anot