问题
New to R. I would like to create a test by creating a variable (yes/no) that checks to see if first name OR last name fuzzy match to email address. If so, append a 'yes' variable to that row.
Data Example:
id firstname lastname email address match
1 patrick boyles patrickb@gmail.com yes
2 zeke cosmos zeke@gmail.com yes
3 foo foo abcd@gmail.com no
I understand that I need to use agrep. What confuses me is how to tell R to check 2 columns (first name and last name) and only check within that row.
Thanks -The newbie
回答1:
Here is something to start with
library(stringdist) # install.packages("stringdist") b4, if you need to
df <- read.table(header = TRUE, text = "id firstname lastname emailaddress match
1 patrick boyles patrickb@gmail.com yes
2 zeke cosmos zeke@gmail.com yes
3 foo foo abcd@gmail.com no")
df$match2 <- ifelse(with(df, stringdist(a = paste0(firstname, lastname),
b = sub("(.*)@.*", "\\1", emailaddress),
method = "lcs")) <= 7,
"yes", "no")
df
# id firstname lastname email.address match match2
# 1 1 patrick boyles patrickb@gmail.com yes yes
# 2 2 zeke cosmos zeke@gmail.com yes yes
# 3 3 foo foo abcd@gmail.com no no
来源:https://stackoverflow.com/questions/25169422/r-multiple-fuzzy-match-agrep-create-variable