问题
I am trying to use the stringr library to extract emails from a big, messy file.
str_match doesn't allow perl=TRUE, and I can't figure out the escape characters to get it to work.
Can someone recommend a relatively robust regex that would work in the context below?
c("larry@gmail.com", "larry-sally@sally.com", "larry@sally.larry.com")->emails
"SomeRegex"->regex
str_match(emails, regex)
回答1:
> "^[[:alnum:].-_]+@[[:alnum:].-]+$"->regex
> str_match(emails, regex)
[,1]
[1,] "larry@gmail.com"
[2,] "larry-sally@sally.com"
[3,] "larry@sally.larry.com"
The @-sign is not in need of escaping in regex. And "." and "-" are not special in character classes. If you want to add a requirement for ".com",".co", ".edu", ".org" then you should specify how complete that list needs to be.
As pointed out by M42, this is not a surefire method. In fact it is claimed that there is no sure-fire method: Using a regular expression to validate an email address
回答2:
I found this regex worked better for me:
^[[:alnum:]._-]+@[[:alnum:].-]+$
Dash does have a special meaning in a character class unless it is the last character. It is a range operator, as in "A-Z"
回答3:
Actually, I'd recommend a longer regex, since the solutions above allow for an email like test@test.com.
with a trailing dot.
isMail <- function(x){
grepl("^[[:alnum:]._-]+@[[:alnum:].-]+$", x))
}
来源:https://stackoverflow.com/questions/19341554/regular-expression-in-base-r-regex-to-identify-email-address