Regular Expression in Base R Regex to identify email address

前端 未结 3 623
星月不相逢
星月不相逢 2020-12-19 21:32

I am trying to use the stringr library to extract emails from a big, messy file.

str_match doesn\'t allow perl=TRUE, and I can\'t figure out the escape characters t

相关标签:
3条回答
  • 2020-12-19 21:44

    Actually, I'd recommend a longer regex, since the solutions above allow for an email like test@test.com. with a trailing dot.

    isMail <- function(x){
       grepl("^[[:alnum:]._-]+@[[:alnum:].-]+$", x))
    }
    
    0 讨论(0)
  • 2020-12-19 21:57
    > "^[[:alnum:].-_]+@[[:alnum:].-]+$"->regex
    > str_match(emails, regex)
         [,1]                   
    [1,] "larry@gmail.com"      
    [2,] "larry-sally@sally.com"
    [3,] "larry@sally.larry.com"
    

    The @-sign is not in need of escaping in regex. And "." and "-" are not special in character classes. If you want to add a requirement for ".com",".co", ".edu", ".org" then you should specify how complete that list needs to be.

    As pointed out by M42, this is not a surefire method. In fact it is claimed that there is no sure-fire method: Using a regular expression to validate an email address

    0 讨论(0)
  • 2020-12-19 21:57

    I found this regex worked better for me:

    ^[[:alnum:]._-]+@[[:alnum:].-]+$
    

    Dash does have a special meaning in a character class unless it is the last character. It is a range operator, as in "A-Z"

    0 讨论(0)
提交回复
热议问题