How to detect email addresses within arbitrary strings

后端 未结 1 917
予麋鹿
予麋鹿 2021-01-16 12:10

I\'m using the following code to detect an email in the string. It works fine except dealing with email having pure number prefix, such as \"536264846@gmail.com\". Is it pos

1条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-16 12:27

    What I did in the past:

    • tokenize the input, e.g., separate tokens using spaces (since most other common separators may be valid within an email). However, this may not be necessary if the regular expression is not anchored - but not sure how it would work without the "^" and "$" anchors (which I added to what was shown on the web site).

    • keep in mind that addresses may take the form '"string"' as well as just address

    • in each token, look for '@', as it's probably the best indicator you have that its an email address

    • run the token through the regular expression shown on this Email Detector comparison site (I found in testing that the one marked #1 as of 3/21/2013 worked best)

    What I did was put the regular expression in a text file, so I didn't need to escape it:

    ^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))\x22))(?:.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))\x22)))@(?:(?:(?!.[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+).){1,126}){1,}(?:(?:[a-z][a-z0-9])|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+))|(?:[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.[a-f0-9][:]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))]))$

    Defined an ivar:

    NSRegularExpression *reg
    

    Created the regular expression:

    NSString *fullPath = [[NSBundle mainBundle] pathForResource:@"EMailRegExp" ofType:@"txt"];
    NSString *pattern = [NSString stringWithContentsOfFile:fullPath encoding:NSUTF8StringEncoding error:NULL];
    NSError *error = nil;
    reg = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionCaseInsensitive error:&error];
    assert(reg && !error);
    

    Then wrote a method to do the comparison:

    - (BOOL)isValidEmail:(NSString *)string
    {
        NSTextCheckingResult *match = [reg firstMatchInString:string options:0 range:NSMakeRange(0, [string length])];
        return match ? YES : NO;
    }
    

    EDIT: I've turned the above into a project on github

    EDIT2: for an alterate, less rigorous but faster, see the comment section of this question

    0 讨论(0)
提交回复
热议问题