Regex to extract top level domain from email address

后端 未结 4 592
一向
一向 2021-01-27 14:01

From email address like

xxx@site.co.uk
xxx@site.uk
xxx@site.me.uk

I want to write a regex which should return \'uk\' is all the cases.

<
相关标签:
4条回答
  • 2021-01-27 14:18

    The regex to extract what you are asking for is:

    \.([^.\n\s]*)$  with /gm modifiers
    

    explanation:

        \. matches the character . literally
    1st Capturing group ([^.\n\s]*)
        [^.\n\s]* match a single character not present in the list below
            Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
            . the literal character .
            \n matches a fine-feed (newline) character (ASCII 10)
            \s match any white space character [\r\n\t\f ]
    $ assert position at end of a line
    m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
    g modifier: global. All matches 
    

    for your input example, it will be:

    import re
    m = re.compile(r'\.([^.\n\s]*)$', re.M)                                             
    f = re.findall(m, data)                                                             
    print f 
    

    output:

    ['uk', 'uk', 'uk']
    

    hope this helps.

    0 讨论(0)
  • 2021-01-27 14:21

    Simply .*\.(\w+) won't help?

    Can add more validations for "@" to the regular expression if needed.

    0 讨论(0)
  • 2021-01-27 14:34

    As myemail@com is a valid address, you can use:

    @.*([^.]+)$
    
    0 讨论(0)
  • 2021-01-27 14:38

    You don't need regex. This would always give you 'uk' in your examples:

    >>> url = 'foo@site.co.uk'
    >>> url.split('.')[-1]
    'uk'
    
    0 讨论(0)
提交回复
热议问题