Regex to extract top level domain from email address

后端未结

关注

 4  592

From email address like

xxx@site.co.uk
xxx@site.uk
xxx@site.me.uk

I want to write a regex which should return \'uk\' is all the cases.

相关标签:

4条回答

时光说笑

2021-01-27 14:18

The regex to extract what you are asking for is:

\.([^.\n\s]*)$  with /gm modifiers

explanation:

    \. matches the character . literally
1st Capturing group ([^.\n\s]*)
    [^.\n\s]* match a single character not present in the list below
        Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
        . the literal character .
        \n matches a fine-feed (newline) character (ASCII 10)
        \s match any white space character [\r\n\t\f ]
$ assert position at end of a line
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
g modifier: global. All matches

for your input example, it will be:

import re
m = re.compile(r'\.([^.\n\s]*)$', re.M)                                             
f = re.findall(m, data)                                                             
print f

output:

['uk', 'uk', 'uk']

hope this helps.

0 讨论(0)

渐次进展

2021-01-27 14:21

Simply .*\.(\w+) won't help?

Can add more validations for "@" to the regular expression if needed.

0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2021-01-27 14:34
As myemail@com is a valid address, you can use:
```
@.*([^.]+)$
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
轮回少年

2021-01-27 14:38
You don't need regex. This would always give you 'uk' in your examples:
```
>>> url = 'foo@site.co.uk'
>>> url.split('.')[-1]
'uk'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...