regex needed to strip out domain name

后端未结

关注

 5  1019

终归单人心

I need a regexp to strip out just the domain name part of a url. So for example if i had the following url:

http://www.website-2000.com

the bit I\'d want the

相关标签:

5条回答

不要未来只要你来

2020-12-20 01:55

This one should work. There might be some faults with it, but none that I can think of right now. If anyone want to improve on it, feel free to do so.

/http:\/\/(?:www\.)?([a-z0-9\-]+)(?:\.[a-z\.]+[\/]?).*/i

http:\/\/            matches the "http://" part
(?:www\.)?           is a non-capturing group that matches zero or one "www."
([a-z0-9\-]+)        is a capturing group that matches character ranges a-z, 0-9
                     in addition to the hyphen. This is what you wanted to extract.
(?:\.[a-z\.]+[\/]?)  is a non-capturing group that matches the TLD part (i.e. ".com",
                     ".co.uk", etc) in addition to zero or one "/"
.*                   matches the rest of the url

http://rubular.com/r/ROz13NSWBQ

0 讨论(0)

滥情空心

2020-12-20 02:01

http://wwww.([^/]+)

~~No need to use regexp, use the urlparse module~~

~~>>> from urlparse import urlparse >>> '.'.join(urlparse("http://www.website-2000.com").netloc.split('.')[-2:]) 'website-2000.com'~~

0 讨论(0)

盖世英雄少女心

2020-12-20 02:02
This one allows you not to have to worry about any of the http/https/ftp etc... in front and also captures all your subdomains too.
```
(?:www\.)?([a-z0-9\-.]+)(?:\.[a-z\.]+[\/]?).*/i
```
The only times it fails that I've found are: - If a . precedes the domain/subdomain without any text before it, the . is included in the regex capture. - Emails with . in them will not work. (fix this by checking passed domain first for the @ symbol before running through regex) - Whitespace in the middle of the domain/subdomain
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2020-12-20 02:02
```
r/^[^:]+:\/\/[^/?#]+//
```
This worked for me.

It will match any scheme or protocol and then after the :// matches any character that's not a / ? or #. These three characters, when they first occur in a URL, signal the end of the domain so that's were I end the match.
0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2020-12-20 02:12

Let me introduce you this wonderful tool txt2re: regular expression generator

Here you can experiment with regex and generate code in many languages.

0 讨论(0)
发布评论:

提交评论
- 加载中...