Perl Regex to get the root domain of a URL

后端未结

关注

 6  1909

How could I get some part of url?

For example:

http://www.facebook.com/xxxxxxxxxxx
http://www.stackoverflow.com/yyyyyyyyyyyyyyyy

相关标签:

6条回答

一向

2021-01-13 18:00

$a="http://www.stackoverflow.com/yyyyyyyyyyyyyyyy";
if($a=~/\/\/\w+\.(.*)\// )
{   print $1; }
else
{ print "false";  }

0 讨论(0)

不知归路

2021-01-13 18:01
I like the URI answer. The OP requested a regex, so in honor of the request and as a challenge, here is the answer I came up with. To be fair, sometimes it is not easy or feasible to install a CPAN modules. I have worked on some projects that are hardened using a very specific version of Perl and only certain modules are allowed.

Here is my attempt at the regex answer. Note that the www. is optional. Sub-domains like mobile. are honored. The search for / is not greedy therefore a URL with directories on the end will be parsed correctly. I am not dependent on the protocol; it could be http, https, file, sftp whatever. The output is captured in $1.
```
^.*://(?:[wW]{3}\.)?([^:/]*).*$
```
Sample input:
```
http://WWW.facebook.com:80/
http://facebook.com/xxxxxxxxxxx/aaaaa
http://www.stackoverflow.com/yyyyyyyyyyyyyyyy/aaaaaaa
https://mobile.yahoo.com/yyyyyyyyyyyyyyyy/aaaaaaa
http://www.theregister.co.uk/
```
Sample output:
```
facebook.com
facebook.com
stackoverflow.com
mobile.yahoo.com
theregister.co.uk
```
EDIT: Thanks @ikegami for the extra challenge. :) Now it supports WWW in any mixed case and a port number like :80.
0 讨论(0)
发布评论:

提交评论
- 加载中...

死守一世寂寞

2021-01-13 18:14

Just some simple regex stuff.

$facebook = "www.facebook.com/xxxxxxxxxxx";

$facebook =~ s/www\.(.*\.com).*/$1/; # get what is between www. and .com

print $facebook;

Returns

facebook.com

You may also want to make this work for .net, .org, etc. Something like:

s/www\.(.*\.(?:net|org|com)).*/$1/;

0 讨论(0)

星月不相逢

2021-01-13 18:16

I found a way:

my @urls = qw( http://www.facebook.com http://www.sadas.com/ );
for my $url (@urls) {
   $url =~ s/^https?:(?:www\.)?//ig;
   $url =~ s{/.*}{};
   print "$url\n";
}

0 讨论(0)

庸人自扰

2021-01-13 18:18

This Might be helpful...

^https?:\/\/www\.([\da-zA-Z\.-]+)

Sample Input:

http://www.banglanews24.com/detailsnews.php
nssl=763daee77dc90b1c1baf0a361be2ff3c&nttl=20130416072403189462

http://www.prothom-alo.com/detail/date/2013-04-20/news/3463

http://www.facebook.com/xxxxxxxxxxx

http://www.stackoverflow.com/yyyyyyyyyyyyyyy

Sample output:

banglanews24.com

prothom-alo.com

facebook.com

stackoverflow.com

0 讨论(0)

悲哀的现实

2021-01-13 18:24

use feature qw( say state );

use Domain::PublicSuffix qw( );
use URI                  qw( );

# Returns "domain.tld" for "subdomain.domain.tld". 
# Handles multi-level TLDs such as ".co.uk".
sub root_domain {
   my ($domain) = @_;
   state $parser = Domain::PublicSuffix->new();
   return $parser->get_root_domain($domain);
}

# Accepts urls as strings and as URI objects.
sub url_root_domain {
   my ($abs_url) = @_;
   my $domain = URI->new($abs_url)->host();
   return root_domain($domain);
}

say url_root_domain('http://www.facebook.com/');       # facebook.com
say url_root_domain('https://www.facebook.com/');      # facebook.com
say url_root_domain('http://mobile.google.com/');      # google.com
say url_root_domain('http://www.theregister.co.uk/');  # theregister.co.uk
say url_root_domain('http://www.com/');                # www.com

0 讨论(0)