How could I get some part of url?
For example:
http://www.facebook.com/xxxxxxxxxxx
http://www.stackoverflow.com/yyyyyyyyyyyyyyyy
I
$a="http://www.stackoverflow.com/yyyyyyyyyyyyyyyy";
if($a=~/\/\/\w+\.(.*)\// )
{ print $1; }
else
{ print "false"; }
I like the URI answer. The OP requested a regex, so in honor of the request and as a challenge, here is the answer I came up with. To be fair, sometimes it is not easy or feasible to install a CPAN modules. I have worked on some projects that are hardened using a very specific version of Perl and only certain modules are allowed.
Here is my attempt at the regex answer. Note that the www.
is optional. Sub-domains like mobile.
are honored. The search for /
is not greedy therefore a URL with directories on the end will be parsed correctly. I am not dependent on the protocol; it could be http, https, file, sftp
whatever. The output is captured in $1
.
^.*://(?:[wW]{3}\.)?([^:/]*).*$
Sample input:
http://WWW.facebook.com:80/
http://facebook.com/xxxxxxxxxxx/aaaaa
http://www.stackoverflow.com/yyyyyyyyyyyyyyyy/aaaaaaa
https://mobile.yahoo.com/yyyyyyyyyyyyyyyy/aaaaaaa
http://www.theregister.co.uk/
Sample output:
facebook.com
facebook.com
stackoverflow.com
mobile.yahoo.com
theregister.co.uk
EDIT: Thanks @ikegami for the extra challenge. :) Now it supports WWW
in any mixed case and a port number like :80
.
Just some simple regex stuff.
$facebook = "www.facebook.com/xxxxxxxxxxx";
$facebook =~ s/www\.(.*\.com).*/$1/; # get what is between www. and .com
print $facebook;
Returns
facebook.com
You may also want to make this work for .net
, .org
, etc. Something like:
s/www\.(.*\.(?:net|org|com)).*/$1/;
I found a way:
my @urls = qw( http://www.facebook.com http://www.sadas.com/ );
for my $url (@urls) {
$url =~ s/^https?:(?:www\.)?//ig;
$url =~ s{/.*}{};
print "$url\n";
}
This Might be helpful...
^https?:\/\/www\.([\da-zA-Z\.-]+)
Sample Input:
http://www.banglanews24.com/detailsnews.php
nssl=763daee77dc90b1c1baf0a361be2ff3c&nttl=20130416072403189462
http://www.prothom-alo.com/detail/date/2013-04-20/news/3463
http://www.facebook.com/xxxxxxxxxxx
http://www.stackoverflow.com/yyyyyyyyyyyyyyy
Sample output:
banglanews24.com
prothom-alo.com
facebook.com
stackoverflow.com
use feature qw( say state );
use Domain::PublicSuffix qw( );
use URI qw( );
# Returns "domain.tld" for "subdomain.domain.tld".
# Handles multi-level TLDs such as ".co.uk".
sub root_domain {
my ($domain) = @_;
state $parser = Domain::PublicSuffix->new();
return $parser->get_root_domain($domain);
}
# Accepts urls as strings and as URI objects.
sub url_root_domain {
my ($abs_url) = @_;
my $domain = URI->new($abs_url)->host();
return root_domain($domain);
}
say url_root_domain('http://www.facebook.com/'); # facebook.com
say url_root_domain('https://www.facebook.com/'); # facebook.com
say url_root_domain('http://mobile.google.com/'); # google.com
say url_root_domain('http://www.theregister.co.uk/'); # theregister.co.uk
say url_root_domain('http://www.com/'); # www.com