Perl Regex to get the root domain of a URL

后端 未结 6 1912
不思量自难忘°
不思量自难忘° 2021-01-13 17:40

How could I get some part of url?

For example:

http://www.facebook.com/xxxxxxxxxxx
http://www.stackoverflow.com/yyyyyyyyyyyyyyyy

I

6条回答
  •  不知归路
    2021-01-13 18:01

    I like the URI answer. The OP requested a regex, so in honor of the request and as a challenge, here is the answer I came up with. To be fair, sometimes it is not easy or feasible to install a CPAN modules. I have worked on some projects that are hardened using a very specific version of Perl and only certain modules are allowed.

    Here is my attempt at the regex answer. Note that the www. is optional. Sub-domains like mobile. are honored. The search for / is not greedy therefore a URL with directories on the end will be parsed correctly. I am not dependent on the protocol; it could be http, https, file, sftp whatever. The output is captured in $1.

    ^.*://(?:[wW]{3}\.)?([^:/]*).*$
    

    Sample input:

    http://WWW.facebook.com:80/
    http://facebook.com/xxxxxxxxxxx/aaaaa
    http://www.stackoverflow.com/yyyyyyyyyyyyyyyy/aaaaaaa
    https://mobile.yahoo.com/yyyyyyyyyyyyyyyy/aaaaaaa
    http://www.theregister.co.uk/
    

    Sample output:

    facebook.com
    facebook.com
    stackoverflow.com
    mobile.yahoo.com
    theregister.co.uk
    

    EDIT: Thanks @ikegami for the extra challenge. :) Now it supports WWW in any mixed case and a port number like :80.

提交回复
热议问题