I need to be able to identify a domain name of any subdomain.
Examples:
For all of thiese I need to match only example.co
/ example.com
If you want an absolutely correct matcher, regular expressions are not the way to go.
Why?
Because both of these are valid domains + TLDs: goo.gl
, t.co
.
Because neither of these are (they're only TLDs): com.au
, co.uk
.
Any regex that you might create that would properly handle all of the above cases would simply amount to listing out the valid TLDs, which would defeat the purpose of using regular expressions in the first place.
Instead, just create/obtain a list of the current TLDs and see which one of them is present, then add the first segment before it.
This will match:
([0-9A-Za-z]{2,}\.[0-9A-Za-z]{2,3}\.[0-9A-Za-z]{2,3}|[0-9A-Za-z]{2,}\.[0-9A-Za-z]{2,3})$
as long as:
Bassically what it does is match any of these two:
Short version:
(\w{2,}\.\w{2,3}\.\w{2,3}|\w{2,}\.\w{2,3})$
If you want it to only match whole lines, then add ^ at the beginning
This is how I tested it:
Might this be of any use. This separates them into a dot notation.
Then it is a simple matter of splitting it.
[^/:"].[^/:"]