问题
Given this :
URL u=new URL("someURL");
How do i identify the top level domain of the URL..
回答1:
So you want to have the top-level domain part only?
//parameter urlString: a String
//returns: a String representing the TLD of urlString, or null iff urlString is malformed
private String getTldString(String urlString) {
URL url = null;
String tldString = null;
try {
url = new URL(urlString);
String[] domainNameParts = url.getHost().split("\\.");
tldString = domainNameParts[domainNameParts.length-1];
}
catch (MalformedURLException e) {
}
return tldString;
}
Let's test it!
@Test
public void identifyLocale() {
String ukString = "http://www.amazon.co.uk/Harry-Potter-Sheet-Complete-Series/dp/0739086731";
logger.debug("ukString TLD: {}", getTldString(ukString));
String deString = "http://www.amazon.de/The-Essential-George-Gershwin/dp/B00008GEOT";
logger.debug("deString TLD: {}", getTldString(deString));
String ceShiString = "http://例子.测试";
logger.debug("ceShiString TLD: {}", getTldString(ceShiString));
String dokimeString = "http://παράδειγμα.δοκιμή";
logger.debug("dokimeString TLD: {}", getTldString(dokimeString));
String nullString = null;
logger.debug("nullString TLD: {}", getTldString(nullString));
String lolString = "lol, this is a malformed URL, amirite?!";
logger.debug("lolString TLD: {}", getTldString(lolString));
}
Output:
ukString TLD: uk
deString TLD: de
ceShiString TLD: 测试
dokimeString TLD: δοκιμή
nullString TLD: null
lolString TLD: null
回答2:
The host part of the url conforms to RFC 2732 according to the docs. It would imply that simply splitting the string you get from
String host = u.getHost();
would not be enough. You will need to ensure that you conform to the RFC 2732 when searching the host OR if you can guarantee that all addresses are of the form server.com then you can search for the last . in the string and grab the tld.
回答3:
Guava provides a nice utility for this. It works as follow:
InternetDomainName.from("someurl.co.uk").publicSuffix()
will get you co.uk
InternetDomainName.from("someurl.de").publicSuffix()
will get you de
回答4:
Use URL#getHost() and if necessary thereafter a String#split() on "\\."
.
Update: if you actually have an IP address as host, then you need to make use of InetAddress#getHostName() independently.
来源:https://stackoverflow.com/questions/2141224/how-to-identify-the-top-level-domain-of-a-url-object-using-java