How to identify the top level domain of a URL object using java?

半城伤御伤魂 提交于 2019-12-22 05:14:48

问题


Given this :

URL u=new URL("someURL");

How do i identify the top level domain of the URL..


回答1:


So you want to have the top-level domain part only?

//parameter urlString: a String
//returns: a String representing the TLD of urlString, or null iff urlString is malformed
private String getTldString(String urlString) {
    URL url = null;
    String tldString = null;
    try {
        url = new URL(urlString);
        String[] domainNameParts = url.getHost().split("\\.");
        tldString = domainNameParts[domainNameParts.length-1];
    }
    catch (MalformedURLException e) {   
    }

    return tldString;
}

Let's test it!

@Test 
public void identifyLocale() {
    String ukString = "http://www.amazon.co.uk/Harry-Potter-Sheet-Complete-Series/dp/0739086731";
    logger.debug("ukString TLD: {}", getTldString(ukString));

    String deString = "http://www.amazon.de/The-Essential-George-Gershwin/dp/B00008GEOT";
    logger.debug("deString TLD: {}", getTldString(deString));

    String ceShiString = "http://例子.测试";
    logger.debug("ceShiString TLD: {}", getTldString(ceShiString));

    String dokimeString = "http://παράδειγμα.δοκιμή";
    logger.debug("dokimeString TLD: {}", getTldString(dokimeString));

    String nullString = null;
    logger.debug("nullString TLD: {}", getTldString(nullString));

    String lolString = "lol, this is a malformed URL, amirite?!";
    logger.debug("lolString TLD: {}", getTldString(lolString));

}

Output:

ukString TLD: uk
deString TLD: de
ceShiString TLD: 测试
dokimeString TLD: δοκιμή
nullString TLD: null
lolString TLD: null



回答2:


The host part of the url conforms to RFC 2732 according to the docs. It would imply that simply splitting the string you get from

  String host = u.getHost();

would not be enough. You will need to ensure that you conform to the RFC 2732 when searching the host OR if you can guarantee that all addresses are of the form server.com then you can search for the last . in the string and grab the tld.




回答3:


Guava provides a nice utility for this. It works as follow:

InternetDomainName.from("someurl.co.uk").publicSuffix() will get you co.uk InternetDomainName.from("someurl.de").publicSuffix() will get you de




回答4:


Use URL#getHost() and if necessary thereafter a String#split() on "\\.".

Update: if you actually have an IP address as host, then you need to make use of InetAddress#getHostName() independently.



来源:https://stackoverflow.com/questions/2141224/how-to-identify-the-top-level-domain-of-a-url-object-using-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!