How to normalize a URL in Java?

后端 未结 8 1114
孤独总比滥情好
孤独总比滥情好 2020-12-09 01:27

URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization p

相关标签:
8条回答
  • 2020-12-09 02:23

    Have you taken a look at the URI class?

    http://docs.oracle.com/javase/7/docs/api/java/net/URI.html#normalize()

    0 讨论(0)
  • 2020-12-09 02:29

    In Java, normalize parts of a URL

    Example of a URL: https://i0.wp.com:55/lplresearch.com/wp-content/feb.png?ssl=1&myvar=2#myfragment

    protocol:        https 
    domain name:     i0.wp.com 
    subdomain:       i0 
    port:            55 
    path:            /lplresearch.com/wp-content/uploads/2019/01/feb.png?ssl=1 
    query:           ?ssl=1" 
    parameters:      &myvar=2 
    fragment:        #myfragment 
    

    Code to do the URL parsing:

    import java.util.*; 
    import java.util.regex.*; 
    public class regex { 
        public static String getProtocol(String the_url){ 
            Pattern p = Pattern.compile("^(http|https|smtp|ftp|file|pop)://.*"); 
            Matcher m = p.matcher(the_url); 
            return m.group(1); 
        } 
        public static String getParameters(String the_url){ 
            Pattern p = Pattern.compile(".*(\\?[-a-zA-Z0-9_.@!$&''()*+,;=]+)(#.*)*$");
            Matcher m = p.matcher(the_url); 
            return m.group(1); 
        } 
        public static String getFragment(String the_url){ 
            Pattern p = Pattern.compile(".*(#.*)$"); 
            Matcher m = p.matcher(the_url); 
            return m.group(1); 
        } 
        public static void main(String[] args){ 
            String the_url = 
                "https://i0.wp.com:55/lplresearch.com/" + 
                "wp-content/feb.png?ssl=1&myvar=2#myfragment"; 
            System.out.println(getProtocol(the_url)); 
            System.out.println(getFragment(the_url)); 
            System.out.println(getParameters(the_url)); 
        }   
    } 
    

    Prints

    https
    #myfragment
    ?ssl=1&myvar=2
    

    You can then push and pull on the parts of the URL until they are up to muster.

    0 讨论(0)
提交回复
热议问题