Full Link Extraction using java

后端 未结 2 2002
感情败类
感情败类 2021-01-16 04:45

My goal is to always get the same string (which is the URI in my case) while reading the href property from a link. Example: Suppose think that a html file it have somany li

2条回答
  •  别那么骄傲
    2021-01-16 05:18

    You can do this using a fullworthy HTML parser like Jsoup. There's a Node#absUrl() which does exactly what you want.

    package com.stackoverflow.q3394298;
    
    import java.net.URL;
    
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    
    public class Test {
        
        public static void main(String... args) throws Exception {
            URL url = new URL("https://stackoverflow.com/questions/3394298/");
            Document document = Jsoup.connect(url).get();
            Element link = document.select("a.question-hyperlink").first();
            System.out.println(link.attr("href"));
            System.out.println(link.absUrl("href"));
        }
        
    }
    

    which prints (correctly) the following for the title link of your current question:

    /questions/3394298/full-link-extraction-using-java
    https://stackoverflow.com/questions/3394298/full-link-extraction-using-java
    

    Jsoup may have more other (undiscovered) advantages for your purpose as well.

    Related questions:

    • What are the pros and cons of the leading HTML parsers in Java?

    Update: if you want to select all links in the document, then do as follows:

            Elements links = document.select("a");
            for (Element link : links) {
                System.out.println(link.attr("href"));
                System.out.println(link.absUrl("href"));
            }
    

提交回复
热议问题