Jsoup: how to get an image's absolute url?

后端 未结 4 1552
难免孤独
难免孤独 2020-12-01 02:42

Is there a way in jsoup to extract an image absolute url, much like one can get a link\'s absolute url?

Consider the following image element found in http://ww

相关标签:
4条回答
  • 2020-12-01 03:14

    Let's assume you are parsing http://www.example.com/index.html.

    Use jsoup to extract the img src which gives you: images/chicken.jpg

    You can then use the URI class to resolve this to an absolute path:

    URL url  = new URL("http://www.example.com/index.html");
    URI uri = url.toURI();
    System.out.println(uri.resolve("images/chicken.jpg").toString());
    

    prints

    http://www.example.com/images/chicken.jpg
    
    0 讨论(0)
  • 2020-12-01 03:30

    It might be inside a div class so the code would be like this (as example only)

    System.out.println(doc.select("div.ClassName image").attr(src));
    
    0 讨论(0)
  • 2020-12-01 03:36

    Once you have the image element, e.g.:

    Element image = document.select("img").first();
    String url = image.absUrl("src");
    // url = http://www.example.com/images/chicken.jpg
    

    Alternatively:

    String url = image.attr("abs:src");
    

    Jsoup has a builtin absUrl() method on all nodes to resolve an attribute to an absolute URL, using the base URL of the node (which could be different from the URL the document was retrieved from).

    See also the Working with URLs jsoup documentation.

    0 讨论(0)
  • 2020-12-01 03:36
    Document doc = Jsoup.connect("www.abc.com").get();
    Elements img = doc.getElementsByTag("img");
    for (Element el : img) {
        String src = el.absUrl("src");
        System.out.println("Image Found!");
        System.out.println("src attribute is : "+src);
        getImages(src);
    }
    
    0 讨论(0)
提交回复
热议问题