Is there a way in jsoup to extract an image absolute url, much like one can get a link\'s absolute url?
Consider the following image element found in http://ww
Let's assume you are parsing http://www.example.com/index.html
.
Use jsoup to extract the img src which gives you: images/chicken.jpg
You can then use the URI class to resolve this to an absolute path:
URL url = new URL("http://www.example.com/index.html");
URI uri = url.toURI();
System.out.println(uri.resolve("images/chicken.jpg").toString());
prints
http://www.example.com/images/chicken.jpg
It might be inside a div class so the code would be like this (as example only)
System.out.println(doc.select("div.ClassName image").attr(src));
Once you have the image element, e.g.:
Element image = document.select("img").first();
String url = image.absUrl("src");
// url = http://www.example.com/images/chicken.jpg
Alternatively:
String url = image.attr("abs:src");
Jsoup has a builtin absUrl() method on all nodes to resolve an attribute to an absolute URL, using the base URL of the node (which could be different from the URL the document was retrieved from).
See also the Working with URLs jsoup documentation.
Document doc = Jsoup.connect("www.abc.com").get();
Elements img = doc.getElementsByTag("img");
for (Element el : img) {
String src = el.absUrl("src");
System.out.println("Image Found!");
System.out.println("src attribute is : "+src);
getImages(src);
}