I am looking for all of the images on a given website.
For this purpose i need to find the ones that are within the css for example:
.gk-crop {
Jsoup
doesn't parse css files.
Have a look at this to know what Jsoup
is responsible for.
You need a separate css parser to extract url
from css
files. Have a look at this
Just like Niranjan mentioned, Jsoup is not for parsing CSS but XML. If you really need to extract some images from CSS, you will need to use some some 3rd party library for that purpose OR write simple regex for grabbing URLs from CSS file - its still plain text isn't it? This is not flexible resolution to your problem, but it would be the fastest one:)
If you want to select the URL's of all the images on a website you can select all the image tags and then get the absolute URL's.
Example:
String html = "http://www.bbc.co.uk";
Document doc = Jsoup.connect(html).get();
Elements titles = doc.select("img");
for (Element e : titles) {
System.out.println(e.absUrl("src"));
}
which will grab all the <img>
elements and present it, such as
http://sa.bbc.co.uk/bbc/bbc/s?name=SET-COUNTER&pal_route=index&ml_name=barlesque&app_type=web&language=en-GB&ml_version=0.16.1&pal_webapp=wwhp&blq_s=3.5&blq_r=3.5&blq_v=default-worldwide
http://static.bbci.co.uk/frameworks/barlesque/2.50.2/desktop/3.5/img/blq-blocks_grey_alpha.png
http://static.bbci.co.uk/frameworks/barlesque/2.50.2/desktop/3.5/img/blq-search_grey_alpha.png
http://news.bbcimg.co.uk/media/images/69139000/jpg/_69139104_69139103.jpg
http://news.bbcimg.co.uk/media/images/69134000/jpg/_69134575_waynerooney1.jpg
If you only want the .JPG files, tell the selector that by including
Elements titles = doc.select("img[src$=.jpg]");
which result in only parsing the .JPG-urls.