问题
I'm creating a java program that will read a html document from a URL and display the sizes of the images in the code. I'm not sure how to go about achieving this though.
I wouldn't need to actually download and save the images, i just need the sizes and the order in which they appear on the webpage.
for example: a webpage has 3 images
<img src="dog.jpg" /> //which is 54kb
<img src="cat.jpg" /> //which is 75kb
<img src="horse.jpg"/> //which is 80kb
i would need the output of my java program to display
54kb
75kb
80kb
Any ideas where i should start?
p.s I'm a bit of a java newbie
回答1:
If you're new to Java you may want to leverage an existing library to make things a bit easier. Jsoup allows you to fetch an HTML page and extract elements using CSS-style selectors.
This is just a quick and very dirty example but I think it will show how easy Jsoup can make such a task. Please note that error handling and response-code handling was omitted, I merely wanted to pass on the general idea:
Document doc = Jsoup.connect("http://stackoverflow.com/questions/14541740/java-program-to-download-images-from-a-website-and-display-the-file-sizes").get();
Elements imgElements = doc.select("img[src]");
Map<String, String> fileSizeMap = new HashMap<String, String>();
for(Element imgElement : imgElements){
String imgUrlString = imgElement.attr("abs:src");
URL imgURL = new URL(imgUrlString);
HttpURLConnection httpConnection = (HttpURLConnection) imgURL.openConnection();
String contentLengthString = httpConnection.getHeaderField("Content-Length");
if(contentLengthString == null)
contentLengthString = "Unknown";
fileSizeMap.put(imgUrlString, contentLengthString);
}
for(Map.Entry<String, String> mapEntry : fileSizeMap.entrySet()){
String imgFileName = mapEntry.getKey();
System.out.println(imgFileName + " ---> " + mapEntry.getValue() + " bytes");
}
You might also consider looking at Apache HttpClient. I find it generally preferable over the raw URLConnection/HttpURLConnection approach.
回答2:
You should break you problem into 3 sub problems
- Download the HTML document
- Parse the HTML document and find the images
- Download the images and determine its size
回答3:
You can use regular expressions to find tag and get image URL. After that you'll need and HttpUrlConnection class to get image data and measure it's size.
回答4:
You can do this:
try {
URL urlConn = new URL("http://yoururl.com/cat.jpg");
URLConnection urlC = urlConn.openConnection();
System.out.println(urlC.getContentLength());
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
来源:https://stackoverflow.com/questions/14541740/java-program-to-download-images-from-a-website-and-display-the-file-sizes