I was wondering, how can one use selenium/webdriver to download an image for a page. Assuming that the user session is required to download the image hence having pure URL i
If you need to test that image is available and exists, you may do like this:
protected boolean isResourceAvailableByUrl(String resourceUrl) {
// backup current url, to come back to it in future
String currentUrl = webDriver.getCurrentUrl();
try {
// try to get image by url
webDriver.get(resourceUrl);
// if "resource not found" message was not appeared - image exists
return webDriver.findElements(RESOURCE_NOT_FOUND).isEmpty();
} finally {
// back to page
webDriver.get(currentUrl);
}
}
But you need to be sure, that going through currentUrl will really turn you back on page before execution of this method. In my case it was so. If not - you may try to use:
webDriver.navigate().back()
And also, unfortunately, as it seems, there is no any chance to analyze response status code. That's why you need to find any specific web element on NOT_FOUND page and check that it was appeared and decide then - that image doesn't exist.
It is just workaround, cause I found no any official way to solve it.
NOTE: This solution is helpful in case when you use authorized session to get resource, and can't just download it by ImageIO or strictly by HttpClient.
I prefer like this:
WebElement logo = driver.findElement(By.cssSelector(".image-logo"));
String logoSRC = logo.getAttribute("src");
URL imageURL = new URL(logoSRC);
BufferedImage saveImage = ImageIO.read(imageURL);
ImageIO.write(saveImage, "png", new File("logo-image.png"));
try the following
JavascriptExecutor js = (JavascriptExecutor) driver;
String base64string = (String) js.executeScript("var c = document.createElement('canvas');"
+ " var ctx = c.getContext('2d');"
+ "var img = document.getElementsByTagName('img')[0];"
+ "c.height=img.naturalHeight;"
+ "c.width=img.naturalWidth;"
+ "ctx.drawImage(img, 0, 0,img.naturalWidth, img.naturalHeight);"
+ "var base64String = c.toDataURL();"
+ "return base64String;");
String[] base64Array = base64string.split(",");
String base64 = base64Array[base64Array.length - 1];
byte[] data = Base64.decode(base64);
ByteArrayInputStream memstream = new ByteArrayInputStream(data);
BufferedImage saveImage = ImageIO.read(memstream);
ImageIO.write(saveImage, "png", new File("path"));
Another mostly correct solution is to download it directly by simple HTTP request.
You could use webDriver's user session, cause it stores cookies.
In my example, I'm just analyzing what status code it returns. If 200, then image exists and it is available for show or download. If you need to really download file itself - you could just get all image data from httpResponse entity (use it as simple input stream).
// just look at your cookie's content (e.g. using browser)
// and import these settings from it
private static final String SESSION_COOKIE_NAME = "JSESSIONID";
private static final String DOMAIN = "domain.here.com";
private static final String COOKIE_PATH = "/cookie/path/here";
protected boolean isResourceAvailableByUrl(String resourceUrl) {
HttpClient httpClient = new DefaultHttpClient();
HttpContext localContext = new BasicHttpContext();
BasicCookieStore cookieStore = new BasicCookieStore();
// apply jsessionid cookie if it exists
cookieStore.addCookie(getSessionCookie());
localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
// resourceUrl - is url which leads to image
HttpGet httpGet = new HttpGet(resourceUrl);
try {
HttpResponse httpResponse = httpClient.execute(httpGet, localContext);
return httpResponse.getStatusLine().getStatusCode() == HttpStatus.SC_OK;
} catch (IOException e) {
return false;
}
}
protected BasicClientCookie getSessionCookie() {
Cookie originalCookie = webDriver.manage().getCookieNamed(SESSION_COOKIE_NAME);
if (originalCookie == null) {
return null;
}
// just build new apache-like cookie based on webDriver's one
String cookieName = originalCookie.getName();
String cookieValue = originalCookie.getValue();
BasicClientCookie resultCookie = new BasicClientCookie(cookieName, cookieValue);
resultCookie.setDomain(DOMAIN);
resultCookie.setExpiryDate(originalCookie.getExpiry());
resultCookie.setPath(COOKIE_PATH);
return resultCookie;
}
Works for me:
# open the image in a new tab
driver.execute_script('''window.open("''' + wanted_url + '''","_blank");''')
sleep(2)
driver.switch_to.window(driver.window_handles[1])
sleep(2)
# make screenshot
driver.save_screenshot("C://Folder/" + photo_name + ".jpeg")
sleep(2)
# close the new tab
driver.execute_script('''window.close();''')
sleep(2)
#back to original tab
driver.switch_to.window(driver.window_handles[0])
The only way I found to avoid downloading the image twice is to use the Chrome DevTools Protocol Viewer.
In Python, this gives:
import base64
import pychrome
def save_image(file_content, file_name):
try:
file_content=base64.b64decode(file_content)
with open("C:\\Crawler\\temp\\" + file_name,"wb") as f:
f.write(file_content)
except Exception as e:
print(str(e))
def response_received(requestId, loaderId, timestamp, type, response, frameId):
if type == 'Image':
url = response.get('url')
print(f"Image loaded: {url}")
response_body = tab.Network.getResponseBody(requestId=requestId)
file_name = url.split('/')[-1].split('?')[0]
if file_name:
save_image(response_body['body'], file_name)
tab.Network.responseReceived = response_received
# start the tab
tab.start()
# call method
tab.Network.enable()
# get request to target the site selenium
driver.get("https://www.realtor.com/ads/forsale/TMAI112283AAAA")
# wait for loading
tab.wait(50)