How to download an image using Selenium (any version)?

前端 未结 13 1449
夕颜
夕颜 2020-11-29 04:11

I was wondering, how can one use selenium/webdriver to download an image for a page. Assuming that the user session is required to download the image hence having pure URL i

相关标签:
13条回答
  • 2020-11-29 04:29

    If you need to test that image is available and exists, you may do like this:

    protected boolean isResourceAvailableByUrl(String resourceUrl) {
        // backup current url, to come back to it in future
        String currentUrl = webDriver.getCurrentUrl();
        try {
            // try to get image by url
            webDriver.get(resourceUrl);
            // if "resource not found" message was not appeared - image exists
            return webDriver.findElements(RESOURCE_NOT_FOUND).isEmpty();
        } finally {
            // back to page
            webDriver.get(currentUrl);
        }
    }
    

    But you need to be sure, that going through currentUrl will really turn you back on page before execution of this method. In my case it was so. If not - you may try to use:

    webDriver.navigate().back()
    

    And also, unfortunately, as it seems, there is no any chance to analyze response status code. That's why you need to find any specific web element on NOT_FOUND page and check that it was appeared and decide then - that image doesn't exist.

    It is just workaround, cause I found no any official way to solve it.

    NOTE: This solution is helpful in case when you use authorized session to get resource, and can't just download it by ImageIO or strictly by HttpClient.

    0 讨论(0)
  • 2020-11-29 04:32

    I prefer like this:

     WebElement logo = driver.findElement(By.cssSelector(".image-logo"));
     String logoSRC = logo.getAttribute("src");
    
     URL imageURL = new URL(logoSRC);
     BufferedImage saveImage = ImageIO.read(imageURL);
    
     ImageIO.write(saveImage, "png", new File("logo-image.png"));
    
    0 讨论(0)
  • 2020-11-29 04:34

    try the following

    JavascriptExecutor js = (JavascriptExecutor) driver;                              
    String base64string = (String) js.executeScript("var c = document.createElement('canvas');"
                           + " var ctx = c.getContext('2d');"
                           + "var img = document.getElementsByTagName('img')[0];"
                           + "c.height=img.naturalHeight;"
                           + "c.width=img.naturalWidth;"
                           + "ctx.drawImage(img, 0, 0,img.naturalWidth, img.naturalHeight);"
                           + "var base64String = c.toDataURL();"
                           + "return base64String;");
    String[] base64Array = base64string.split(",");
    
    String base64 = base64Array[base64Array.length - 1];
    
    byte[] data = Base64.decode(base64);
    
    ByteArrayInputStream memstream = new ByteArrayInputStream(data);
    BufferedImage saveImage = ImageIO.read(memstream);
    
    ImageIO.write(saveImage, "png", new File("path"));
    
    0 讨论(0)
  • 2020-11-29 04:35

    Another mostly correct solution is to download it directly by simple HTTP request.
    You could use webDriver's user session, cause it stores cookies.
    In my example, I'm just analyzing what status code it returns. If 200, then image exists and it is available for show or download. If you need to really download file itself - you could just get all image data from httpResponse entity (use it as simple input stream).

    // just look at your cookie's content (e.g. using browser)
    // and import these settings from it
    private static final String SESSION_COOKIE_NAME = "JSESSIONID";
    private static final String DOMAIN = "domain.here.com";
    private static final String COOKIE_PATH = "/cookie/path/here";
    
    protected boolean isResourceAvailableByUrl(String resourceUrl) {
        HttpClient httpClient = new DefaultHttpClient();
        HttpContext localContext = new BasicHttpContext();
        BasicCookieStore cookieStore = new BasicCookieStore();
        // apply jsessionid cookie if it exists
        cookieStore.addCookie(getSessionCookie());
        localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
        // resourceUrl - is url which leads to image
        HttpGet httpGet = new HttpGet(resourceUrl);
    
        try {
            HttpResponse httpResponse = httpClient.execute(httpGet, localContext);
            return httpResponse.getStatusLine().getStatusCode() == HttpStatus.SC_OK;
        } catch (IOException e) {
            return false;
        }
    }
    
    protected BasicClientCookie getSessionCookie() {
        Cookie originalCookie = webDriver.manage().getCookieNamed(SESSION_COOKIE_NAME);
    
        if (originalCookie == null) {
            return null;
        }
    
        // just build new apache-like cookie based on webDriver's one
        String cookieName = originalCookie.getName();
        String cookieValue = originalCookie.getValue();
        BasicClientCookie resultCookie = new BasicClientCookie(cookieName, cookieValue);
        resultCookie.setDomain(DOMAIN);
        resultCookie.setExpiryDate(originalCookie.getExpiry());
        resultCookie.setPath(COOKIE_PATH);
        return resultCookie;
    }
    
    0 讨论(0)
  • 2020-11-29 04:39

    Works for me:

    # open the image in a new tab
    driver.execute_script('''window.open("''' + wanted_url + '''","_blank");''')
    sleep(2)
    driver.switch_to.window(driver.window_handles[1])
    sleep(2)
    
    # make screenshot
    driver.save_screenshot("C://Folder/" + photo_name + ".jpeg")
    sleep(2)
    
    # close the new tab
    driver.execute_script('''window.close();''')
    sleep(2)
    
    #back to original tab
    driver.switch_to.window(driver.window_handles[0])
    
    0 讨论(0)
  • 2020-11-29 04:41

    The only way I found to avoid downloading the image twice is to use the Chrome DevTools Protocol Viewer.

    In Python, this gives:

    import base64
    import pychrome
    def save_image(file_content, file_name):
        try:
           file_content=base64.b64decode(file_content)
           with open("C:\\Crawler\\temp\\" + file_name,"wb") as f:
                f.write(file_content)
        except Exception as e:
           print(str(e))
    
    def response_received(requestId, loaderId, timestamp, type, response, frameId):
        if type == 'Image':
            url = response.get('url')
            print(f"Image loaded: {url}")
            response_body = tab.Network.getResponseBody(requestId=requestId)
            file_name = url.split('/')[-1].split('?')[0]
            if file_name:
                save_image(response_body['body'], file_name)
    
    
    tab.Network.responseReceived = response_received
    
    # start the tab 
    tab.start()
    
    # call method
    tab.Network.enable()
    
    # get request to target the site selenium 
    driver.get("https://www.realtor.com/ads/forsale/TMAI112283AAAA")
    
    # wait for loading
    tab.wait(50)
    
    0 讨论(0)
提交回复
热议问题