Selenium: How to Inject/execute a Javascript in to a Page before loading/executing any other scripts of the page?

后端 未结 4 1241
花落未央
花落未央 2020-12-05 10:12

I\'m using selenium python webdriver in order to browse some pages. I want to inject a javascript code in to a pages before any other Javascript codes get loaded and execut

相关标签:
4条回答
  • 2020-12-05 10:46

    so I know it's been a few years, but I've found a way to do this without modifying the webpage's content and without using a proxy! I'm using the nodejs version, but presumably the API is consistent for other languages as well. What you want to do is as follows

    const {Builder, By, Key, until, Capabilities} = require('selenium-webdriver');
    const capabilities = new Capabilities();
    capabilities.setPageLoadStrategy('eager'); // Options are 'eager', 'none', 'normal'
    let driver = await new Builder().forBrowser('firefox').setFirefoxOptions(capabilities).build();
    await driver.get('http://example.com');
    driver.executeScript(\`
      console.log('hello'
    \`)
    

    That 'eager' option works for me. You may need to use the 'none' option. Documentation: https://seleniumhq.github.io/selenium/docs/api/javascript/module/selenium-webdriver/lib/capabilities_exports_PageLoadStrategy.html

    EDIT: Note that the 'eager' option has not been implemented in Chrome yet...

    0 讨论(0)
  • 2020-12-05 10:55

    If you cannot modify the page content, you may use a proxy, or use a content script in an extension installed in your browser. Doing it within selenium you would write some code that injects the script as one of the children of an existing element, but you won't be able to have it run before the page is loaded (when your driver's get() call returns.)

    String name = (String) ((JavascriptExecutor) driver).executeScript(
        "(function () { ... })();" ...
    

    The documentation leaves unspecified the moment at which the code would start executing. You would want it to before the DOM starts loading so that guarantee might only be satisfiable with the proxy or extension content script route.

    If you can instrument your page with a minimal harness, you may detect the presence of a special url query parameter and load additional content, but you need to do so using an inline script. Pseudocode:

     <html>
        <head>
           <script type="text/javascript">
           (function () {
           if (location && location.href && location.href.indexOf("SELENIUM_TEST") >= 0) {
              var injectScript = document.createElement("script");
              injectScript.setAttribute("type", "text/javascript");
    
              //another option is to perform a synchronous XHR and inject via innerText.
              injectScript.setAttribute("src", URL_OF_EXTRA_SCRIPT);
              document.documentElement.appendChild(injectScript);
    
              //optional. cleaner to remove. it has already been loaded at this point.
              document.documentElement.removeChild(injectScript);
           }
           })();
           </script>
        ...
    
    0 讨论(0)
  • 2020-12-05 10:55

    Since version 1.0.9, selenium-wire has gained the functionality to modify responses to requests. Below is an example of this functionality to inject a script into a page before it reaches a webbrowser.

    import os
    from seleniumwire import webdriver
    from gzip import compress, decompress
    from urllib.parse import urlparse
    
    from lxml import html
    from lxml.etree import ParserError
    from lxml.html import builder
    
    script_elem_to_inject = builder.SCRIPT('alert("injected")')
    
    def inject(req, req_body, res, res_body):
        # various checks to make sure we're only injecting the script on appropriate responses
        # we check that the content type is HTML, that the status code is 200, and that the encoding is gzip
        if res.headers.get_content_subtype() != 'html' or res.status != 200 or res.getheader('Content-Encoding') != 'gzip':
            return None
        try:
            parsed_html = html.fromstring(decompress(res_body))
        except ParserError:
            return None
        try:
            parsed_html.head.insert(0, script_elem_to_inject)
        except IndexError: # no head element
            return None
        return compress(html.tostring(parsed_html))
    
    drv = webdriver.Firefox(seleniumwire_options={'custom_response_handler': inject})
    drv.header_overrides = {'Accept-Encoding': 'gzip'} # ensure we only get gzip encoded responses
    

    Another way in general to control a browser remotely and be able to inject a script before the pages content loads would be to use a library based on a separate protocol entirely, eg: DevTools Protocol. A Python implementation is available here: https://github.com/pyppeteer/pyppeteer2 (Disclaimer: I'm one of the main authors)

    0 讨论(0)
  • 2020-12-05 11:01

    If you want to inject something into the html of a page before it gets parsed and executed by the browser I would suggest that you use a proxy such as Mitmproxy.

    0 讨论(0)
提交回复
热议问题