Can a website detect when you are using selenium with chromedriver?

后端 未结 19 2665
情歌与酒
情歌与酒 2020-11-21 05:41

I\'ve been testing out Selenium with Chromedriver and I noticed that some pages can detect that you\'re using Selenium even though there\'s no automation at all. Even when I

相关标签:
19条回答
  • 2020-11-21 06:16

    It seems to me the simplest way to do it with Selenium is to intercept the XHR that sends back the browser fingerprint.

    But since this is a Selenium-only problem, its better just to use something else. Selenium is supposed to make things like this easier, not way harder.

    0 讨论(0)
  • 2020-11-21 06:17

    I've found changing the javascript "key" variable like this:

    //Fools the website into believing a human is navigating it
            ((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");
    

    works for some websites when using Selenium Webdriver along with Google Chrome, since many sites check for this variable in order to avoid being scrapped by Selenium.

    0 讨论(0)
  • 2020-11-21 06:18

    A lot have been analyzed and discussed about a website being detected being driven by Selenium controlled ChromeDriver. Here are my two cents:

    According to the article Browser detection using the user agent serving different webpages or services to different browsers is usually not among the best of ideas. The web is meant to be accessible to everyone, regardless of which browser or device an user is using. There are best practices outlined to develop a website to progressively enhance itself based on the feature availability rather than by targeting specific browsers.

    However, browsers and standards are not perfect, and there are still some edge cases where some websites still detects the browser and if the browser is driven by Selenium controled WebDriver. Browsers can be detected through different ways and some commonly used mechanisms are as follows:

    • Implementing captcha / recaptcha to detect the automatic bots.

    You can find a relevant detailed discussion in How does recaptcha 3 know I'm using selenium/chromedriver?

    • Detecting the term HeadlessChrome within headless Chrome UserAgent

    You can find a relevant detailed discussion in Access Denied page with headless Chrome on Linux while headed Chrome works on windows using Selenium through Python

    • Using Bot Management service from Distil Networks

    You can find a relevant detailed discussion in Unable to use Selenium to automate Chase site login

    • Using Bot Manager service from Akamai

    You can find a relevant detailed discussion in Dynamic dropdown doesn't populate with auto suggestions on https://www.nseindia.com/ when values are passed using Selenium and Python

    • Using Bot Protection service from Datadome

    You can find a relevant detailed discussion in Website using DataDome gets captcha blocked while scraping using Selenium and Python

    However, using the user-agent to detect the browser looks simple but doing it well is in fact a bit tougher.

    Note: At this point it's worth to mention that: it's very rarely a good idea to use user agent sniffing. There are always better and more broadly compatible way to address a certain issue.


    Considerations for browser detection

    The idea behind detecting the browser can be either of the following:

    • Trying to work around a specific bug in some specific variant or specific version of a webbrowser.
    • Trying to check for the existence of a specific feature that some browsers don't yet support.
    • Trying to provide different HTML depending on which browser is being used.

    Alternative of browser detection through UserAgents

    Some of the alternatives of browser detection are as follows:

    • Implementing a test to detect how the browser implements the API of a feature and determine how to use it from that. An example was Chrome unflagged experimental lookbehind support in regular expressions.
    • Adapting the design technique of Progressive enhancement which would involve developing a website in layers, using a bottom-up approach, starting with a simpler layer and improving the capabilities of the site in successive layers, each using more features.
    • Adapting the top-down approach of Graceful degradation in which we build the best possible site using all the features we want and then tweak it to make it work on older browsers.

    Solution

    To prevent the Selenium driven WebDriver from getting detected, a niche approach would include either/all of the below mentioned approaches:

    • Rotating the UserAgent in every execution of your Test Suite using fake_useragent module as follows:

      from selenium import webdriver
      from selenium.webdriver.chrome.options import Options
      from fake_useragent import UserAgent
      
      options = Options()
      ua = UserAgent()
      userAgent = ua.random
      print(userAgent)
      options.add_argument(f'user-agent={userAgent}')
      driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
      driver.get("https://www.google.co.in")
      driver.quit()
      

    You can find a relevant detailed discussion in Way to change Google Chrome user agent in Selenium?

    • Rotating the UserAgent in each of your Tests using Network.setUserAgentOverride through execute_cdp_cmd() as follows:

      from selenium import webdriver
      
      driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe')
      print(driver.execute_script("return navigator.userAgent;"))
      # Setting user agent as Chrome/83.0.4103.97
      driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'})
      print(driver.execute_script("return navigator.userAgent;"))
      

    You can find a relevant detailed discussion in How to change the User Agent using Selenium and Python

    • Changing the property value of navigator for webdriver to undefined as follows:

      driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
        "source": """
          Object.defineProperty(navigator, 'webdriver', {
            get: () => undefined
          })
        """
      })
      

    You can find a relevant detailed discussion in Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

    • Changing the values of navigator.plugins, navigator.languages, WebGL, hairline feature, missing image, etc.

    You can find a relevant detailed discussion in Is there a version of selenium webdriver that is not detectable?

    • Changing the conventional Viewport

    You can find a relevant detailed discussion in How to bypass Google captcha with Selenium and python?


    Dealing with reCAPTCHA

    While dealing with 2captcha and recaptcha-v3 rather clicking on checkbox associated to the text I'm not a robot, it may be easier to get authenticated extracting and using the data-sitekey.

    You can find a relevant detailed discussion in How to identify the 32 bit data-sitekey of ReCaptcha V2 to obtain a valid response programmatically using Selenium and Python Requests?

    0 讨论(0)
  • 2020-11-21 06:19

    Even if you are sending all the right data (e.g. Selenium doesn't show up as an extension, you have a reasonable resolution/bit-depth, &c), there are a number of services and tools which profile visitor behaviour to determine whether the actor is a user or an automated system.

    For example, visiting a site then immediately going to perform some action by moving the mouse directly to the relevant button, in less than a second, is something no user would actually do.

    It might also be useful as a debugging tool to use a site such as https://panopticlick.eff.org/ to check how unique your browser is; it'll also help you verify whether there are any specific parameters that indicate you're running in Selenium.

    0 讨论(0)
  • 2020-11-21 06:26

    One more thing I found is that some websites uses a platform that checks the User Agent. If the value contains: "HeadlessChrome" the behavior can be weird when using headless mode.

    The workaround for that will be to override the user agent value, for example in Java:

    chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");
    
    0 讨论(0)
  • 2020-11-21 06:29

    Firefox is said to set window.navigator.webdriver === true if working with a webdriver. That was according to one of the older specs (e.g.: archive.org) but I couldn't find it in the new one except for some very vague wording in the appendices.

    A test for it is in the selenium code in the file fingerprint_test.js where the comment at the end says "Currently only implemented in firefox" but I wasn't able to identify any code in that direction with some simple greping, neither in the current (41.0.2) Firefox release-tree nor in the Chromium-tree.

    I also found a comment for an older commit regarding fingerprinting in the firefox driver b82512999938 from January 2015. That code is still in the Selenium GIT-master downloaded yesterday at javascript/firefox-driver/extension/content/server.js with a comment linking to the slightly differently worded appendix in the current w3c webdriver spec.

    0 讨论(0)
提交回复
热议问题