Can a website detect when you are using selenium with chromedriver?

后端 未结 19 2677
情歌与酒
情歌与酒 2020-11-21 05:41

I\'ve been testing out Selenium with Chromedriver and I noticed that some pages can detect that you\'re using Selenium even though there\'s no automation at all. Even when I

19条回答
  •  被撕碎了的回忆
    2020-11-21 06:18

    A lot have been analyzed and discussed about a website being detected being driven by Selenium controlled ChromeDriver. Here are my two cents:

    According to the article Browser detection using the user agent serving different webpages or services to different browsers is usually not among the best of ideas. The web is meant to be accessible to everyone, regardless of which browser or device an user is using. There are best practices outlined to develop a website to progressively enhance itself based on the feature availability rather than by targeting specific browsers.

    However, browsers and standards are not perfect, and there are still some edge cases where some websites still detects the browser and if the browser is driven by Selenium controled WebDriver. Browsers can be detected through different ways and some commonly used mechanisms are as follows:

    • Implementing captcha / recaptcha to detect the automatic bots.

    You can find a relevant detailed discussion in How does recaptcha 3 know I'm using selenium/chromedriver?

    • Detecting the term HeadlessChrome within headless Chrome UserAgent

    You can find a relevant detailed discussion in Access Denied page with headless Chrome on Linux while headed Chrome works on windows using Selenium through Python

    • Using Bot Management service from Distil Networks

    You can find a relevant detailed discussion in Unable to use Selenium to automate Chase site login

    • Using Bot Manager service from Akamai

    You can find a relevant detailed discussion in Dynamic dropdown doesn't populate with auto suggestions on https://www.nseindia.com/ when values are passed using Selenium and Python

    • Using Bot Protection service from Datadome

    You can find a relevant detailed discussion in Website using DataDome gets captcha blocked while scraping using Selenium and Python

    However, using the user-agent to detect the browser looks simple but doing it well is in fact a bit tougher.

    Note: At this point it's worth to mention that: it's very rarely a good idea to use user agent sniffing. There are always better and more broadly compatible way to address a certain issue.


    Considerations for browser detection

    The idea behind detecting the browser can be either of the following:

    • Trying to work around a specific bug in some specific variant or specific version of a webbrowser.
    • Trying to check for the existence of a specific feature that some browsers don't yet support.
    • Trying to provide different HTML depending on which browser is being used.

    Alternative of browser detection through UserAgents

    Some of the alternatives of browser detection are as follows:

    • Implementing a test to detect how the browser implements the API of a feature and determine how to use it from that. An example was Chrome unflagged experimental lookbehind support in regular expressions.
    • Adapting the design technique of Progressive enhancement which would involve developing a website in layers, using a bottom-up approach, starting with a simpler layer and improving the capabilities of the site in successive layers, each using more features.
    • Adapting the top-down approach of Graceful degradation in which we build the best possible site using all the features we want and then tweak it to make it work on older browsers.

    Solution

    To prevent the Selenium driven WebDriver from getting detected, a niche approach would include either/all of the below mentioned approaches:

    • Rotating the UserAgent in every execution of your Test Suite using fake_useragent module as follows:

      from selenium import webdriver
      from selenium.webdriver.chrome.options import Options
      from fake_useragent import UserAgent
      
      options = Options()
      ua = UserAgent()
      userAgent = ua.random
      print(userAgent)
      options.add_argument(f'user-agent={userAgent}')
      driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
      driver.get("https://www.google.co.in")
      driver.quit()
      

    You can find a relevant detailed discussion in Way to change Google Chrome user agent in Selenium?

    • Rotating the UserAgent in each of your Tests using Network.setUserAgentOverride through execute_cdp_cmd() as follows:

      from selenium import webdriver
      
      driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe')
      print(driver.execute_script("return navigator.userAgent;"))
      # Setting user agent as Chrome/83.0.4103.97
      driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'})
      print(driver.execute_script("return navigator.userAgent;"))
      

    You can find a relevant detailed discussion in How to change the User Agent using Selenium and Python

    • Changing the property value of navigator for webdriver to undefined as follows:

      driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
        "source": """
          Object.defineProperty(navigator, 'webdriver', {
            get: () => undefined
          })
        """
      })
      

    You can find a relevant detailed discussion in Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

    • Changing the values of navigator.plugins, navigator.languages, WebGL, hairline feature, missing image, etc.

    You can find a relevant detailed discussion in Is there a version of selenium webdriver that is not detectable?

    • Changing the conventional Viewport

    You can find a relevant detailed discussion in How to bypass Google captcha with Selenium and python?


    Dealing with reCAPTCHA

    While dealing with 2captcha and recaptcha-v3 rather clicking on checkbox associated to the text I'm not a robot, it may be easier to get authenticated extracting and using the data-sitekey.

    You can find a relevant detailed discussion in How to identify the 32 bit data-sitekey of ReCaptcha V2 to obtain a valid response programmatically using Selenium and Python Requests?

提交回复
热议问题