Website navigates to no-access page using ChromeDriver and Chrome through Selenium probably Bot Protected

谁说胖子不能爱 提交于 2020-07-03 12:59:50

问题


My target site is https://www.nike.com/kr/ko_kr When using selenium driver.get to connect to this target using webdriver.Chrome().get, the connection is done.

But if I click elements to use my hand or element_find_xpath(), it redirected no-access page(probably bot protector) and I cant do anything(other target's sub page etc...).

I changed user-agent, ip but it redirected no-access too. How can I cheat the site and enable normal access?

I have also changed the user-agent and etc.. But didn't work

Snapshot of code trials:

Change user-agent and etc.. But didnt work


回答1:


You can try to set a timeout of a few seconds between your actions, to act more "human-like".

There is an implicit and an explicit way (source: Selenium Waits).

An explicit wait is a code you define to wait for a certain condition to occur before proceeding further in the code. The extreme case of this is time.sleep(), which sets the condition to an exact time period to wait

You are probably looking for the implicit way: An implicit wait tells WebDriver to poll the DOM for a certain amount of time when trying to find any element (or elements) not immediately available. The default setting is 0. Once set, the implicit wait is set for the life of the WebDriver object.

from selenium import webdriver

driver = webdriver.Firefox()
driver.implicitly_wait(10) # seconds
driver.get("http://somedomain/url_that_delays_loading")
myDynamicElement = driver.find_element_by_id("myDynamicElement")

Another way to wait for a few seconds works with the package time:

import time 
time.sleep(5) #wait 5 seconds



回答2:


I made some tweaks to your code and executed the test as follows:

  • Code Block:

    from selenium import webdriver 
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get("https://www.naver.com/")
    driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
    driver.execute_script("Object.defineProperty(navigator, 'plugins', {get: function() {return[1, 2, 3, 4, 5]}})")
    driver.execute_script("Object.defineProperty(navigator, 'languages', {get: function() {return['ko-KR', 'ko']}})")
    driver.execute_script("const getParameter = WebGLRenderingContext.getParameter;WebGLRenderingContext.prototype.getParameter = function(parameter) {if (parameter === 37445) {return 'NVIDIA Corporation'} if (parameter === 37446) {return 'NVIDIA GeForce GTX 980 Ti OpenGL Engine';}return getParameter(parameter);};")
    driver.get("https://www.nike.com/kr/ko_kr/")
    

Observation

Similar to your observation, I have hit the same roadblock being redirected to No Access page as follows:


Deep Dive

It seems Selenium driven ChromeDriver initiated Chrome Browsing Context is getting detected as a automated bot.

Meanwhile, while inspecting the DOM Tree of the webpage it was observed that some of the <script> tag contains the keyword akam. As an example:

  • <script type="text/javascript" src="https://www.nike.com/akam/11/43465b03" defer=""></script>
  • <noscript><img src="https://www.nike.com/akam/11/pixel_43465b03?a=dD1kMjkzYzhlOTA4OWVmZTlhOGZhMjg2MjBmNjk5YWVjZmM0Y2U2NWY5JmpzPW9mZg==" style="visibility: hidden; position: absolute; left: -999px; top: -999px;" /></noscript>
  • <link id="dnsprefetchlink" rel="dns-prefetch" href="//gerxi63iifbfuxxtmreq-f-e92349eda-clientnsv4-s.akamaihd.net">

Which is a clear indication that the website is protected by Bot Manager an advanced bot detection service provided by Akamai and the response gets blocked.


Bot Manager

As per the article Bot Manager - Foundations:

akamai_detection


Conclusion

So it can be concluded that the request for the data is detected as being performed by Selenium driven WebDriver instance and the response is blocked.


References

A couple of documentations:

  • Bot Manager
  • Bot Manager : Foundations

tl; dr

A couple of relevant discussions:

  • Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
  • Can a website detect when you are using selenium with chromedriver?


来源:https://stackoverflow.com/questions/62550886/website-navigates-to-no-access-page-using-chromedriver-and-chrome-through-seleni

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!