问题
So i have created an automation bot to do some stuff for me on the internet .. Using Selenium Python..After long and grooling coding sessions ..days and nights of working on this project i have finally completed it ...Only to be randomly greeted with a Error 1015 "You are being rate limited".
I understand this is to prevent DDOS attacks. But it is a major blow.
I have contacted the website to resolve the matter but to no avail ..But the third party security software they use says that they the website can grant my ip exclusion of rate limiting.
So i was wondering is there any other way to bypass this ..maybe from a coding perspective ... I don't think stuff like clearing cookies will resolve anything ..or will it as it is my specific ip address that they are blocking
Note: The TofC of the website i am running my bot on doesn't say you cant use automation software on it ..but it doesn't say you cant either.
I don't mind coding some more to prevent random access denials ..that i think last for 24 hours which can be detrimental as the final stage of this build is to have my program run daily for long periods of times.
Do you think i could communicate with the third party security to ask them to ask the website to grant me access ..I have already tried resolving the matter with the website. All they said was that A. On there side it says i am fine B. The problem is most likely on my side .."Maybe some malicious software is trying to access our website" which .. malicious no but a bot yes. That's what made me think maybe it would be better if i resolved the matter myself.
Do you think i may have to implement wait times between processes or something. Im stuck.
Thanks for any help. And its a single bot!
回答1:
If you are randomly greeted with...
...implies that the site owner implemented Rate Limiting that affects your visitor traffic.
rate-limiting reason
Cloudflare can rate-limit the the visitor traffic trying to counter a possible Dictionary attack.
rate-limit thresholds
In generic cases Cloudflare rate-limits the visitor when the visitor traffic crosses the rate-limit thresholds which is calculated by, dividing 24 hours of uncached website requests by the unique visitors for the same 24 hours. Then, divide by the estimated average minutes of a visit. Finally, multiply by 4 (or larger) to establish an estimated threshold per minute for your website. A value higher than 4 is fine since most attacks are an order of magnitude above typical traffic rates.
Solution
In these cases the a potential solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context.
undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.
Code Block:
import undetected_chromedriver as uc from selenium import webdriver options = webdriver.ChromeOptions() options.add_argument("start-maximized") driver = uc.Chrome(options=options) driver.get('https://bet365.com')
References
You can find a couple of relevant detailed discussions in:
- Selenium app redirect to Cloudflare page when hosted on Heroku
- Linkedin API throttle limit
回答2:
I see some possibilities for you here:
- Introduce wait time between requests to the site
- Reduce the requests you make
- Extend your bot to detect when it hits the limit and change your ip address (e.g. by restarting you router)
The last one is the least preferable I would assume and also the most time consuming one.
回答3:
First: Read to Terms of Use of the website, for example, look at the robots.txt, usually this is at the root of the website like www.google.com/robots.txt . Note that going against the website owner's explicit terms may be illegal depending on jurisdiction and may result in the owner blocking your tool and/or ip.
https://www.robotstxt.org/robotstxt.html
This will let you know what the website owner explicitly allows for automation and scraping.
After you've reviewed the website's terms and understand what they allow, and they do not respond to you, and you've determined you are not breaking the websites terms of use, the only real other option would be utilize proxies and/or VPSs that will give the system running the scripts different IPs.
来源:https://stackoverflow.com/questions/65128879/how-to-bypass-being-rate-limited-html-error-1015-using-python