Setting Scrapy proxy middleware to rotate on each request

前端 未结 1 1107
太阳男子
太阳男子 2020-12-31 13:29

This question necessarily comes in two forms, because I don\'t know the better route to a solution.

A site I\'m crawling kicks me to a redirected \"User Blocked\" pa

相关标签:
1条回答
  • 2020-12-31 14:15

    yesterday I had similar task with proxy and protection against DDoS. ( I've parsed a site ) The idea is in random.choice. Every request has a chance of changing IP. Scrapy uses Tor and telnetlib3. You need to configure ControlPort password.

    from scrapy import log
    from settings import USER_AGENT_LIST
    
    import random
    import telnetlib
    import time
    
    
    # 15% ip change
    class RetryChangeProxyMiddleware(object):
        def process_request(self, request, spider):
            if random.choice(xrange(1,100)) <= 15:
                log.msg('Changing proxy')
                tn = telnetlib.Telnet('127.0.0.1', 9051)
                tn.read_until("Escape character is '^]'.", 2)
                tn.write('AUTHENTICATE "<PASSWORD HERE>"\r\n')
                tn.read_until("250 OK", 2)
                tn.write("signal NEWNYM\r\n")
                tn.read_until("250 OK", 2)
                tn.write("quit\r\n")
                tn.close()
                log.msg('>>>> Proxy changed. Sleep Time')
                time.sleep(10)
    
    
    
    # 30% useragent change
    class RandomUserAgentMiddleware(object):
        def process_request(self, request, spider):
            if random.choice(xrange(1,100)) <= 30:
                log.msg('Changing UserAgent')
                ua  = random.choice(USER_AGENT_LIST)
                if ua:
                    request.headers.setdefault('User-Agent', ua)
                log.msg('>>>> UserAgent changed')
    
    0 讨论(0)
提交回复
热议问题