How i can get new ip from tor every requests in threads?

后端 未结 2 1902
予麋鹿
予麋鹿 2021-01-06 08:21

I try to use TOR proxy for scraping and everything works fine in one thread, but this is slow. I try to do something simple:



        
2条回答
  •  囚心锁ツ
    2021-01-06 08:56

    If you want different IPs for each connection, you can also use Stream Isolation over SOCKS by specifying a different proxy username:password combination for each connection.

    With this method, you only need one Tor instance and each requests client can use a different stream with a different exit node.

    In order to set this up, add unique proxy credentials for each requests.session object like so: socks5h://username:password@localhost:9050

    import random
    from multiprocessing import Pool
    import requests
    
    def check_ip():
        session = requests.session()
        creds = str(random.randint(10000,0x7fffffff)) + ":" + "foobar"
        session.proxies = {'http': 'socks5h://{}@localhost:9050'.format(creds), 'https': 'socks5h://{}@localhost:9050'.format(creds)}
        r = session.get('http://httpbin.org/ip')
        print(r.text)
    
    
    with Pool(processes=8) as pool:
        for _ in range(9):
            pool.apply_async(check_ip)
        pool.close()
        pool.join()
    

    Tor Browser isolates streams on a per-domain basis by setting the credentials to firstpartydomain:randompassword, where randompassword is a random nonce for each unique first party domain.

    If you're crawling the same site and you want random IP's, then use a random username:password combination for each session. If you are crawling random domains and want to use the same circuit for requests to a domain, use Tor Browser's method of domain:randompassword for credentials.

提交回复
热议问题