How i can get new ip from tor every requests in threads?

后端 未结 2 1900
予麋鹿
予麋鹿 2021-01-06 08:21

I try to use TOR proxy for scraping and everything works fine in one thread, but this is slow. I try to do something simple:



        
相关标签:
2条回答
  • 2021-01-06 08:55

    You only have one proxy, which is listening on the port 9050. All 3 processes are sending requests in parallel through that proxy so they share the same IP.

    What is happening is:

    1. All 3 processes ask the proxy to get a new IP
    2. The proxy either request a new IP 3 times, receive 3 responses and apply the last one or it will recognize that it is already waiting for a new IP and disregard 2 of the requests, answering the 3 of them together. That will depend on the proxy implementation.
    3. The processes send their requests through the proxy, which results in the same IP.
    4. The processes are completed and another 3 processes are initiated. Rinse and repeat.

    That is why the IPs are the same for every block of 3 requests.
    You'll need 3 independent proxies to have 3 different IPs at the same time.


    EDIT:

    Possible solution using locks and assuming 3 proxies running on the background:

    import contextlib, threading, time
    
    _controller_ports = [
        # (Controller Lock, connection port, management port)
        (threading.Lock(), 9050, 9051),
        (threading.Lock(), 9060, 9061),
        (threading.Lock(), 9070, 9071),
    ]
    
    def get_new_ip_for(port):
        with Controller.from_port(port=port) as controller:
            controller.authenticate(password="password")
            controller.signal(Signal.NEWNYM)
            time.sleep(controller.get_newnym_wait())
    
    @contextlib.contextmanager
    def get_port_with_new_ip():
        while True:
            for lock, con_port, manage_port in _controller_ports:
                if lock.acquire(blocking=False):
                    get_new_ip_for(manage_port)
                    yield con_port
                    lock.release()
                    break
            time.sleep(1)
    
    def check_ip():
        with get_port_with_new_ip() as port:
            session = requests.session() 
            session.proxies = {'http': f'socks5h://localhost:{port}', 'https': f'socks5h://localhost:{port}'}
            r = session.get('http://httpbin.org/ip')
            print(r.text)
    
    with Pool(processes=3) as pool:
        for _ in range(9):
            pool.apply_async(check_ip)
        pool.close()
        pool.join()
    
    0 讨论(0)
  • 2021-01-06 08:56

    If you want different IPs for each connection, you can also use Stream Isolation over SOCKS by specifying a different proxy username:password combination for each connection.

    With this method, you only need one Tor instance and each requests client can use a different stream with a different exit node.

    In order to set this up, add unique proxy credentials for each requests.session object like so: socks5h://username:password@localhost:9050

    import random
    from multiprocessing import Pool
    import requests
    
    def check_ip():
        session = requests.session()
        creds = str(random.randint(10000,0x7fffffff)) + ":" + "foobar"
        session.proxies = {'http': 'socks5h://{}@localhost:9050'.format(creds), 'https': 'socks5h://{}@localhost:9050'.format(creds)}
        r = session.get('http://httpbin.org/ip')
        print(r.text)
    
    
    with Pool(processes=8) as pool:
        for _ in range(9):
            pool.apply_async(check_ip)
        pool.close()
        pool.join()
    

    Tor Browser isolates streams on a per-domain basis by setting the credentials to firstpartydomain:randompassword, where randompassword is a random nonce for each unique first party domain.

    If you're crawling the same site and you want random IP's, then use a random username:password combination for each session. If you are crawling random domains and want to use the same circuit for requests to a domain, use Tor Browser's method of domain:randompassword for credentials.

    0 讨论(0)
提交回复
热议问题