Python Requests - Use navigate site by servers IP

后端 未结 4 2026
生来不讨喜
生来不讨喜 2021-01-03 02:20

I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won\'t bother me.

How can I utilize this in the re

相关标签:
4条回答
  • 2021-01-03 02:50

    Answer for HTTPS/SNI support: Use the HostHeaderSSLAdapter in the requests_toolbelt module:

    The above solution works fine with virtualhosts for non-encrypted HTTP connections. For HTTPS you also need to pass SNI (Server Name Identification) in the TLS header which as some servers will present a different SSL certificate depending on what is passed in via SNI. Also, the python ssl libraries by default don't look at the Host: header to match the server connection at connection time.

    The above provides a straight-forward for adding a transport adapter to requests that handles this for you.

    Example

    import requests
    
    from requests_toolbelt.adapters import host_header_ssl
    
    # Create a new requests session
    s = requests.Session()
    
    # Mount the adapter for https URLs
    s.mount('https://', host_header_ssl.HostHeaderSSLAdapter())
    
    # Send your request
    s.get("https://198.51.100.50", headers={"Host": "example.org"})
    
    0 讨论(0)
  • 2021-01-03 02:55

    You will have to set a custom header host with value of example.com, something like:

    requests.get('http://127.0.0.1/foo.php', headers={'host': 'example.com'})
    

    should do the trick. If you want to verify that then type in the following command (requires netcat): nc -l -p 80 and then run the above command. It will produce output in the netcat window:

    GET /foo.php HTTP/1.1
    Host: example.com
    Connection: keep-alive
    Accept-Encoding: gzip, deflate
    Accept: */*
    User-Agent: python-requests/2.6.2 CPython/3.4.3 Windows/8
    
    0 讨论(0)
  • 2021-01-03 02:59

    You'd have to tell requests to fake the Host header, and replace the hostname in the URL with the IP address:

    requests.get('http://123.45.67.89/foo.php', headers={'Host': 'www.example.com'})
    

    The URL 'patching' can be done with the urlparse library:

    parsed = urlparse.urlparse(url)
    hostname = parsed.hostname
    parsed = parsed._replace(netloc=ipaddress)
    ip_url = parsed.geturl()
    
    response = requests.get(ip_url, headers={'Host': hostname})
    

    Demo against Stack Overflow:

    >>> import urlparse
    >>> import socket
    >>> url = 'http://stackoverflow.com/help/privileges'
    >>> parsed = urlparse.urlparse(url)
    >>> hostname = parsed.hostname
    >>> hostname
    'stackoverflow.com'
    >>> ipaddress = socket.gethostbyname(hostname)
    >>> ipaddress
    '198.252.206.16'
    >>> parsed = parsed._replace(netloc=ipaddress)
    >>> ip_url = parsed.geturl()
    >>> ip_url
    'http://198.252.206.16/help/privileges'
    >>> response = requests.get(ip_url, headers={'Host': hostname})
    >>> response
    <Response [200]>
    

    In this case I looked up the ip address dynamically.

    0 讨论(0)
  • 2021-01-03 03:15

    I think the best way to send https requests to a specific IP is to add a customized resolver to bind domain name to that IP you want to hit. In this way, both SNI and host header are correctly set, and certificate verification can always succeed as web browser.

    Otherwise, you will see various issue like InsecureRequestWarning, SSLCertVerificationError, and SNI is always missing in Client Hello, even if you try different combination of headers and verify arguments.

    requests.get('https://1.2.3.4/foo.php', headers= {"host": "example.com", verify=True)

    In addition, I tried

    requests_toolbelt

    pip install requests[security]

    forcediphttpsadapter

    all solutions mentioned here using requests with TLS doesn't give SNI support

    None of them set SNI when hitting https://IP directly.

    # mock /etc/hosts
    # lock it in multithreading or use multiprocessing if an endpoint is bound to multiple IPs frequently
    etc_hosts = {}
    
    
    # decorate python built-in resolver
    def custom_resolver(builtin_resolver):
        def wrapper(*args, **kwargs):
            try:
                return etc_hosts[args[:2]]
            except KeyError:
                # fall back to builtin_resolver for endpoints not in etc_hosts
                return builtin_resolver(*args, **kwargs)
    
        return wrapper
    
    
    # monkey patching
    socket.getaddrinfo = custom_resolver(socket.getaddrinfo)
    
    
    def _bind_ip(domain_name, port, ip):
        '''
        resolve (domain_name,port) to a given ip
        '''
        key = (domain_name, port)
        # (family, type, proto, canonname, sockaddr)
        value = (socket.AddressFamily.AF_INET, socket.SocketKind.SOCK_STREAM, 6, '', (ip, port))
        etc_hosts[key] = [value]
    
    
    _bind_ip('example.com', 443, '1.2.3.4')
    # this sends requests to 1.2.3.4
    response = requests.get('https://www.example.com/foo.php', verify=True)
    
    0 讨论(0)
提交回复
热议问题