How to stop NodeJS “Request” module changes request when using proxy

后端 未结 4 1991
小鲜肉
小鲜肉 2021-02-19 22:49

Sorry if this comes off as confusing.

I have written a script using the NodeJS request module that runs and performs a function on a website then returns with the data.

相关标签:
4条回答
  • 2021-02-19 23:07

    You're using the http-scheme for you request, but if the webserver redirects http to https and if the proxy-server is not configured to accept redirects (to https) then the problem might only be about the scheme respectively the URL you enter.

    So the proxy had to be configured to accept redirects or the URL has to be checked manually in the case of faults and then adjusted in the case of a redirect.

    Here you can read about redirects on one proxy-server (Apache Traffic Server), the scenario there includes more redirects than I described above:
    https://docs.trafficserver.apache.org/en/4.2.x/admin/reverse-proxy-http-redirects.en.html#handling-origin-server-redirect-responses

    If you still encounter problems the server-logs of the proxy-server would be helpful.

    EDIT:
    According to he page @Jannes Botis linked there exist still more proxy-settings that might be able to support or disrupt the desired functionality, so the whole issue is perhaps about configuring the proxy-server correct. Here are a few settings that are directly related to redirects:

    followRedirect - follow HTTP 3xx responses as redirects (default: true). This property can also be implemented as function which gets response object as a single argument and should return true if redirects should continue or false otherwise.
    followAllRedirects - follow non-GET HTTP 3xx responses as redirects (default: false)
    followOriginalHttpMethod - by default we redirect to HTTP method GET. you can enable this property to redirect to the original HTTP method (default: false)
    maxRedirects - the maximum number of redirects to follow (default: 10)
    removeRefererHeader - removes the referer header when a redirect happens (default: false). Note: if true, referer header set in the initial request is preserved during redirect chain.
    

    It's quite possible that other settings of the proxy-server have impact on fail or success of your scenario too.

    0 讨论(0)
  • 2021-02-19 23:09

    After deactivating my old account I wanted to come back and give an actual answer to this question now I fully understand the answer. What I was asking one year ago was not possible, The antibot was fingerprinting me through the TLS ClientHello (And even slightly on the TCP/frame level).

    To start, I wrote my a wrapper called request-curl which wrapped libcurl/curl binaries into a single library with the same format as request-promise, this gave me much more control over the request (preventing encoding, http2/proxy support and further session/TLS control) this still only let me reach a medicore rank of the 687th most popular ClientHello (https://client.tlsfingerprint.io:8443/). It wasn't good enough.

    I had to move language. NodeJS is too much of a high-level language to allow for a really deep control (had to modify packets being sent from Layer 3). So as the answer to my question.

    This is not yet possible to do in NodeJS - Let alone with the now unmaintained request.js library.

    For anyone reading this, if you want to forge perfect requests to bypass antibot security you must move to a different language: I recommend utls in Golang or BouncyCastle in c#. Godspeed to you as it took me a year to really know how to do this. Even then, there's more internal issues these languages have and features they do not yet supposed (Go doesn't support 'basic' header-ordering, you need to monkey-patch/modify internals etc, utls doesn't easily support proxies). The list goes on and on.

    If you're not already too deep into it, it's one hell of a rabbithole and I recommend you do not enter it.

    0 讨论(0)
  • 2021-02-19 23:18

    According to the proxies documentation of the request module:

    By default, when proxying http traffic, request will simply make a standard proxied http request. This is done by making the url section of the initial line of the request a fully qualified url to the endpoint.

    Instead you can use a http tunnel by setting:

    tunnel : true
    

    in the request module proxy settings.

    It could be that in your case, you are making a standard proxied http request, whereas when using a proxy globally on your system or a chrome extension a http tunnel is created.

    From the documentation:

    Note that, when using a tunneling proxy, the proxy-authorization header and any headers from custom proxyHeaderExclusiveList are never sent to the endpoint server, but only to the proxy server.

    0 讨论(0)
  • 2021-02-19 23:18

    There are some scenarios that I can think of

    • Proxy is actually adding some headers to the final request (in order to identify you to the server)
    • The website you're trying to reach has your proxy IPs blacklisted (public/paid ones?)

    It really depends on why you need to use that proxy

    • Is it because of network restrictions?
    • Is it because you want to hide the original request address?

    Also, if you have control over the proxy server, can you log the requests being made to the final server?

    My suggestion

    Try writing your own proxy (a reverse one) and host it somewhere. Instead of requesting to https://target.com, to a request to your http[s]://proxy.com/ and let the reverse proxy do the work. Also, remember to disable X headers on the implementation as it will change the request headers

    Reference for node.js implementation:

    https://github.com/nodejitsu/node-http-proxy

    Note: let me know about the questions I made in the comments

    0 讨论(0)
提交回复
热议问题