Python-Twisted: Reverse Proxy to HTTPS API: Could not connect

三世轮回 提交于 2019-12-31 03:00:50

问题


I am trying to build a reverse-proxy to talk to certain APIs(like Twitter, Github, Instagram) that I can then call with my reverse-proxy to any (client) applications I want (think of it like an API-manager).

Also, I am using an LXC-container to do this.

For example, here is the simplest of code that I hacked from the examples on the Twisted Docs:

from twisted.internet import reactor
from twisted.web import proxy, server
from twisted.python.log import startLogging
from sys import stdout
startLogging(stdout)

site = server.Site(proxy.ReverseProxyResource('https://api.github.com/users/defunkt', 443, b''))
reactor.listenTCP(8080, site)
reactor.run()

When I do CURL within the container, I get a valid request (meaning I get the appropriate JSON response).

Here is how I used the CURL command:

curl https://api.github.com/users/defunkt

And here is the output I get:

{
  "login": "defunkt",
  "id": 2,
  "avatar_url": "https://avatars.githubusercontent.com/u/2?v=3",
  "gravatar_id": "",
  "url": "https://api.github.com/users/defunkt",
  "html_url": "https://github.com/defunkt",
  "followers_url": "https://api.github.com/users/defunkt/followers",
  "following_url": "https://api.github.com/users/defunkt/following{/other_user}",
  "gists_url": "https://api.github.com/users/defunkt/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/defunkt/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/defunkt/subscriptions",
  "organizations_url": "https://api.github.com/users/defunkt/orgs",
  "repos_url": "https://api.github.com/users/defunkt/repos",
  "events_url": "https://api.github.com/users/defunkt/events{/privacy}",
  "received_events_url": "https://api.github.com/users/defunkt/received_events",
  "type": "User",
  "site_admin": true,
  "name": "Chris Wanstrath",
  "company": "GitHub",
  "blog": "http://chriswanstrath.com/",
  "location": "San Francisco",
  "email": "chris@github.com",
  "hireable": true,
  "bio": null,
  "public_repos": 107,
  "public_gists": 280,
  "followers": 15153,
  "following": 208,
  "created_at": "2007-10-20T05:24:19Z",
  "updated_at": "2016-02-26T22:34:27Z"
}

However, when I attempt fetching the proxy via Firefox using:

http://10.5.5.225:8080/

I get: "Could not connect"

This is what my Twisted log looks like:

2016-02-27 [-] Log opened.

2016-02-27 [-] Site starting on 8080

2016-02-27 [-] Starting factory

2016-02-27 [-] Starting factory

2016-02-27 [-] "10.5.5.225" - - [27/Feb/2016: +0000] "GET / HTTP/1.1" 501 26 "-" "Mozilla/5.0 (X11; Debian; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0"

2016-02-27 [-] Stopping factory

How can I use Twisted to make an API call (most APIs are HTTPS nowadays anyway) and get the required response (basically, what the "200" response/JSON should be)?

I tried looking at this question: Convert HTTP Proxy to HTTPS Proxy in Twisted

But it didn't make much sense from a coding point-of-view (or mention anything about reverse-proxying).

**Edit: I also tried switching out the HTTPS API call for a regular HTTP call using:

curl http[colon][slash][slash]openlibrary[dot]org[slash]authors[slash]OL1A.json

(URL above has been formatted to avoid link-conflict issue)

However, I still get the same error in my browser (as mentioned above).

**Edit2: I have tried running your code, but I get this error:

Error-screenshot

If you look at the image, you will see the error (when running the code) of:

builtins.AttributeError: 'str' object has no attribute 'decode'


回答1:


If you read the API documentation for ReverseProxyResource, you will see that the signature of __init__ is:

def __init__(self, host, port, path, reactor=reactor):

and "host" is documented as "the host of the web server to proxy".

So you are passing a URI where Twisted expects a host.

Worse yet, ReverseProxyResource is designed for local use on a web server, and doesn't quite support https:// URLs out of the box.

It does have a (very limited) extensibility hook though - proxyClientFactoryClass - and to apologize for ReverseProxyResource not having what you need out of the box, I will show you how to use that to extend ReverseProxyResource to add https:// support so you can use the GitHub API :).

from twisted.web import proxy, server
from twisted.logger import globalLogBeginner, textFileLogObserver
from twisted.protocols.tls import TLSMemoryBIOFactory
from twisted.internet import ssl, defer, task, endpoints
from sys import stdout
globalLogBeginner.beginLoggingTo([textFileLogObserver(stdout)])

class HTTPSReverseProxyResource(proxy.ReverseProxyResource, object):
    def proxyClientFactoryClass(self, *args, **kwargs):
        """
        Make all connections using HTTPS.
        """
        return TLSMemoryBIOFactory(
            ssl.optionsForClientTLS(self.host.decode("ascii")), True,
            super(HTTPSReverseProxyResource, self)
            .proxyClientFactoryClass(*args, **kwargs))
    def getChild(self, path, request):
        """
        Ensure that implementation of C{proxyClientFactoryClass} is honored
        down the resource chain.
        """
        child = super(HTTPSReverseProxyResource, self).getChild(path, request)
        return HTTPSReverseProxyResource(child.host, child.port, child.path,
                                         child.reactor)

@task.react
def main(reactor):
    import sys
    forever = defer.Deferred()
    myProxy = HTTPSReverseProxyResource('api.github.com', 443,
                                        b'/users/defunkt')
    myProxy.putChild("", myProxy)
    site = server.Site(myProxy)
    endpoint = endpoints.serverFromString(
        reactor,
        dict(enumerate(sys.argv)).get(1, "tcp:8080:interface=127.0.0.1")
    )
    endpoint.listen(site)
    return forever

If you run this, curl http://localhost:8080/ should do what you expect.

I've taken the liberty of modernizing your Twisted code somewhat; endpoints instead of listenTCP, logger instead of twisted.python.log, and react instead of starting the reactor yourself.

The weird little putChild piece at the end there is because when we pass b"/users/defunkt" as the path, that means a request for / will result in the client requesting /users/defunkt/ (note the trailing slash), which is a 404 in GitHub's API. If we explicitly proxy the empty-child-segment path as if it did not have the trailing segment, I believe it will do what you expect.

PLEASE NOTE: proxying from plain-text HTTP to encrypted HTTPS can be extremely dangerous, so I've added a default listening interface here of localhost-only. If your bytes transit over an actual network, you should ensure that they are properly encrypted with TLS.



来源:https://stackoverflow.com/questions/35664007/python-twisted-reverse-proxy-to-https-api-could-not-connect

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!