HTTP Download very Big File

前端 未结 4 1143
借酒劲吻你
借酒劲吻你 2021-02-04 07:47

I\'m working at a web application in Python/Twisted.

I want the user to be able to download a very big file (> 100 Mb). I don\'t want to load all the file in memory (of

相关标签:
4条回答
  • 2021-02-04 07:48

    Yes, the Content-Length header will give you the progress bar you desire!

    0 讨论(0)
  • 2021-02-04 07:53

    If this really is text/plain content, you should seriously consider sending it with Content-Encoding: gzip whenever a client indicates they can handle it. You ought to see huge bandwidth savings. Additionally, if this is a static file, what you really want to do is use sendfile(2). As for browsers not doing what you expect in terms of downloading things, you might want to look at the Content-Disposition header. So anyhow, the logic goes like this:

    If the client indicates they can handle gzip encoding via the Accept-Encoding header (e.g. Accept-Encoding: compress;q=0.5, gzip;q=1.0 or Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0 or similar) then compress the file, cache the compressed result somewhere, write the correct headers for the response (Content-Encoding: gzip, Content-Length: n, Content-Type: text/plain, etc), and then use sendfile(2) (however that may or may not have been made available in your environment) to copy the content from the open file descriptor into your response stream.

    If they don't accept gzip, do the same thing, but without gzipping first.

    Alternatively, if you have Apache, Lighttpd, or similar acting as a transparent proxy in front of your server, you could use the X-Sendfile header, which is exceedingly fast:

    response.setHeader('Content-Type', 'text/plain')
    response.setHeader(
      'Content-Disposition',
      'attachment; filename="' + os.path.basename(fileName) + '"'
    )
    response.setHeader('X-Sendfile', fileName)
    response.setHeader('Content-Length', os.stat(fileName).st_size)
    
    0 讨论(0)
  • 2021-02-04 08:04

    Here is an example of downloading files in chunks using urllib2, which you could use from inside of a twisted function call

    import os
    import urllib2
    import math
    
    def downloadChunks(url):
        """Helper to download large files
            the only arg is a url
           this file will go to a temp directory
           the file will also be downloaded
           in chunks and print out how much remains
        """
    
        baseFile = os.path.basename(url)
    
        #move the file to a more uniq path
        os.umask(0002)
        temp_path = "/tmp/"
        try:
            file = os.path.join(temp_path,baseFile)
    
            req = urllib2.urlopen(url)
            total_size = int(req.info().getheader('Content-Length').strip())
            downloaded = 0
            CHUNK = 256 * 10240
            with open(file, 'wb') as fp:
                while True:
                    chunk = req.read(CHUNK)
                    downloaded += len(chunk)
                    print math.floor( (downloaded / total_size) * 100 )
                    if not chunk: break
                    fp.write(chunk)
        except urllib2.HTTPError, e:
            print "HTTP Error:",e.code , url
            return False
        except urllib2.URLError, e:
            print "URL Error:",e.reason , url
            return False
    
        return file
    
    0 讨论(0)
  • 2021-02-04 08:10

    Two big problems with the sample code you posted are that it is non-cooperative and it loads the entire file into memory before sending it.

    while r != '':
        r = fp.read(1024)
        request.write(r)
    

    Remember that Twisted uses cooperative multitasking to achieve any sort of concurrency. So the first problem with this snippet is that it is a while loop over the contents of an entire file (which you say is large). This means the entire file will be read into memory and written to the response before anything else can happen in the process. In this case, it happens that "anything" also includes pushing the bytes from the in-memory buffer onto the network, so your code will also hold the entire file in memory at once and only start to get rid of it when this loop completes.

    So, as a general rule, you shouldn't write code for use in a Twisted-based application that uses a loop like this to do a big job. Instead, you need to do each small piece of the big job in a way that will cooperate with the event loop. For sending a file over the network, the best way to approach this is with producers and consumers. These are two related APIs for moving large amounts of data around using buffer-empty events to do it efficiently and without wasting unreasonable amounts of memory.

    You can find some documentation of these APIs here:

    http://twistedmatrix.com/projects/core/documentation/howto/producers.html

    Fortunately, for this very common case, there is also a producer written already that you can use, rather than implementing your own:

    http://twistedmatrix.com/documents/current/api/twisted.protocols.basic.FileSender.html

    You probably want to use it sort of like this:

    from twisted.protocols.basic import FileSender
    from twisted.python.log import err
    from twisted.web.server import NOT_DONE_YET
    
    class Something(Resource):
        ...
    
        def render_GET(self, request):
            request.setHeader('Content-Type', 'text/plain')
            fp = open(fileName, 'rb')
            d = FileSender().beginFileTransfer(fp, request)
            def cbFinished(ignored):
                fp.close()
                request.finish()
            d.addErrback(err).addCallback(cbFinished)
            return NOT_DONE_YET
    

    You can read more about NOT_DONE_YET and other related ideas the "Twisted Web in 60 Seconds" series on my blog, http://jcalderone.livejournal.com/50562.html (see the "asynchronous responses" entries in particular).

    0 讨论(0)
提交回复
热议问题