Python seek on remote file using HTTP

前端 未结 4 1610
遥遥无期
遥遥无期 2020-12-05 08:41

How do I seek to a particular position on a remote (HTTP) file so I can download only that part?

Lets say the bytes on a remote file were: 1234567890

I wanna

相关标签:
4条回答
  • 2020-12-05 08:57

    If you are downloading the remote file through HTTP, you need to set the Range header.

    Check in this example how it can be done. Looks like this:

    myUrlclass.addheader("Range","bytes=%s-" % (existSize))
    

    EDIT: I just found a better implementation. This class is very simple to use, as it can be seen in the docstring.

    class HTTPRangeHandler(urllib2.BaseHandler):
    """Handler that enables HTTP Range headers.
    
    This was extremely simple. The Range header is a HTTP feature to
    begin with so all this class does is tell urllib2 that the 
    "206 Partial Content" reponse from the HTTP server is what we 
    expected.
    
    Example:
        import urllib2
        import byterange
    
        range_handler = range.HTTPRangeHandler()
        opener = urllib2.build_opener(range_handler)
    
        # install it
        urllib2.install_opener(opener)
    
        # create Request and set Range header
        req = urllib2.Request('http://www.python.org/')
        req.header['Range'] = 'bytes=30-50'
        f = urllib2.urlopen(req)
    """
    
    def http_error_206(self, req, fp, code, msg, hdrs):
        # 206 Partial Content Response
        r = urllib.addinfourl(fp, hdrs, req.get_full_url())
        r.code = code
        r.msg = msg
        return r
    
    def http_error_416(self, req, fp, code, msg, hdrs):
        # HTTP's Range Not Satisfiable error
        raise RangeError('Requested Range Not Satisfiable')
    

    Update: The "better implementation" has moved to github: excid3/urlgrabber in the byterange.py file.

    0 讨论(0)
  • 2020-12-05 08:58

    I highly recommend using the requests library. It is easily the best HTTP library I have ever used. In particular, to accomplish what you have described, you would do something like:

    import requests
    
    url = "http://www.sffaudio.com/podcasts/ShellGameByPhilipK.Dick.pdf"
    
    # Retrieve bytes between offsets 3 and 5 (inclusive).
    r = requests.get(url, headers={"range": "bytes=3-5"})
    
    # If a 4XX client error or a 5XX server error is encountered, we raise it.
    r.raise_for_status()
    
    0 讨论(0)
  • 2020-12-05 09:03

    You can use httpio to access remote HTTP files as if they were local:

    pip install httpio
    
    import zipfile
    import httpio
    
    url = "http://some/large/file.zip"
    with httpio.open(url) as fp:
        zf = zipfile.ZipFile(fp)
        print(zf.namelist())
    
    0 讨论(0)
  • 2020-12-05 09:06

    AFAIK, this is not possible using fseek() or similar. You need to use the HTTP Range header to achieve this. This header may or may not be supported by the server, so your mileage may vary.

    import urllib2
    
    myHeaders = {'Range':'bytes=0-9'}
    
    req = urllib2.Request('http://www.promotionalpromos.com/mirrors/gnu/gnu/bash/bash-1.14.3-1.14.4.diff.gz',headers=myHeaders)
    
    partialFile = urllib2.urlopen(req)
    
    s2 = (partialFile.read())
    

    EDIT: This is of course assuming that by remote file you mean a file stored on a HTTP server...

    If the file you want is on an FTP server, FTP only allows to to specify a start offset and not a range. If this is what you want, then the following code should do it (not tested!)

    import ftplib
    fileToRetrieve = 'somefile.zip'
    fromByte = 15
    ftp = ftplib.FTP('ftp.someplace.net')
    outFile = open('partialFile', 'wb')
    ftp.retrbinary('RETR '+ fileToRetrieve, outFile.write, rest=str(fromByte))
    outFile.close()
    
    0 讨论(0)
提交回复
热议问题