Download file using partial download (HTTP)

你离开我真会死。 提交于 2019-11-26 06:32:37

问题


Is there a way to download huge and still growing file over HTTP using the partial-download feature?

It seems that this code downloads file from scratch every time it executed:

import urllib
urllib.urlretrieve (\"http://www.example.com/huge-growing-file\", \"huge-growing-file\")

I\'d like:

  1. To fetch just the newly-written data
  2. Download from scratch only if the source file becomes smaller (for example it has been rotated).

回答1:


It is possible to do partial download using the range header, the following will request a selected range of bytes:

req = urllib2.Request('http://www.python.org/')
req.headers['Range'] = 'bytes=%s-%s' % (start, end)
f = urllib2.urlopen(req)

For example:

>>> req = urllib2.Request('http://www.python.org/')
>>> req.headers['Range'] = 'bytes=%s-%s' % (100, 150)
>>> f = urllib2.urlopen(req)
>>> f.read()
'l1-transitional.dtd">\n\n\n<html xmlns="http://www.w3.'

Using this header you can resume partial downloads. In your case all you have to do is to keep track of already downloaded size and request a new range.

Keep in mind that the server need to accept this header for this to work.




回答2:


This is quite easy to do using TCP sockets and raw HTTP. The relevant request header is "Range".

An example request might look like:

mysock = connect(("www.example.com", 80))
mysock.write(
  "GET /huge-growing-file HTTP/1.1\r\n"+\
  "Host: www.example.com\r\n"+\
  "Range: bytes=XXXX-\r\n"+\
  "Connection: close\r\n\r\n")

Where XXXX represents the number of bytes you've already retrieved. Then you can read the response headers and any content from the server. If the server returns a header like:

Content-Length: 0

You know you've got the entire file.

If you want to be particularly nice as an HTTP client you can look into "Connection: keep-alive". Perhaps there is a python library that does everything I have described (perhaps even urllib2 does it!) but I'm not familiar with one.




回答3:


If I understand your question correctly, the file is not changing during download, but is updated regularly. If that is the question, rsync is the answer.

If the file is being updated continually including during download, you'll need to modify rsync or a bittorrent program. They split files into separate chunks and download or update the chunks independently. When you get to the end of the file from the first iteration, repeat to get the appended chunk; continue as necessary. With less efficiency, one could just repeatedly rsync.



来源:https://stackoverflow.com/questions/1798879/download-file-using-partial-download-http

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!