问题
I have a large file (500 Mb-1Gb) stored on a HTTP(S) location
(say https://example.com/largefile.zip
).
I have read/write access to an FTP server
I have normal user permissions (no sudo).
Within these constraints I want to read the file from the HTTP URL via requests and send it to the FTP server without writing to disk first.
So normally, I would do.
response=requests.get('https://example.com/largefile.zip', stream=True)
with open("largefile_local.zip", "wb") as handle:
for data in response.iter_content(chunk_size=4096):
handle.write(data)
and then upload the local file to FTP. But I want to avoid the disk I/O. I cannot mount the FTP as a fuse filesystem because I don't have super user rights.
Ideally I would do something like ftp_file.write()
instead of handle.write()
. Is that possible? The ftplib documentation seems to assume only local files will be uploaded, not response.content
. So ideally I would like to do
response=requests.get('https://example.com/largefile.zip', stream=True)
for data in response.iter_content(chunk_size=4096):
ftp_send_chunk(data)
I am not sure how to write ftp_send_chunk()
.
There is a similar question here (Python - Upload a in-memory file (generated by API calls) in FTP by chunks). My use case requires retrieving a chunk from the HTTP URL and writing it to FTP.
P.S.: The solution provided in the answer (wrapper around urllib.urlopen) will work with dropbox uploads as well. I had problems working with my ftp provider ,so finally used dropbox, which is working reliably.
Note that Dropbox has a "add web upload" feature in the api which does the same thing (remote upload). That only works with "direct" links. In my use case the http_url came from a streaming service that was i.p. restricted. So this workaround became necessary. Here's the code
import dropbox;
d = dropbox.Dropbox(<ACTION-TOKEN>);
f=FileWithProgress(filehandle);
filesize=filehandle.length;
targetfile='/'+fname;
CHUNK_SIZE=4*1024*1024
upload_session_start_result = d.files_upload_session_start(f.read(CHUNK_SIZE));
num_chunks=1
cursor = dropbox.files.UploadSessionCursor(session_id=upload_session_start_result.session_id,
offset=CHUNK_SIZE*num_chunks)
commit = dropbox.files.CommitInfo(path=targetfile)
while CHUNK_SIZE*num_chunks < filesize:
if ((filesize - (CHUNK_SIZE*num_chunks)) <= CHUNK_SIZE):
print d.files_upload_session_finish(f.read(CHUNK_SIZE),cursor,commit)
else:
d.files_upload_session_append(f.read(CHUNK_SIZE),cursor.session_id,cursor.offset)
num_chunks+=1
cursor.offset = CHUNK_SIZE*num_chunks
link = d.sharing_create_shared_link(targetfile)
url = link.url
dl_url = re.sub(r"\?dl\=0", "?dl=1", url)
dl_url = dl_url.strip()
print 'dropbox_url: ',dl_url;
I think it should even be possible to do this with google-drive via their python api , but using credentials with their python wrapper is too hard for me. Check this1 and this2
回答1:
It should be easy with urllib.request.urlopen, as it returns a file-like object, which you can use directly with FTP.storbinary.
ftp = FTP(host, user, passwd)
filehandle = urllib.request.urlopen(http_url)
ftp.storbinary("STOR /ftp/path/file.dat", filehandle)
If you want to monitor progress, implement a wrapper file-like object that will delegate calls to filehandle
object, but will also display the progress:
class FileWithProgress:
def __init__(self, filehandle):
self.filehandle = filehandle
self.p = 0
def read(self, blocksize):
r = self.filehandle.read(blocksize)
self.p += len(r)
print(str(self.p) + " of " + str(self.p + self.filehandle.length))
return r
filehandle = urllib.request.urlopen(http_url)
ftp.storbinary("STOR /ftp/path/file.dat", FileWithProgress(filehandle))
For Python 2 use:
urllib.urlopen
, instead ofurllib.request.urlopen
.filehandle.info().getheader('Content-Length')
instead ofstr(self.p + filehandle.length)
来源:https://stackoverflow.com/questions/53544969/python-transfer-a-file-from-https-url-to-ftp-dropbox-without-disk-writing-c