Python - Transfer a file from HTTP(S) URL to FTP/Dropbox without disk writing (chunked upload)

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-02 09:53:57

问题


I have a large file (500 Mb-1Gb) stored on a HTTP(S) location
(say https://example.com/largefile.zip).

I have read/write access to an FTP server

I have normal user permissions (no sudo).

Within these constraints I want to read the file from the HTTP URL via requests and send it to the FTP server without writing to disk first.

So normally, I would do.

response=requests.get('https://example.com/largefile.zip', stream=True)
with open("largefile_local.zip", "wb") as handle:                                                                                                     
 for data in response.iter_content(chunk_size=4096):
  handle.write(data)     

and then upload the local file to FTP. But I want to avoid the disk I/O. I cannot mount the FTP as a fuse filesystem because I don't have super user rights.

Ideally I would do something like ftp_file.write() instead of handle.write(). Is that possible? The ftplib documentation seems to assume only local files will be uploaded, not response.content. So ideally I would like to do

response=requests.get('https://example.com/largefile.zip', stream=True)
for data in response.iter_content(chunk_size=4096):
 ftp_send_chunk(data)   

I am not sure how to write ftp_send_chunk().

There is a similar question here (Python - Upload a in-memory file (generated by API calls) in FTP by chunks). My use case requires retrieving a chunk from the HTTP URL and writing it to FTP.

P.S.: The solution provided in the answer (wrapper around urllib.urlopen) will work with dropbox uploads as well. I had problems working with my ftp provider ,so finally used dropbox, which is working reliably.

Note that Dropbox has a "add web upload" feature in the api which does the same thing (remote upload). That only works with "direct" links. In my use case the http_url came from a streaming service that was i.p. restricted. So this workaround became necessary. Here's the code

import dropbox;
d = dropbox.Dropbox(<ACTION-TOKEN>);
f=FileWithProgress(filehandle);
filesize=filehandle.length;
targetfile='/'+fname;
CHUNK_SIZE=4*1024*1024
upload_session_start_result = d.files_upload_session_start(f.read(CHUNK_SIZE));
num_chunks=1
cursor = dropbox.files.UploadSessionCursor(session_id=upload_session_start_result.session_id,
                                           offset=CHUNK_SIZE*num_chunks)
commit = dropbox.files.CommitInfo(path=targetfile)
while CHUNK_SIZE*num_chunks < filesize:
 if ((filesize - (CHUNK_SIZE*num_chunks)) <= CHUNK_SIZE):
  print d.files_upload_session_finish(f.read(CHUNK_SIZE),cursor,commit)
 else:
  d.files_upload_session_append(f.read(CHUNK_SIZE),cursor.session_id,cursor.offset)
 num_chunks+=1
cursor.offset = CHUNK_SIZE*num_chunks
link = d.sharing_create_shared_link(targetfile)  
url = link.url
dl_url = re.sub(r"\?dl\=0", "?dl=1", url)
dl_url = dl_url.strip()
print 'dropbox_url: ',dl_url;

I think it should even be possible to do this with google-drive via their python api , but using credentials with their python wrapper is too hard for me. Check this1 and this2


回答1:


It should be easy with urllib.request.urlopen, as it returns a file-like object, which you can use directly with FTP.storbinary.

ftp = FTP(host, user, passwd)

filehandle = urllib.request.urlopen(http_url)

ftp.storbinary("STOR /ftp/path/file.dat", filehandle)

If you want to monitor progress, implement a wrapper file-like object that will delegate calls to filehandle object, but will also display the progress:

class FileWithProgress:

    def __init__(self, filehandle):
        self.filehandle = filehandle
        self.p = 0

    def read(self, blocksize):
        r = self.filehandle.read(blocksize)
        self.p += len(r)
        print(str(self.p) + " of " + str(self.p + self.filehandle.length)) 
        return r

filehandle = urllib.request.urlopen(http_url)

ftp.storbinary("STOR /ftp/path/file.dat", FileWithProgress(filehandle))

For Python 2 use:

  • urllib.urlopen, instead of urllib.request.urlopen.
  • filehandle.info().getheader('Content-Length') instead of str(self.p + filehandle.length)


来源:https://stackoverflow.com/questions/53544969/python-transfer-a-file-from-https-url-to-ftp-dropbox-without-disk-writing-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!