Transfer file from URL to Cloud Storage

前端 未结 2 1155
不知归路
不知归路 2020-11-30 15:58

I\'m a Ruby dev trying my hand at Google Cloud Functions written in Python and have hit a wall with transferring a remote file from a given URL to Google Cloud Storage (GCS)

相关标签:
2条回答
  • 2020-11-30 16:23

    Directly transferring URLs into GCS is possible through the Cloud Transfer service, but setting up a cloud transfer job for a single URL is a lot of overhead. That sort of solution is targeted towards a situation with millions of URLs that need to become GCS objects.

    Instead, I recommend writing a job that pumps an incoming stream from reading a URL into a write stream to GCS and running that somewhere in the Google Cloud close to the bucket.

    0 讨论(0)
  • 2020-11-30 16:34

    It is not possible to upload a file to Google Cloud Storage directly from an URL. Since you are running the script from a local environment, the file contents that you want to upload, need to be in that same environment. This means that the contents of the url need to either be stored in the memory, or in a file.

    An example showing how to do it, based in your code:

    Option 1: You can use the wget module, that will fetch the url and download it's contents into a local file (similar to the wget CLI command). Note that this means that the file will be stored locally, and then uploaded from the file. I added the os.remove line to remove the file once the upload is done.

    from google.cloud import storage
    import wget
    import io, os
    
    project_id = 'my-project'
    bucket_name = 'my-bucket'
    destination_blob_name = 'upload.test'
    storage_client = storage.Client.from_service_account_json('my_creds.json')
    
    source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
    
    def upload_blob(bucket_name, source_file_name, destination_blob_name):   
        filename = wget.download(source_file_name)
    
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
        blob.upload_from_filename(filename, content_type='image/jpg')
        os.remove(filename)
    
    upload_blob(bucket_name, source_file_name, destination_blob_name)
    

    Option 2: using the urllib module, works similar to the wget module, but instead of writing into a file it writes to a variable. Note that I did this example im Python3, there are some differences if you plan to run your script in Python 2.X.

    from google.cloud import storage
    import urllib.request
    
    project_id = 'my-project'
    bucket_name = 'my-bucket'
    destination_blob_name = 'upload.test'
    storage_client = storage.Client.from_service_account_json('my_creds.json')
    
    source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
    
    def upload_blob(bucket_name, source_file_name, destination_blob_name):   
        file = urllib.request.urlopen(source_file_name)
    
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
    
        blob.upload_from_string(link.read(), content_type='image/jpg')
    
    upload_blob(bucket_name, source_file_name, destination_blob_name)
    
    0 讨论(0)
提交回复
热议问题