I\'m a Ruby dev trying my hand at Google Cloud Functions written in Python and have hit a wall with transferring a remote file from a given URL to Google Cloud Storage (GCS)
Directly transferring URLs into GCS is possible through the Cloud Transfer service, but setting up a cloud transfer job for a single URL is a lot of overhead. That sort of solution is targeted towards a situation with millions of URLs that need to become GCS objects.
Instead, I recommend writing a job that pumps an incoming stream from reading a URL into a write stream to GCS and running that somewhere in the Google Cloud close to the bucket.
It is not possible to upload a file to Google Cloud Storage directly from an URL. Since you are running the script from a local environment, the file contents that you want to upload, need to be in that same environment. This means that the contents of the url need to either be stored in the memory, or in a file.
An example showing how to do it, based in your code:
Option 1: You can use the wget
module, that will fetch the url and download it's contents into a local file (similar to the wget
CLI command). Note that this means that the file will be stored locally, and then uploaded from the file. I added the os.remove
line to remove the file once the upload is done.
from google.cloud import storage
import wget
import io, os
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
filename = wget.download(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(filename, content_type='image/jpg')
os.remove(filename)
upload_blob(bucket_name, source_file_name, destination_blob_name)
Option 2: using the urllib
module, works similar to the wget
module, but instead of writing into a file it writes to a variable. Note that I did this example im Python3, there are some differences if you plan to run your script in Python 2.X.
from google.cloud import storage
import urllib.request
project_id = 'my-project'
bucket_name = 'my-bucket'
destination_blob_name = 'upload.test'
storage_client = storage.Client.from_service_account_json('my_creds.json')
source_file_name = 'http://www.hospiceofmontezuma.org/wp-content/uploads/2017/10/confused-man.jpg'
def upload_blob(bucket_name, source_file_name, destination_blob_name):
file = urllib.request.urlopen(source_file_name)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(link.read(), content_type='image/jpg')
upload_blob(bucket_name, source_file_name, destination_blob_name)