问题
Problem: I want to copy files from a folder in Google Cloud Storage Bucket (e.g Folder1 in Bucket1) to another Bucket (e.g Bucket2). I can't find any Airflow Operator for Google Cloud Storage to copy files.
回答1:
I know this is an old question but I found myself dealing with this task too. Since I'm using the Google Cloud-Composer, GoogleCloudStorageToGoogleCloudStorageOperator
was not available in the current version.
I managed to solve this issue by using a simple BashOperator
from airflow.operators.bash_operator import BashOperator
with models.DAG(
dag_name,
schedule_interval=timedelta(days=1),
default_args=default_dag_args) as dag:
copy_files = BashOperator(
task_id='copy_files',
bash_command='gsutil -m cp <Source Bucket> <Destination Bucket>'
)
Is very straightforward, can create folders if you need and rename your files.
回答2:
I just found a new operator in contrib uploaded 2 hours ago: https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/gcs_to_gcs.py called GoogleCloudStorageToGoogleCloudStorageOperator
that should copy an object from a bucket to another, with renaming if requested.
来源:https://stackoverflow.com/questions/47452879/copy-files-from-one-google-cloud-storage-bucket-to-other-using-apache-airflow