How can I download and access files using Cloud Composer?

…衆ロ難τιáo~ 提交于 2019-12-19 04:37:07

问题


I have a few file-related use cases that I'm not sure how to best accomplish using Cloud Composer. How should I best accomplish these?

1)I need to use a private key (.pem) file to access an SFTP server. Where should this file be stored and how should it be accessed? In on-prem Airflow, I would just have the file in a folder /keys/ in the same directory as /dags/.

2)I need to move files from an SFTP server to Cloud Storage. With Airflow on prem, I download these from the SFTP server to a specific location on the Airflow worker instance and then upload from there. Am I able to do something similar with Composer, or is there a workaround as I am unable to access the file system?


回答1:


1) Assuming the .pem file only needs to be accessed at task runtime (as opposed to DAG definition parse time), you can put it in the /data directory of the environment's Cloud Storage bucket. It is mounted with fuse on the path /home/airflow/gcs/data. You can upload files with the Cloud Composer gcloud component.

2) There are 2 options here.

  1. Write from your SFTP server to /home/airflow/gcs/data, which is fuse mounted to your Cloud Storage bucket. You could leave it there or use the GoogleCloudStorageToGoogleCloudStorageOperator to move it to where you really want it.

  2. If you want to copy to local disk and from local disk to Cloud Storage, you'll need to do both steps within the same task (since Cloud Composer environments use the CeleryExecutor, tasks within the same DAG aren't guaranteed to run on the same machine). You should be able to write to /home/airflow and /tmp.




回答2:


For 2., based on cloud composer documentation:

When you modify DAGs or plugins in the Cloud Storage bucket, Cloud Composer synchronizes the data across all the nodes in the cluster. Cloud Composer synchronizes the dags/ and plugins/ folders uni-directionally by copying locally and synchronizes data/ and logs/ folders bi-directionally by using Cloud Storage FUSE.

you can write files to local directory /home/airflow/gcs/data in operators and cloud composer will sync the directory with gs://bucket/data bi-directionally.

more details you can take a look this document to know how google cloud composer interacts with google cloud storage: https://cloud.google.com/composer/docs/concepts/cloud-storage



来源:https://stackoverflow.com/questions/50357560/how-can-i-download-and-access-files-using-cloud-composer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!