google-cloud-composer

Any success story installing private dependency on GCP Composer Airflow?

假装没事ソ 提交于 2020-03-22 07:54:42
问题 Background info Normally within a container environment I can easily install my private dependency with a requirements.txt like this: --index-url https://user:pass@some_repo.jfrog.io/some_repo/api/pypi/pypi/simple some-private-lib The package "some-private-lib" is the one I wanted to install. Issue Within the GCP Composer environment, I tried with the GCloud command ( gcloud composer environments update ENV_NAME --update-pypi-packages-from-file ./requirements.txt --location LOCATION ), but it

Any success story installing private dependency on GCP Composer Airflow?

廉价感情. 提交于 2020-03-22 07:54:10
问题 Background info Normally within a container environment I can easily install my private dependency with a requirements.txt like this: --index-url https://user:pass@some_repo.jfrog.io/some_repo/api/pypi/pypi/simple some-private-lib The package "some-private-lib" is the one I wanted to install. Issue Within the GCP Composer environment, I tried with the GCloud command ( gcloud composer environments update ENV_NAME --update-pypi-packages-from-file ./requirements.txt --location LOCATION ), but it

Trigger Cloud Composer DAG with a Pub/Sub message

走远了吗. 提交于 2020-02-25 04:13:14
问题 I am trying to create a Cloud Composer DAG to be triggered via a Pub/Sub message. There is the following example from Google which triggers a DAG every time a change occurs in a Cloud Storage bucket: https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf However, on the beginning they say you can trigger DAGs in response to events, such as a change in a Cloud Storage bucket or a message pushed to Cloud Pub/Sub . I have spent a lot of time try to figure out how that can be

Workflow scheduling on GCP Dataproc cluster

落爺英雄遲暮 提交于 2020-02-24 03:56:08
问题 I have some complex Oozie workflows to migrate from on-prem Hadoop to GCP Dataproc. Workflows consist of shell-scripts, Python scripts, Spark-Scala jobs, Sqoop jobs etc. I have come across some potential solutions incorporating my workflow scheduling needs: Cloud Composer Dataproc Workflow Template with Cloud Scheduling Install Oozie on Dataproc auto-scaling cluster Please let me know which option would be most efficient in terms of performance, costing and migration complexities. 回答1: All 3

Why shouldn't you run Kubernetes pods for longer than an hour from Composer?

偶尔善良 提交于 2020-02-01 05:47:05
问题 The Cloud Composer documentation explicitly states that: Due to an issue with the Kubernetes Python client library, your Kubernetes pods should be designed to take no more than an hour to run. However, it doesn't provide any more context than that, and I can't find a definitively relevant issue on the Kubernetes Python client project. To test it, I ran a pod for two hours and saw no problems. What issue creates this restriction, and how does it manifest? 回答1: I'm not deeply familiar with

GCP composer creation failed with bad request

江枫思渺然 提交于 2020-01-15 09:32:05
问题 Trying to create GCP composer environment instance with gcloud CLI gcloud composer environments create "jakub" \ > --project "projectX" \ > --location "us-central1" \ > --zone "us-central1-a" \ > --disk-size 50GB \ > --node-count 3 \ > --image-version composer-1.7.1-airflow-1.10.2 \ > --machine-type n1-standard-2 \ > --python-version 3 \ > --labels env="test" After an hour getting error: f7b3f4-6b95-4fb0-85e3-f39a2b11cec9] failed: Http error status code: 400 Http error message: BAD REQUEST

Template_searchpath gives TemplateNotFound error in Airflow and cannot find the SQL script

瘦欲@ 提交于 2020-01-06 15:27:06
问题 I have a DAG described like this : tmpl_search_path = '/home/airflow/gcs/sql_requests/' with DAG(dag_id='pipeline', default_args=default_args, template_searchpath = [tmpl_search_path]) as dag: create_table = bigquery_operator.BigQueryOperator( task_id = 'create_table', sql = 'create_table.sql', use_legacy_sql = False, destination_dataset_table = some_table) ) The task create_table calls a SQL script create_table.sql . This SQL script is not in the same folder as the DAG folder : it is in a

No module named 'gcp_sql_operator' in cloud composer

梦想与她 提交于 2020-01-06 08:00:19
问题 I am not able to import statement as- from airflow.contrib.operators.gcp_sql_operator import CloudSqlQueryOperator I want to import this in my DAG file which will be run in cloud composer airflow whose version is 1.10.0 and not 1.9.0.Here just to check, I tried to import gcs_to_gcs as- from airflow.contrib.operators.gcs_to_gcs import GoogleCloudStorageToGoogleCloudStorageOperator I am able to import this but not gcp_sql_operator. 回答1: The CloudSqlQueryOperator operator is released since

Google Cloud Composer with regional kubernetes cluster

孤街醉人 提交于 2020-01-06 05:26:43
问题 I'm planning a DR plan in case of zone failures in GCP. Currently, Composer runs in a single zone. Is there a way to have its Kubernetes cluster regional? 回答1: At the moment, no. The composer API requires a region when you create the environment but it chooses a single zone within that region to create the cluster. GKE can clearly support regional and multi-zonal clusters, so I recommend filing a feature request to add this functionality to composer 来源: https://stackoverflow.com/questions

Accessing Kubernetes Secret from Airflow KubernetesPodOperator

非 Y 不嫁゛ 提交于 2019-12-25 00:27:19
问题 I'm setting up an Airflow environment on Google Cloud Composer for testing. I've added some secrets to my namespace, and they show up fine: $ kubectl describe secrets/eric-env-vars Name: eric-env-vars Namespace: eric-dev Labels: <none> Annotations: <none> Type: Opaque Data ==== VERSION_NUMBER: 6 bytes I've referenced this secret in my DAG definition file (leaving out some code for brevity): env_var_secret = Secret( deploy_type='env', deploy_target='VERSION_NUMBER', secret='eric-env-vars', key