Import/Export DataFusion pipelines

帅比萌擦擦* 提交于 2020-06-26 04:07:16

问题


Does anyone know if it is possible to programmatically import/export DataFlow pipelines (deployed or in draft status)?

The idea is to write a script to drop and create a DataFusion instance, in order to avoid billing when it's not used. Via gloud commandline it's possible to provision a DataFusion cluster and to destroy it, but it would be interesting to automatically export and import all my pipelines too.

The official documentation, unfortunately, didn't help me...

Thanks!


回答1:


You could use the REST API to do this. However you would probably need some script that automatically does this given the instance url. You should be able to get pipeline config from application list API (reference here). In your case you first need to get list of pipelines (reference here) then iterate through all pipelines and get details of individual pipeline which will have a property called configuration which will have the config pipeline json. You still have to create a new JSON with name, description, artifact information along with config property with configuration json you received from backend.

A sample would look like this,

  1. In your cluster you are about to destroy, GET API to get list of apps with artifactName=cdap-data-pipeline,cdap-data-streams as query parameter
/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams
  1. Parse response and iterate through individual apps and GET app details
namespaces/default/apps/<app-name>

For each app get configuration property in the response and form your final JSON to something like,


{   
  "name": "Pipeline_1",
  "description": "Pipeline to do taskX",
  "artifact": {
    "name": "cdap-data-pipeline",
    "version": "6.1.0-SNAPSHOT",
    "scope": "USER"
  },
  "config": JSON.parse(<configuration-from-app-detailed-api>) 
} 

  1. Then in a new cluster you are about to create just deploy the pipeline using the json you got in the previous step.

One thing to note is, if you have setup schedules or triggers for pipelines in old cluster, those won't be created in the new cluster. Rest of the pipeline should just work if you are just deploying and running the pipeline.

Hope this helps.

[UPDATE] 11/20

Just realized there is docs on accessing REST API for datafusion here However it doesn't take entirely about HOW to make the REST api call. Here is an example on how to do it,

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -w"\n" -X GET <instance-url>/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams

Here we use gcloud to get access-token to that specific instance. A pre-requisite for this would be to signin with gcloud SDK. This should successfully return the list of apps in your specific instance once the authentication is successful.



来源:https://stackoverflow.com/questions/58839608/import-export-datafusion-pipelines

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!