问题
Does anyone know if it is possible to programmatically import/export DataFlow pipelines (deployed or in draft status)?
The idea is to write a script to drop and create a DataFusion instance, in order to avoid billing when it's not used. Via gloud commandline it's possible to provision a DataFusion cluster and to destroy it, but it would be interesting to automatically export and import all my pipelines too.
The official documentation, unfortunately, didn't help me...
Thanks!
回答1:
You could use the REST API to do this. However you would probably need some script that automatically does this given the instance url. You should be able to get pipeline config from application list API (reference here). In your case you first need to get list of pipelines (reference here) then iterate through all pipelines and get details of individual pipeline which will have a property called configuration
which will have the config pipeline json. You still have to create a new JSON with name, description, artifact information along with config property with configuration json you received from backend.
A sample would look like this,
- In your cluster you are about to destroy, GET API to get list of apps with
artifactName=cdap-data-pipeline,cdap-data-streams
as query parameter
/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams
- Parse response and iterate through individual apps and GET app details
namespaces/default/apps/<app-name>
For each app get configuration
property in the response and form your final JSON to something like,
{
"name": "Pipeline_1",
"description": "Pipeline to do taskX",
"artifact": {
"name": "cdap-data-pipeline",
"version": "6.1.0-SNAPSHOT",
"scope": "USER"
},
"config": JSON.parse(<configuration-from-app-detailed-api>)
}
- Then in a new cluster you are about to create just deploy the pipeline using the json you got in the previous step.
One thing to note is, if you have setup schedules or triggers for pipelines in old cluster, those won't be created in the new cluster. Rest of the pipeline should just work if you are just deploying and running the pipeline.
Hope this helps.
[UPDATE] 11/20
Just realized there is docs on accessing REST API for datafusion here However it doesn't take entirely about HOW to make the REST api call. Here is an example on how to do it,
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -w"\n" -X GET <instance-url>/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams
Here we use gcloud to get access-token to that specific instance. A pre-requisite for this would be to signin with gcloud SDK. This should successfully return the list of apps in your specific instance once the authentication is successful.
来源:https://stackoverflow.com/questions/58839608/import-export-datafusion-pipelines