问题
I have two separate Pipelines say 'P1' and 'P2'. As per my requirement I need to run P2 only after P1 has completely finished its execution. I need to get this entire operation done through a single Template.
Basically Template gets created the moment it finds run() its way say p1.run().
So what I can see that I need to handle two different Pipelines using two different templates but that would not satisfy my strict order based Pipeline execution requirement.
Another way I could think of calling p1.run() inside the ParDo of p2.run() and keep the run() of p2 wait until finish of run() of p1. I tried this way but stuck at IllegalArgumentException given below.
java.io.NotSerializableException: PipelineOptions objects are not serializable and should not be embedded into transforms (did you capture a PipelineOptions object in a field or in an anonymous class?). Instead, if you're using a DoFn, access PipelineOptions at runtime via ProcessContext/StartBundleContext/FinishBundleContext.getPipelineOptions(), or pre-extract necessary fields from PipelineOptions at pipeline construction time.
Is it not possible at all to call the run() of a pipeline inside any transform say 'Pardo' of another Pipeline?
If this is the case then how to satisfy my requirement of calling two different Pipelines in sequence by creating a single template?
回答1:
A template can contain only a single pipeline. In order to sequence the execution of two separate pipelines each of which is a template, you'll need to schedule them externally, e.g. via some workflow management system (such as what Anuj mentioned, or Airflow, or something else - you might draw some inspiration from this post for example).
We are aware of the need for better sequencing primitives in Beam within a single pipeline, but do not have a concrete design yet.
来源:https://stackoverflow.com/questions/46603536/unable-to-run-multiple-pipelines-in-desired-order-by-creating-template-in-apache