问题
I am trying to pass BigTable tableId, instanceId and projectId which are defined as ValueProvider in the TemplateOption class at the execution time as they are runtime values but they don't get honored with the new values . The pipleine gets executed with the old values which were defined when the pipeline was constructed. What changes should i make so that it honors values at runtime?
Pipeline p = Pipeline.create(options);
com.google.cloud.bigtable.config.BigtableOptions.Builder optionsBuilder =
new com.google.cloud.bigtable.config.BigtableOptions.Builder().
setProjectId("my-project");
PCollection<com.google.bigtable.v2.Row> row = p.apply("filtered read", org.apache.beam.sdk.io.gcp.bigtable.BigtableIO.read().withBigtableOptions(optionsBuilder).withoutValidation().withInstanceId(options.getInstanceId()).withProjectId(options.getProjectId()).withTableId(options.getTableId()));
PCollection<KV<Integer,String>> convertToKV = row.apply(ParDo.of(new ConvertToKV()));
My Option class looks like :--
@Default.String("my-project")
@Description("The Google Cloud project ID for the Cloud Bigtable instance.")
ValueProvider<String> getProjectId();
void setProjectId(ValueProvider<String> projectId);
@Default.String("my-instance")
@Description("The Google Cloud Bigtable instance ID .")
ValueProvider<String> getInstanceId();
void setInstanceId(ValueProvider<String> instanceId);
@Default.String("my-test")
@Description("The Cloud Bigtable table ID in the instance." )
ValueProvider<String> getTableId();
void setTableId(ValueProvider<String> tableId);
@Description("bucket name")
@Default.String("mybucket")
ValueProvider<String> getBucketName();
void setBucketName(ValueProvider<String> bucketName);
Any help would be really appreciated.
回答1:
I do believe that validating runtime parameters at construction time is an issue. However, what I don't understand is not honoring the runtime parameters that were passed when executing the pipeline using the template.
How do you pass your runtime parameters? It should be something like this:
public interface WordCountOptions extends PipelineOptions {
@Description("Path of the file to read from")
@Default.String("gs://dataflow-samples/shakespeare/kinglear.txt")
ValueProvider<String> getInputFile();
void setInputFile(ValueProvider<String> value);
}
public static void main(String[] args) {
WordCountOptions options =
PipelineOptionsFactory.fromArgs(args).withValidation()
.as(WordCountOptions.class);
Pipeline p = Pipeline.create(options);
See "create template" for details: https://cloud.google.com/dataflow/docs/templates/creating-templates
Once the template is constructed, you can execute the pipeline with runtime parameters. For example:
gcloud beta dataflow jobs run test-run1 \
--gcs-location gs://my_template/templates/DemoTemplate \
--parameters inputFile=/path/to/my-file
See "Execute templates" for details: https://cloud.google.com/dataflow/docs/templates/executing-templates
Note: If you don't pass runtime parameters when executing your pipeline, the parameters will either have default values or null.
Hope this helps!
回答2:
I believe that the --inputFiles are bundled in with template when the template is created.
Please see note 1: "In addition to the template file, templated pipeline execution also relies on files that were staged and referenced at the time of template creation. If the staged files are moved or removed, your pipeline execution will fail."
This thread seems relevant as well 2
回答3:
We were also facing the same exception, to fix the issue we added dummy default values for ValueProvider configs and not passed the value at compile time and passed only at run time, it worked fine.
来源:https://stackoverflow.com/questions/49595921/valueprovider-type-parameters-not-getting-honored-at-the-template-execution-time