问题
Take the official document 'Creating Templates' as an example: https://cloud.google.com/dataflow/docs/templates/creating-templates
class WordcountOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
# Use add_value_provider_argument for arguments to be templatable
# Use add_argument as usual for non-templatable arguments
parser.add_value_provider_argument(
'--input',
default='gs://dataflow-samples/shakespeare/kinglear.txt',
help='Path of the file to read from')
parser.add_argument(
'--output',
required=True,
help='Output file to write results to.')
pipeline_options = PipelineOptions(['--output', 'some/output_path'])
p = beam.Pipeline(options=pipeline_options)
wordcount_options = pipeline_options.view_as(WordcountOptions)
lines = p | 'read' >> ReadFromText(wordcount_options.input)
wordcount_options.input
is a RuntimeValueProvider. I want to use the value specified at runtime(executing the template), so I need to use wordcount_options.input.value
. However, it does not have attribute 'value' when creating the template. It only has 'default_value' instead. I try to specify a value when creating the template(so that I can use it now and later), but no mater what value I specify at runtime, it only use the previous value that I specified when creating the template.
(Basically, my input is a pickle file so I can not use wordcount_options.input
directly.)
回答1:
Just below the linked example is a section Using ValueProvider in your functions.
The documentation shows using the .get()
method on the ValueProvider
parameter to retrieve the runtime value.
Note that the value cannot be used during pipeline construction, since it hasn't been injected from the template. You should only call ValueProvider.get()
inside of runtime methods such as DoFn.process()
.
来源:https://stackoverflow.com/questions/47417881/how-to-use-add-value-provider-argument-to-initialise-runtime-parameter