The Additional Paramates at Dataflow into Beam Pipeline

喜你入骨 提交于 2021-02-05 09:29:01

问题


I'm working on Dataflow, I already has build my custom pipeline via Python SDK. I would like to add the parameters at the Dataflow UI into my custom pipeline. using the Additional Parameters. Reference by https://cloud.google.com/dataflow/docs/guides/templates/creating-templates#staticvalue

Then I changed add_argument to add_value_provider_argument follow by google docs

class CustomParams(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):        
        parser.add_value_provider_argument(
            "--input_topic",
            type = str,
        )
        parser.add_value_provider_argument(
            "--window_size",
            type = int,
            default = 5,
        )

def run():
    pipeline_options = PipelineOptions(pipeline_args, .....)
    custom_param = pipeline_options.view_as(CustomParams)
    .....
    pipeline | "Read PubSub Message" >> beam.io.ReadFromPubSub(custom_param.input_topic)

After that, I try to making a templates to GCP. The script for upload look like

  python custom_pipeline.py \
    --runner DataflowRunner \
    --project YOUR_PROJECT_ID \
    --staging_location gs://YOUR_BUCKET_NAME/staging \
    --temp_location gs://YOUR_BUCKET_NAME/temp \
    --template_location gs://YOUR_BUCKET_NAME/templates/YOUR_TEMPLATE_NAME

But I got the error when I creating template for upload to GCS, like this

TypeError: expected string or bytes-like object

at the line beam.io.ReadFromPubSub()

It's look like the thing I got from add_value_provider_argument Is RuntimeValueProvider object. So I'm quite confuse what I have to do for fix this?

I try to fix this problem such as

Casting the data type

beam.io.ReadFromPubSub(str(custom_param.input_topic))

But got this error,

ValueError: PubSub topic must be in the form "projects/<project>/topics/<topic>" (got "RuntimeValueProvider(option: input_topic, type: str, default_value: '...')").

So Please anyone have troubleshooting for this? I have no idea how to go no it.


回答1:


As mentioned by @mk_sta

It seems that ReadFromPubSub module doesn't accept ValueProvider. Have you checked this Stack thread?

and explained in that thread, ReadFromPubSub does not currently accept ValueProvider arguments since it is implemented as a native transform in Dataflow.

You can check I/O methods that accept runtime parameters for the ValueProvider support in different SDKs.

So at this moment, if you switch from Python SDK to Java SDK, the Read of PubSubIO does support ValueProvider.



来源:https://stackoverflow.com/questions/62295344/the-additional-paramates-at-dataflow-into-beam-pipeline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!