Exception Handling in Apache Beam pipelines using Python

后端 未结 2 1421
半阙折子戏
半阙折子戏 2021-01-19 10:01

I\'m doing a simple pipeline using Apache Beam in python (on GCP Dataflow) to read from PubSub and write on Big Query but can\'t handle exceptions on pipeline to create alte

2条回答
  •  清酒与你
    2021-01-19 10:48

    I've been only able to catch exceptions at the DoFn level, so something like this:

    class MyPipelineStep(beam.DoFn):
    
        def process(self, element, *args, **kwargs):
            try:
                # do stuff...
                yield pvalue.TaggedOutput('main_output', output_element)
            except Exception as e:
                yield pvalue.TaggedOutput('exception', str(e))
    

    However WriteToBigQuery is PTransform that wraps the DoFn BigQueryWriteFn

    So you may need to do something like this

    class MyBigQueryWriteFn(BigQueryWriteFn):
    
        def process(self, *args, **kwargs):
            try:
                return super(BigQueryWriteFn, self).process(*args, **kwargs)
            except Exception as e:
                # Do something here
    
    class MyWriteToBigQuery(WriteToBigQuery):
        # Copy the source code of `WriteToBigQuery` here, 
        # but replace `BigQueryWriteFn` with `MyBigQueryWriteFn`
    

    https://beam.apache.org/releases/pydoc/2.9.0/_modules/apache_beam/io/gcp/bigquery.html#WriteToBigQuery

提交回复
热议问题