Exception Handling in Apache Beam pipelines using Python

后端 未结 2 1422
半阙折子戏
半阙折子戏 2021-01-19 10:01

I\'m doing a simple pipeline using Apache Beam in python (on GCP Dataflow) to read from PubSub and write on Big Query but can\'t handle exceptions on pipeline to create alte

相关标签:
2条回答
  • 2021-01-19 10:47

    You can also use the generator flavor of FlatMap:

    This is similar to the other answer, in that you can use a DoFn in the place of something else, e.g. a CombineFn to produce no outputs when there is an exception or other kind of failed-preconditions.

    def sum_values(values: List[int]) -> Generator[int, None, None]:
        if not values or len(values) < 10:
            logging.error(f'received invalid inputs: {...}')
            return
        yield sum(values)
    
    
    # Now instead of use |CombinePerKey|
    (inputs
      | 'WithKey' >> beam.Map(lambda x: (x.key, x)) \
      | 'GroupByKey' >> beam.GroupByKey() \
      | 'Values' >> beam.Values() \
      | 'MaybeSum' >> beam.FlatMap(sum_values))
    
    0 讨论(0)
  • 2021-01-19 10:48

    I've been only able to catch exceptions at the DoFn level, so something like this:

    class MyPipelineStep(beam.DoFn):
    
        def process(self, element, *args, **kwargs):
            try:
                # do stuff...
                yield pvalue.TaggedOutput('main_output', output_element)
            except Exception as e:
                yield pvalue.TaggedOutput('exception', str(e))
    

    However WriteToBigQuery is PTransform that wraps the DoFn BigQueryWriteFn

    So you may need to do something like this

    class MyBigQueryWriteFn(BigQueryWriteFn):
    
        def process(self, *args, **kwargs):
            try:
                return super(BigQueryWriteFn, self).process(*args, **kwargs)
            except Exception as e:
                # Do something here
    
    class MyWriteToBigQuery(WriteToBigQuery):
        # Copy the source code of `WriteToBigQuery` here, 
        # but replace `BigQueryWriteFn` with `MyBigQueryWriteFn`
    

    https://beam.apache.org/releases/pydoc/2.9.0/_modules/apache_beam/io/gcp/bigquery.html#WriteToBigQuery

    0 讨论(0)
提交回复
热议问题