I\'m doing a simple pipeline using Apache Beam in python (on GCP Dataflow) to read from PubSub and write on Big Query but can\'t handle exceptions on pipeline to create alte
I've been only able to catch exceptions at the DoFn
level, so something like this:
class MyPipelineStep(beam.DoFn):
def process(self, element, *args, **kwargs):
try:
# do stuff...
yield pvalue.TaggedOutput('main_output', output_element)
except Exception as e:
yield pvalue.TaggedOutput('exception', str(e))
However WriteToBigQuery
is PTransform
that wraps the DoFn
BigQueryWriteFn
So you may need to do something like this
class MyBigQueryWriteFn(BigQueryWriteFn):
def process(self, *args, **kwargs):
try:
return super(BigQueryWriteFn, self).process(*args, **kwargs)
except Exception as e:
# Do something here
class MyWriteToBigQuery(WriteToBigQuery):
# Copy the source code of `WriteToBigQuery` here,
# but replace `BigQueryWriteFn` with `MyBigQueryWriteFn`
https://beam.apache.org/releases/pydoc/2.9.0/_modules/apache_beam/io/gcp/bigquery.html#WriteToBigQuery