Writing failed row inserts in a streaming job to bigquery using apache beam JAVA SDK?

白昼怎懂夜的黑 提交于 2021-01-29 02:40:03

问题


While running a streaming job its always good to have logs of rows which were not processed while inserting into big query. Catching and write those into another big query table will give an idea for what went wrong.

Below are the steps that you can try to achieve the same.


回答1:


Pre-requisites:

  • apache-beam >= 2.10.0 or latest

Using the getFailedInsertsWithErr() function available in the sdk you can easily catch the failed inserts and push to another table for performing RCA. This becomes an important feature for debugging streaming pipelines which are running infinitely.

BigQueryInsertError is an error function that is thrown back by big query for a failed TableRow. This will contain the following parameters

  • Row.
  • Error stacktrace and error message payload.
  • Table reference object.

The above parameters can be captured and pushed into another bq table. Example schema for error records.

    "fields": [{
            "name": "timestamp",
            "type": "TIMESTAMP",
            "mode": "REQUIRED"
        },
        {
            "name": "payloadString",
            "type": "STRING",
            "mode": "REQUIRED"
        },
        {
            "name": "errorMessage",
            "type": "STRING",
            "mode": "NULLABLE"
        },
        {
            "name": "stacktrace",
            "type": "STRING",
            "mode": "NULLABLE"
        }
    ]
}




来源:https://stackoverflow.com/questions/57247531/writing-failed-row-inserts-in-a-streaming-job-to-bigquery-using-apache-beam-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!