问题
While running a streaming job its always good to have logs of rows which were not processed while inserting into big query. Catching and write those into another big query table will give an idea for what went wrong.
Below are the steps that you can try to achieve the same.
回答1:
Pre-requisites:
- apache-beam >= 2.10.0 or latest
Using the getFailedInsertsWithErr() function available in the sdk you can easily catch the failed inserts and push to another table for performing RCA. This becomes an important feature for debugging streaming pipelines which are running infinitely.
BigQueryInsertError is an error function that is thrown back by big query for a failed TableRow. This will contain the following parameters
- Row.
- Error stacktrace and error message payload.
- Table reference object.
The above parameters can be captured and pushed into another bq table. Example schema for error records.
"fields": [{
"name": "timestamp",
"type": "TIMESTAMP",
"mode": "REQUIRED"
},
{
"name": "payloadString",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "errorMessage",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "stacktrace",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
来源:https://stackoverflow.com/questions/57247531/writing-failed-row-inserts-in-a-streaming-job-to-bigquery-using-apache-beam-java