Biqquery: Some rows belong to different partitions rather than destination partition

牧云@^-^@ 提交于 2020-01-25 09:20:28

问题


I am running a Airflow DAG which moves data from GCS to BQ using operator GoogleCloudStorageToBigQueryOperator i am on Airflow version 1.10.2.

This task moves data from MySql to BQ(Table partitioned), all this time we were partitioned by Ingestion-time and the incremental load for past three days were working fine when the data was loaded using Airflow DAG.

Now we changed the partitioned type to be Date or timestamp on a DATE column from the table, after which we have started getting this error, since we are getting the incremental load to have data for last three days from MySql table, I was expecting the BQ job to Append the new records or recreate the partition with 'WRITE_TRUNCATE' which i have tested earlier and both of them fail with below error message.

Exception: BigQuery job failed. Final error was: {'reason': 'invalid', 'message': 'Some rows belong to different partitions rather than destination partition 20191202'}.

I wont be able to post the code since, all modules being called based on JSON parameter, but here is what I am passing to the operator for this table with other regular parameters

create_disposition='CREATE_IF_NEEDED',
time_partitioning = {'field': 'entry_time', 'type': 'DAY'}
write_disposition = 'WRITE_APPEND' #Tried with 'WRITE_TRUNCATE'
schema_update_options = ('ALLOW_FIELD_ADDITION',
                                 'ALLOW_FIELD_RELAXATION')

I believe these are the fields which might cause the issue, any help on this is appreciated.


回答1:


When using Bigquery partitioned tables by Date or timestamp, you should specify the partition to load the data. E.g

table_name$20160501

Also, your column value should match the partition, for example, if you create this table:

$ bq query --use_legacy_sql=false "CREATE TABLE tmp_elliottb.PartitionedTable (x INT64, y NUMERIC, date DATE) PARTITION BY date"

The column date is the column-based for the partition and if you try to load the next row

$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable\$20181105" ./row.csv

You will get this error due you are loading data from 2018-11-07 when you are using the partition 20181107

Some rows belong to different partitions rather than destination partition 20181105

I suggest to use the following destination_project_dataset_table value and verify if the data match to the partition date.

destination_project_dataset_table='dataset.table$YYYYMMDD',


来源:https://stackoverflow.com/questions/59182899/biqquery-some-rows-belong-to-different-partitions-rather-than-destination-parti

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!