问题
I am running a Airflow DAG which moves data from GCS to BQ using operator GoogleCloudStorageToBigQueryOperator i am on Airflow version 1.10.2.
This task moves data from MySql to BQ(Table partitioned), all this time we were partitioned by Ingestion-time
and the incremental load for past three days were working fine when the data was loaded using Airflow DAG.
Now we changed the partitioned type to be Date or timestamp
on a DATE column from the table, after which we have started getting this error, since we are getting the incremental load to have data for last three days from MySql table, I was expecting the BQ job to Append the new records or recreate the partition with 'WRITE_TRUNCATE' which i have tested earlier and both of them fail with below error message.
Exception: BigQuery job failed. Final error was: {'reason': 'invalid', 'message': 'Some rows belong to different partitions rather than destination partition 20191202'}.
I wont be able to post the code since, all modules being called based on JSON parameter, but here is what I am passing to the operator for this table with other regular parameters
create_disposition='CREATE_IF_NEEDED',
time_partitioning = {'field': 'entry_time', 'type': 'DAY'}
write_disposition = 'WRITE_APPEND' #Tried with 'WRITE_TRUNCATE'
schema_update_options = ('ALLOW_FIELD_ADDITION',
'ALLOW_FIELD_RELAXATION')
I believe these are the fields which might cause the issue, any help on this is appreciated.
回答1:
When using Bigquery partitioned tables by Date or timestamp, you should specify the partition to load the data. E.g
table_name$20160501
Also, your column value should match the partition, for example, if you create this table:
$ bq query --use_legacy_sql=false "CREATE TABLE tmp_elliottb.PartitionedTable (x INT64, y NUMERIC, date DATE) PARTITION BY date"
The column date is the column-based for the partition and if you try to load the next row
$ echo "1,3.14,2018-11-07" > row.csv
$ bq "tmp_elliottb.PartitionedTable\$20181105" ./row.csv
You will get this error due you are loading data from 2018-11-07 when you are using the partition 20181107
Some rows belong to different partitions rather than destination partition 20181105
I suggest to use the following destination_project_dataset_table value and verify if the data match to the partition date.
destination_project_dataset_table='dataset.table$YYYYMMDD',
来源:https://stackoverflow.com/questions/59182899/biqquery-some-rows-belong-to-different-partitions-rather-than-destination-parti