问题
I have a pipe that uploads avro files to bigquery, the configured schema seems to be ok, but BigQuery understands as an integer value and not a date field. What can I do in this case?
Schema´s avro - Date field:
{
"name": "date",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
},
"doc": "the date where the transaction happend"
}
Big Query table:
I tried using the code below but it simply ignores it. You know the reason?
import gcloud
from gcloud import storage
from google.cloud import bigquery
def insert_bigquery_avro(target_uri, dataset_id, table_id):
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.autodetect = True
job_config.source_format = bigquery.SourceFormat.AVRO
job_config.use_avro_logical_types = True
time_partitioning = bigquery.table.TimePartitioning()
# time_partitioning = bigquery.table.TimePartitioning(type_=bigquery.TimePartitioningType.DAY, field="date")
job_config.time_partitioning = time_partitioning
uri = target_uri
load_job = bigquery_client.load_table_from_uri(
uri,
dataset_ref.table(table_id),
job_config=job_config
)
print('Starting job {}'.format(load_job.job_id))
load_job.result()
print('Job finished.')
回答1:
This is intended since BigQuery by default ignores the logicalType attributes and uses the underlying Avro type instead. The Avro timestamp-millis logical type, for instance, is set to Integer in BigQuery.
To enable the conversion, set the --use_avro_logical_types
to True
using the command-line tool, or set the useAvroLogicalTypes
property in the job resource when you call the jobs.insert method to create a load job. After this, your field date
will be set as Timestamp
type in BigQuery.
Take a look at the Avro logical types and BigQuery doc to see all the ignored Avro logical types and how they'd be converted after setting that flag. This will also help you to decide the best Avro logical type for your fields.
Hope this is helpful.
来源:https://stackoverflow.com/questions/59090735/schema-avro-is-in-timestamp-but-in-bigquery-comes-as-integer