Schema avro is in timestamp but in bigquery comes as integer

匆匆过客 提交于 2020-01-25 03:10:08

问题


I have a pipe that uploads avro files to bigquery, the configured schema seems to be ok, but BigQuery understands as an integer value and not a date field. What can I do in this case?

Schema´s avro - Date field:

{
  "name": "date",
  "type": {
    "type": "long",
    "logicalType": "timestamp-millis"
  },
  "doc": "the date where the transaction happend"
}

Big Query table:

I tried using the code below but it simply ignores it. You know the reason?

import gcloud
from gcloud import storage
from google.cloud import bigquery

def insert_bigquery_avro(target_uri, dataset_id, table_id):
    bigquery_client = bigquery.Client()
    dataset_ref = bigquery_client.dataset(dataset_id)
    job_config = bigquery.LoadJobConfig()
    job_config.autodetect = True
    job_config.source_format = bigquery.SourceFormat.AVRO
    job_config.use_avro_logical_types = True
    time_partitioning = bigquery.table.TimePartitioning()
#    time_partitioning = bigquery.table.TimePartitioning(type_=bigquery.TimePartitioningType.DAY, field="date")
    job_config.time_partitioning = time_partitioning
    uri = target_uri
    load_job = bigquery_client.load_table_from_uri(
        uri,
        dataset_ref.table(table_id),
        job_config=job_config
        )
    print('Starting job {}'.format(load_job.job_id))
    load_job.result()
    print('Job finished.')

回答1:


This is intended since BigQuery by default ignores the logicalType attributes and uses the underlying Avro type instead. The Avro timestamp-millis logical type, for instance, is set to Integer in BigQuery.

To enable the conversion, set the --use_avro_logical_types to True using the command-line tool, or set the useAvroLogicalTypes property in the job resource when you call the jobs.insert method to create a load job. After this, your field date will be set as Timestamp type in BigQuery.

Take a look at the Avro logical types and BigQuery doc to see all the ignored Avro logical types and how they'd be converted after setting that flag. This will also help you to decide the best Avro logical type for your fields.

Hope this is helpful.



来源:https://stackoverflow.com/questions/59090735/schema-avro-is-in-timestamp-but-in-bigquery-comes-as-integer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!