How to query for data in streaming buffer ONLY in BigQuery?

后端 未结 2 1893
灰色年华
灰色年华 2021-01-05 22:11

We have a table partitioned by day in BigQuery, which is updated by streaming inserts.

The doc says that: \"when streaming to a partitioned table, data in the strea

相关标签:
2条回答
  • 2021-01-05 23:01

    When you stream data to BQ you usually have the "warming-up" period and that's the time it takes for the streamed data to become available for operations such as querying, copying and exporting.

    The doc states in the end that after a period of up to 90 mins the pseudo-column _PARTITIONTIME receives a non-null value, which means your streamed data is fully ready for any type of operation you want to run on the data (being able to run queries usually takes a few seconds).

    That means that you don't query partitioned tables looking for when this field is null but instead, you do like so:

    SELECT
      fields
    FROM
      `dataset.partitioned_table_name`
    WHERE
      _PARTITIONTIME = TIMESTAMP('2017-01-20') 
    

    In this example, you would be querying only data streamed in the dates partition Jan/20 (which avoids a full table scan).

    You can also select for a range of dates, you would just have to change the WHERE clause to:

    WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-01-20') AND TIMESTAMP('2017-01-22') 
    

    Which would query for 2 days in your table.

    0 讨论(0)
  • 2021-01-05 23:11

    Data in the streaming buffer has a NULL value for the _PARTITIONTIME column.

    SELECT
      fields
    FROM
      `dataset.partitioned_table_name`
    WHERE
      _PARTITIONTIME IS NULL
    

    https://cloud.google.com/bigquery/docs/partitioned-tables#copying_to_partitioned_tables

    0 讨论(0)
提交回复
热议问题