Scheduled queries and clustering

喜欢而已 提交于 2020-05-28 06:19:05

问题


It does not seem to be possible to schedule Queries in BigQuery that write to time partitioned and clustered target tables (using WRITE_TRUNCATE and a partition decorator): we are getting the error message:

Invalid value: Incompatible table partitioning specification. Expects partitioning specification interval(type:day) clustering(siteId,channelId), but input partitioning specification is interval(type:day)

I don't understand why this is happening, isn't the clustering specification just a part of the table definition? We also don't need to specify anything extra when performing dml inserts data in an already clustered table. Or is this related to us not using DML in the scheduled query?

EDIT: The scheduled query structure looks like this:

SELECT
  reportDate,
  channelId,
  siteId,
  pageType,
  [MORE_COLUMNS_SELECT),
  SUM(timeOnPage) AS timeOnPage_agg,
  ARRAY_AGG(STRUCT( sessionId,
      [MORE_COLUMNS_NESTED)
 ) AS Details
  ----
FROM `project.dataset.viewname` 
WHERE    reportDate >= TIMESTAMP_TRUNC(TIMESTAMP_ADD(@run_time, INTERVAL -1 DAY), DAY)
     AND reportDate < TIMESTAMP_TRUNC(@run_time, DAY)
GROUP BY
    reportDate,
  channelId,
  siteId,
  pageType,
  [MORE_COLUMNS_SELECT)

I am writing the results of this query to the target table like this: TARGET_TABLE_NAME${run_time-24h|"%Y%m%d"}

That table is time partitioned on _PARTITIONTIME (= Reportdate) and clustered on siteId, channelId


回答1:


As of 23-10-2018, it seems that the BigQuery scheduled query functionality does NOT support the WRITE_TRUNCATE loading pattern in combination with clustering.

What DOES work however, is writing to a clustered target table using a DML statement.



来源:https://stackoverflow.com/questions/52890124/scheduled-queries-and-clustering

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!