Dynamic table name when writing to BQ from dataflow pipelines

后端 未结 1 626
北海茫月
北海茫月 2020-12-10 23:28

As a followup question to the following question and answer:

https://stackoverflow.com/questions/31156774/about-key-grouping-with-groupbykey

I\'d like to con

相关标签:
1条回答
  • 2020-12-10 23:41

    The BigQueryIO.Write transform does not support this. The closest thing you can do is to use per-window tables, and encode whatever information you need to select the table in the window objects by using a custom WindowFn.

    If you don't want to do that, you can make BigQuery API calls directly from your DoFn. With this, you can set the table name to anything you want, as computed by your code. This could be looked up from a side input, or computed directly from the element the DoFn is currently processing. To avoid making too many small calls to BigQuery, you can batch up the requests using finishBundle();

    You can see how the Dataflow runner does the streaming import here: https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/util/BigQueryTableInserter.java

    0 讨论(0)
提交回复
热议问题