发表新帖

发表新帖

How to get a list of elements out of a PCollection in Google Dataflow and use it in the pipeline to loop Write Transforms?

后端未结

关注

 1  719

I am using Google Cloud Dataflow with the Python SDK.

I would like to :

Get a list of unique dates out of a master PCollection
Loop through th

相关标签:

1条回答

执念已碎

2020-12-21 16:18

It is not possible to get the contents of a PCollection directly - an Apache Beam or Dataflow pipeline is more like a query plan of what processing should be done, with PCollection being a logical intermediate node in the plan, rather than containing the data. The main program assembles the plan (pipeline) and kicks it off.

However, ultimately you're trying to write data to BigQuery tables sharded by date. This use case is currently supported only in the Java SDK and only for streaming pipelines.

For a more general treatment of writing data to multiple destinations depending on the data, follow BEAM-92.

See also Creating/Writing to Parititoned BigQuery table via Google Cloud Dataflow

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题