How to get a list of elements out of a PCollection in Google Dataflow and use it in the pipeline to loop Write Transforms?

后端 未结 1 717
悲&欢浪女
悲&欢浪女 2020-12-21 16:01

I am using Google Cloud Dataflow with the Python SDK.

I would like to :

  • Get a list of unique dates out of a master PCollection
  • Loop through th
相关标签:
1条回答
  • 2020-12-21 16:18

    It is not possible to get the contents of a PCollection directly - an Apache Beam or Dataflow pipeline is more like a query plan of what processing should be done, with PCollection being a logical intermediate node in the plan, rather than containing the data. The main program assembles the plan (pipeline) and kicks it off.

    However, ultimately you're trying to write data to BigQuery tables sharded by date. This use case is currently supported only in the Java SDK and only for streaming pipelines.

    For a more general treatment of writing data to multiple destinations depending on the data, follow BEAM-92.

    See also Creating/Writing to Parititoned BigQuery table via Google Cloud Dataflow

    0 讨论(0)
提交回复
热议问题