Google Cloud DataFlow Randomize WritetoBigQuery

前端 未结 1 909
独厮守ぢ
独厮守ぢ 2021-01-15 23:31

I have succesfully implemented a dataflow pipeline that writes to BigQuery. This pipeline is transforming data for a Cloud ML Engine job. However, I noticed that the rows th

相关标签:
1条回答
  • 2021-01-15 23:39

    BigQuery tables don't have the concept of order or grouping, they are just a bag of rows; if one needs ordering or grouping, one writes a query with an ORDER BY or GROUP BY clause. If you have code that reads rows from BigQuery and requires these rows to be read in random order, you can do something like https://www.oreilly.com/learning/repeatable-sampling-of-data-sets-in-bigquery-for-machine-learning

    0 讨论(0)
提交回复
热议问题