Google Cloud DataFlow Randomize WritetoBigQuery

前端未结

关注

 1  911

I have succesfully implemented a dataflow pipeline that writes to BigQuery. This pipeline is transforming data for a Cloud ML Engine job. However, I noticed that the rows th

相关标签:

1条回答

忘掉有多难

2021-01-15 23:39

BigQuery tables don't have the concept of order or grouping, they are just a bag of rows; if one needs ordering or grouping, one writes a query with an ORDER BY or GROUP BY clause. If you have code that reads rows from BigQuery and requires these rows to be read in random order, you can do something like https://www.oreilly.com/learning/repeatable-sampling-of-data-sets-in-bigquery-for-machine-learning

0 讨论(0)
发布评论:

提交评论
- 加载中...