问题
Is there any way to extract first n elements in a beam pcollection? The documentation doesn't seem to indicate any such function. I think such an operation would require first a global element number assignment and then a filter - would be nice to have this functionality.
I use Google DataFlow Java SDK 2.2.0
.
回答1:
PCollection's are unordered per se, so the notion of "first N elements" does not exist - however:
In case you need the top N elements by some criterion, you can use the Top transform.
In case you need any N elements, you can use Sample.
来源:https://stackoverflow.com/questions/48267159/beam-dataflow-2-2-0-extract-first-n-elements-from-pcollection