We want to build a Cloud Dataflow Streaming pipeline which ingests events from Pubsub and performs multiple ETL-like operations on each individual event. One of these operat
Here's a few things you can do:
DoFn
@Setup
and @Teardown
methods useful).GroupByKey
by the device id; then, most of the time, at least with the Cloud Dataflow runner, the same key will be processed by the same worker (though key assignments can change while the pipeline runs, but not too frequently usually). You'll probably want to set a windowing/triggering strategy with immediate triggering though.