We are attempting to use fixed windows on an Apache Beam pipeline (using DirectRunner
). Our flow is as follows:
Looks like the main issue was indeed a missing trigger - the window was opening and there was nothing telling it when to emit results. We wanted to simply window based on processing time (not event time) and so did the following:
.apply("Window", Window
.into(new GlobalWindows())
.triggering(Repeatedly
.forever(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(5))
)
)
.withAllowedLateness(Duration.ZERO).discardingFiredPanes()
)
Essentially this creates a global window, which is triggered to emit events 5 seconds after the first element is processed. Every time the window is closed, another is opened once it receives an element. Beam complained when we didn't have the withAllowedLateness
piece - as far as I know this just tells it to ignore any late data.
My understanding may be a bit off the mark here, but the above snippet has solved our problem!