问题
I'd like to create a different WindowFn
in a such way to assign Windows to any of my input elements based on another field instead of based on my input entry's timestamp. I know the pre-defined WindowFn
's from Google DataFlow SDK use the timestamp as a criteria to assign window.
More specifically I'd like to create a kind of SlidingWindows
but instead of considering timestamp as the Window assignment criteria I'd like to consider another field as that criteria.
How could I create my customised WindowFn
? What are the points that I should consider when creating my own WindowFn
?
Thanks.
回答1:
To create a new WindowFn, you just need to inherit from WindowFn or a subclass and override the various abstract methods.
In your case, you don't need window merging, so you can inherit from NonMergingWindowFn, and your code could look something like
public class MyWindowFn extends NonMergingWindowFn<ElementT, IntervalWindow> {
public Collection<W> assignWindows(AssignContext c) {
return setOfWindowsElementShouldBeIn(c.element());
}
public boolean isCompatible(WindowFn other) {
return other instanceof MyWindowFn;
}
public Coder<IntervalWindow> windowCoder() {
return IntervalWindow.getCoder();
}
public W getSideInputWindow(final BoundedWindow window) {
// You may not need this if you won't ever be using PCollections windowed
// with this as side inputs. If that's the case, just throw.
// Otherwise you'll need to figure out how to map the main input windows
// into the windows generated by this WindowFn.
}
}
来源:https://stackoverflow.com/questions/37897452/how-to-create-a-personalised-windowfn-in-google-dataflow