问题
I have two streams of events
- L = (l1, l3, l8, ...) - is sparser and represents user logins to a IP
- E = (e2, e4, e5, e9, ...) - is a stream of logs the particular IP
the lower index represents a timestamp... If we joined the two streams together and sorted them by time we would get:
- l1, e2, l3, e4, e5, l8, e9, ...
Would it be possible to implement custom Window
/ Trigger
functions to group the event to sessions (time between logins of different users):
- l1 - l3 : e2
- l3 - l8 : e4, e5
- l8 - l14 : e9, e10, e11, e12, e13
- ...
The problem which I see is that the two streams are not necessarily sorted. I thought about sorting the input stream by time-stamps. Then it would be easy to implement the windowing using GlobalWindow
and custom Trigger
- yet it seems that it is not possible.
Am I missing something or is it definitely not possible to do so in current Flink (v1.3.2)?
Thanks
回答1:
Question: shouldn't E3 come before L4?
Sorting is pretty straightforward using a ProcessFunction
. Something like this:
public static class SortFunction extends ProcessFunction<Event, Event> {
private ValueState<PriorityQueue<Event>> queueState = null;
@Override
public void open(Configuration config) {
ValueStateDescriptor<PriorityQueue<Event>> descriptor = new ValueStateDescriptor<>(
// state name
"sorted-events",
// type information of state
TypeInformation.of(new TypeHint<PriorityQueue<Event>>() {
}));
queueState = getRuntimeContext().getState(descriptor);
}
@Override
public void processElement(Event event, Context context, Collector<Event> out) throws Exception {
TimerService timerService = context.timerService();
if (context.timestamp() > timerService.currentWatermark()) {
PriorityQueue<Event> queue = queueState.value();
if (queue == null) {
queue = new PriorityQueue<>(10);
}
queue.add(event);
queueState.update(queue);
timerService.registerEventTimeTimer(event.timestamp);
}
}
@Override
public void onTimer(long timestamp, OnTimerContext context, Collector<Event> out) throws Exception {
PriorityQueue<Event> queue = queueState.value();
Long watermark = context.timerService().currentWatermark();
Event head = queue.peek();
while (head != null && head.timestamp <= watermark) {
out.collect(head);
queue.remove(head);
head = queue.peek();
}
}
}
来源:https://stackoverflow.com/questions/47576408/sorting-union-of-streams-to-identify-user-sessions-in-apache-flink