Sorting union of streams to identify user sessions in Apache Flink

前端 未结 1 1730
被撕碎了的回忆
被撕碎了的回忆 2021-01-24 08:57

I have two streams of events

  • L = (l1, l3, l8, ...) - is sparser and represents user logins to a IP
  • E = (e2,
1条回答
  •  不思量自难忘°
    2021-01-24 09:45

    Question: shouldn't E3 come before L4?

    Sorting is pretty straightforward using a ProcessFunction. Something like this:

    public static class SortFunction extends ProcessFunction {
      private ValueState> queueState = null;
    
      @Override
      public void open(Configuration config) {
        ValueStateDescriptor> descriptor = new ValueStateDescriptor<>(
            // state name
            "sorted-events",
            // type information of state
            TypeInformation.of(new TypeHint>() {
            }));
        queueState = getRuntimeContext().getState(descriptor);
      }
    
      @Override
      public void processElement(Event event, Context context, Collector out) throws Exception {
        TimerService timerService = context.timerService();
    
        if (context.timestamp() > timerService.currentWatermark()) {
          PriorityQueue queue = queueState.value();
          if (queue == null) {
            queue = new PriorityQueue<>(10);
          }
          queue.add(event);
          queueState.update(queue);
          timerService.registerEventTimeTimer(event.timestamp);
        }
      }
    
      @Override
      public void onTimer(long timestamp, OnTimerContext context, Collector out) throws Exception {
        PriorityQueue queue = queueState.value();
        Long watermark = context.timerService().currentWatermark();
        Event head = queue.peek();
        while (head != null && head.timestamp <= watermark) {
          out.collect(head);
          queue.remove(head);
          head = queue.peek();
        }
      }
    }
    

    Update: see How to sort an out-of-order event time stream using Flink for a description of a generally better approach.

    0 讨论(0)
提交回复
热议问题