Samza: Delay processing of messages until timestamp

后端 未结 2 750
谎友^
谎友^ 2021-01-23 23:40

I\'m processing messages from a Kafka topic with Samza. Some of the messages come with a timestamp in the future and I\'d like to postpone the processing until after that timest

2条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-24 00:14

    I think you could use key-value store of Samza to keep state of your task instance instead of in-memory Set. It should look something like:

    public class MyTask implements StreamTask, WindowableTask, InitableTask {
    
      private KeyValueStore waitingMessages;
    
    
      @SuppressWarnings("unchecked")
      @Override
      public void init(Config config, TaskContext context) throws Exception {
        this.waitingMessages = (KeyValueStore) context.getStore("messages-store");
      }
    
      @Override
      public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector,
          TaskCoordinator taskCoordinator) {
        byte[] message = (byte[]) incomingMessageEnvelope.getMessage();
        MyMessage parsedMessage = MyMessage.parseFrom(message);
    
        if (parsedMessage.getValidFromDateTime().isBefore(LocalDate.now())) {
          // Do the processing
        } else {
          waitingMessages.put(parsedMessage.getId(), parsedMessage);
        }
    
      }
    
      @Override
      public void window(MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
        KeyValueIterator all = waitingMessages.all();
        while(all.hasNext()) {
          MyMessage message = all.next().getValue();
          // Do the processing and remove the message from the set
        }
      }
    
    }
    

    If you redeploy you task Samza should recreate state of key-value store (Samza keeps values in special kafka topic related to key-value store). You need of course provide some extra configuration of your store (in above example for messages-store).

    You could read about key-value store here (for the latest Samza version): https://samza.apache.org/learn/documentation/0.14/container/state-management.html

提交回复
热议问题