Samza: Delay processing of messages until timestamp

后端未结

关注

 2  754

I\'m processing messages from a Kafka topic with Samza. Some of the messages come with a timestamp in the future and I\'d like to postpone the processing until after that timest

相关标签:

2条回答

小鲜肉

2021-01-24 00:14

I think you could use key-value store of Samza to keep state of your task instance instead of in-memory Set. It should look something like:

public class MyTask implements StreamTask, WindowableTask, InitableTask {

  private KeyValueStore<String, MyMessage> waitingMessages;


  @SuppressWarnings("unchecked")
  @Override
  public void init(Config config, TaskContext context) throws Exception {
    this.waitingMessages = (KeyValueStore<String, MyMessage>) context.getStore("messages-store");
  }

  @Override
  public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector,
      TaskCoordinator taskCoordinator) {
    byte[] message = (byte[]) incomingMessageEnvelope.getMessage();
    MyMessage parsedMessage = MyMessage.parseFrom(message);

    if (parsedMessage.getValidFromDateTime().isBefore(LocalDate.now())) {
      // Do the processing
    } else {
      waitingMessages.put(parsedMessage.getId(), parsedMessage);
    }

  }

  @Override
  public void window(MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
    KeyValueIterator<String, MyMessage> all = waitingMessages.all();
    while(all.hasNext()) {
      MyMessage message = all.next().getValue();
      // Do the processing and remove the message from the set
    }
  }

}

If you redeploy you task Samza should recreate state of key-value store (Samza keeps values in special kafka topic related to key-value store). You need of course provide some extra configuration of your store (in above example for messages-store).

You could read about key-value store here (for the latest Samza version): https://samza.apache.org/learn/documentation/0.14/container/state-management.html

0 讨论(0)

Happy的楠姐

2021-01-24 00:21

It's important to keep in mind, when dealing with message queues, is that they perform a very specific function in a system: they hold messages while the processor(s) are busy processing preceding messages. It is expected that a properly-functioning message queue will deliver messages on demand. What this implies is that as soon as a message reaches the head of the queue, the next pull on the queue will yield the message.

Notice that delay is not a configurable part of the equation. Instead, delay is an output variable of a system with a queue. In fact, Little's Law offers some interesting insights into this.

So, in a system where a delay is necessary (for example, to join/wait for a parallel operation to complete), you should be looking at other methods. Typically a queryable database would make sense in this particular instance. If you find yourself keeping messages in a queue for a pre-set period of time, you're actually using the message queue as a database - a function it was not designed to provide. Not only is this risky, but it also has a high likelihood of hurting the performance of your message broker.

0 讨论(0)
发布评论:

提交评论
- 加载中...