End-of-window outer join with KafkaStreams

后端 未结 3 1878
轻奢々
轻奢々 2021-01-03 09:05

I have a Kafka topic where I expect messages with two different key types: old and new. i.e. \"1-new\", \"1-old\", \"2-new\", \"

相关标签:
3条回答
  • 2021-01-03 09:16

    If I understand your question correctly you only want to report id's as suspicious when there is an "old" without a corresponding "new" within the 2-minute window.

    If that's the case you'll want to use a left join :

    val leftJoined = oldStream.leftJoin(newStream,...).filter(condition where value expected from "new" stream is null);
    

    HTH

    0 讨论(0)
  • 2021-01-03 09:19

    Looks like what you were looking for. Kafka Streams left outer join on timeout

    Eliminates the lack of sql-like left join semantic in kafka streams framework. This implementation will generate left join event only if full join event didn't happen in join window duration interval.

    0 讨论(0)
  • 2021-01-03 09:38

    The DSL might not give you what you want. However, you can use Processor API. Having say this, the leftJoin can actually be used to do the "heavy lifting". Thus, after the leftJoin you can use .transform(...) with an attached state to "clean up" the data further.

    For each old&null record you receive, put it into the store. If you receive a later old&new you can remove it from the store. Furthermore, you register a punctuation and on each punctuation call, you scan the store for entries that are "old enough" so you are sure no later old&new join result will be produced. For those entries, you emit old&null and remove from them from the store.

    As an alternative, you can also omit the join, and do everything in a single transform() with state. For this, you would need to KStream#merge() old and new stream and call transform() on the merged stream.

    Note: instead of registering a punctuation, you can also put the "scan logic" into the transform and execute it each time you process a record.

    0 讨论(0)
提交回复
热议问题