External system queries during Kafka Stream processing

后端 未结 1 2051
一整个雨季
一整个雨季 2020-12-05 21:14

I\'m trying to design a streaming architecture for streaming analytics. Requirements:

  • RT and NRT streaming data input
  • Stream processors implementing s
相关标签:
1条回答
  • 2020-12-05 22:07

    About the querying pattern to external systems, there are multiple possibilities you have:

    1. Recommended: Use Kafka Connect to import your data from external systems into Kafka, and read those topics as KTables to do the KStream-KTable lookup join.
    2. You can implement your own custom lookup join within your UDF code. Depending on the details, you can use KStream methods #mapValues(), #map(), or lower level methods like #transform() or #process(). Thus, you manually open a connection to your external system and issue a lookup query for each record you process.
      • sync lookups: if you do sync calls to external systems there is nothing else you need to consider (you can use #mapValues() for example to implement this)
      • async lookpus: for async calls to external systems, it's more tricky to get right (and you should be quite careful -- it's not a recommended pattern, because there is no library support at the moment). First, you need to remember all async calls you issue in a reliable way (ie, you need to attach a state and write each request you want to issue into the state before you actually fire it up). Second, on each callback, you need to buffer the result somehow, and process it later when the same operator issuing the request is called again (it's not possible to produce a downstream result in an async callback handler, but only within UDF code). After downstream emit, you can remove the request from the state. Third, in recovery after a failure case, you need to check your state for unfinished requests and issue those request again. Also keep in mind, that this kind of async processing, breaks some internal Streams assumptions, like guaranteed processing order with regard to record topic offsets.

    Compare this question about failure handling in streams with regard to offset commits: How to handle error and don't commit when use Kafka Streams DSL

    0 讨论(0)
提交回复
热议问题