Aggregration and state store retention in kafka streams

后端 未结 1 979
情书的邮戳
情书的邮戳 2021-02-09 05:16

I have a use case like the following. For each incoming event, I want to look at a certain field to see if it\'s status changed from A to B and if so, send that to an output t

1条回答
  •  梦谈多话
    2021-02-09 05:55

    1. By default, a persistent RocksDB store will be used. If you want to use an in-memory store, you would pass in Materialized.as(Stores.inMemoryKeyValueStore(...))

    2. If you have an infinite number of unique keys, you will eventually run out of main-memory or disk and your application will die. Depending on your semantics, you can get a "TTL" by using a session windowed aggregation with a large "gap" parameter instead to expire old keys.

    3. The state will always be restored before processing new data happens. If you use in-memory store, this will happen by consuming the underlying changelog topic. Depending on the size of your state, this can take a while. If you use persistent RocksDB store, the state will be loaded from disk and thus no restore will be required and processing should happen immediately. Only if you loose the state on local disk, a restore from the changelog topic will happen for this case.

    0 讨论(0)
提交回复
热议问题