问题
I am trying to understand Stateful
Stream processor
.
As I understand in this type of stream-processor, it maintains some sort of state using State Store
.
I came to know, one of the ways to implement State Store
is using RocksDB
. Assuming the following topology
(and only one processor being stateful
)
A->B->C ; processor B as stateful with local state store and changelog
enabled. I am using low level API.
Assuming the sp listens on a single kafka topic, say topic-1
with 10 partitions.
I observed, that when the application is started (2 instances in different physical machines and num.stream.threads
= 5), then for state store
it creates directory structure which
has something like below:
0_0 , 0_1, 0_2.... 0_9 (Each machines has five so total 10 partitions).
I was going through some online material where it said we should create a StoreBuilder
and attach it topology using addStateStore()
instead of creating a state store within a processor.
Like:
topology.addStateStore(storeBuilder,"processorName")
Ref also: org.apache.kafka.streams.state.Store
I didn't understand what is the difference in attaching a storeBuilder to topology vs actually creating a statestore within processor. What is the differences between them?
The second part: For statestore it creates directory like: 0_0, 0_1 etc. Who and how it gets created? Is there some sort of 1:1 mapping between the kafka topics (at which sp is listening) ande the number of directories that gets created for State Store
?
回答1:
I didn't understand what is the difference in attaching a storeBuilder to topology vs actually creating a statestore within processor. What is the differences between them?
In order to let Kafka Streams manage the store for you (fault-tolerance, migration), Kafka Streams needs to be aware of the store. Thus, you give Kafka Streams a StoreBuilder
and Kafka Streams creates and manages the store for you.
If you just create a store inside your processor, Kafka Streams is not aware of the store and the store won't be fault-tolerant.
For statestore it creates directory like: 0_0, 0_1 etc. Who and how it gets created? Is there some sort of 1:1 mapping between the kafka topics (at which sp is listening) ande the number of directories that gets created for State Store?
Yes, there is a mapping. The store is shared base in the number of input topic partitions. You also get a "task" per partition and the task directories are name y_z
with y
being the sub-topology number and z
being the partition number. For your simple topology you only have one sub-topology to all directories you see have the same 0_
prefix.
Hence, you logical store has 10 physical shards. This sharding allows Kafka Streams to mirgrate state when the corresponding input topic partition is assigned to a different instance. Overall, you can run up to 10 instanced and each would process one partition, and host one shard of your store.
来源:https://stackoverflow.com/questions/61622414/kafka-stateful-stream-processor-with-statestore-behind-the-scenes