Kafka like offset on Kinesis Stream?

后端未结

关注

 1  1400

时光取名叫无心 2021-01-06 12:01

I have worked a bit with Kafka in the past and lately there is a requirement to port part of the data pipeline on AWS Kinesis Stream. Now I have read that Kinesis is effecti

1条回答

执笔经年 (楼主)

2021-01-06 12:43
Yes.

You can have multiple Kinesis Consumer Applications. Let's say you have 2.
1. First consumer application (I think it is "consumer group" in Kafka?) can be "first-app" and store it's positions in the DynamoDB "first-app-table". It can have as many nodes (ec2 instances) as you want.
2. Second consumer application can also work on the same stream, and store it's positions on another DynamoDB table let's say "second-app-table".
Each table will contain "what is the last processed position on shard X for app Y" information. So the 2 applications store checkpoints for the same shards in a different place, which makes them independent.

About the ingestion rate, there is a "idleTimeBetweenReadsInMillis" value in consumer applications using KCL, that is the polling interval for Amazon Kinesis API for Get operations. For example first application can have "2000" poll interval, so it will poll stream's shards every 2 seconds to see if any new record came.

I don't know Kafka well but as far as I remember; Kafka "partition" is "shard" in Kinesis, likewise Kafka "offset" is "sequence number" in Kinesis. Kinesis Consumer Library uses the term "checkpoint" for the stored sequences. Like you said, the concepts are similar.
0 讨论(0)
发布评论:

提交评论
- 加载中...