How to balance kinesis shards across several record processor?

泄露秘密 提交于 2019-12-13 17:52:47

问题


I am currently writing the simple Kinesis Client Library (KCL) in Golang version. One of the features that I want it for my simple KCL is load balancing shards across multiple record processors and EC2 instances. For example, I have two record processors (which will run in the separate EC2 instance) and four Kinesis shards. The load balancing feature will allow each record processors to process two Kinesis shards.

I read that Java KCL implemented this but I can't find the implementation in the library. My question is how am I going to implement this feature in Golang? Thank you.


回答1:


KCL already does load balancing for you.

Here's a basic description of how it works today (keep in mind this is just the basics and is subject to change as Amazon improves the logic):

  • When a worker (which may process multiple shards) starts up, it checks a central DynamoDB database for which shards are owned by workers (creating that database if necessary). This is the "leasing" table.
    • A "lease" is a relationship between a worker and a shard
    • Workers will process records for shards it owns an unexpired lease for
    • Leases expire if the worker hasn't emitted a "heartbeat" for the lease before it expires (typically every few seconds) - this heartbeat essentially updates the DDB record
  • It checks Kinesis stream for which shards are available, and updates the table if needed
  • If any leases are expired, the worker will try to take ownership of the lease - at database level, use shardId as key and write it's workerId there.
  • If a worker starts and all shards are already taken, it checks to see what the "balance" is - if it detects an imbalance (ie: "I own 0 shards and some other worker owns 10 shards"), it initiates a "steal shard" protocol - the old worker stops processing for that shard, and the new worker starts

You're of course free to examine the source code for KCL on github: https://github.com/awslabs/amazon-kinesis-client - hopefully this explanation gives you more context for how to understand KCL and adapt it to your needs.




回答2:


Before you start writing your own client...it looks like there are some people who have already done this:

  • https://github.com/nieksand/gokinesis
  • https://github.com/runtakun/kinesis-connector-go

Another option you have is the KCL MultiLangDaemon. You can install a small runner program that does all the balancing for you, and then you just listen to the messages the daemon sends you and commit them back.

https://github.com/awslabs/amazon-kinesis-client#amazon-kcl-support-for-other-languages




回答3:


For people interested in the topic, VMWare wrote the Go implementation of the Java lib: https://github.com/vmware/vmware-go-kcl/

At the time of writing, though, it doesn't support lease stealing: https://github.com/vmware/vmware-go-kcl/issues/4



来源:https://stackoverflow.com/questions/46159268/how-to-balance-kinesis-shards-across-several-record-processor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!