Kafka + AWS lambda

后端 未结 6 1222
夕颜
夕颜 2021-01-31 18:36

Is it possible to integrate AWS Lambda with Apache Kafka ? I want to put a consumer in a lambda function. When a consumer receive a message the lambda function execute.

相关标签:
6条回答
  • 2021-01-31 19:19

    Yes it is very much possible to have a Kafka consumer in AWS Lambda function.

    However note that you would not be able to invoke the lambda using some sort of notification. You will rather have to poll the Kafka topic. And the easiest way can be to use a Scheduled Lambda

    0 讨论(0)
  • 2021-01-31 19:33

    If you are using managed apache kafka in AWS (MSK):

    Since august 2020 you can connect AWS Managed Streaming for Kafka (MSK) as event source. Not your own installed kafka cluster but if you already uses AWS managed kafka this could be useful.

    More in the announcement https://aws.amazon.com/about-aws/whats-new/2020/08/aws-lambda-now-supports-amazon-managed-streaming-for-apache-kafka-as-an-event-source/

    Screenshot from AWS Console:

    0 讨论(0)
  • 2021-01-31 19:36

    AWS now supports "self-hosted Apache Kafka as an event source for AWS Lambda"

    When you create a new Lambda, in the "Configuration" tab, click "Add trigger", you can now select and configure your self-hosted Apache Kafka.

    Feel free to read more here:

    https://aws.amazon.com/blogs/compute/using-self-hosted-apache-kafka-as-an-event-source-for-aws-lambda/

    https://docs.aws.amazon.com/lambda/latest/dg/kafka-smaa.html

    0 讨论(0)
  • 2021-01-31 19:39

    Continuing the point by Arafat. We have successfully built an infrastructure to consume from Kafka using AWS Lambdas. Here are some gotcha's:

    • Make sure to consistently batch and commit while reading when consuming.
    • If you are storing the batches to s3, make sure to clean your file descriptors.
    • If you are forwarding the batches to another service make sure to clean the variables. Variable caching in AWS Lambda might result in memory overflows.
    • A good idea is to check how much time you have left while from the context object in the Lambda and give yourself some wiggle room to do something with the buffer you populated in your consumer which might not be read to a file unless you call close().

    We are using Apache Airflow for scheduling. I hear cloudwatch can do that too.

    0 讨论(0)
  • 2021-01-31 19:39

    Here is AWS article on scheduled lambdas.

    Given your Kafka installation will be running in a VPC, best practise is to configure your Lambda to run within the VPC as well - this will simplify the security group configuration for the EC2 instances running Kafka.

    Here is the AWS blog article on configuring Lambdas to run in a VPC.

    0 讨论(0)
  • 2021-01-31 19:42

    There is a community-provided Kafka Connector for AWS Lambda. This solution would require you to run the connector somewhere such as EC2 or ECS.

    0 讨论(0)
提交回复
热议问题