How to set group.id for consumer group in kafka data source in Structured Streaming?

后端未结

关注

 4  733

I want to use Spark Structured Streaming to read from a secure kafka. This means that I will need to force a specific group.id. However, as is stated in the documentation th

相关标签:

4条回答

旧巷少年郎

2020-12-06 03:29

Structured Streaming guide seems to be quite explicit about it:

Note that the following Kafka params cannot be set and the Kafka source or sink will throw an exception:

group.id: Kafka source will create a unique group id for each query automatically.

auto.offset.reset: Set the source option startingOffsets to specify where to start instead.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2020-12-06 03:37

Now with spark3.0, you can specify group.id for kafka https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#kafka-specific-configurations

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-12-06 03:38
Since Spark 3.0.0

According to the Structured Kafka Integration Guide you can provide the ConsumerGroup as an option kafka.group.id:
```
val df = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("subscribe", "topic1")
  .option("kafka.group.id", "myConsumerGroup")
  .load()
```
However, Spark will not commit any offsets back so the offsets of your ConsumerGroups will not be stored in Kafka's internal topic __consumer_offsets but rather in Spark's checkpoint files.

Being able to set the group.id is meant to deal with Kafka's latest feature Authorization using Role-Based Access Control for which your ConsumerGroup usually needs to follow naming conventions.

A full example of a Spark 3.x application setting kafka.group.id is discussed and solved here.
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-06 03:42

Currently (v2.4.0) it is not possible.

You can check following lines in Apache Spark project:

https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L81 - generate group.id

https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L534 - set it in properties, that are used to create KafkaConsumer

In master branch you can find modification, that enable to setting prefix or particular group.id

https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L83 - generate group.id based on group prefix (groupidprefix)

https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L543 - set previously generated groupId, if kafka.group.id wasn't passed in properties

0 讨论(0)
发布评论:

提交评论
- 加载中...

How to set group.id for consumer group in kafka data source in Structured Streaming?

Since Spark 3.0.0