apache-kafka | 易学教程

What is the difference between Kafka partitions and Kafka replicas?

阅读更多关于 What is the difference between Kafka partitions and Kafka replicas?

问题 I created 3 Kafka brokers setup with broker id's 20,21,22. Then I created this topic: bin/kafka-topics.sh --zookeeper localhost:2181 \ --create --topic zeta --partitions 4 --replication-factor 3 which resulted in: When a producer sends message "hello world" to topic zeta, to which partition the message first gets written to by Kafka? The "hello world" message gets replicated in all 4 partitions? Each broker among the 3 brokers contain all the 4 partitions? How is that related to replica

Spark structured streaming with kafka leads to only one batch (Pyspark)

阅读更多关于 Spark structured streaming with kafka leads to only one batch (Pyspark)

问题 I have the following code and I'm wondering why it generates only one batch: df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "IP").option("subscribe", "Topic").option("startingOffsets","earliest").load() // groupby on slidings windows query = slidingWindowsDF.writeStream.queryName("bla").outputMode("complete").format("memory").start() The application is launched with the following parameters: spark.streaming.backpressure.initialRate 5 spark.streaming.backpressure

ClickHouse JSON parse exception: Cannot parse input: expected ',' before

阅读更多关于 ClickHouse JSON parse exception: Cannot parse input: expected ',' before

问题 I'm trying to add JSON data to ClickHouse from Kafka. Here's simplified JSON: { ... "sendAddress":{ "sendCommChannelTypeId":4, "sendCommChannelTypeCode":"SMS", "sendAddress":"789345345945"}, ... } Here's the steps for creating table in ClickHouse, create another table using Kafka Engine and creating MATERIALIZED VIEW to connect these two tables, and also connect CH with Kafka. Creating the first table CREATE TABLE tab ( ... sendAddress Tuple (sendCommChannelTypeId Int32,

How to distribute data evenly in Kafka producing messages through Spark?

阅读更多关于 How to distribute data evenly in Kafka producing messages through Spark?

问题 I have a streaming job that writes data into Kafka and I've noticed one of Kafka partitions (#3) takes more data then others. +-----------------------------------------------------+ | partition | messages | earlist offset | next offset| +-----------------------------------------------------+ |1 | 166522754 | 5861603324 | 6028126078 | |2 | 152251127 | 6010226633 | 6162477760 | |3 | 382935293 | 6332944925 | 6715880218 | |4 | 188126274 | 6171311709 | 6359437983 | |5 | 188270700 | 6100140089 |

How to distribute data evenly in Kafka producing messages through Spark?

阅读更多关于 How to distribute data evenly in Kafka producing messages through Spark?

Kafka 1.1.0 keeps getting partition leader epoch

阅读更多关于 Kafka 1.1.0 keeps getting partition leader epoch

问题 I have a problem with Kafka. I upgraded to kafka version from 0.11.0.1 to 1.1.0. After the upgrade, I'm getting the below warn message too much. [2018-06-19 13:34:45,377] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:350280659}, Current: {epoch:4, offset:126401625} for Partition: __consumer_offsets-48 (kafka.server.epoch.LeaderEpochFileCache) [2018-06-19 13:34:45,386] WARN Received a

What does a dash represents in CURRENT-OFFSET

阅读更多关于 What does a dash represents in CURRENT-OFFSET

问题 Referring below screenshot of consumer-group description, i am trying to understand what does "-" means here for CURRENT-OFFSET. Does it says that messages are not consumed from partition 1 & 3 even though the partitions are allocated to a consumer. LOG-END offset for partition 1 & 3 are 281 & 277 respectively . 回答1: CURRENT-OFFSET means the current max offset of the consumed messages of the partition for this consumer instance, whereas LOG-END-OFFSET is the offset of the latest message in

Kafka multiple topic consume

阅读更多关于 Kafka multiple topic consume

问题 consumer.subscribe(Pattern.compile(".*"),new ConsumerRebalanceListener() { @Override public void onPartitionsRevoked(Collection<TopicPartition> clctn) { } @Override public void onPartitionsAssigned(Collection<TopicPartition> clctn) { } }); How to consume all topics with regex in apache/kafka? I tried above code, but it didn't work. 回答1: For regex use the following signature KafkaConsumer.subscribe(Pattern pattern, ConsumerRebalanceListener listener) E.g. the following code snippet enables the

How to stream data from Kafka topic to Delta table using Spark Structured Streaming

阅读更多关于 How to stream data from Kafka topic to Delta table using Spark Structured Streaming

问题 I'm trying to understand databricks delta and thinking to do a POC using Kafka. Basically the plan is to consume data from Kafka and insert it to the databricks delta table. These are the steps that I did: Create a delta table on databricks. %sql CREATE TABLE hazriq_delta_trial2 ( value STRING ) USING delta LOCATION '/delta/hazriq_delta_trial2' Consume data from Kafka. import org.apache.spark.sql.types._ val kafkaBrokers = "broker1:port,broker2:port,broker3:port" val kafkaTopic = "kafkapoc"

Spark Structural Streaming with Confluent Cloud Kafka connectivity issue

阅读更多关于 Spark Structural Streaming with Confluent Cloud Kafka connectivity issue

问题 I am writing a Spark structured streaming application in PySpark to read data from Kafka in Confluent Cloud. The documentation for the spark readstream() function is too shallow and didn't specify much on the optional parameter part especially on the auth mechanism part. I am not sure what parameter goes wrong and crash the connectivity. Can anyone have experience in Spark help me to start this connection? Required Parameter > Consumer({'bootstrap.servers': > 'cluster.gcp.confluent.cloud:9092