kafka-python

Python kafka consumer group id issue

半世苍凉 提交于 2019-12-04 21:05:26
AFAIK, The concept of partitions and (consumer) groups in kafka was introduced to implement parallelism. I am working with kafka through python. I have a certain topic, which has (say) 2 partitions. This means, if I start a consumer group with 2 consumers in it, they will be mapped(subscribed) to different partitions. But, using kafka library in python, I came across a weird issue. I started 2 consumers with essentially the same group-ids, and started the threads for them to consume messages. But, every message in the kafka-stream is being consumed by both of them !! This seems ridiculous to

kafka-python consumer start reading from offset (automatically)

倖福魔咒の 提交于 2019-12-04 14:21:31
问题 I'm trying to build an application with kafka-python where a consumer reads data from a range of topics. It is extremely important that the consumer never reads the same message twice, but also never misses a message. Everything seems to be working fine, except when I turn off the consumer (e.g. failure) and try to start reading from offset. I can only read all the messages from the topic (which creates double reads) or listen for new messages only (and miss messages that where emitted during

kafka-python - How do I commit a partition?

ε祈祈猫儿з 提交于 2019-12-04 10:46:45
Using kafka-python-1.0.2. If I have a topic with 10 partitions, how do I go about committing a particular partition, while looping through the various partitions and messages. I just cant seem find an example of this anywhere, in the docs or otherwise From the docs, I want to use: consumer.commit(offset=offsets) Specifically, how do I create the partition and OffsetAndMetadata dictionary required for offsets (dict, optional) – {TopicPartition: OffsetAndMetadata}. I was hoping the function call would just be something like: consumer.commit(partition, offset) but this does not seem to be the

multiprocessing in kafka-python

十年热恋 提交于 2019-12-04 05:34:11
I have been using the python-kaka module to consume from a kafka broker. I want to consume from the same topic with 'x' number of partitions in parallel. The documentation has this : # Use multiple consumers in parallel w/ 0.9 kafka brokers # typically you would run each on a different server / process / CPU consumer1 = KafkaConsumer('my-topic', group_id='my-group', bootstrap_servers='my.server.com') consumer2 = KafkaConsumer('my-topic', group_id='my-group', bootstrap_servers='my.server.com') Does this mean I can create a separate consumer for each process that I spawn? Also, will there be an

How to pass data from Kafka to Spark Streaming?

微笑、不失礼 提交于 2019-12-03 17:47:38
问题 I am trying to pass data from kafka to spark streaming. This is what I've done till now: Installed both kafka and spark Started zookeeper with default properties config Started kafka server with default properties config Started kafka producer Started kafka consumer Sent message from producer to consumer. Works fine. Wrote kafka-spark.py to receive messages from kafka to spark. I try running ./bin/spark-submit examples/src/main/python/kafka-spark.py I get an error. kafka-spark.py - from _

How to pass data from Kafka to Spark Streaming?

扶醉桌前 提交于 2019-12-03 07:33:32
I am trying to pass data from kafka to spark streaming. This is what I've done till now: Installed both kafka and spark Started zookeeper with default properties config Started kafka server with default properties config Started kafka producer Started kafka consumer Sent message from producer to consumer. Works fine. Wrote kafka-spark.py to receive messages from kafka to spark. I try running ./bin/spark-submit examples/src/main/python/kafka-spark.py I get an error. kafka-spark.py - from __future__ import print_function import sys from pyspark.streaming import StreamingContext from pyspark

how to properly use pyspark to send data to kafka broker?

混江龙づ霸主 提交于 2019-12-02 21:14:47
I'm trying to write a simple pyspark job, which would receive data from a kafka broker topic, did some transformation on that data, and put the transformed data on a different kafka broker topic. I have the following code, which reads data from a kafka topic, but has no effect running sendkafka function: from pyspark import SparkConf, SparkContext from operator import add import sys from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils import json from kafka import SimpleProducer, KafkaClient def sendkafka(messages): kafka = KafkaClient("localhost:9092")

KafkaTimeoutError('Failed to update metadata after 60.0 secs.')

北慕城南 提交于 2019-12-02 05:04:34
问题 I am writing a Kafka producer using Python 3.6,the Python-kafka client version is 1.4.4。The Kafka version is: 2.1.0 & 1.1.1(two version are tried),but when I write a message to producer,throw this error: KafkaTimeoutError('Failed to update metadata after 60.0 secs.') This is my client code: producer = KafkaProducer( bootstrap_servers=['mq-server:9092'], api_version = (0,10,2,0) # solve no broker error ) producer.send("dolphin-test".encode('utf-8'),b"test") This is server config I am modified:

NoBrokersAvailable: NoBrokersAvailable-Kafka Error

99封情书 提交于 2019-12-02 00:43:52
问题 i have already started to learn Kafka. Trying basic operations on it. I have stucked on a point which about the 'Brokers'. My kafka is running but when i want to create a partition. from kafka import TopicPartition (ERROR THERE) consumer = KafkaConsumer(bootstrap_servers='localhost:1234') consumer.assign([TopicPartition('foobar', 2)]) msg = next(consumer) traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/kafka/consumer/group.py", line 284, in

KafkaTimeoutError('Failed to update metadata after 60.0 secs.')

雨燕双飞 提交于 2019-12-01 23:06:48
I am writing a Kafka producer using Python 3.6,the Python-kafka client version is 1.4.4。The Kafka version is: 2.1.0 & 1.1.1(two version are tried),but when I write a message to producer,throw this error: KafkaTimeoutError('Failed to update metadata after 60.0 secs.') This is my client code: producer = KafkaProducer( bootstrap_servers=['mq-server:9092'], api_version = (0,10,2,0) # solve no broker error ) producer.send("dolphin-test".encode('utf-8'),b"test") This is server config I am modified: listeners=PLAINTEXT://10.142.0.2:9092 advertised.listeners=PLAINTEXT://10.142.0.2:9092 When using