kafka-python

reading only specific messages from kafka topic

♀尐吖头ヾ 提交于 2020-03-05 04:56:06
问题 Scenario: I am writing data JSON object data into kafka topic while reading I want to read an only specific set of messages based on the value present in the message. I am using kafka-python library. sample messages: {flow_status: "completed", value: 1, active: yes} {flow_status:"failure",value 2, active:yes} Here I want to read only messages having flow_Status as completed. 回答1: In Kafka it's not possible doing something like that. The consumer consumes messages one by one, one after the

kafka consumer seek is not working: AssertionError: Unassigned partition

杀马特。学长 韩版系。学妹 提交于 2020-02-25 05:47:07
问题 the kafka consumer con defined below works perfectly fine when I try to receive messages form my topic; however, it's giving me trouble when I try to change the offset using seek method or any of its variations. i.e. seek_to_beginning , seek_to_end from kafka import KafkaConsumer, TopicPartition con = KafkaConsumer(my_topic, bootstrap_servers = my_bootstrapservers, group_id = my_groupid) p = con.partitions_for_topic(my_topic) my_partition = p.pop() tp = TopicPartition(topic = my_topic,

Why Kafka care about the hostname?

一笑奈何 提交于 2020-01-30 11:07:27
问题 I did a test with code below to send data to topic. The kafka is kafka_2.12-1.1.0 The code are import kafka print(kafka.version.__version__) from kafka import KafkaProducer producer = KafkaProducer( bootstrap_servers=['172.25.44.238:9092'], sasl_mechanism="PLAIN", api_version=(0, 10), retries=2 ) f = producer.send("test", "some") f.get() If I change the server config like this: listeners=PLAINTEXT://172.25.44.238:9092 Then my code can send data to my topic If I change the server config like

Python Producer can send via shell, but not .py

雨燕双飞 提交于 2020-01-25 00:26:06
问题 I have a running and tested Kafka cluster, and am trying to use a Python script to send messages to the brokers. This works when I use the Python3 shell and call the producer method, however when I put these same commands into a python file and execute it - the script seems to hang. I am using the kafka-python library for the consumer and producer. When I use the Python3 shell I can see the messages appear in the topic using Kafka GUI tool 2.0.4 I've tried various loops and statements in the

kafka-python read from last produced message after a consumer restart

元气小坏坏 提交于 2020-01-24 17:00:55
问题 i am using kafka-python to consume messages from a kafka queue (kafka version 0.10.2.0). In particular i am using KafkaConsumer type. If the consumer stops and after a while it is restarted i would like to restart from the latest produced message, that is drop all the messages produced during the time the consumer was down. How can i achieve this? Thanks 回答1: You will not to seekToEnd() to the end of the log. Keep in mind, that you first need to subscribe to a topic before you can seek. Also,

how to properly use pyspark to send data to kafka broker?

↘锁芯ラ 提交于 2020-01-22 06:45:15
问题 I'm trying to write a simple pyspark job, which would receive data from a kafka broker topic, did some transformation on that data, and put the transformed data on a different kafka broker topic. I have the following code, which reads data from a kafka topic, but has no effect running sendkafka function: from pyspark import SparkConf, SparkContext from operator import add import sys from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils import json from

Kafka delivering duplicate message

女生的网名这么多〃 提交于 2020-01-15 03:58:08
问题 We are using kafka(0.9.0.0) for orchestrating command messages between different micro services. We are finding an intermittent issue where duplicate messages are getting delivered to a particular topic. The logs that occur when this issue happens is given below. Can some one help to understand this issue Wed, 21-Sep-2016 09:19:07 - WARNING Coordinator unknown during heartbeat -- will retry Wed, 21-Sep-2016 09:19:07 - WARNING Heartbeat failed; retrying Wed, 21-Sep-2016 09:19:07 - WARNING

How to get latest offset for a partition for a kafka topic?

僤鯓⒐⒋嵵緔 提交于 2019-12-31 08:58:07
问题 I am using the Python high level consumer for Kafka and want to know the latest offsets for each partition of a topic. However I cannot get it to work. from kafka import TopicPartition from kafka.consumer import KafkaConsumer con = KafkaConsumer(bootstrap_servers = brokers) ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)] con.assign(ps) for p in ps: print "For partition %s highwater is %s"%(p.partition,con.highwater(p)) print "Subscription = %s"%con.subscription()

PySpark Processing Stream data and saving processed data to file

坚强是说给别人听的谎言 提交于 2019-12-25 08:04:31
问题 I am trying to replicate a device that is streaming it's location's coordinates, then process the data and save it to a text file. I am using Kafka and Spark streaming (on pyspark),this is my architecture: 1-Kafka producer emits data to a topic named test in the following string format : "LG float LT float" example : LG 8100.25191107 LT 8406.43141483 Producer code : from kafka import KafkaProducer import random producer = KafkaProducer(bootstrap_servers='localhost:9092') for i in range(0

Zookeeper -Kafka: ConnectException - Connection refused

寵の児 提交于 2019-12-24 00:57:25
问题 I am trying to setup 3 Kafka brokers on ubuntu EC2 machines. But I am getting ConnectException while starting zookeeper . All the ports in the security group of my ec2 intsances are already open. Below is the stack trace: [2016-03-03 07:37:12,040] ERROR Exception while listening (org.apache.zookeeper.server.quorum.QuorumCnxManager) java.net.BindException: Cannot assign requested address at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind