kafka-python (1.0.0) throws error while connecting to the broker. At the same time /usr/bin/kafka-console-producer and /usr/bin/kafka-console-consumer work fine.
Pyt
Install kafka-python using pip install kafka-python
Steps to create kafka data pipeline:-
1. Run the Zookeeper using shell command or install zookeeperd using
sudo apt-get install zookeeperd
This will run zookeeper as a daemon and by default listens to 2181 port
Here are the commands to run:-
cd kafka-directory
./bin/zookeeper-server-start.sh ./config/zookeeper.properties
./bin/kafka-server-start.sh ./config/server.properties
Now that you have zookeeper and kafka server running, Run the producer.py script and consumer.py
Producer.py:
from kafka import KafkaProducer import time
producer = KafkaProducer(bootstrap_servers=['localhost:9092'])
topic = 'test'
lines = ["1","2","3","4","5","6","7","8"]
for line in lines:
try:
producer.send(topic, bytes(line, "UTF-8")).get(timeout=10)
except IndexError as e:
print(e)
continue
Consumer.py:-
from kafka import KafkaConsumer
topic = 'test'
consumer = KafkaConsumer(topic, bootstrap_servers=['localhost:9092'])
for message in consumer:
# message value and key are raw bytes -- decode if necessary!
# e.g., for unicode: `message.value.decode('utf-8')`
# print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,
# message.offset, message.key,
# message.value))
print(message)
Now run the producer.py and consumer.py in separate terminals to see the live data..!
Note: Above producer.py script runs once only to run it forever, use while loop and use time module.