kafka-python

Does Kafka guarantee message ordering within a single partition with ANY config param values?

寵の児 提交于 2019-12-23 12:09:14
问题 If I set Kafka config param at Producer as: 1. retries = 3 2. max.in.flight.requests.per.connection = 5 then its likely that Messages within one partition may not be in send_order. Does Kafka takes any extra step to make sure that messages within a partition remains in sent order only OR With above configuration, its possible to have out of order messages within a partition ? 回答1: Unfortunately, no. With your current configuration, there is a chance message will arrive unordered because of

How to stop Python Kafka Consumer in program?

陌路散爱 提交于 2019-12-23 09:18:12
问题 I am doing Python Kafka consumer (trying to use kafka.consumer.SimpleConsumer or kafka.consumer.simple.SimpleConsumer in http://kafka-python.readthedocs.org/en/latest/apidoc/kafka.consumer.html). When I run the following piece of code, it will run all the time, even if all messages consumed. I hope the consumer will stop if it consume all the messages. How to do it? Also I have no idea how to use stop() function (which is in base class kafka.consumer.base.Consumer). UPDATE I used signal

Python kafka consumer group id issue

喜欢而已 提交于 2019-12-22 01:18:07
问题 AFAIK, The concept of partitions and (consumer) groups in kafka was introduced to implement parallelism. I am working with kafka through python. I have a certain topic, which has (say) 2 partitions. This means, if I start a consumer group with 2 consumers in it, they will be mapped(subscribed) to different partitions. But, using kafka library in python, I came across a weird issue. I started 2 consumers with essentially the same group-ids, and started the threads for them to consume messages.

multiprocessing in kafka-python

佐手、 提交于 2019-12-21 12:27:42
问题 I have been using the python-kaka module to consume from a kafka broker. I want to consume from the same topic with 'x' number of partitions in parallel. The documentation has this : # Use multiple consumers in parallel w/ 0.9 kafka brokers # typically you would run each on a different server / process / CPU consumer1 = KafkaConsumer('my-topic', group_id='my-group', bootstrap_servers='my.server.com') consumer2 = KafkaConsumer('my-topic', group_id='my-group', bootstrap_servers='my.server.com')

Kafka optimal retention and deletion policy

烂漫一生 提交于 2019-12-18 12:34:15
问题 I am fairly new to kafka so forgive me if this question is trivial. I have a very simple setup for purposes of timing tests as follows: Machine A -> writes to topic 1 (Broker) -> Machine B reads from topic 1 Machine B -> writes message just read to topic 2 (Broker) -> Machine A reads from topic 2 Now I am sending messages of roughly 1400 bytes in an infinite loop filling up the space on my small broker very quickly. I'm experimenting with setting different values for log.retention.ms, log

consumer consuming the same message twice at the starting only

让人想犯罪 __ 提交于 2019-12-13 03:18:12
问题 At the very first consuming, my consumer is consuming the same message twice, this only happens at the first consuming, after that it only consumes once. Attaching the consumer conf code below. please check for the corrections def __init__(self, group_id, topic='default', bootstrap_servers= ['localhost:9092']): self.topic = topic self.bootstrap_servers = bootstrap_servers self.group_id = group_id self.consumer = KafkaConsumer(## Heading ## self.topic, bootstrap_servers=self.bootstrap_servers,

How to force a consumer to read a specific partition in kafka

非 Y 不嫁゛ 提交于 2019-12-10 18:19:33
问题 I have an application for downloading specific web-content, from a stream of URL's generated from 1 Kafka-producer. I've created a topic with 5 partitions and there are 5 kafka-consumers. However the timeout for the webpage download is 60 seconds. While one of the url is getting downloaded, the server assumes that the message is lost and resends the data to different consumers. I've tried everything mentioned in Kafka consumer configuration / performance issues and https://github.com/spring

Python: how to mock a kafka topic for unit tests?

点点圈 提交于 2019-12-10 13:19:37
问题 We have a message scheduler that generates a hash-key from the message attributes before placing it on a Kafka topic queue with the key. This is done for de-duplication purposes. However, I am not sure how I could possibly test this deduplication without actually setting up a local cluster and checking that it is performing as expected. Searching online for tools for mocking a Kafka topic queue has not helped, and I am concerned that I am perhaps thinking about this the wrong way. Ultimately,

how to send JSON object to kafka from python client

偶尔善良 提交于 2019-12-08 06:33:45
问题 I have a simple JSON object like the following d = { 'tag ': 'blah', 'name' : 'sam', 'score': {'row1': 100, 'row2': 200 } } The following is my python code which is sending messages to Kafka from kafka import SimpleProducer, KafkaClient import json # To send messages synchronously kafka = KafkaClient('10.20.30.12:9092') producer = SimpleProducer(kafka) jd = json.dumps(d) producer.send_messages(b'message1',jd) I see in the storm logs that the message is being received but its throwing

Kafka python consumer reading all the messages when started

喜欢而已 提交于 2019-12-07 01:34:10
问题 I am using the below code to read messages from a topic. I am facing two issues. Whenever i start consumer, it is reading all the messages in the queue? How do read only the unread messages? from kafka import KafkaConsumer consumer = KafkaConsumer('my-topic', group_id='my-group', bootstrap_servers=['localhost:9092']) for message in consumer: consumer.commit() # message value and key are raw bytes -- decode if necessary! # e.g., for unicode: `message.value.decode('utf-8')` print ("%s:%d:%d: