reading only specific messages from kafka topic

♀尐吖头ヾ 提交于 2020-03-05 04:56:06

问题


Scenario:

I am writing data JSON object data into kafka topic while reading I want to read an only specific set of messages based on the value present in the message. I am using kafka-python library.

sample messages:

{flow_status: "completed", value: 1, active: yes}
{flow_status:"failure",value 2, active:yes}

Here I want to read only messages having flow_Status as completed.


回答1:


In Kafka it's not possible doing something like that. The consumer consumes messages one by one, one after the other starting from the latest committed offset (or from the beginning, or seeking at a specific offset). Depends on your use case, maybe you could have a different flow in your scenario: the message taking the process to do goes into a topic but then the application which processes the action, then writes the result (completed or failed) in two different topics: in this way you have all completed separated from failed. Another way is to use a Kafka Streams application for doing the filtering but taking into account that it's just a sugar, in reality the streams application will always read all the messages but allowing you to filter messages easily.




回答2:


You can create two different topics; one for completed and another for failure status. And then read messages from the completed topics to handle them.

Otherwise, if you want them to be in a single topic and want to read only completed ones, I believe you need to read them all and ignore the failure ones using a simple if-else condition.




回答3:


Kafka consumer doesn't support this kind of functionality upfront. You will have to consume all events sequentially, filter out the status completed events and put it somewhere. Instead you can consider using Kafka Streams application where you can read the data as a stream and filter the events where flow_status = "completed" and publish in some output topic or some other destination.

Example :

KStream<String,JsonNode> inputStream= builder.stream(inputTopic);
KStream<String,JsonNode> completedFlowStream = inputStream.filter(value-> value.get("flow_status").equals("completed"));

P.S. Kafka doesn't have official release for Python API for KStream but there is open source project : https://github.com/wintoncode/winton-kafka-streams




回答4:


As of today it is not possible to achieve it at broker end, there is a Jira feature request open to apache kafka to get this feature implemented, you can track it here, i hope they will get this implemented in near future: https://issues.apache.org/jira/browse/KAFKA-6020

I feel the best way is to use a RecordFilterStrategy (Java/spring) interface and filter it at consumer end.



来源:https://stackoverflow.com/questions/54742208/reading-only-specific-messages-from-kafka-topic

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!