Kafka Streams - Send on different topics depending on Streams Data

守給你的承諾、 提交于 2019-12-20 10:11:28

问题


I have a kafka streams application waiting for records to be published on topic user_activity. It will receive json data and depending on the value of against a key I want to push that stream into different topics.

This is my streams App code:

KStream<String, String> source_user_activity = builder.stream("user_activity");
        source_user_activity.flatMapValues(new ValueMapper<String, Iterable<String>>() {
            @Override
            public Iterable<String> apply(String value) {
                System.out.println("value: " +  value);
                ArrayList<String> keywords = new ArrayList<String>();
                try {
                    JSONObject send = new JSONObject();
                    JSONObject received = new JSONObject(value);

                    send.put("current_date", getCurrentDate().toString());
                    send.put("activity_time", received.get("CreationTime"));
                    send.put("user_id", received.get("UserId"));
                    send.put("operation_type", received.get("Operation"));
                    send.put("app_name", received.get("Workload"));
                    keywords.add(send.toString());
                    // apply regex to value and for each match add it to keywords

                } catch (Exception e) {
                    // TODO: handle exception
                    System.err.println("Unable to convert to json");
                    e.printStackTrace();
                }

                return keywords;
            }
        }).to("user_activity_by_date");

In this code, I want to check operation type and then depending on that I want to push the streams into the relevant topic.

How can I achieve this?

EDIT:

I have updated my code to this:

final StreamsBuilder builder = new StreamsBuilder();

KStream<String, String> source_o365_user_activity = builder.stream("o365_user_activity");
KStream<String, String>[] branches = source_o365_user_activity.branch( 
      (key, value) -> (value.contains("Operation\":\"SharingSet") && value.contains("ItemType\":\"File")),
      (key, value) -> (value.contains("Operation\":\"AddedToSecureLink") && value.contains("ItemType\":\"File")),
      (key, value) -> true
     );

branches[0].to("o365_sharing_set_by_date");
branches[1].to("o365_added_to_secure_link_by_date");
branches[2].to("o365_user_activity_by_date");

回答1:


You can use branch method in order to split your stream. This method takes predicates for splitting the source stream into several streams.

The code below is taken from kafka-streams-examples:

KStream<String, OrderValue>[] forks = ordersWithTotals.branch(
    (id, orderValue) -> orderValue.getValue() >= FRAUD_LIMIT,
    (id, orderValue) -> orderValue.getValue() < FRAUD_LIMIT);

forks[0].mapValues(
    orderValue -> new OrderValidation(orderValue.getOrder().getId(), FRAUD_CHECK, FAIL))
    .to(ORDER_VALIDATIONS.name(), Produced
        .with(ORDER_VALIDATIONS.keySerde(), ORDER_VALIDATIONS.valueSerde()));

forks[1].mapValues(
    orderValue -> new OrderValidation(orderValue.getOrder().getId(), FRAUD_CHECK, PASS))
    .to(ORDER_VALIDATIONS.name(), Produced
  .with(ORDER_VALIDATIONS.keySerde(), ORDER_VALIDATIONS.valueSerde()));



回答2:


The original KStream.branch method is invonvenient because of mixed arrays and generics, and because it forces one to use 'magic numbers' to extract the right branch from the result (see e.g. KAFKA-5488 issue). Starting from spring-kafka 2.2.4, KafkaStreamBrancher class is going to be available. With it, more convenient branching will be possible:

new KafkaStreamsBrancher<String, String>()
    .branch((key, value) -> value.contains("A"), ks->ks.to("A"))
    .branch((key, value) -> value.contains("B"), ks->ks.to("B"))
    .defaultBranch(ks->ks.to("C"))
    .onTopOf(builder.stream("source"))
    //onTopOf returns the provided stream so we can continue with method chaining 
    //and do something more with the original stream

There is also KIP-418, so a there is also a chance that such class will appear in Kafka itself.




回答3:


Another possibility is routing the event dynamically using a TopicNameExtractor:

https://www.confluent.io/blog/putting-events-in-their-place-with-dynamic-routing

you would need to have created the topics in advance though,

val outputTopic: TopicNameExtractor[String, String] = (_, value: String, _) => defineOutputTopic(value)

builder
  .stream[String, String](inputTopic)
  .to(outputTopic)

and defineOutputTopic can return one of a defined set of topics given the value (or key or record context for that matter). PD: sorry for the scala code, in the link there is a Java example.



来源:https://stackoverflow.com/questions/48950580/kafka-streams-send-on-different-topics-depending-on-streams-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!