How to split records into different streams, from one topic to different streams?

◇◆丶佛笑我妖孽 提交于 2021-02-05 12:07:18

问题


I have a single source CSV file containing records of different sizes that pushes every record into one source topic. I want to split the records into different KStreams/KTables from that source topic. I have a pipeline for one table load, where I am pushing the record from the source topic into stream1 in delimited format and then pushing the records into another stream in AVRO format which is then pushed into JDBC sink connector that pushes the record into MySQL database. The pipeline needs to be the same. But I wanted to push records of different tables into one source topic and then split the records into the different streams as per one value. Is this possible? I tried searching for ways to do that but could not. Can I improve the pipeline somehow too or use KTable instead of KStreams or any other modifications?

My current flow - one source CSV file (source.csv) -> source topic (name - sourcetopic containing test1 records) -> stream 1 (delimited value format) -> stream 2 (as AVRO value format) -> end topic (name - sink-db-test1) -> JDBC sink connector -> MySQL DB (name - test1)

I have a different MySQL table test2 with a different schema and the records for this table are also present in source.csv file. Since the schema is different I cannot follow the current pipeline of test1 to insert data into the test2 table.

Example - in CSV source file,

line 1 - 9,atm,mun,ronaldo line 2- 10,atm,mun,bravo,num2 line 3 - 11,atm,sign,bravo,sick

here in this example, the value under which it is to be split is column 4 (ronaldo or bravo) all these data should be loaded into table 1, table 2, table 3 respectively The key is the column 4.

if col4==ronaldo, go to table 1 if col4==bravo and col3==mun, go to table 2 if col4==bravo and col3 ==sign go to table 3

I am very new to Kafka, started Kafka development from the previous week.


回答1:


You can write a separated Kafka Streams application to split records from the input topic to different KStream or output topics using KStream#branch() operator:

KStream<K, V>[] branches = streamsBuilder.branch(
        (key, value) -> {filter logic for topic 1 here},
        (key, value) -> {filter logic for topic 2 here},
        (key, value) -> true//get all messages for this branch
);

// KStream branches[0] records for logic 1
// KStream branches[1] records for logic 2
// KStream branches[2] records for logic 3

Or you could manually branch your KStream like this:

KStream<K, V> inputKStream = streamsBuilder.stream("your_input_topic", Consumed.with(keySerde, valueSerdes));

inputKStream
        .filter((key, value) -> {filter logic for topic 1 here})
        .to("your_1st_output_topic");

inputKStream
        .filter((key, value) -> {filter logic for topic 2 here})
        .to("your_2nd_output_topic");
...



回答2:


I am able to split the data and used KSQL for the approach that I am sharing below. 1. An input stream is created with value_format='JSON' and a column payload as STRING 2. The payload will contain the whole record as a STRING 3. The record is then split into different streams using LIKE operator in the WHERE clause while putting the payload into different streams as per the requirement. Here I have used SPLIT operator of KSQL to get the records from payload that are in comma-delimited format



来源:https://stackoverflow.com/questions/61182673/how-to-split-records-into-different-streams-from-one-topic-to-different-streams

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!