问题
I have a single source CSV file containing records of different sizes that pushes every record into one source topic. I want to split the records into different KStreams/KTables from that source topic. I have a pipeline for one table load, where I am pushing the record from the source topic into stream1 in delimited format and then pushing the records into another stream in AVRO format which is then pushed into JDBC sink connector that pushes the record into MySQL database. The pipeline needs to be the same. But I wanted to push records of different tables into one source topic and then split the records into the different streams as per one value. Is this possible? I tried searching for ways to do that but could not. Can I improve the pipeline somehow too or use KTable instead of KStreams or any other modifications?
My current flow -
one source CSV file (source.csv
) -> source topic (name - sourcetopic containing test1 records) -> stream 1 (delimited value format) -> stream 2 (as AVRO value format) -> end topic (name - sink-db-test1
) -> JDBC sink connector -> MySQL DB (name - test1
)
I have a different MySQL table test2
with a different schema and the records for this table are also present in source.csv
file. Since the schema is different I cannot follow the current pipeline of test1
to insert data into the test2
table.
Example - in CSV source file,
line 1 - 9,atm,mun,ronaldo
line 2- 10,atm,mun,bravo,num2
line 3 - 11,atm,sign,bravo,sick
here in this example, the value under which it is to be split is column 4
(ronaldo
or bravo
)
all these data should be loaded into table 1
, table 2
, table 3
respectively
The key is the column 4.
if col4==ronaldo, go to table 1
if col4==bravo and col3==mun, go to table 2
if col4==bravo and col3 ==sign go to table 3
I am very new to Kafka, started Kafka development from the previous week.
回答1:
You can write a separated Kafka Streams application to split records from the input topic to different KStream or output topics using KStream#branch() operator:
KStream<K, V>[] branches = streamsBuilder.branch(
(key, value) -> {filter logic for topic 1 here},
(key, value) -> {filter logic for topic 2 here},
(key, value) -> true//get all messages for this branch
);
// KStream branches[0] records for logic 1
// KStream branches[1] records for logic 2
// KStream branches[2] records for logic 3
Or you could manually branch your KStream like this:
KStream<K, V> inputKStream = streamsBuilder.stream("your_input_topic", Consumed.with(keySerde, valueSerdes));
inputKStream
.filter((key, value) -> {filter logic for topic 1 here})
.to("your_1st_output_topic");
inputKStream
.filter((key, value) -> {filter logic for topic 2 here})
.to("your_2nd_output_topic");
...
回答2:
I am able to split the data and used KSQL for the approach that I am sharing below.
1. An input stream is created with value_format='JSON'
and a column payload
as STRING
2. The payload will contain the whole record as a STRING
3. The record is then split into different streams using LIKE
operator in the WHERE
clause while putting the payload into different streams as per the requirement. Here I have used SPLIT
operator of KSQL to get the records from payload that are in comma-delimited format
来源:https://stackoverflow.com/questions/61182673/how-to-split-records-into-different-streams-from-one-topic-to-different-streams