I have a single source CSV file containing records of different sizes that pushes every record into one source topic. I want to split the records into different KStreams/KTables
You can write a separated Kafka Streams application to split records from the input topic to different KStream or output topics using KStream#branch() operator:
KStream<K, V>[] branches = streamsBuilder.branch(
(key, value) -> {filter logic for topic 1 here},
(key, value) -> {filter logic for topic 2 here},
(key, value) -> true//get all messages for this branch
);
// KStream branches[0] records for logic 1
// KStream branches[1] records for logic 2
// KStream branches[2] records for logic 3
Or you could manually branch your KStream like this:
KStream<K, V> inputKStream = streamsBuilder.stream("your_input_topic", Consumed.with(keySerde, valueSerdes));
inputKStream
.filter((key, value) -> {filter logic for topic 1 here})
.to("your_1st_output_topic");
inputKStream
.filter((key, value) -> {filter logic for topic 2 here})
.to("your_2nd_output_topic");
...
I am able to split the data and used KSQL for the approach that I am sharing below.
1. An input stream is created with value_format='JSON'
and a column payload
as STRING
2. The payload will contain the whole record as a STRING
3. The record is then split into different streams using LIKE
operator in the WHERE
clause while putting the payload into different streams as per the requirement. Here I have used SPLIT
operator of KSQL to get the records from payload that are in comma-delimited format