问题
I'm using materialized KTable to use for left join with my KStream(while the stream is the left side).
However, it seem to process immediately, without waiting for the current version of the KTable to load..
I have a lot of values in my source topic for the KTable and when I start the application, a lot of joins fail(well, not really since it is a left join).
Can I make it start in delay so it would wait for the initial topic load?
回答1:
Processing is time synchronized in Kafka Streams. Hence, the table input topic and stream input topic are processed based on record timestamp order. This is semantically sound, because on a stream-table join, you don't want to join a stream record with an older version nor with a newer version of the KTable
, but with the right version based on the stream record timestamp.
If your data is not properly timestamped, you can try to specify a custom timestamp extractor for via builder.table(..., Consumed.with(...))
to return timestamps that ensure proper behavior (ie, maybe smaller than timestamp of the first stream record?)
- https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-timestamp-extractor
Note, that a proper timestamp synchronization requires Kafka Streams 2.1. Older version synchronize time in best effort manner only and may not provide the behavior you want. For more details, see KIP-353.
- https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization
回答2:
You could use the GlobalKTable. It waits until all values synchronized.
来源:https://stackoverflow.com/questions/56556270/can-kafka-streams-be-configured-to-wait-for-ktable-to-load