问题
I am wondering is possible with Flink windowing to induce a 10 minute delay from when the data enters the pipeline until it is written to a table in Cassandra.
My initial intention was to write each transaction to a table in Cassandra and query the table using a range key at the web layer but due to the volume of data, I am looking at options to delay the write for N seconds. This means that my table will only ever have data that is at least 10 minutes old.
The small diagram below shows 10 minute windows that roll every minute. As time moves on I only want to write data to Cassandra that is older than 10 minutes (the parts in green). I guess is this even possible with Flink?
I could create 11 minute windows that roll every minute but I would end up throwing 90% of the data away, which seems a waste.
Final Solution
I created my own flavour of FlinkKafkaConsumer09
called DelayedKafkaConsumer
The main reason for this is to override the creation of the KafkaFetcher
public class DelayedKafkaConsumer<T> extends FlinkKafkaConsumer09<T> {
private ConsumerRecordFunction applyDelayAction;
.............
@Override
protected AbstractFetcher<T, ?> createFetcher(SourceContext<T> sourceContext, Map<KafkaTopicPartition, Long> assignedPartitionsWithInitialOffsets,
SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic,
SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
StreamingRuntimeContext runtimeContext, OffsetCommitMode offsetCommitMode) throws Exception {
return new DelayedKafkaFetcher<>(
sourceContext, assignedPartitionsWithInitialOffsets, watermarksPeriodic, watermarksPunctuated,
runtimeContext.getProcessingTimeService(), runtimeContext.getExecutionConfig().getAutoWatermarkInterval(),
runtimeContext.getUserCodeClassLoader(), runtimeContext.getTaskNameWithSubtasks(),
runtimeContext.getMetricGroup(), this.deserializer, this.properties, this.pollTimeout, useMetrics, applyDelayAction);
}
The DelayedKafkaFetcher
has a small piece of code in it's runFetchLoop
that sleeps for n milliseconds before emmitting the record.
private void delayMessage(Long msgTransactTime, Long nowMinusDelay) throws InterruptedException {
if (msgTransactTime > nowMinusDelay) {
Long sleepTimeout = msgTransactTime - nowMinusDelay;
if (LOGGER.isDebugEnabled()) {
LOGGER.debug(format("Message with transaction time {0}ms is not older than {1}ms. Sleeping for {2}", msgTransactTime, nowMinusDelay, sleepTimeout));
}
TimeUnit.MILLISECONDS.sleep(sleepTimeout);
}
}
来源:https://stackoverflow.com/questions/44159256/apache-flink-using-windows-to-induce-a-delay-before-writing-to-sink