Apache Flink Using Windows to induce a delay before writing to Sink

廉价感情. 提交于 2019-12-08 05:13:17

问题


I am wondering is possible with Flink windowing to induce a 10 minute delay from when the data enters the pipeline until it is written to a table in Cassandra.

My initial intention was to write each transaction to a table in Cassandra and query the table using a range key at the web layer but due to the volume of data, I am looking at options to delay the write for N seconds. This means that my table will only ever have data that is at least 10 minutes old.

The small diagram below shows 10 minute windows that roll every minute. As time moves on I only want to write data to Cassandra that is older than 10 minutes (the parts in green). I guess is this even possible with Flink?

I could create 11 minute windows that roll every minute but I would end up throwing 90% of the data away, which seems a waste.

Final Solution

I created my own flavour of FlinkKafkaConsumer09 called DelayedKafkaConsumer The main reason for this is to override the creation of the KafkaFetcher

public class DelayedKafkaConsumer<T> extends FlinkKafkaConsumer09<T> {

    private ConsumerRecordFunction applyDelayAction;

    .............

    @Override
    protected AbstractFetcher<T, ?> createFetcher(SourceContext<T> sourceContext, Map<KafkaTopicPartition, Long> assignedPartitionsWithInitialOffsets,
                                                  SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic,
                                                  SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
                                                  StreamingRuntimeContext runtimeContext, OffsetCommitMode offsetCommitMode) throws Exception {
        return new DelayedKafkaFetcher<>(
            sourceContext, assignedPartitionsWithInitialOffsets, watermarksPeriodic, watermarksPunctuated,
            runtimeContext.getProcessingTimeService(), runtimeContext.getExecutionConfig().getAutoWatermarkInterval(),
            runtimeContext.getUserCodeClassLoader(), runtimeContext.getTaskNameWithSubtasks(),
            runtimeContext.getMetricGroup(), this.deserializer, this.properties, this.pollTimeout, useMetrics, applyDelayAction);
    }

The DelayedKafkaFetcher has a small piece of code in it's runFetchLoop that sleeps for n milliseconds before emmitting the record.

private void delayMessage(Long msgTransactTime, Long nowMinusDelay) throws InterruptedException {

        if (msgTransactTime > nowMinusDelay) {
            Long sleepTimeout = msgTransactTime - nowMinusDelay;
            if (LOGGER.isDebugEnabled()) {
                LOGGER.debug(format("Message with transaction time {0}ms is not older than {1}ms. Sleeping for {2}", msgTransactTime, nowMinusDelay, sleepTimeout));
            }
            TimeUnit.MILLISECONDS.sleep(sleepTimeout);
        }
    }

来源:https://stackoverflow.com/questions/44159256/apache-flink-using-windows-to-induce-a-delay-before-writing-to-sink

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!