Retention time in kafka local state store / changelog

大憨熊 提交于 2021-02-07 10:10:33

问题


I'm using Kafka and Kafka Streams as part of Spring Cloud Stream. The data that is flowing in my Kafka Streams app is being aggregated and materialized by certain time windows:

Materialized<String, ErrorScore, WindowStore<Bytes, byte[]>> oneHour = Materialized.as("one-hour-store");
    oneHour.withLoggingEnabled(topicConfig);
    events
            .map(getStringSensorMeasurementKeyValueKeyValueMapper())
            .groupByKey()
            .windowedBy(TimeWindows.of(oneHourStore.getTimeUnit()))
            .reduce((aggValue, newValue) -> getMaxErrorScore(aggValue, newValue),
                    (oneHour));

As designed the information that is being materialized is also backed by a changelog topic.

Our app also has a rest endpoint that will query the statestore like this:

 ReadOnlyWindowStore<String, Double> windowStore =  queryableStoreRegistry.getQueryableStoreType("one-hour-store", QueryableStoreTypes.windowStore());
 WindowStoreIterator<ErrorScore> iter = windowStore.fetch(key, from, to);

Looking at the settings of the changelog topic that is created it reads:

min.insync.replicas 1
cleanup.policy delete
retention.ms 5259600000
retention.bytes -1

I would assume that the local state store would at least keep the information for 61 days (~2 months). However it seems that only about the last day of data remains in the stores.

What could cause the data being removed so soon?

Update with solution The Kafka Streams version 2.0.1 does not contain the Materialized.withRetention method. For this particular version I was able to set the retention time of the state stores using the following code which solves my problem:

TimeWindows timeWindows = TimeWindows.of(windowSizeMs);
    timeWindows.until(retentionMs);

making my code be written like:

...

.groupByKey()
        .windowedBy(timeWindows)
        .reduce((aggValue, newValue) -> getMaxErrorScore(aggValue, newValue),
                (oneHour));
...

回答1:


For windowed KTables there is a local retention time and there is the changlog retention time. You can set the local store retention time via Materialized.withRetentionTime(...) -- the default value is 24h.

For older Kafka release, the local store retention time is set via Windows#until().

If a new application is created, changelog topics are created with the same retention time as local store retention time. However, if you manually increase the log retention time, this won't affect your store retention time, but you need to update your code accordingly. This is also true when the changelog topic exist already: if you change the local store retention time, the changelog topic config is not update automatically.

There is a Jira for this as well: https://issues.apache.org/jira/browse/KAFKA-7591



来源:https://stackoverflow.com/questions/54685352/retention-time-in-kafka-local-state-store-changelog

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!