问题
I'm using Kafka and Kafka Streams as part of Spring Cloud Stream. The data that is flowing in my Kafka Streams app is being aggregated and materialized by certain time windows:
Materialized<String, ErrorScore, WindowStore<Bytes, byte[]>> oneHour = Materialized.as("one-hour-store");
oneHour.withLoggingEnabled(topicConfig);
events
.map(getStringSensorMeasurementKeyValueKeyValueMapper())
.groupByKey()
.windowedBy(TimeWindows.of(oneHourStore.getTimeUnit()))
.reduce((aggValue, newValue) -> getMaxErrorScore(aggValue, newValue),
(oneHour));
As designed the information that is being materialized is also backed by a changelog topic.
Our app also has a rest endpoint that will query the statestore like this:
ReadOnlyWindowStore<String, Double> windowStore = queryableStoreRegistry.getQueryableStoreType("one-hour-store", QueryableStoreTypes.windowStore());
WindowStoreIterator<ErrorScore> iter = windowStore.fetch(key, from, to);
Looking at the settings of the changelog topic that is created it reads:
min.insync.replicas 1
cleanup.policy delete
retention.ms 5259600000
retention.bytes -1
I would assume that the local state store would at least keep the information for 61 days (~2 months). However it seems that only about the last day of data remains in the stores.
What could cause the data being removed so soon?
Update with solution The Kafka Streams version 2.0.1 does not contain the Materialized.withRetention method. For this particular version I was able to set the retention time of the state stores using the following code which solves my problem:
TimeWindows timeWindows = TimeWindows.of(windowSizeMs);
timeWindows.until(retentionMs);
making my code be written like:
...
.groupByKey()
.windowedBy(timeWindows)
.reduce((aggValue, newValue) -> getMaxErrorScore(aggValue, newValue),
(oneHour));
...
回答1:
For windowed KTable
s there is a local retention time and there is the changlog retention time. You can set the local store retention time via Materialized.withRetentionTime(...)
-- the default value is 24h.
For older Kafka release, the local store retention time is set via
Windows#until()
.
If a new application is created, changelog topics are created with the same retention time as local store retention time. However, if you manually increase the log retention time, this won't affect your store retention time, but you need to update your code accordingly. This is also true when the changelog topic exist already: if you change the local store retention time, the changelog topic config is not update automatically.
There is a Jira for this as well: https://issues.apache.org/jira/browse/KAFKA-7591
来源:https://stackoverflow.com/questions/54685352/retention-time-in-kafka-local-state-store-changelog