apache-kafka-streams

KTable unable fetch data from Materialized view

試著忘記壹切 提交于 2020-01-14 06:07:11
问题 I am using Kafka Streams with Spring Boot. In my use case when I receive customer event I need to store it in customer-store materialized view and when I receive order event, I need to join customer and order then store the result in customer-order materialized view. StoreBuilder customerStateStore = Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("customer-store"),Serdes.String(), customerSerde) .withLoggingEnabled(new HashMap<>()); streamsBuilder.stream("customer", Consumed.with

KTable unable fetch data from Materialized view

懵懂的女人 提交于 2020-01-14 06:07:09
问题 I am using Kafka Streams with Spring Boot. In my use case when I receive customer event I need to store it in customer-store materialized view and when I receive order event, I need to join customer and order then store the result in customer-order materialized view. StoreBuilder customerStateStore = Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("customer-store"),Serdes.String(), customerSerde) .withLoggingEnabled(new HashMap<>()); streamsBuilder.stream("customer", Consumed.with

External processing using Kafka Streams

守給你的承諾、 提交于 2020-01-13 05:58:48
问题 There are several questions regarding message enrichment using external data, and the recommendation is almost always the same: ingest external data using Kafka Connect and then join the records using state stores. Although it fits in most cases, there are several other use cases in which it does not, such as IP to location and user agent detection, to name a few. Enriching a message with an IP-based location usually requires a lookup by a range of IPs, but currently, there is no built-in

What are internal topics used in Kafka?

江枫思渺然 提交于 2020-01-06 08:02:08
问题 We are using kafka stream api for aggregation in which we are also using group by. We are also using state store where it saves the input topics data. What i notice is Kafka internally creates 3 kinds of topic Changelog-<storeid>-<partition> Repartition-<storeid>-<partition> <topicname>-<partition> What I am not able to understand is Why it creates changelog topic when I have all the data in <topic>-<partition> Does repartition topic contains data after grouping. and I see that the size of

Kafka Streams: Any guarantees on ordering of saves to state stores when using at_least_once?

一个人想着一个人 提交于 2020-01-06 06:18:30
问题 We have a Kafka Streams Java topology built with the Processor API. In the topology, we have a single processor, that saves to multiple state stores. As we use at_least_once, we would expect to see some inconsistencies between the state stores - e.g. an incoming record results in writes to both state store A and B, but a crash between the saves results in only the save to store A getting written to the Kafka change log topic. Are we guaranteed that the order in which we save will also be the

How does Kafka Stream send final aggregation with KTable#Suppress?

牧云@^-^@ 提交于 2020-01-06 05:27:07
问题 What I'd like to do is this: Consume records from a topic count the values for each 1 sec window detect window whose records num < 4 Send the FINAL result to another topic I use suppress to send final result, but I got an error like this. 09:18:07,963 ERROR org.apache.kafka.streams.processor.internals.ProcessorStateManager - task [1_0] Failed to flush state store KSTREAM-AGGREGATE-STATE-STORE-0000000002: java.lang.ClassCastException: org.apache.kafka.streams.kstream.Windowed cannot be cast to

Kafka Streams (Suppress): Closing a TimeWindow by timeout

≡放荡痞女 提交于 2020-01-06 04:40:07
问题 I have the following piece of code to aggregate data hourly based on event time KStream<Windowed<String>, SomeUserDefinedClass> windowedResults = inputStream .groupByKey(Grouped.with(Serdes.String(), new SomeUserDefinedSerde<>())) .windowedBy(TimeWindows.of(Duration.ofMinutes(60)).grace(Duration.ofMinutes(15))) .aggregate ( // do some aggregation ) .suppress(Suppressed.untilTimeLimit(Duration.ofMinutes(75), Suppressed.BufferConfig.unbounded())) .toStream(); The issue is that I am unable to

Update KTable based on partial data attributes

依然范特西╮ 提交于 2020-01-05 17:57:51
问题 I am trying to update a KTable with partial data of an object. Eg. User object is {"id":1, "name":"Joe", "age":28} The object is being streamed into a topic and grouped by key into KTable. Now the user object is updated partially as follows {"id":1, "age":33} and streamed into table. But the updated table looks as follows {"id":1, "name":null, "age":28} . The expected output is {"id":1, "name":"Joe", "age":33} . How can I use Kafka streams and spring cloud streams to achieve the expected

InvalidStateStoreException: the state store is not open in Kafka streams

谁说胖子不能爱 提交于 2020-01-05 05:31:31
问题 StreamsBuilder builder = new StreamsBuilder(); Map<String, ?> serdeConfig = Collections.singletonMap(SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryUrl); Serde keySerde= getSerde(keyClass); keySerde.configure(serdeConfig,true); Serde valueSerde = getSerde(valueClass); valueSerde.configure(serdeConfig,false); StoreBuilder<KeyValueStore<K,V>> store = Stores.keyValueStoreBuilder( Stores.persistentKeyValueStore("mystore"), keySerde,valueSerde).withCachingEnabled(); builder.addGlobalStore(store,

Kafka Streams rebalancing latency spikes on high throughput kafka-streams services

旧街凉风 提交于 2020-01-04 06:50:48
问题 we are starting to work with Kafka streams, our service is a very simple stateless consumer. We have tight requirements on latency, and we are facing too high latency problems when the consumer group is rebalancing. In our scenario, rebalancing will happen relatively often: rolling updates of code, scaling up/down the service, containers being shuffled by the cluster scheduler, containers dying, hardware failing. One of the first tests we have done is having a small consumer group with 4