apache-kafka-streams | 易学教程

Kafka Streams Topology stuck in num.stream.threads=50 and 100 partitions

阅读更多关于 Kafka Streams Topology stuck in num.stream.threads=50 and 100 partitions

问题 There is a topology: kStreamBuilder.stream(kafkaProperties.getInboundTopicName(), consumed) .filterNot((k,v) -> Objects.isNull(v)) .transform(() -> new CustomTransformer(...)) .transform(() -> new AnotherTransformer(...)) .to(kafkaProperties.getOutTopicName(), resultProduced); with configured num.stream.threads: 50 On startup application stuck with constantly logging messages(I'm not 100% sure it stuck but after 20 minutes there are no changes in the state and CPU, network usage is very high)

Kafka Streams table transformations

阅读更多关于 Kafka Streams table transformations

问题 I've got a table in SQL Server that I'd like to stream to Kafka topic, the structure is as follows: (UserID, ReportID) This table is going to be continuously changed (records added, inserted, no updates) I'd like to transform this into this kind of structure and put into Elasticsearch: { "UserID": 1, "Reports": [1, 2, 3, 4, 5, 6] } Examples I've seen so far are logs or click-stream which and do not work in my case. Is this kind of use case possible at all? I could always just look at UserID

Kafka Stream reprocessing old messages on rebalancing

阅读更多关于 Kafka Stream reprocessing old messages on rebalancing

问题 I have a Kafka Streams application which reads data from a few topics, joins the data and writes it to another topic. This is the configuration of my Kafka cluster: 5 Kafka brokers Kafka topics - 15 partitions and replication factor 3. My Kafka Streams applications are running on the same machines as my Kafka broker. A few million records are consumed/produced per hour. Whenever I take a broker down, the application goes into rebalancing state and after rebalancing many times it starts

Kafka Stream reprocessing old messages on rebalancing

阅读更多关于 Kafka Stream reprocessing old messages on rebalancing

Kafka Processor API: Different key for Source and StateStore?

阅读更多关于 Kafka Processor API: Different key for Source and StateStore?

问题 we are currently implementing a process (using the Kafka Processor API) were we need to combine information from 2 correlated events (messages) on a topic and then forward those combined information. The events originate from IoT devices and since we want to keep them in order, the source topic uses a device identifier as key. The events also contain a correlation ID: Key { deviceId: "..." } Message { deviceId: "...", correlationId: "...", data: ...} Our first approach was to create a

Aggregration and state store retention in kafka streams

阅读更多关于 Aggregration and state store retention in kafka streams

问题 I have a use case like the following. For each incoming event, I want to look at a certain field to see if it's status changed from A to B and if so, send that to an output topic. The flow is like this: An event with key "xyz" comes in with status A, and some time later another event comes in with key "xyz" with status B. I have this code using the high level DSL. final KStream<String, DomainEvent> inputStream.... final KStream<String, DomainEvent> outputStream = inputStream .map((k, v) ->

Kafka Streams with state stores - Reprocessing of messages on app restart

阅读更多关于 Kafka Streams with state stores - Reprocessing of messages on app restart

问题 We have the following topology with two transformers, and each transformer uses persistent state store: kStreamBuilder.stream(inboundTopicName) .transform(() -> new FirstTransformer(FIRST_STATE_STORE), FIRST_STATE_STORE) .map((key, value) -> ...) .transform(() -> new SecondTransformer(SECOND_STATE_STORE), SECOND_STATE_STORE) .to(outboundTopicName); and Kafka settings has auto.offset.reset: latest . After app was launched, I see two internal compacted topics were creates (and it's expected):

Why does my Kafka Streams topology does not replay/reprocess correctly?

阅读更多关于 Why does my Kafka Streams topology does not replay/reprocess correctly?

问题 I have a topology that looks like this: KTable<ByteString, User> users = topology.table(USERS); KStream<ByteString, JoinRequest> joinRequests = topology.stream(JOIN_REQUESTS) .mapValues(entityTopologyProcessor::userNew) .to(USERS); topology.stream(SETTINGS_CONFIRM_REQUESTS) .join(users, entityTopologyProcessor::userSettingsConfirm) .to(USERS); topology.stream(SETTINGS_UPDATE_REQUESTS) .join(users, entityTopologyProcessor::userSettingsUpdate) .to(USERS); At runtime this topology works fine.

Kafka streams exactly once delivery

阅读更多关于 Kafka streams exactly once delivery

问题 My goal is to consume from topic A, do some processing and produce to topic B, as a single atomic action. To achieve this I see two options: Use a spring-kafka @Kafkalistener and a KafkaTemplate as described here. Use Streams eos (exactly-once) functionality. I have successfully verified option #1. By successfully, I mean that if my processing fails (IllegalArgumentException is thrown) the consumed message from topic A keeps being consumed by the KafkaListener. This is what I expect, as the

Set timestamp in output with Kafka Streams

阅读更多关于 Set timestamp in output with Kafka Streams

问题 I'm getting CSVs in a Kafka topic "raw-data", the goal is to transform them by sending each line in another topic "data" with the right timestamp (different for each line). Currently, I have 2 streamers: one to split the lines in "raw-data", sending them to an "internal" topic (no timestamp) one with a TimestampExtractor that consumes "internal" and send them to "data". I'd like to remove the use of this "internal" topic by setting directly the timestamp but I couldn't find a way (the