apache-kafka-streams

Kafka Streams Topology stuck in num.stream.threads=50 and 100 partitions

笑着哭i 提交于 2020-01-04 05:54:22
问题 There is a topology: kStreamBuilder.stream(kafkaProperties.getInboundTopicName(), consumed) .filterNot((k,v) -> Objects.isNull(v)) .transform(() -> new CustomTransformer(...)) .transform(() -> new AnotherTransformer(...)) .to(kafkaProperties.getOutTopicName(), resultProduced); with configured num.stream.threads: 50 On startup application stuck with constantly logging messages(I'm not 100% sure it stuck but after 20 minutes there are no changes in the state and CPU, network usage is very high)

Kafka Streams table transformations

梦想的初衷 提交于 2020-01-03 01:59:07
问题 I've got a table in SQL Server that I'd like to stream to Kafka topic, the structure is as follows: (UserID, ReportID) This table is going to be continuously changed (records added, inserted, no updates) I'd like to transform this into this kind of structure and put into Elasticsearch: { "UserID": 1, "Reports": [1, 2, 3, 4, 5, 6] } Examples I've seen so far are logs or click-stream which and do not work in my case. Is this kind of use case possible at all? I could always just look at UserID

Kafka Stream reprocessing old messages on rebalancing

拈花ヽ惹草 提交于 2020-01-02 23:14:17
问题 I have a Kafka Streams application which reads data from a few topics, joins the data and writes it to another topic. This is the configuration of my Kafka cluster: 5 Kafka brokers Kafka topics - 15 partitions and replication factor 3. My Kafka Streams applications are running on the same machines as my Kafka broker. A few million records are consumed/produced per hour. Whenever I take a broker down, the application goes into rebalancing state and after rebalancing many times it starts

Kafka Stream reprocessing old messages on rebalancing

懵懂的女人 提交于 2020-01-02 23:14:16
问题 I have a Kafka Streams application which reads data from a few topics, joins the data and writes it to another topic. This is the configuration of my Kafka cluster: 5 Kafka brokers Kafka topics - 15 partitions and replication factor 3. My Kafka Streams applications are running on the same machines as my Kafka broker. A few million records are consumed/produced per hour. Whenever I take a broker down, the application goes into rebalancing state and after rebalancing many times it starts

Kafka Processor API: Different key for Source and StateStore?

北战南征 提交于 2020-01-02 05:45:07
问题 we are currently implementing a process (using the Kafka Processor API) were we need to combine information from 2 correlated events (messages) on a topic and then forward those combined information. The events originate from IoT devices and since we want to keep them in order, the source topic uses a device identifier as key. The events also contain a correlation ID: Key { deviceId: "..." } Message { deviceId: "...", correlationId: "...", data: ...} Our first approach was to create a

Aggregration and state store retention in kafka streams

穿精又带淫゛_ 提交于 2020-01-01 09:38:09
问题 I have a use case like the following. For each incoming event, I want to look at a certain field to see if it's status changed from A to B and if so, send that to an output topic. The flow is like this: An event with key "xyz" comes in with status A, and some time later another event comes in with key "xyz" with status B. I have this code using the high level DSL. final KStream<String, DomainEvent> inputStream.... final KStream<String, DomainEvent> outputStream = inputStream .map((k, v) ->

Kafka Streams with state stores - Reprocessing of messages on app restart

六月ゝ 毕业季﹏ 提交于 2020-01-01 07:22:21
问题 We have the following topology with two transformers, and each transformer uses persistent state store: kStreamBuilder.stream(inboundTopicName) .transform(() -> new FirstTransformer(FIRST_STATE_STORE), FIRST_STATE_STORE) .map((key, value) -> ...) .transform(() -> new SecondTransformer(SECOND_STATE_STORE), SECOND_STATE_STORE) .to(outboundTopicName); and Kafka settings has auto.offset.reset: latest . After app was launched, I see two internal compacted topics were creates (and it's expected):

Why does my Kafka Streams topology does not replay/reprocess correctly?

末鹿安然 提交于 2019-12-31 06:22:36
问题 I have a topology that looks like this: KTable<ByteString, User> users = topology.table(USERS); KStream<ByteString, JoinRequest> joinRequests = topology.stream(JOIN_REQUESTS) .mapValues(entityTopologyProcessor::userNew) .to(USERS); topology.stream(SETTINGS_CONFIRM_REQUESTS) .join(users, entityTopologyProcessor::userSettingsConfirm) .to(USERS); topology.stream(SETTINGS_UPDATE_REQUESTS) .join(users, entityTopologyProcessor::userSettingsUpdate) .to(USERS); At runtime this topology works fine.

Kafka streams exactly once delivery

家住魔仙堡 提交于 2019-12-31 04:47:35
问题 My goal is to consume from topic A, do some processing and produce to topic B, as a single atomic action. To achieve this I see two options: Use a spring-kafka @Kafkalistener and a KafkaTemplate as described here. Use Streams eos (exactly-once) functionality. I have successfully verified option #1. By successfully, I mean that if my processing fails (IllegalArgumentException is thrown) the consumed message from topic A keeps being consumed by the KafkaListener. This is what I expect, as the

Set timestamp in output with Kafka Streams

倖福魔咒の 提交于 2019-12-30 10:40:11
问题 I'm getting CSVs in a Kafka topic "raw-data", the goal is to transform them by sending each line in another topic "data" with the right timestamp (different for each line). Currently, I have 2 streamers: one to split the lines in "raw-data", sending them to an "internal" topic (no timestamp) one with a TimestampExtractor that consumes "internal" and send them to "data". I'd like to remove the use of this "internal" topic by setting directly the timestamp but I couldn't find a way (the