apache-kafka

Cassandra Sink for PySpark Structured Streaming from Kafka topic

久未见 提交于 2021-02-04 16:34:14
问题 I want to write Structure Streaming Data into Cassandra using PySpark Structured Streaming API. My data flow is like below: REST API -> Kafka -> Spark Structured Streaming (PySpark) -> Cassandra Source and Version in below: Spark version: 2.4.3 DataStax DSE: 6.7.6-1 initialize spark: spark = SparkSession.builder\ .master("local[*]")\ .appName("Analytics")\ .config("kafka.bootstrap.servers", "localhost:9092")\ .config("spark.cassandra.connection.host","localhost:9042")\ .getOrCreate()

Kafka Streams limiting off-heap memory

99封情书 提交于 2021-02-04 08:27:32
问题 We are running kafka streams applications and frequency running into off heap memory issues. Our applications are deployed and kubernetes PODs and they keep on restarting. I am doing some investigation and found that we can limit the off heap memory by implementing RocksDBConfigSetter as shown in following example. public static class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter { // See #1 below private static org.rocksdb.Cache cache = new org.rocksdb.LRUCache(TOTAL_OFF_HEAP

Does Kafka support ELB in front of broker cluster?

醉酒当歌 提交于 2021-02-04 04:33:08
问题 I had a question regarding Kafka broker clusters on AWS. Right now there is an AWS ELB sitting in front of the cluster, but when I set the "bootstrap.servers" property of my producer or consumer to the "A" record (and correct port number) of my ELB, both the producer and consumer fail to produce and consume messages respectively. I have turned off all SSL on my broker and am connecting through the PLAINTEXT 9092 port, with which my ELB forwards port 1234 to 9092. So in my Producer Configs

Whole cluster failing if one kafka node goes down?

两盒软妹~` 提交于 2021-01-29 22:54:54
问题 I have 3 node kafka cluster each having zookeeper and kafka. If i explicitly kill the leader node both zookeeper and kafka the whole cluster is not accepting any incoming data and waiting for the node to come back. kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 min.insync.replicas=2 --partitions 6 --topic logs topic created using the above command. Node 1 server.properties broker.id=0 listeners=PLAINTEXT://:9092 advertised.listeners=PLAINTEXT://10.0.2.4:9092

How to reset the retry count in Spring Kafka consumer when the exception thrown in the first retry is different from the second retry?

懵懂的女人 提交于 2021-01-29 22:34:48
问题 I am trying to implement a Kafka retry consumer in spring-boot and using SeekToCurrentErrorHandler for the retries. I have set the backoff policy to have 5 retry attempts. My question is, lets say the first attempt of the retry, the exception was 'database not available', and the second attempt db was available but there is another failure at another step like a timeout, in this case will the retry count goes back to zero and starts a fresh or will continue to try only of the remaining

Whole cluster failing if one kafka node goes down?

▼魔方 西西 提交于 2021-01-29 22:33:36
问题 I have 3 node kafka cluster each having zookeeper and kafka. If i explicitly kill the leader node both zookeeper and kafka the whole cluster is not accepting any incoming data and waiting for the node to come back. kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 min.insync.replicas=2 --partitions 6 --topic logs topic created using the above command. Node 1 server.properties broker.id=0 listeners=PLAINTEXT://:9092 advertised.listeners=PLAINTEXT://10.0.2.4:9092

Axon Event Published Multiple Times Over EventBus

别等时光非礼了梦想. 提交于 2021-01-29 21:51:01
问题 Just want to confirm the intended behavior of Axon, versus what I’m seeing in my application. We have a customized Kafka publisher integrated with the Axon framework, as well as, a custom Cassandra-backed event store. The issue I’m seeing is as follows: (1) I publish a command (e.g. CreateServiceCommand) which hits the constructor of the ServiceAggregate, and then (2) A ServiceCreatedEvent is applied to the aggregate. (3) We see the domain event persisted in the backend and published over the

Spring Kafka Consumer, rewind consumer offset to go back 'n' records

守給你的承諾、 提交于 2021-01-29 20:37:25
问题 I'm using "programmatic" way of consuming messages from Kafka topic using org.springframework.kafka.listener.ConcurrentMessageListenerContainer I'm wondering if there's a "spring" way of rewinding offsets for a specific partitions of a topic to go back 'n' messages? Would like to know the cleanest way of doing this (programmatically and not using the CLI). 回答1: If you want to reset the offsets during application startup, use a ConsumerAwareRebalanceListener and perform the seeks on the

Problems joining 2 kafka streams (using custom timestampextractor)

允我心安 提交于 2021-01-29 16:42:45
问题 I'm having problems joining 2 kafka streams extracting the date from the fields of my event. The join is working fine when I do not define a custom TimeStampExtractor but when I do the join does not work anymore. My topology is quite simple: val builder = new StreamsBuilder() val couponConsumedWith = Consumed.`with`(Serdes.String(), getAvroCouponSerde(schemaRegistryHost, schemaRegistryPort)) val couponStream: KStream[String, Coupon] = builder.stream(couponInputTopic, couponConsumedWith) val

ClickHouse: Usage of hash and internal_replication in Distributed & Replicated tables

风格不统一 提交于 2021-01-29 16:11:48
问题 I have read this in the Distributed Engine documentation about internal_replication setting. If this parameter is set to ‘true’, the write operation selects the first healthy replica and writes data to it. Use this alternative if the Distributed table “looks at” replicated tables. In other words, if the table where data will be written is going to replicate them itself. If it is set to ‘false’ (the default), data is written to all replicas. In essence, this means that the Distributed table