apache-kafka

PySpark 2.x: Programmatically adding Maven JAR Coordinates to Spark

冷暖自知 提交于 2021-02-07 19:42:06
问题 The following is my PySpark startup snippet, which is pretty reliable (I've been using it a long time). Today I added the two Maven Coordinates shown in the spark.jars.packages option (effectively "plugging" in Kafka support). Now that normally triggers dependency downloads (performed by Spark automatically): import sys, os, multiprocessing from pyspark.sql import DataFrame, DataFrameStatFunctions, DataFrameNaFunctions from pyspark.conf import SparkConf from pyspark.sql import SparkSession

Using onErrorResume to handle problematic payloads posted to Kafka using Reactor Kafka

我只是一个虾纸丫 提交于 2021-02-07 19:23:40
问题 I am using reactor kafka to send in kafka messages and receive and process them. While receiving the kakfa payload, I do some deserialization, and if there is an exception, I want to just log that payload ( by saving to mongo ), and then continue receiving other payloads. For this I am using the below approach - @EventListener(ApplicationStartedEvent.class) public void kafkaReceiving() { for(Flux<ReceiverRecord<String, Object>> flux: kafkaService.getFluxReceives()) { flux.delayUntil(//some

Using onErrorResume to handle problematic payloads posted to Kafka using Reactor Kafka

安稳与你 提交于 2021-02-07 19:21:29
问题 I am using reactor kafka to send in kafka messages and receive and process them. While receiving the kakfa payload, I do some deserialization, and if there is an exception, I want to just log that payload ( by saving to mongo ), and then continue receiving other payloads. For this I am using the below approach - @EventListener(ApplicationStartedEvent.class) public void kafkaReceiving() { for(Flux<ReceiverRecord<String, Object>> flux: kafkaService.getFluxReceives()) { flux.delayUntil(//some

Are there any problems with this way of starting an infinite loop in a Spring Boot application?

流过昼夜 提交于 2021-02-07 14:24:11
问题 I have a Spring Boot application and it needs to process some Kafka streaming data. I added an infinite loop to a CommandLineRunner class that will run on startup. In there is a Kafka consumer that can be woken up. I added a shutdown hook with Runtime.getRuntime().addShutdownHook(new Thread(consumer::wakeup)); . Will I run into any problems? Is there a more idiomatic way of doing this in Spring? Should I use @Scheduled instead? The code below is stripped of specific Kafka-implementation stuff

Kafka Structured Streaming KafkaSourceProvider could not be instantiated

橙三吉。 提交于 2021-02-07 11:38:13
问题 I am working on a streaming project where I have a kafka stream of ping statistics like so : 64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=1 ttl=62 time=0.913 ms 64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=2 ttl=62 time=0.936 ms 64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=3 ttl=62 time=0.980 ms 64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=4 ttl=62 time=0.889 ms I am trying to read this as a structured stream in

Kafka not starting up if zookeeper.set.acl is set to true

佐手、 提交于 2021-02-07 10:50:36
问题 I have a set up of kerberized Zookeeper and kerberized Kafka which works fine with zookeeper.set.acl set to false. When I try to start Kafka with the parameter set to true, I get this in the zookeeper logs: Nov 12 13:36:26 <zk host> docker:zookeeper_corelinux_<zk host>[1195]: [2019-11-12 13:36:26,625] INFO Client attempting to establish new session at /<kafka ip>:54272 (org.apache.zookeeper.server.ZooKeeperServer) Nov 12 13:36:26 <zk host> docker:zookeeper_corelinux_<zk host>[1195]: [2019-11

Design Kafka consumers and producers for scalability

Deadly 提交于 2021-02-07 10:50:30
问题 I want to design a solution for sending different kinds of e-mails to several providers. The general overview. I have several upstream providers Sendgrid, Zoho, Mailgun and etc. They will be used to send e-mails and etc. For example: E-mail for Register new user E-mail for Remove user E-mail for Space Quota limit (in general around 6 types of e-mails) Every type of e-mail should be generated into Producers, converted into Serialized Java Object and Send to the appropriate Kafka Consumer

Enabling SSL between Apache spark and Kafka broker

你说的曾经没有我的故事 提交于 2021-02-07 10:43:15
问题 I am trying to enable the SSL between my Apache Spark 1.4.1 and Kafka 0.9.0.0 and I am using spark-streaming-kafka_2.10 Jar to connect to Kafka and I am using KafkaUtils.createDirectStream method to read the data from Kafka topic. Initially, I got OOM issue and I have resolved it by increasing the Driver memory, after that I am seeing below issue, I have done little bit of reading and found out that spark-streaming-kafka_2.10 uses Kafka 0.8.2.1 API, which doesn't support SSL (Kafka supports

Enabling SSL between Apache spark and Kafka broker

你。 提交于 2021-02-07 10:43:11
问题 I am trying to enable the SSL between my Apache Spark 1.4.1 and Kafka 0.9.0.0 and I am using spark-streaming-kafka_2.10 Jar to connect to Kafka and I am using KafkaUtils.createDirectStream method to read the data from Kafka topic. Initially, I got OOM issue and I have resolved it by increasing the Driver memory, after that I am seeing below issue, I have done little bit of reading and found out that spark-streaming-kafka_2.10 uses Kafka 0.8.2.1 API, which doesn't support SSL (Kafka supports

Develop Apache Kafka producer and load testing using JMeter

霸气de小男生 提交于 2021-02-07 10:41:25
问题 Is it possible to use JMeter to push messages to Apache Kafka. How to implement producer (in JAVA) to push messages to Kafka. Regards, Anand 回答1: I thought there was an answer earlier, maybe not. Have you taken a look at these? I'm using the original kafkameter myself. https://github.com/BrightTag/kafkameter https://github.com/EugeneYushin/new-api-kafkameter and tutorials on kafkameter: http://www.technix.in/load-testing-apache-kafka-using-kafkameter http://codyaray.com/2014/07/custom-jmeter