spark-streaming-kafka

does pyspark support spark-streaming-kafka-0-10 lib?

徘徊边缘 提交于 2020-07-08 02:03:39
问题 my kafka cluster version is 0.10.0.0, and i want to use pyspark stream to read kafka data. but in Spark Streaming + Kafka Integration Guide, http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html there is no python code example. so can pyspark use spark-streaming-kafka-0-10 to integrate kafka? Thank you in advance for your help ! 回答1: I also use spark streaming with Kafka 0.10.0 cluster. After adding following line to your code, you are good to go. spark.jars.packages org

Array of JSON to Dataframe in Spark received by Kafka

这一生的挚爱 提交于 2020-06-25 07:10:08
问题 I'm writing a Spark application in Scala using Spark Structured Streaming that receive some data formatted in JSON style from Kafka. This application could receive both a single or multiple JSON object formatted in this way: [{"key1":"value1","key2":"value2"},{"key1":"value1","key2":"value2"},...,{"key1":"value1","key2":"value2"}] I tried to define a StructType like: var schema = StructType( Array( StructField("key1",DataTypes.StringType), StructField("key2",DataTypes.StringType) )) But it

Array of JSON to Dataframe in Spark received by Kafka

感情迁移 提交于 2020-06-25 07:07:22
问题 I'm writing a Spark application in Scala using Spark Structured Streaming that receive some data formatted in JSON style from Kafka. This application could receive both a single or multiple JSON object formatted in this way: [{"key1":"value1","key2":"value2"},{"key1":"value1","key2":"value2"},...,{"key1":"value1","key2":"value2"}] I tried to define a StructType like: var schema = StructType( Array( StructField("key1",DataTypes.StringType), StructField("key2",DataTypes.StringType) )) But it

How to define Kafka (data source) dependencies for Spark Streaming?

对着背影说爱祢 提交于 2020-05-08 08:11:14
问题 I'm trying to consume a kafka 0.8 topic using spark-streaming2.0.0, i'm trying to identify the required dependencies i have tried using these dependencies in my build.sbt file libraryDependencies += "org.apache.spark" %% "spark-streaming_2.11" % "2.0.0" when i run sbt package i'm getting unresolved dependencies for all three these jars, But these jars do exist https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8_2.11/2.0.0 Please help in debugging this issue, I'm new

Log4j2 Kafka appender does not work with Spark Streaming Kafka Consumer

笑着哭i 提交于 2020-02-02 17:39:08
问题 When I use log4j2 kafka appender with my spark streaming code, it throws below error when spark.sql task is executed. Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.serialization.ByteArraySerializer is not an instance of org.apache.kafka.common.serialization.Serializer at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstance(AbstractConfig.java:372) The spark streaming code is written in scala. My build.gradle.kts has below dependencies. I am using

Failed to find leader for topics; java.lang.NullPointerException NullPointerException at org.apache.kafka.common.utils.Utils.formatAddress

坚强是说给别人听的谎言 提交于 2020-01-28 03:03:44
问题 When we are trying to stream the data from SSL enabled Kafka topic we are facing below error . Can you please help us on this issue . 19/11/07 13:26:54 INFO ConsumerFetcherManager: [ConsumerFetcherManager-1573151189884] Added fetcher for partitions ArrayBuffer() 19/11/07 13:26:54 WARN ConsumerFetcherManager$LeaderFinderThread: [spark-streaming-consumer_dvtcbddc101.corp.cox.com-1573151189725-d40a510f-leader-finder-thread], Failed to find leader for Set([inst_monitor_status_test,2], [inst

Reading avro messages from Kafka in spark streaming/structured streaming

Deadly 提交于 2020-01-15 10:07:09
问题 I am using pyspark for the first time. Spark Version : 2.3.0 Kafka Version : 2.2.0 I have a kafka producer which sends nested data in avro format and I am trying to write code in spark-streaming/ structured streaming in pyspark which will deserialize the avro coming from kafka into dataframe do transformations write it in parquet format into s3. I was able to find avro converters in spark/scala but support in pyspark has not yet been added. How do I convert the same in pyspark. Thanks. 回答1:

spark submit failed with spark streaming workdcount python code

旧街凉风 提交于 2019-12-21 23:53:36
问题 I just copied the spark streaming wodcount python code, and use spark-submit to run the wordcount python code in Spark cluster, but it shows the following errors: py4j.protocol.Py4JJavaError: An error occurred while calling o23.loadClass. : java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged

Kafka Producer - org.apache.kafka.common.serialization.StringSerializer could not be found

两盒软妹~` 提交于 2019-12-19 02:28:09
问题 I have creating a simple Kafka Producer & Consumer.I am using kafka_2.11-0.9.0.0. Here is my Producer code, public class KafkaProducerTest { public static String topicName = "test-topic-2"; public static void main(String[] args) { // TODO Auto-generated method stub Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("acks", "all"); props.put("retries", 0); props.put("batch.size", 16384); props.put("linger.ms", 1); props.put("buffer.memory",

Deserializing Spark structured stream data from Kafka topic

筅森魡賤 提交于 2019-12-13 00:34:29
问题 I am working off Kafka 2.3.0 and Spark 2.3.4. I have already built a Kafka Connector which reads off a CSV file and posts a line from the CSV to the relevant Kafka topic. The line is like so: "201310,XYZ001,Sup,XYZ,A,0,Presales,6,Callout,0,0,1,N,Prospect". The CSV contains 1000s of such lines. The Connector is able to successfully post them on the topic and I am also able to get the message in Spark. I am not sure how can I deserialize that message to my schema? Note that the messages are