spark-streaming-kafka | 易学教程

does pyspark support spark-streaming-kafka-0-10 lib?

阅读更多关于 does pyspark support spark-streaming-kafka-0-10 lib?

问题 my kafka cluster version is 0.10.0.0, and i want to use pyspark stream to read kafka data. but in Spark Streaming + Kafka Integration Guide, http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html there is no python code example. so can pyspark use spark-streaming-kafka-0-10 to integrate kafka? Thank you in advance for your help ! 回答1: I also use spark streaming with Kafka 0.10.0 cluster. After adding following line to your code, you are good to go. spark.jars.packages org

Array of JSON to Dataframe in Spark received by Kafka

阅读更多关于 Array of JSON to Dataframe in Spark received by Kafka

问题 I'm writing a Spark application in Scala using Spark Structured Streaming that receive some data formatted in JSON style from Kafka. This application could receive both a single or multiple JSON object formatted in this way: [{"key1":"value1","key2":"value2"},{"key1":"value1","key2":"value2"},...,{"key1":"value1","key2":"value2"}] I tried to define a StructType like: var schema = StructType( Array( StructField("key1",DataTypes.StringType), StructField("key2",DataTypes.StringType) )) But it

Array of JSON to Dataframe in Spark received by Kafka

阅读更多关于 Array of JSON to Dataframe in Spark received by Kafka

How to define Kafka (data source) dependencies for Spark Streaming?

阅读更多关于 How to define Kafka (data source) dependencies for Spark Streaming?

问题 I'm trying to consume a kafka 0.8 topic using spark-streaming2.0.0, i'm trying to identify the required dependencies i have tried using these dependencies in my build.sbt file libraryDependencies += "org.apache.spark" %% "spark-streaming_2.11" % "2.0.0" when i run sbt package i'm getting unresolved dependencies for all three these jars, But these jars do exist https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8_2.11/2.0.0 Please help in debugging this issue, I'm new

Log4j2 Kafka appender does not work with Spark Streaming Kafka Consumer

阅读更多关于 Log4j2 Kafka appender does not work with Spark Streaming Kafka Consumer

问题 When I use log4j2 kafka appender with my spark streaming code, it throws below error when spark.sql task is executed. Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.serialization.ByteArraySerializer is not an instance of org.apache.kafka.common.serialization.Serializer at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstance(AbstractConfig.java:372) The spark streaming code is written in scala. My build.gradle.kts has below dependencies. I am using

Failed to find leader for topics; java.lang.NullPointerException NullPointerException at org.apache.kafka.common.utils.Utils.formatAddress

阅读更多关于 Failed to find leader for topics; java.lang.NullPointerException NullPointerException at org.apache.kafka.common.utils.Utils.formatAddress

问题 When we are trying to stream the data from SSL enabled Kafka topic we are facing below error . Can you please help us on this issue . 19/11/07 13:26:54 INFO ConsumerFetcherManager: [ConsumerFetcherManager-1573151189884] Added fetcher for partitions ArrayBuffer() 19/11/07 13:26:54 WARN ConsumerFetcherManager$LeaderFinderThread: [spark-streaming-consumer_dvtcbddc101.corp.cox.com-1573151189725-d40a510f-leader-finder-thread], Failed to find leader for Set([inst_monitor_status_test,2], [inst

Reading avro messages from Kafka in spark streaming/structured streaming

阅读更多关于 Reading avro messages from Kafka in spark streaming/structured streaming

问题 I am using pyspark for the first time. Spark Version : 2.3.0 Kafka Version : 2.2.0 I have a kafka producer which sends nested data in avro format and I am trying to write code in spark-streaming/ structured streaming in pyspark which will deserialize the avro coming from kafka into dataframe do transformations write it in parquet format into s3. I was able to find avro converters in spark/scala but support in pyspark has not yet been added. How do I convert the same in pyspark. Thanks. 回答1:

spark submit failed with spark streaming workdcount python code

阅读更多关于 spark submit failed with spark streaming workdcount python code

问题 I just copied the spark streaming wodcount python code, and use spark-submit to run the wordcount python code in Spark cluster, but it shows the following errors: py4j.protocol.Py4JJavaError: An error occurred while calling o23.loadClass. : java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged

Kafka Producer - org.apache.kafka.common.serialization.StringSerializer could not be found

阅读更多关于 Kafka Producer - org.apache.kafka.common.serialization.StringSerializer could not be found

问题 I have creating a simple Kafka Producer & Consumer.I am using kafka_2.11-0.9.0.0. Here is my Producer code, public class KafkaProducerTest { public static String topicName = "test-topic-2"; public static void main(String[] args) { // TODO Auto-generated method stub Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("acks", "all"); props.put("retries", 0); props.put("batch.size", 16384); props.put("linger.ms", 1); props.put("buffer.memory",

Deserializing Spark structured stream data from Kafka topic

阅读更多关于 Deserializing Spark structured stream data from Kafka topic

问题 I am working off Kafka 2.3.0 and Spark 2.3.4. I have already built a Kafka Connector which reads off a CSV file and posts a line from the CSV to the relevant Kafka topic. The line is like so: "201310,XYZ001,Sup,XYZ,A,0,Presales,6,Callout,0,0,1,N,Prospect". The CSV contains 1000s of such lines. The Connector is able to successfully post them on the topic and I am also able to get the message in Spark. I am not sure how can I deserialize that message to my schema? Note that the messages are