unable to read kafka topic data using spark

后端 未结 2 1072
-上瘾入骨i
-上瘾入骨i 2021-01-16 18:13

I have data like below in one of the topics which I created named \"sampleTopic\"

sid,Believer  

Where the first argument is

相关标签:
2条回答
  • 2021-01-16 19:02

    Try to add spark-sql-kafka library to your build file. Check below.

    build.sbt

    libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.3.0"  
    // Change to Your spark version 
    

    pom.xml

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
        <version>2.3.0</version>    // Change to Your spark version
    </dependency>
    
    

    Change your code like below

        package com.sparkKafka
        import org.apache.spark.SparkContext
        import org.apache.spark.SparkConf
        import org.apache.spark.sql.SparkSession
        import org.apache.spark.sql.types._
        import org.apache.spark.sql.functions._
        case class KafkaMessage(key: String, value: String, topic: String, partition: Int, offset: Long, timestamp: String)
    
        object SparkKafkaTopic {
    
          def main(args: Array[String]) {
            //val spark = SparkSession.builder().appName("SparkKafka").master("local[*]").getOrCreate()
            println("hey")
            val spark = SparkSession.builder().appName("SparkKafka").master("local[*]").getOrCreate()
            import spark.implicits._
            val mySchema = StructType(Array(
              StructField("userName", StringType),
              StructField("songName", StringType)))
            val df = spark
              .readStream
              .format("kafka")
              .option("kafka.bootstrap.servers", "localhost:9092")
              .option("subscribe", "sampleTopic1")
              .load()
    
            val query = df
              .as[KafkaMessage]
              .select(split($"value", ",")(0).as("userName"),split($"value", ",")(1).as("songName"))
              .writeStream
              .outputMode("append")
              .format("console")
              .start()
              .awaitTermination()
          }
        }
    
         /*
            +------+--------+
            |userid|songname|
            +------+--------+
            |   sid|Believer|
            +------+--------+
           */
    
          }
        }
    
    0 讨论(0)
  • 2021-01-16 19:03

    spark-sql-kafka jar is missing, which is having the implementation of 'kafka' datasource.

    you can add the jar using config option or build fat jar which includes spark-sql-kafka jar. Please use relevant version of jar

    val spark = SparkSession.builder()
      .appName("SparkKafka").master("local[*]")
      .config("spark.jars","/path/to/spark-sql-kafka-xxxxxx.jar")
      .getOrCreate()
    
    0 讨论(0)
提交回复
热议问题