KafkaUtils class not found in Spark streaming

后端 未结 9 1542
囚心锁ツ
囚心锁ツ 2021-01-11 20:14

I have just began with Spark Streaming and I am trying to build a sample application that counts words from a Kafka stream. Although it compiles with sbt package

相关标签:
9条回答
  • 2021-01-11 20:20

    Please try by including all dependency jars while submitting application:

    ./spark-submit --name "SampleApp" --deploy-mode client--master spark://host:7077 --class com.stackexchange.SampleApp --jars $SPARK_INSTALL_DIR/spark-streaming-kafka_2.10-1.3.0.jar,$KAFKA_INSTALL_DIR/libs/kafka_2.10-0.8.2.0.jar,$KAFKA_INSTALL_DIR/libs/metrics-core-2.2.0.jar,$KAFKA_INSTALL_DIR/libs/zkclient-0.3.jar spark-example-1.0-SNAPSHOT.jar

    0 讨论(0)
  • 2021-01-11 20:23

    spark-submit does not automatically put the package containing KafkaUtils. You need to have in your project JAR. For that you need to create an all inclusive uber-jar, using sbt assembly. Here is an example build.sbt .

    https://github.com/tdas/spark-streaming-external-projects/blob/master/kafka/build.sbt

    You obviously also need to add the assembly plugin to SBT.

    https://github.com/tdas/spark-streaming-external-projects/tree/master/kafka/project

    0 讨论(0)
  • 2021-01-11 20:23

    meet the same problem, I solved it by build the jar with dependencies.

    add the code below to pom.xml

    <build>
        <sourceDirectory>src/main/java</sourceDirectory>
        <testSourceDirectory>src/test/java</testSourceDirectory>
        <plugins>
          <!--
                       Bind the maven-assembly-plugin to the package phase
            this will create a jar file without the storm dependencies
            suitable for deployment to a cluster.
           -->
          <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
              <descriptorRefs>
                <descriptorRef>jar-with-dependencies</descriptorRef>
              </descriptorRefs>
              <archive>
                <manifest>
                  <mainClass></mainClass>
                </manifest>
              </archive>
            </configuration>
            <executions>
              <execution>
                <id>make-assembly</id>
                <phase>package</phase>
                <goals>
                  <goal>single</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
    </build>    
    

    mvn package submit the "example-jar-with-dependencies.jar"

    0 讨论(0)
  • 2021-01-11 20:26

    Following build.sbt worked for me. It requires you to also put the sbt-assembly plugin in a file under the projects/ directory.

    build.sbt

    name := "NetworkStreaming" // https://github.com/sbt/sbt-assembly/blob/master/Migration.md#upgrading-with-bare-buildsbt
    
    libraryDependencies ++= Seq(
      "org.apache.spark" % "spark-streaming_2.10" % "1.4.1",
      "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.4.1",         // kafka
      "org.apache.hbase" % "hbase" % "0.92.1",
      "org.apache.hadoop" % "hadoop-core" % "1.0.2",
      "org.apache.spark" % "spark-mllib_2.10" % "1.3.0"
    )
    
    mergeStrategy in assembly := {
      case m if m.toLowerCase.endsWith("manifest.mf")          => MergeStrategy.discard
      case m if m.toLowerCase.matches("meta-inf.*\\.sf$")      => MergeStrategy.discard
      case "log4j.properties"                                  => MergeStrategy.discard
      case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
      case "reference.conf"                                    => MergeStrategy.concat
      case _                                                   => MergeStrategy.first
    }
    

    project/plugins.sbt

    addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1")

    0 讨论(0)
  • 2021-01-11 20:32

    use --packages argument on spark-submit, it takes mvn package in the format group:artifact:version,...

    0 讨论(0)
  • 2021-01-11 20:36
    import org.apache.spark.streaming.kafka.KafkaUtils
    

    use the below in build.sbt


    name := "kafka"
    
    version := "0.1"
    
    scalaVersion := "2.11.12"
    
    retrieveManaged := true
    
    fork := true
    
    //libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.2.0"
    //libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-8_2.11" % "2.1.0"
    
    libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"
    
    //libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
    
    libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0"
    
    // https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8
    libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-8" % "2.2.0" % "provided"
    
    // https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8-assembly
    libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-8-assembly" % "2.2.0"
    

    This will fix the issue

    0 讨论(0)
提交回复
热议问题