问题
When using ->
in Spark Streaming 2.0.0 jobs, or using spark-streaming-kafka-0-8_2.11
v2.0.0, and submitting it with spark-submit
, I get the following error:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 72.0 failed 1 times, most recent failure: Lost task 0.0 in stage 72.0 (TID 37, localhost): java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
I put a brief illustration of this phenomenon to a GitHub repo: spark-2-streaming-nosuchmethod-arrowassoc
Putting only provided dependencies to build.sbt
"org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.0.0" % "provided"
using ->
anywhere in the driver code, packing it with sbt-assembly
and submitting the job results in an error. This isn't a big problem by itself, using ArrayAssoc
can be avoided, but spark-streaming-kafka-0-8_2.11
v2.0.0 has it somewhere inside, and generates the same error.
Doing it like so:
wordCounts.map{
case (w, c) => Map(w -> c)
}.print()
Then
sbt assembly
Then
spark-2.0.0-bin-hadoop2.7/bin/spark-submit \
--class org.apache.spark.examples.streaming.NetworkWordCount \
--master local[2] \
--deploy-mode client \
./target/scala-2.11/spark-2-streaming-nosuchmethod-arrowassoc-assembly-1.0.jar \
localhost 5555
回答1:
- Spark jobs should be packed without Scala runtime, i.e. , if you're doing it with sbt-assembly, add this:
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
- I just had my
SPARK_HOME
enviroment variable pointing to Spark 1.6.2, doesn't matter where you runspark-submit
from,SPARK_HOME
should be set properly.
来源:https://stackoverflow.com/questions/39395521/spark-2-0-0-streaming-job-packed-with-sbt-assembly-lacks-scala-runtime-methods