Type mismatch with identical types in Spark-shell

问题

I have build a scripting workflow around the spark-shell but I'm often vexed by bizarre type mismatches (probably inherited from the scala repl) occuring with identical found and required types. The following example illustrates the problem. Executed in paste mode, no problem

scala> :paste
// Entering paste mode (ctrl-D to finish)


import org.apache.spark.rdd.RDD
case class C(S:String)
def f(r:RDD[C]): String = "hello"
val in = sc.parallelize(List(C("hi")))
f(in)

// Exiting paste mode, now interpreting.

import org.apache.spark.rdd.RDD
defined class C
f: (r: org.apache.spark.rdd.RDD[C])String
in: org.apache.spark.rdd.RDD[C] = ParallelCollectionRDD[0] at parallelize at <console>:13
res0: String = hello

but

scala> f(in)
<console>:29: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[C]
 required: org.apache.spark.rdd.RDD[C]
              f(in)
                ^

There are related discussion about the scala repl and about the spark-shell but the mentioned issue seems unrelated (and resolved) to me.

This problem causes serious problems for writing passable code to be executed interactively in the repl, or causes to lose most of the advantage of working in a repl to begin with. Is there a solution? (And/or is it a known issue?)

Edits:

Problems occured with spark 1.2 and 1.3.0. Test made on spark 1.3.0 using scala 2.10.4

It seems that, at least in the test, repeating the statement using the class separately from the case class definition, mitigate the problem

scala> :paste
// Entering paste mode (ctrl-D to finish)


def f(r:RDD[C]): String = "hello"
val in = sc.parallelize(List(C("hi1")))

// Exiting paste mode, now interpreting.

f: (r: org.apache.spark.rdd.RDD[C])String
in: org.apache.spark.rdd.RDD[C] = ParallelCollectionRDD[1] at parallelize at <console>:26

scala> f(in)
res2: String = hello

回答1:

Unfortunately, this is still an open issue. Code in spark-shell is wrapped in classes and it causes strange behavior sometimes.

The other problem: Errors like value reduceByKey is not a member of org.apache.spark.rdd.RDD[(...,...)] can be caused by using different versions of spark in the same project. If you use IntelliJ, go to File -> Project Structure -> Libraries and delete the stuff like "SBT: org.apache.spark:spark-catalyst_2.10:1.1.0:jar". You need libs with spark's version 1.2.0 or 1.3.0.

Hope it will help you somehow.

来源：https://stackoverflow.com/questions/29768717/type-mismatch-with-identical-types-in-spark-shell