I meet a very strange problem on Spark about serialization. The code is as below:
class PLSA(val sc : SparkContext, val numOfTopics : Int) extends Serializable
{
This is indeed a weird one, but I think I can guess the problem. But first, you have not provided the bare minimum to solve the problem (I can guess, because I've seen 100s of these before). Here are some problems with your question:
def infer(document: RDD[Document], numOfTopics: Int): RDD[DocumentParameter] = {
val docs = documents.map(doc => DocumentParameter(doc, numOfTopics))
}
This method doesn't return RDD[DocumentParameter]
it returns Unit
. You must have copied and pasted code incorrectly.
Secondly you haven't provided the entire stack trace? Why? There is no reason NOT to provide the full stack trace, and the full stack trace with message is necessary to understand the error - one needs the whole error to understand what the error is. Usually a not serializable exception tells you what is not serializable.
Thirdly you haven't told us where method infer
is, are you doing this in a shell? What is the containing object/class/trait etc of infer
?
Anyway, I'm going guess that by passing in the Int
your causing a chain of things to get serialized that you don't expect, I can't give you any more information than that until you provide the bare minimum code so we can fully understand your problem.