How to resolve error "AttributeError: 'SparkSession' object has no attribute 'serializer'?

问题

I'm using pyspark dataframe. I have some code in which I'm trying to convert the dataframe to an rdd, but I receive the following error:

AttributeError: 'SparkSession' object has no attribute 'serializer'

What can be the issue?

training, test = rescaledData.randomSplit([0.8, 0.2])
nb = NaiveBayes(smoothing=1.0, modelType="multinomial")
# Train a naive Bayes model.
model = nb.fit(rescaledData)

# Make prediction and test accuracy.
predictionAndLabel = test.rdd.map(lambda p: (model.predict(p.features), p.label))
accuracy = 1.0 * predictionAndLabel.filter(lambda pl: pl[0] == pl[1]).count() / test.count()
print('model accuracy {}'.format(accuracy))

Does anyone have any insight to why the statement test.rdd causes an error? The dataframe contains Row object of (label, features).

Thanks

回答1:

Apologies as I don't have enough rep to comment. The answer to this question may resolve this, as this pertains to the way the SQL context is initiated:

https://stackoverflow.com/a/54738984/8534357

When I initiate the Spark Session and the SQL context, I was doing this, which is not right:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sc)

This problem was resolved by doing this instead:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)

来源：https://stackoverflow.com/questions/53327006/how-to-resolve-error-attributeerror-sparksession-object-has-no-attribute-se

标签

apache-spark

pyspark

pyspark-sql

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!