问题
I'm using pyspark dataframe. I have some code in which I'm trying to convert the dataframe
to an rdd
, but I receive the following error:
AttributeError: 'SparkSession' object has no attribute 'serializer'
What can be the issue?
training, test = rescaledData.randomSplit([0.8, 0.2])
nb = NaiveBayes(smoothing=1.0, modelType="multinomial")
# Train a naive Bayes model.
model = nb.fit(rescaledData)
# Make prediction and test accuracy.
predictionAndLabel = test.rdd.map(lambda p: (model.predict(p.features), p.label))
accuracy = 1.0 * predictionAndLabel.filter(lambda pl: pl[0] == pl[1]).count() / test.count()
print('model accuracy {}'.format(accuracy))
Does anyone have any insight to why the statement test.rdd
causes an error? The dataframe contains Row object of (label, features)
.
Thanks
回答1:
Apologies as I don't have enough rep to comment. The answer to this question may resolve this, as this pertains to the way the SQL context is initiated:
https://stackoverflow.com/a/54738984/8534357
When I initiate the Spark Session and the SQL context, I was doing this, which is not right:
sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sc)
This problem was resolved by doing this instead:
sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)
来源:https://stackoverflow.com/questions/53327006/how-to-resolve-error-attributeerror-sparksession-object-has-no-attribute-se