I want to perform k-fold cross validation using pyspark to finetune the parameters and I\'m using pyspark.ml. I am getting Attribute Error.
AttributeError: \'DataFrame\'
If a metric evaluation error you probably:
# Spark model, transformed test, converted to pandas df
predictions = model.transform(test)
predDF = predictions.toPandas()
predDF.head()
eval_acc = MulticlassClassificationEvaluator(
labelCol='Label_index',
predictionCol='prediction',
metricName='accuracy'
)
# Evaluate Performance
acc = eval_acc.evaluate(predDF) # Error
print(f"accuracy: {acc}")
I forgot predDF is a Pandas DataFrame. Needed predictions because its a Spark Dataframe.
acc = eval_acc.evaluate(predictions) # Works
print(f"accuracy: {acc}")
Convert Panadas to Spark
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
spark_dff = sqlContext.createDataFrame(panada_df)