Error: AttributeError: 'DataFrame' object has no attribute '_jdf'

前端 未结 2 1985
余生分开走
余生分开走 2021-02-20 02:08

I want to perform k-fold cross validation using pyspark to finetune the parameters and I\'m using pyspark.ml. I am getting Attribute Error.

AttributeError: \'DataFrame\'

相关标签:
2条回答
  • 2021-02-20 02:38

    If a metric evaluation error you probably:

    1. Transformed using Spark on test set properly, then peeked using Pandas DF.
    # Spark model, transformed test, converted to pandas df
    predictions = model.transform(test)
    predDF = predictions.toPandas()
    predDF.head()
    
    1. Then tried:
    eval_acc = MulticlassClassificationEvaluator(
                labelCol='Label_index',
                predictionCol='prediction',
                metricName='accuracy'
    )
    
    # Evaluate Performance
    acc = eval_acc.evaluate(predDF) # Error
    print(f"accuracy: {acc}")
    

    I forgot predDF is a Pandas DataFrame. Needed predictions because its a Spark Dataframe.

    acc = eval_acc.evaluate(predictions) # Works
    print(f"accuracy: {acc}")
    
    0 讨论(0)
  • 2021-02-20 02:59

    Convert Panadas to Spark

    from pyspark.sql import SQLContext
    sc = SparkContext.getOrCreate()
    sqlContext = SQLContext(sc)
    
    spark_dff = sqlContext.createDataFrame(panada_df)
    
    0 讨论(0)
提交回复
热议问题