Spark: Extracting summary for a ML logistic regression model from a pipeline model

人走茶凉 提交于 2019-12-12 19:15:22

问题


I've estimated a logistic regression using pipelines.

My last few lines before fitting the logistic regression:

from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(featuresCol="lr_features", labelCol = "targetvar")
# create assember to include encoded features
    lr_assembler = VectorAssembler(inputCols= numericColumns + 
                               [categoricalCol + "ClassVec" for categoricalCol in categoricalColumns],
                               outputCol = "lr_features")
from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline
# Model definition:
lr = LogisticRegression(featuresCol = "lr_features", labelCol = "targetvar")
# Pipeline definition:
lr_pipeline = Pipeline(stages = indexStages + encodeStages +[lr_assembler, lr])
# Fit the logistic regression model:
lrModel = lr_pipeline.fit(train_train)

And then I tried to run the summary of the model. However, the code line below:

trainingSummary = lrModel.summary

results in: 'PipelineModel' object has no attribute 'summary'

Any advice on how one could extract the summary information that is usually contained in regression's model from a pipeline model?

Thanks a lot!


回答1:


Just get the model from stages:

lrModel.stages[-1].summary

If model is earlier in the Pipeline replace -1 with its index.



来源:https://stackoverflow.com/questions/47685234/spark-extracting-summary-for-a-ml-logistic-regression-model-from-a-pipeline-mod

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!